Bio


1977 B.A, Chemistry and Biology, University of Rochester, NY
1978-1982 Ph.D. California Institute of Technology, CA Advisor: Dr. Norman Davidson
1982-1986 Postdoctoral Research Stanford University School of Medicine, CA Advisor: Dr. Ronald Davis
1986-2009 Faculty Dept of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT
2009-present Dept of Genetics, Stanford University School of Medicine, Stanford, CA

Academic Appointments


Administrative Appointments


  • Director, Center for Genomics and Personalized Medicine (2009 - Present)
  • Chair, Dept. of Genetics (2009 - Present)

Current Research and Scholarly Interests


We are presently in an omics revolution in which genomes and other omes can be readily characterized. Our laboratory uses a variety of approaches to analyze genomes and regulatory networks. Our research focuses on yeast, an ideal model organism ideally suited to genetic analysis, and humans.

1) Transcriptomes
To annotate genomes, we developed RNA sequencing for annotation the yeast and human transcriptomes. We discovered that the eukaryotic transcriptome is much more complex than previously appreciated and that embryonic stem cells have more transcript isoforms than differentiated cells.

2) Transcription Factor Binding Networks
We have also developed methods for mapping transcription factor binding sites through the genome. We used this to develop regulatory maps and have been using this to help decipher the combinatorial regulatory code – which factors work together to regulate which genes. Using this approach we have mapped out pathways crucial for metabolism and inflammation.

3) Integrated Regulatory Networks
In addition to transcriptional factor binding networks we have also been mapping phosphorylation and metabolite-protein interaction networks. These studies have revealed novel global regulators and key points in integrated regulatory networks.

4) Variation
We have been analyzing differences between individuals and species at two levels: DNA sequence variation and regulatory information variations. We developed paired end sequencing for humans and found that humans have extensive structural variation (SV), i.e. deletions, insertions and inversions. This is likely to be a major cause of phenotypic variation and human disease. In addition, by mapping binding sites difference among different yeast strains and humans, we have found that individuals differ much more in their regulatory information than in coding sequence differences. We can correlate these differences with those in SNPS and SVs, thereby associating noncoding DNA differences with regulatory information.

5) Human Disease
Finally, we are applying omics approaches of genome sequencing, transcriptomics proteomics metabolomics, DNA methylation and microbiome assays to the analysis of human disease. These integrative omics approaches are being applied to help understand the molecular basis of disease and the development of diagnostics and therapeutics.

Clinical Trials


  • Understanding and Diagnosing Allergic Disease in Twins Recruiting

    The purpose of this study is to gain better understanding of how the immune system works in twins with and without allergic disease. Healthy volunteers are not specifically targeted. Healthy non-allergic study participants may be found through the course of evaluation for the presence of allergies.

    View full details

Journal Articles


  • Whole-exome sequencing identifies tetratricopeptide repeat domain 7A (TTC7A) mutations for combined immunodeficiency with intestinal atresias JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY Chen, R., Giliani, S., Lanzi, G., Mias, G. I., Lonardi, S., Dobbs, K., Manis, J., Im, H., Gallagher, J. E., Phanstiel, D. H., Euskirchen, G., Lacroute, P., Bettinger, K., Moratto, D., Weinacht, K., Montin, D., Gallo, E., Mangili, G., Porta, F., Notarangelo, L. D., Pedretti, S., Al-Herz, W., Alfahdli, W., Comeau, A. M., Traister, R. S., Pai, S., Carella, G., Facchetti, F., Nadeau, K. C., Snyder, M., Notarangelo, L. D. 2013; 132 (3): 656-?

    Abstract

    Combined immunodeficiency with multiple intestinal atresias (CID-MIA) is a rare hereditary disease characterized by intestinal obstructions and profound immune defects.We sought to determine the underlying genetic causes of CID-MIA by analyzing the exomic sequences of 5 patients and their healthy direct relatives from 5 unrelated families.We performed whole-exome sequencing on 5 patients with CID-MIA and 10 healthy direct family members belonging to 5 unrelated families with CID-MIA. We also performed targeted Sanger sequencing for the candidate gene tetratricopeptide repeat domain 7A (TTC7A) on 3 additional patients with CID-MIA.Through analysis and comparison of the exomic sequence of the subjects from these 5 families, we identified biallelic damaging mutations in the TTC7A gene, for a total of 7 distinct mutations. Targeted TTC7A gene sequencing in 3 additional unrelated patients with CID-MIA revealed biallelic deleterious mutations in 2 of them, as well as an aberrant splice product in the third patient. Staining of normal thymus showed that the TTC7A protein is expressed in thymic epithelial cells, as well as in thymocytes. Moreover, severe lymphoid depletion was observed in the thymus and peripheral lymphoid tissues from 2 patients with CID-MIA.We identified deleterious mutations of the TTC7A gene in 8 unrelated patients with CID-MIA and demonstrated that the TTC7A protein is expressed in the thymus. Our results strongly suggest that TTC7A gene defects cause CID-MIA.

    View details for DOI 10.1016/j.jaci.2013.06.013

    View details for Web of Science ID 000323612000018

    View details for PubMedID 23830146

  • Systematic functional regulatory assessment of disease-associated variants. Proceedings of the National Academy of Sciences of the United States of America Karczewski, K. J., Dudley, J. T., Kukurba, K. R., Chen, R., Butte, A. J., Montgomery, S. B., Snyder, M. 2013; 110 (23): 9607-9612

    Abstract

    Genome-wide association studies have discovered many genetic loci associated with disease traits, but the functional molecular basis of these associations is often unresolved. Genome-wide regulatory and gene expression profiles measured across individuals and diseases reflect downstream effects of genetic variation and may allow for functional assessment of disease-associated loci. Here, we present a unique approach for systematic integration of genetic disease associations, transcription factor binding among individuals, and gene expression data to assess the functional consequences of variants associated with hundreds of human diseases. In an analysis of genome-wide binding profiles of NF?B, we find that disease-associated SNPs are enriched in NF?B binding regions overall, and specifically for inflammatory-mediated diseases, such as asthma, rheumatoid arthritis, and coronary artery disease. Using genome-wide variation in transcription factor-binding data, we find that NF?B binding is often correlated with disease-associated variants in a genotype-specific and allele-specific manner. Furthermore, we show that this binding variation is often related to expression of nearby genes, which are also found to have altered expression in independent profiling of the variant-associated disease condition. Thus, using this integrative approach, we provide a unique means to assign putative function to many disease-associated SNPs.

    View details for DOI 10.1073/pnas.1219099110

    View details for PubMedID 23690573

  • Extensive genetic variation in somatic human tissues PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA O'Huallachain, M., Karczewski, K. J., Weissman, S. M., Urban, A. E., Snyder, M. P. 2012; 109 (44): 18018-18023

    Abstract

    Genetic variation between individuals has been extensively investigated, but differences between tissues within individuals are far less understood. It is commonly assumed that all healthy cells that arise from the same zygote possess the same genomic content, with a few known exceptions in the immune system and germ line. However, a growing body of evidence shows that genomic variation exists between differentiated tissues. We investigated the scope of somatic genomic variation between tissues within humans. Analysis of copy number variation by high-resolution array-comparative genomic hybridization in diverse tissues from six unrelated subjects reveals a significant number of intraindividual genomic changes between tissues. Many (79%) of these events affect genes. Our results have important consequences for understanding normal genetic and phenotypic variation within individuals, and they have significant implications for both the etiology of genetic diseases such as cancer and for immortalized cell lines that might be used in research and therapeutics.

    View details for DOI 10.1073/pnas.1213736109

    View details for Web of Science ID 000311149900070

    View details for PubMedID 23043118

  • An integrated encyclopedia of DNA elements in the human genome NATURE Dunham, I., Kundaje, A., Aldred, S. F., Collins, P. J., Davis, C., Doyle, F., Epstein, C. B., Frietze, S., Harrow, J., Kaul, R., Khatun, J., Lajoie, B. R., Landt, S. G., Lee, B., Pauli, F., Rosenbloom, K. R., Sabo, P., Safi, A., Sanyal, A., Shoresh, N., Simon, J. M., Song, L., Trinklein, N. D., Altshuler, R. C., Birney, E., Brown, J. B., Cheng, C., Djebali, S., Dong, X., Dunham, I., Ernst, J., Furey, T. S., Gerstein, M., Giardine, B., Greven, M., Hardison, R. C., Harris, R. S., Herrero, J., Hoffman, M. M., Iyer, S., Kellis, M., Khatun, J., Kheradpour, P., Kundaje, A., Lassmann, T., Li, Q., Lin, X., Marinov, G. K., Merkel, A., Mortazavi, A., Parker, S. C., Reddy, T. E., Rozowsky, J., Schlesinger, F., Thurman, R. E., Wang, J., Ward, L. D., Whitfield, T. W., Wilder, S. P., Wu, W., Xi, H. S., Yip, K. Y., Zhuang, J., Bernstein, B. E., Birney, E., Dunham, I., Green, E. D., Gunter, C., Snyder, M., Pazin, M. J., Lowdon, R. F., Dillon, L. A., Adams, L. B., Kelly, C. J., Zhang, J., Wexler, J. R., Green, E. D., Good, P. J., Feingold, E. A., Bernstein, B. E., Birney, E., Crawford, G. E., Dekker, J., Elnitski, L., Farnham, P. J., Gerstein, M., Giddings, M. C., Gingeras, T. R., Green, E. D., Guigo, R., Hardison, R. C., Hubbard, T. J., Kellis, M., Kent, W. J., Lieb, J. D., Margulies, E. H., Myers, R. M., Snyder, M., Stamatoyannopoulos, J. A., Tenenbaum, S. A., Weng, Z., White, K. P., Wold, B., Khatun, J., Yu, Y., Wrobel, J., Risk, B. A., Gunawardena, H. P., Kuiper, H. C., Maier, C. W., Xie, L., Chen, X., Giddings, M. C., Bernstein, B. E., Epstein, C. B., Shoresh, N., Ernst, J., Kheradpour, P., Mikkelsen, T. S., Gillespie, S., Goren, A., Ram, O., Zhang, X., Wang, L., Issner, R., Coyne, M. J., Durham, T., Ku, M., Truong, T., Ward, L. D., Altshuler, R. C., Eaton, M. L., Kellis, M., Djebali, S., Davis, C. A., Merkel, A., Dobin, A., Lassmann, T., Mortazavi, A., Tanzer, A., Lagarde, J., Lin, W., Schlesinger, F., Xue, C., Marinov, G. K., Khatun, J., Williams, B. A., Zaleski, C., Rozowsky, J., Roeder, M., Kokocinski, F., Abdelhamid, R. F., Alioto, T., Antoshechkin, I., Baer, M. T., Batut, P., Bell, I., Bell, K., Chakrabortty, S., Chen, X., Chrast, J., Curado, J., Derrien, T., Drenkow, J., Dumais, E., Dumais, J., Duttagupta, R., Fastuca, M., Fejes-Toth, K., Ferreira, P., Foissac, S., Fullwood, M. J., Gao, H., Gonzalez, D., Gordon, A., Gunawardena, H. P., Howald, C., Jha, S., Johnson, R., Kapranov, P., King, B., Kingswood, C., Li, G., Luo, O. J., Park, E., Preall, J. B., Presaud, K., Ribeca, P., Risk, B. A., Robyr, D., Ruan, X., Sammeth, M., Sandhu, K. S., Schaeffer, L., See, L., Shahab, A., Skancke, J., Suzuki, A. M., Takahashi, H., Tilgner, H., Trout, D., Walters, N., Wang, H., Wrobel, J., Yu, Y., Hayashizaki, Y., Harrow, J., Gerstein, M., Hubbard, T. J., Reymond, A., Antonarakis, S. E., Hannon, G. J., Giddings, M. C., Ruan, Y., Wold, B., Carninci, P., Guigo, R., Gingeras, T. R., Rosenbloom, K. R., Sloan, C. A., Learned, K., Malladi, V. S., Wong, M. C., Barber, G., Cline, M. S., Dreszer, T. R., Heitner, S. G., Karolchik, D., Kent, W. J., Kirkup, V. M., Meyer, L. R., Long, J. C., Maddren, M., Raney, B. J., Furey, T. S., Song, L., Grasfeder, L. L., Giresi, P. G., Lee, B., Battenhouse, A., Sheffield, N. C., Simon, J. M., Showers, K. A., Safi, A., London, D., Bhinge, A. A., Shestak, C., Schaner, M. R., Kim, S. K., Zhang, Z. Z., Mieczkowski, P. A., Mieczkowska, J. O., Liu, Z., McDaniell, R. M., Ni, Y., Rashid, N. U., Kim, M. J., Adar, S., Zhang, Z., Wang, T., Winter, D., Keefe, D., Birney, E., Iyer, V. R., Lieb, J. D., Crawford, G. E., Li, G., Sandhu, K. S., Zheng, M., Wang, P., Luo, O. J., Shahab, A., Fullwood, M. J., Ruan, X., Ruan, Y., Myers, R. M., Pauli, F., Williams, B. A., Gertz, J., Marinov, G. K., Reddy, T. E., Vielmetter, J., Partridge, E. C., Trout, D., Varley, K. E., Gasper, C., Bansal, A., Pepke, S., Jain, P., Amrhein, H., Bowling, K. M., Anaya, M., Cross, M. K., King, B., Muratet, M. A., Antoshechkin, I., Newberry, K. M., McCue, K., Nesmith, A. S., Fisher-Aylor, K. I., Pusey, B., DeSalvo, G., Parker, S. L., Balasubramanian, S., Davis, N. S., Meadows, S. K., Eggleston, T., Gunter, C., Newberry, J. S., Levy, S. E., Absher, D. M., Mortazavi, A., Wong, W. H., Wold, B., Blow, M. J., Visel, A., Pennachio, L. A., Elnitski, L., Margulies, E. H., Parker, S. C., Petrykowska, H. M., Abyzov, A., Aken, B., Barrell, D., Barson, G., Berry, A., Bignell, A., Boychenko, V., Bussotti, G., Chrast, J., Davidson, C., Derrien, T., Despacio-Reyes, G., Diekhans, M., Ezkurdia, I., Frankish, A., Gilbert, J., Gonzalez, J. M., Griffiths, E., Harte, R., Hendrix, D. A., Howald, C., Hunt, T., Jungreis, I., Kay, M., Khurana, E., Kokocinski, F., Leng, J., Lin, M. F., Loveland, J., Lu, Z., Manthravadi, D., Mariotti, M., Mudge, J., Mukherjee, G., Notredame, C., Pei, B., Rodriguez, J. M., Saunders, G., Sboner, A., Searle, S., Sisu, C., Snow, C., Steward, C., Tanzer, A., Tapanari, E., Tress, M. L., van Baren, M. J., Walters, N., Washietl, S., Wilming, L., Zadissa, A., Zhang, Z., Brent, M., Haussler, D., Kellis, M., Valencia, A., Gerstein, M., Reymond, A., Guigo, R., Harrow, J., Hubbard, T. J., Landt, S. G., Frietze, S., Abyzov, A., Addleman, N., Alexander, R. P., Auerbach, R. K., Balasubramanian, S., Bettinger, K., Bhardwaj, N., Boyle, A. P., Cao, A. R., Cayting, P., Charos, A., Cheng, Y., Cheng, C., Eastman, C., Euskirchen, G., Fleming, J. D., Grubert, F., Habegger, L., Hariharan, M., Harmanci, A., Iyengar, S., Jin, V. X., Karczewski, K. J., Kasowski, M., Lacroute, P., Lam, H., Lamarre-Vincent, N., Leng, J., Lian, J., Lindahl-Allen, M., Min, R., Miotto, B., Monahan, H., Moqtaderi, Z., Mu, X. J., O'Geen, H., Ouyang, Z., Patacsil, D., Pei, B., Raha, D., Ramirez, L., Reed, B., Rozowsky, J., Sboner, A., Shi, M., Sisu, C., Slifer, T., Witt, H., Wu, L., Xu, X., Yan, K., Yang, X., Yip, K. Y., Zhang, Z., Struhl, K., Weissman, S. M., Gerstein, M., Farnham, P. J., Snyder, M., Tenenbaum, S. A., Penalva, L. O., Doyle, F., Karmakar, S., Landt, S. G., Bhanvadia, R. R., Choudhury, A., Domanus, M., Ma, L., Moran, J., Patacsil, D., Slifer, T., Victorsen, A., Yang, X., Snyder, M., White, K. P., Auer, T., Centanin, L., Eichenlaub, M., Gruhl, F., Heermann, S., Hoeckendorf, B., Inoue, D., Kellner, T., Kirchmaier, S., Mueller, C., Reinhardt, R., Schertel, L., Schneider, S., Sinn, R., Wittbrodt, B., Wittbrodt, J., Weng, Z., Whitfield, T. W., Wang, J., Collins, P. J., Aldred, S. F., Trinklein, N. D., Partridge, E. C., Myers, R. M., Dekker, J., Jain, G., Lajoie, B. R., Sanyal, A., Balasundaram, G., Bates, D. L., Byron, R., Canfield, T. K., Diegel, M. J., Dunn, D., Ebersol, A. K., Frum, T., Garg, K., Gist, E., Hansen, R. S., Boatman, L., Haugen, E., Humbert, R., Jain, G., Johnson, A. K., Johnson, E. M., Kutyavin, T. V., Lajoie, B. R., Lee, K., Lotakis, D., Maurano, M. T., Neph, S. J., Neri, F. V., Nguyen, E. D., Qu, H., Reynolds, A. P., Roach, V., Rynes, E., Sabo, P., Sanchez, M. E., Sandstrom, R. S., Sanyal, A., Shafer, A. O., Stergachis, A. B., Thomas, S., Thurman, R. E., Vernot, B., Vierstra, J., Vong, S., Wang, H., Weaver, M. A., Yan, Y., Zhang, M., Akey, J. M., Bender, M., Dorschner, M. O., Groudine, M., MacCoss, M. J., Navas, P., Stamatoyannopoulos, G., Kaul, R., Dekker, J., Stamatoyannopoulos, J. A., Dunham, I., Beal, K., Brazma, A., Flicek, P., Herrero, J., Johnson, N., Keefe, D., Lukk, M., Luscombe, N. M., Sobral, D., Vaquerizas, J. M., Wilder, S. P., Batzoglou, S., Sidow, A., Hussami, N., Kyriazopoulou-Panagiotopoulou, S., Libbrecht, M. W., Schaub, M. A., Kundaje, A., Hardison, R. C., Miller, W., Giardine, B., Harris, R. S., Wu, W., Bickel, P. J., Banfai, B., Boley, N. P., Brown, J. B., Huang, H., Li, Q., Li, J. J., Noble, W. S., Bilmes, J. A., Buske, O. J., Hoffman, M. M., Sahu, A. D., Kharchenko, P. V., Park, P. J., Baker, D., Taylor, J., Weng, Z., Iyer, S., Dong, X., Greven, M., Lin, X., Wang, J., Xi, H. S., Zhuang, J., Gerstein, M., Alexander, R. P., Balasubramanian, S., Cheng, C., Harmanci, A., Lochovsky, L., Min, R., Mu, X. J., Rozowsky, J., Yan, K., Yip, K. Y., Birney, E. 2012; 489 (7414): 57-74

    Abstract

    The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

    View details for DOI 10.1038/nature11247

    View details for Web of Science ID 000308347000039

    View details for PubMedID 22955616

  • Architecture of the human regulatory network derived from ENCODE data NATURE Gerstein, M. B., Kundaje, A., Hariharan, M., Landt, S. G., Yan, K., Cheng, C., Mu, X. J., Khurana, E., Rozowsky, J., Alexander, R., Min, R., Alves, P., Abyzov, A., Addleman, N., Bhardwaj, N., Boyle, A. P., Cayting, P., Charos, A., Chen, D. Z., Cheng, Y., Clarke, D., Eastman, C., Euskirchen, G., Frietze, S., Fu, Y., Gertz, J., Grubert, F., Harmanci, A., Jain, P., Kasowski, M., Lacroute, P., Leng, J., Lian, J., Monahan, H., O'Geen, H., Ouyang, Z., Partridge, E. C., Patacsil, D., Pauli, F., Raha, D., Ramirez, L., Reddy, T. E., Reed, B., Shi, M., Slifer, T., Wang, J., Wu, L., Yang, X., Yip, K. Y., Zilberman-Schapira, G., Batzoglou, S., Sidow, A., Farnham, P. J., Myers, R. M., Weissman, S. M., Snyder, M. 2012; 489 (7414): 91-100

    Abstract

    Transcription factors bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the principles of the human transcriptional regulatory network, we determined the genomic binding information of 119 transcription-related factors in over 450 distinct experiments. We found the combinatorial, co-association of transcription factors to be highly context specific: distinct combinations of factors bind at specific genomic locations. In particular, there are significant differences in the binding proximal and distal to genes. We organized all the transcription factor binding into a hierarchy and integrated it with other genomic information (for example, microRNA regulation), forming a dense meta-network. Factors at different levels have different properties; for instance, top-level transcription factors more strongly influence expression and middle-level ones co-regulate targets to mitigate information-flow bottlenecks. Moreover, these co-regulations give rise to many enriched network motifs (for example, noise-buffering feed-forward loops). Finally, more connected network components are under stronger selection and exhibit a greater degree of allele-specific activity (that is, differential binding to the two parental alleles). The regulatory information obtained in this study will be crucial for interpreting personal genome sequences and understanding basic principles of human biology and disease.

    View details for DOI 10.1038/nature11245

    View details for Web of Science ID 000308347000042

    View details for PubMedID 22955619

  • Linking disease associations with regulatory information in the human genome GENOME RESEARCH Schaub, M. A., Boyle, A. P., Kundaje, A., Batzoglou, S., Snyder, M. 2012; 22 (9): 1748-1759

    Abstract

    Genome-wide association studies have been successful in identifying single nucleotide polymorphisms (SNPs) associated with a large number of phenotypes. However, an associated SNP is likely part of a larger region of linkage disequilibrium. This makes it difficult to precisely identify the SNPs that have a biological link with the phenotype. We have systematically investigated the association of multiple types of ENCODE data with disease-associated SNPs and show that there is significant enrichment for functional SNPs among the currently identified associations. This enrichment is strongest when integrating multiple sources of functional information and when highest confidence disease-associated SNPs are used. We propose an approach that integrates multiple types of functional data generated by the ENCODE Consortium to help identify "functional SNPs" that may be associated with the disease phenotype. Our approach generates putative functional annotations for up to 80% of all previously reported associations. We show that for most associations, the functional SNP most strongly supported by experimental evidence is a SNP in linkage disequilibrium with the reported association rather than the reported SNP itself. Our results show that the experimental data sets generated by the ENCODE Consortium can be successfully used to suggest functional hypotheses for variants associated with diseases and other phenotypes.

    View details for DOI 10.1101/gr.136127.111

    View details for Web of Science ID 000308272800016

    View details for PubMedID 22955986

  • Annotation of functional variation in personal genomes using RegulomeDB GENOME RESEARCH Boyle, A. P., Hong, E. L., Hariharan, M., Cheng, Y., Schaub, M. A., Kasowski, M., Karczewski, K. J., Park, J., Hitz, B. C., Weng, S., Cherry, J. M., Snyder, M. 2012; 22 (9): 1790-1797

    Abstract

    As the sequencing of healthy and disease genomes becomes more commonplace, detailed annotation provides interpretation for individual variation responsible for normal and disease phenotypes. Current approaches focus on direct changes in protein coding genes, particularly nonsynonymous mutations that directly affect the gene product. However, most individual variation occurs outside of genes and, indeed, most markers generated from genome-wide association studies (GWAS) identify variants outside of coding segments. Identification of potential regulatory changes that perturb these sites will lead to a better localization of truly functional variants and interpretation of their effects. We have developed a novel approach and database, RegulomeDB, which guides interpretation of regulatory variants in the human genome. RegulomeDB includes high-throughput, experimental data sets from ENCODE and other sources, as well as computational predictions and manual annotations to identify putative regulatory potential and identify functional variants. These data sources are combined into a powerful tool that scores variants to help separate functional variants from a large pool and provides a small set of putative sites with testable hypotheses as to their function. We demonstrate the applicability of this tool to the annotation of noncoding variants from 69 full sequenced genomes as well as that of a personal genome, where thousands of functionally associated variants were identified. Moreover, we demonstrate a GWAS where the database is able to quickly identify the known associated functional variant and provide a hypothesis as to its function. Overall, we expect this approach and resource to be valuable for the annotation of human genome sequences.

    View details for DOI 10.1101/gr.137323.112

    View details for Web of Science ID 000308272800019

    View details for PubMedID 22955989

  • ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia GENOME RESEARCH Landt, S. G., Marinov, G. K., Kundaje, A., Kheradpour, P., Pauli, F., Batzoglou, S., Bernstein, B. E., Bickel, P., Brown, J. B., Cayting, P., Chen, Y., DeSalvo, G., Epstein, C., Fisher-Aylor, K. I., Euskirchen, G., Gerstein, M., Gertz, J., Hartemink, A. J., Hoffman, M. M., Iyer, V. R., Jung, Y. L., Karmakar, S., Kellis, M., Kharchenko, P. V., Li, Q., Liu, T., Liu, X. S., Ma, L., Milosavljevic, A., Myers, R. M., Park, P. J., Pazin, M. J., Perry, M. D., Raha, D., Reddy, T. E., Rozowsky, J., Shoresh, N., Sidow, A., Slattery, M., Stamatoyannopoulos, J. A., Tolstorukov, M. Y., White, K. P., Xi, S., Farnham, P. J., Lieb, J. D., Wold, B. J., Snyder, M. 2012; 22 (9): 1813-1831

    Abstract

    Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals.

    View details for DOI 10.1101/gr.136184.111

    View details for Web of Science ID 000308272800021

    View details for PubMedID 22955991

  • Personal Omics Profiling Reveals Dynamic Molecular and Medical Phenotypes CELL Chen, R., Mias, G. I., Li-Pook-Than, J., Jiang, L., Lam, H. Y., Chen, R., Miriami, E., Karczewski, K. J., Hariharan, M., Dewey, F. E., Cheng, Y., Clark, M. J., Im, H., Habegger, L., Balasubramanian, S., O'Huallachain, M., Dudley, J. T., Hillenmeyer, S., Haraksingh, R., Sharon, D., Euskirchen, G., Lacroute, P., Bettinger, K., Boyle, A. P., Kasowski, M., Grubert, F., Seki, S., Garcia, M., Whirl-Carrillo, M., Gallardo, M., Blasco, M. A., Greenberg, P. L., Snyder, P., Klein, T. E., Altman, R. B., Butte, A. J., Ashley, E. A., Gerstein, M., Nadeau, K. C., Tang, H., Snyder, M. 2012; 148 (6): 1293-1307

    Abstract

    Personalized medicine is expected to benefit from combining genomic information with regular monitoring of physiological states by multiple high-throughput methods. Here, we present an integrative personal omics profile (iPOP), an analysis that combines genomic, transcriptomic, proteomic, metabolomic, and autoantibody profiles from a single individual over a 14 month period. Our iPOP analysis revealed various medical risks, including type 2 diabetes. It also uncovered extensive, dynamic changes in diverse molecular components and biological pathways across healthy and diseased conditions. Extremely high-coverage genomic and transcriptomic data, which provide the basis of our iPOP, revealed extensive heteroallelic changes during healthy and diseased states and an unexpected RNA editing mechanism. This study demonstrates that longitudinal iPOP can be used to interpret healthy and diseased states by connecting genomic information with additional dynamic omics activity.

    View details for DOI 10.1016/j.cell.2012.02.009

    View details for Web of Science ID 000301889500023

    View details for PubMedID 22424236

  • Detecting and annotating genetic variations using the HugeSeq pipeline NATURE BIOTECHNOLOGY Lam, H. Y., Pan, C., Clark, M. J., Lacroute, P., Chen, R., Haraksingh, R., O'Huallachain, M., Gerstein, M. B., Kidd, J. M., Bustamante, C. D., Snyder, M. 2012; 30 (3): 226-229

    View details for Web of Science ID 000301303800013

    View details for PubMedID 22398614

  • Extensive Promoter-Centered Chromatin Interactions Provide a Topological Basis for Transcription Regulation CELL Li, G., Ruan, X., Auerbach, R. K., Sandhu, K. S., Zheng, M., Wang, P., Poh, H. M., Goh, Y., Lim, J., Zhang, J., Sim, H. S., Peh, S. Q., Mulawadi, F. H., Ong, C. T., Orlov, Y. L., Hong, S., Zhang, Z., Landt, S., Raha, D., Euskirchen, G., Wei, C., Ge, W., Wang, H., Davis, C., Fisher-Aylor, K. I., Mortazavi, A., Gerstein, M., Gingeras, T., Wold, B., Sun, Y., Fullwood, M. J., Cheung, E., Liu, E., Sung, W., Snyder, M., Ruan, Y. 2012; 148 (1-2): 84-98

    Abstract

    Higher-order chromosomal organization for transcription regulation is poorly understood in eukaryotes. Using genome-wide Chromatin Interaction Analysis with Paired-End-Tag sequencing (ChIA-PET), we mapped long-range chromatin interactions associated with RNA polymerase II in human cells and uncovered widespread promoter-centered intragenic, extragenic, and intergenic interactions. These interactions further aggregated into higher-order clusters, wherein proximal and distal genes were engaged through promoter-promoter interactions. Most genes with promoter-promoter interactions were active and transcribed cooperatively, and some interacting promoters could influence each other implying combinatorial complexity of transcriptional controls. Comparative analyses of different cell lines showed that cell-specific chromatin interactions could provide structural frameworks for cell-specific transcription, and suggested significant enrichment of enhancer-promoter interactions for cell-specific functions. Furthermore, genetically-identified disease-associated noncoding elements were found to be spatially engaged with corresponding genes through long-range interactions. Overall, our study provides insights into transcription regulation by three-dimensional chromatin interactions for both housekeeping and cell-specific genes in human cells.

    View details for DOI 10.1016/j.cell.2011.12.014

    View details for Web of Science ID 000299540700016

    View details for PubMedID 22265404

  • Dissecting phosphorylation networks: lessons learned from yeast EXPERT REVIEW OF PROTEOMICS Mok, J., Zhu, X., Snyder, M. 2011; 8 (6): 775-786

    Abstract

    Protein phosphorylation continues to be regarded as one of the most important post-translational modifications found in eukaryotes and has been implicated in key roles in the development of a number of human diseases. In order to elucidate roles for the 518 human kinases, phosphorylation has routinely been studied using the budding yeast Saccharomyces cerevisiae as a model system. In recent years, a number of technologies have emerged to globally map phosphorylation in yeast. In this article, we review these technologies and discuss how these phosphorylation mapping efforts have shed light on our understanding of kinase signaling pathways and eukaryotic proteomic networks in general.

    View details for DOI 10.1586/EPR.11.64

    View details for Web of Science ID 000297299000013

    View details for PubMedID 22087660

  • Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF NATURE Iyer, V. R., Horak, C. E., Scafe, C. S., Botstein, D., Snyder, M., Brown, P. O. 2001; 409 (6819): 533-538

    Abstract

    Proteins interact with genomic DNA to bring the genome to life; and these interactions also define many functional features of the genome. SBF and MBF are sequence-specific transcription factors that activate gene expression during the G1/S transition of the cell cycle in yeast. SBF is a heterodimer of Swi4 and Swi6, and MBF is a heterodimer of Mbpl and Swi6 (refs 1, 3). The related Swi4 and Mbp1 proteins are the DNA-binding components of the respective factors, and Swi6 mayhave a regulatory function. A small number of SBF and MBF target genes have been identified. Here we define the genomic binding sites of the SBF and MBF transcription factors in vivo, by using DNA microarrays. In addition to the previously characterized targets, we have identified about 200 new putative targets. Our results support the hypothesis that SBF activated genes are predominantly involved in budding, and in membrane and cell-wall biosynthesis, whereas DNA replication and repair are the dominant functions among MBF activated genes. The functional specialization of these factors may provide a mechanism for independent regulation of distinct molecular processes that normally occur in synchrony during the mitotic cell cycle.

    View details for Web of Science ID 000166570500053

    View details for PubMedID 11206552

  • Impacts of variation in the human genome on gene regulation. Journal of molecular biology Haraksingh, R. R., Snyder, M. P. 2013; 425 (21): 3970-3977

    Abstract

    Recent advances in fast and inexpensive DNA sequencing have enabled the extensive study of genomic and transciptomic variation in humans. Human genomic variation is composed of sequence and structural changes including single-nucleotide and multinucleotide variants, short insertions or deletions (indels), larger copy number variants, and similarly sized copy neutral inversions and translocations. It is now well established that any two genomes differ extensively and that structural changes constitute the most prominent source of this variation. There have also been major technological advances in RNA sequencing to globally quantify and describe diversity in transcripts. Large consortia such as the 1000 Genomes Project and the Enclyclopedia of DNA Elements Project are producing increasingly comphrehensive maps outlining the regions of the human genome containing variants and functional elements, respectively. Integration of genetic variation data and extensive annotation of functional genomic elements, along with the ability to measure global transcription, allow the impacts of genetic variants on gene expression to be resolved. There are several well-established models by which genetic variants affect gene regulation depending on the type, nature, and position of the variant with respect to the affected genes. These effects can be manifested in two ways: changes to transcript sequences and isoforms by coding variants, and changes to transcript abundance by dosage or regulatory variants. Here, we review the current state of how genetic variations impact gene regulation locally and globally in the human genome.

    View details for DOI 10.1016/j.jmb.2013.07.015

    View details for PubMedID 23871684

  • Variation and genetic control of protein abundance in humans NATURE Wu, L., Candille, S. I., Choi, Y., Xie, D., Jiang, L., Li-Pook-Than, J., Tang, H., Snyder, M. 2013; 499 (7456): 79-82

    Abstract

    Gene expression differs among individuals and populations and is thought to be a major determinant of phenotypic variation. Although variation and genetic loci responsible for RNA expression levels have been analysed extensively in human populations, our knowledge is limited regarding the differences in human protein abundance and the genetic basis for this difference. Variation in messenger RNA expression is not a perfect surrogate for protein expression because the latter is influenced by an array of post-transcriptional regulatory mechanisms, and, empirically, the correlation between protein and mRNA levels is generally modest. Here we used isobaric tag-based quantitative mass spectrometry to determine relative protein levels of 5,953 genes in lymphoblastoid cell lines from 95 diverse individuals genotyped in the HapMap Project. We found that protein levels are heritable molecular phenotypes that exhibit considerable variation between individuals, populations and sexes. Levels of specific sets of proteins involved in the same biological process covary among individuals, indicating that these processes are tightly regulated at the protein level. We identified cis-pQTLs (protein quantitative trait loci), including variants not detected by previous transcriptome studies. This study demonstrates the feasibility of high-throughput human proteome quantification that, when integrated with DNA variation and transcriptome information, adds a new dimension to the characterization of gene expression regulation.

    View details for DOI 10.1038/nature12223

    View details for Web of Science ID 000321285600037

    View details for PubMedID 23676674

  • Genome Wide Proteomics of ERBB2 and EGFR and Other Oncogenic Pathways in Inflammatory Breast Cancer. Journal of proteome research Zhang, E. Y., Cristofanilli, M., Robertson, F., Reuben, J. M., Mu, Z., Beavis, R. C., Im, H., Snyder, M., Hofree, M., Ideker, T., Omenn, G. S., Fanayan, S., Jeong, S., Paik, Y., Zhang, A. F., Wu, S., Hancock, W. S. 2013; 12 (6): 2805-2817

    Abstract

    In this study we selected three breast cancer cell lines (SKBR3, SUM149 and SUM190) with different oncogene expression levels involved in ERBB2 and EGFR signaling pathways as a model system for the evaluation of selective integration of subsets of transcriptomic and proteomic data. We assessed the oncogene status with reads per kilobase per million mapped reads (RPKM) values for ERBB2 (14.4, 400, and 300 for SUM149, SUM190, and SKBR3, respectively) and for EGFR (60.1, not detected, and 1.4 for the same 3 cell lines). We then used RNA-Seq data to identify those oncogenes with significant transcript levels in these cell lines (total 31) and interrogated the corresponding proteomics data sets for proteins with significant interaction values with these oncogenes. The number of observed interactors for each oncogene showed a significant range, e.g., 4.2% (JAK1) to 27.3% (MYC). The percentage is measured as a fraction of the total protein interactions in a given data set vs total interactors for that oncogene in STRING (Search Tool for the Retrieval of Interacting Genes/Proteins, version 9.0) and I2D (Interologous Interaction Database, version 1.95). This approach allowed us to focus on 4 main oncogenes, ERBB2, EGFR, MYC, and GRB2, for pathway analysis. We used bioinformatics sites GeneGo, PathwayCommons and NCI receptor signaling networks to identify pathways that contained the four main oncogenes and had good coverage in the transcriptomic and proteomic data sets as well as a significant number of oncogene interactors. The four pathways identified were ERBB signaling, EGFR1 signaling, integrin outside-in signaling, and validated targets of C-MYC transcriptional activation. The greater dynamic range of the RNA-Seq values allowed the use of transcript ratios to correlate observed protein values with the relative levels of the ERBB2 and EGFR transcripts in each of the four pathways. This provided us with potential proteomic signatures for the SUM149 and 190 cell lines, growth factor receptor-bound protein 7 (GRB7), Crk-like protein (CRKL) and Catenin delta-1 (CTNND1) for ERBB signaling; caveolin 1 (CAV1), plectin (PLEC) for EGFR signaling; filamin A (FLNA) and actinin alpha1 (ACTN1) (associated with high levels of EGFR transcript) for integrin signalings; branched chain amino-acid transaminase 1 (BCAT1), carbamoyl-phosphate synthetase (CAD), nucleolin (NCL) (high levels of EGFR transcript); transferrin receptor (TFRC), metadherin (MTDH) (high levels of ERBB2 transcript) for MYC signaling; S100-A2 protein (S100A2), caveolin 1 (CAV1), Serpin B5 (SERPINB5), stratifin (SFN), PYD and CARD domain containing (PYCARD), and EPH receptor A2 (EPHA2) for PI3K signaling, p53 subpathway. Future studies of inflammatory breast cancer (IBC), from which the cell lines were derived, will be used to explore the significance of these observations.

    View details for DOI 10.1021/pr4001527

    View details for PubMedID 23647160

  • Overview of high throughput sequencing technologies to elucidate molecular pathways in cardiovascular diseases. Circulation research Churko, J. M., Mantalas, G. L., Snyder, M. P., Wu, J. C. 2013; 112 (12): 1613-1623

    Abstract

    High throughput sequencing technologies have become essential in studies on genomics, epigenomics, and transcriptomics. Although sequencing information has traditionally been elucidated using a low throughput technique called Sanger sequencing, high throughput sequencing technologies are capable of sequencing multiple DNA molecules in parallel, enabling hundreds of millions of DNA molecules to be sequenced at a time. This advantage allows high throughput sequencing to be used to create large data sets, generating more comprehensive insights into the cellular genomic and transcriptomic signatures of various diseases and developmental stages. Within high throughput sequencing technologies, whole exome sequencing can be used to identify novel variants and other mutations that may underlie many genetic cardiac disorders, whereas RNA sequencing can be used to analyze how the transcriptome changes. Chromatin immunoprecipitation sequencing and methylation sequencing can be used to identify epigenetic changes, whereas ribosome sequencing can be used to determine which mRNA transcripts are actively being translated. In this review, we will outline the differences in various sequencing modalities and examine the main sequencing platforms on the market in terms of their relative read depths, speeds, and costs. Finally, we will discuss the development of future sequencing platforms and how these new technologies may improve on current sequencing platforms. Ultimately, these sequencing technologies will be instrumental in further delineating how the cardiovascular system develops and how perturbations in DNA and RNA can lead to cardiovascular disease.

    View details for DOI 10.1161/CIRCRESAHA.113.300939

    View details for PubMedID 23743227

  • iPOP Goes the World: Integrated Personalized Omics Profiling and the Road toward Improved Health Care. Chemistry & biology Li-Pook-Than, J., Snyder, M. 2013; 20 (5): 660-666

    Abstract

    The health of an individual depends upon their DNA as well as upon environmental factors (environome or exposome). It is expected that although the genome is the blueprint of an individual, its analysis with that of the other omes such as the DNA methylome, the transcriptome, proteome, and metabolome will further provide a dynamic assessment of the physiology and health state of an individual. This review will help to categorize the current progress of omics analyses and how omics integration can be used for medical research. We believe that integrative personal omics profiling (iPOP) is a stepping stone to a new road to personalized health care and may improve disease risk assessment, accuracy of diagnosis, disease monitoring, targeted treatments, and understanding the biological processes of disease states for their prevention.

    View details for DOI 10.1016/j.chembiol.2013.05.001

    View details for PubMedID 23706632

  • Identification of Potential Glycan Cancer Markers with Sialic Acid Attached to Sialic Acid and Up-regulated Fucosylated Galactose Structures in Epidermal Growth Factor Receptor Secreted from A431 Cell Line. Molecular & cellular proteomics Wu, S., Taylor, A. D., Lu, Q., Hanash, S. M., Im, H., Snyder, M., Hancock, W. S. 2013; 12 (5): 1239-1249

    Abstract

    We have used powerful HPLC-mass spectrometric approaches to characterize the secreted form of epidermal growth factor receptor (sEGFR). We demonstrated that the amino acid sequence lacked the cytoplasmic domain and was consistent with the primary sequence reported for EGFR purified from a human plasma pool. One of the sEGFR forms, attributed to the alternative RNA splicing, was also confirmed by transcriptional analysis (RNA sequencing). Two unusual types of glycan structures were observed in sEGFR as compared with membrane-bound EGFR from the A431 cell line. The unusual glycan structures were di-sialylated glycans (sialic acid attached to sialic acid) at Asn-151 and N-acetylhexosamine attached to a branched fucosylated galactose with N-acetylglucosamine moieties (HexNAc-(Fuc)Gal-GlcNAc) at Asn-420. These unusual glycans at specific sites were either present at a much lower level or were not observable in membrane-bound EGFR present in the A431 cell lysate. The observation of these di-sialylated glycan structures was consistent with the observed expression of the corresponding ?-N-acetylneuraminide ?-2,8-sialyltransferase 2 (ST8SiA2) and ?-N-acetylneuraminide ?-2,8-sialyltransferase 4 (ST8SiA4), by quantitative real time RT-PCR. The connectivity present at the branched fucosylated galactose was also confirmed by methylation of the glycans followed by analysis with sequential fragmentation in mass spectrometry. We hypothesize that the presence of such glycan structures could promote secretion via anionic or steric repulsion mechanisms and thus facilitate the observation of these glycan forms in the secreted fractions. We plan to use this model system to facilitate the search for novel glycan structures present at specific sites in sEGFR as well as other secreted oncoproteins such as Erbb2 as markers of disease progression in blood samples from cancer patients.

    View details for DOI 10.1074/mcp.M112.024554

    View details for PubMedID 23371026

  • Preparation of recombinant protein spotted arrays for proteome-wide identification of kinase targets. Current protocols in protein science / editorial board, John E. Coligan ... [et al.] Im, H., Snyder, M. 2013; Chapter 27: Unit 27 4-?

    Abstract

    Protein microarrays allow unique approaches for interrogating global protein interaction networks. Protein arrays can be divided into two categories: antibody arrays and functional protein arrays. Antibody arrays consist of various antibodies and are appropriate for profiling protein abundance and modifications. Functional full-length protein arrays employ full-length proteins with various post-translational modifications. A key advantage of the latter is rapid parallel processing of large number of proteins for studying highly controlled biochemical activities, protein-protein interactions, protein-nucleic acid interactions, and protein-small molecule interactions. This unit presents a protocol for constructing functional yeast protein microarrays for global kinase substrate identification. This approach enables the rapid determination of protein interaction networks in yeast on a proteome-wide level. The same methodology can be readily applied to higher eukaryotic systems with careful consideration of overexpression strategy.

    View details for DOI 10.1002/0471140864.ps2704s72

    View details for PubMedID 23546622

  • Comparative annotation of functional regions in the human genome using epigenomic data NUCLEIC ACIDS RESEARCH Won, K., Zhang, X., Wang, T., Ding, B., Raha, D., Snyder, M., Ren, B., Wang, W. 2013; 41 (8): 4423-4432

    Abstract

    Epigenetic regulation is dynamic and cell-type dependent. The recently available epigenomic data in multiple cell types provide an unprecedented opportunity for a comparative study of epigenetic landscape. We developed a machine-learning method called ChroModule to annotate the epigenetic states in eight ENCyclopedia Of DNA Elements cell types. The trained model successfully captured the characteristic histone-modification patterns associated with regulatory elements, such as promoters and enhancers, and showed superior performance on identifying enhancers compared with the state-of-art methods. In addition, given the fixed number of epigenetic states in the model, ChroModule allows straightforward illustration of epigenetic variability in multiple cell types. Using this feature, we found that invariable and variable epigenetic states across cell types correspond to housekeeping functions and stimulus response, respectively. Especially, we observed that enhancers, but not the other regulatory elements, dictate cell specificity, as similar cell types share common enhancers, and cell-type-specific enhancers are often bound by transcription factors playing critical roles in that cell type. More interestingly, we found some genomic regions are dormant in cell type but primed to become active in other cell types. These observations highlight the usefulness of ChroModule in comparative analysis and interpretation of multiple epigenomes.

    View details for DOI 10.1093/nar/gkt143

    View details for Web of Science ID 000318569700014

    View details for PubMedID 23482391

  • A Major Epigenetic Programming Mechanism Guided by piRMAs DEVELOPMENTAL CELL Huang, X. A., Yin, H., Sweeney, S., Raha, D., Snyder, M., Lin, H. 2013; 24 (5): 502-516

    Abstract

    A central enigma in epigenetics is how epigenetic factors are guided to specific genomic sites for their function. Previously, we reported that a Piwi-piRNA complex associates with the piRNA-complementary site in the Drosophila genome and regulates its epigenetic state. Here, we report that Piwi-piRNA complexes bind to numerous piRNA-complementary sequences throughout the genome, implicating piRNAs as a major mechanism that guides Piwi and Piwi-associated epigenetic factors to program the genome. To test this hypothesis, we demonstrate that inserting piRNA-complementary sequences to an ectopic site leads to Piwi, HP1a, and Su(var)3-9 recruitment to the site as well as H3K9me2/3 enrichment and reduced RNA polymerase II association, indicating that piRNA is both necessary and sufficient to recruit Piwi and epigenetic factors to specific genomic sites. Piwi deficiency drastically changed the epigenetic landscape and polymerase II profile throughout the genome, revealing the Piwi-piRNA mechanism as a major epigenetic programming mechanism in Drosophila.

    View details for DOI 10.1016/j.devcel.2013.01.023

    View details for Web of Science ID 000316163000005

    View details for PubMedID 23434410

  • Accurate Identification and Analysis of Human mRNA Isoforms Using Deep Long Read Sequencing G3-GENES GENOMES GENETICS Tilgner, H., Raha, D., Habegger, L., Mohiuddin, M., Gerstein, M., Snyder, M. 2013; 3 (3): 387-397

    Abstract

    Precise identification of RNA-coding regions and transcriptomes of eukaryotes is a significant problem in biology. Currently, eukaryote transcriptomes are analyzed using deep short-read sequencing experiments of complementary DNAs. The resulting short-reads are then aligned against a genome and annotated junctions to infer biological meaning. Here we use long-read complementary DNA datasets for the analysis of a eukaryotic transcriptome and generate two large datasets in the human K562 and HeLa S3 cell lines. Both data sets comprised at least 4 million reads and had median read lengths greater than 500 bp. We show that annotation-independent alignments of these reads provide partial gene structures that are very much in-line with annotated gene structures, 15% of which have not been obtained in a previous de novo analysis of short reads. For long-noncoding RNAs (i.e., lncRNA) genes, however, we find an increased fraction of novel gene structures among our alignments. Other important aspects of transcriptome analysis, such as the description of cell type-specific splicing, can be performed in an accurate, reliable and completely annotation-free manner, making it ideal for the analysis of transcriptomes of newly sequenced genomes. Furthermore, we demonstrate that long read sequence can be assembled into full-length transcripts with considerable success. Our method is applicable to all long read sequencing technologies.

    View details for DOI 10.1534/g3.112.004812

    View details for Web of Science ID 000315950000002

    View details for PubMedID 23450794

  • Extensive Transcript Diversity and Novel Upstream Open Reading Frame Regulation in Yeast G3-GENES GENOMES GENETICS Waern, K., Snyder, M. 2013; 3 (2): 343-352

    Abstract

    To understand the diversity of transcripts in yeast (Saccharomyces cerevisiae) we analyzed the transcriptional landscapes for cells grown under 18 different environmental conditions. Each sample was analyzed using RNA-sequencing, and a total of 670,446,084 uniquely mapped reads and 377,263 poly-adenylated end tags were produced. Consistent with previous studies, we find that the majority of yeast genes are expressed under one or more different conditions. By directly comparing the 5' and 3' ends of the transcribed regions, we find extensive differences in transcript ends across many conditions, especially those of stationary phase, growth in grape juice, and salt stimulation, suggesting differential choice of transcription start and stop sites is pervasive in yeast. Relative to the exponential growth condition (i.e., YPAD), transcripts differing at the 5' ends and 3' ends are predicted to differ in their annotated start codon in 21 genes and their annotated stop codon in 63 genes. Many (431) upstream open reading frames (uORFs) are found in alternate 5' ends and are significantly enriched in transcripts produced during the salt response. Mutational analysis of five genes with uORFs revealed that two sets of uORFs increase the expression of a reporter construct, indicating a role in activation which had not been reported previously, whereas two other uORFs decreased expression. In addition, RNA binding protein motifs are statistically enriched for alternate ends under many conditions. Overall, these results demonstrate enormous diversity of transcript ends, and that this heterogeneity is regulated under different environmental conditions. Moreover, transcript end diversity has important biological implications for the regulation of gene expression. In addition, our data also serve as a valuable resource for the scientific community.

    View details for DOI 10.1534/g3.112.003640

    View details for Web of Science ID 000314881600019

    View details for PubMedID 23390610

  • SeqFold: Genome-scale reconstruction of RNA secondary structure integrating high-throughput sequencing data GENOME RESEARCH Ouyang, Z., Snyder, M. P., Chang, H. Y. 2013; 23 (2): 377-387

    Abstract

    We present an integrative approach, SeqFold, that combines high-throughput RNA structure profiling data with computational prediction for genome-scale reconstruction of RNA secondary structures. SeqFold transforms experimental RNA structure information into a structure preference profile (SPP) and uses it to select stable RNA structure candidates representing the structure ensemble. Under a high-dimensional classification framework, SeqFold efficiently matches a given SPP to the most likely cluster of structures sampled from the Boltzmann-weighted ensemble. SeqFold is able to incorporate diverse types of RNA structure profiling data, including parallel analysis of RNA structure (PARS), selective 2'-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq), fragmentation sequencing (FragSeq) data generated by deep sequencing, and conventional SHAPE data. Using the known structures of a wide range of mRNAs and noncoding RNAs as benchmarks, we demonstrate that SeqFold outperforms or matches existing approaches in accuracy and is more robust to noise in experimental data. Application of SeqFold to reconstruct the secondary structures of the yeast transcriptome reveals the diverse impact of RNA secondary structure on gene regulation, including translation efficiency, transcription initiation, and protein-RNA interactions. SeqFold can be easily adapted to incorporate any new types of high-throughput RNA structure profiling data and is widely applicable to analyze RNA structures in any transcriptome.

    View details for DOI 10.1101/gr.138545.112

    View details for Web of Science ID 000314323100016

    View details for PubMedID 23064747

  • Two methods for full-length RNA sequencing for low quantities of cells and single cells PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Pan, X., Durrett, R. E., Zhu, H., Tanaka, Y., Li, Y., Zi, X., Marjani, S. L., Euskirchen, G., Ma, C., LaMotte, R. H., Park, I., Snyder, M. P., Mason, C. E., Weissman, S. M. 2013; 110 (2): 594-599

    Abstract

    The ability to determine the gene expression pattern in low quantities of cells or single cells is important for resolving a variety of problems in many biological disciplines. A robust description of the expression signature of a single cell requires determination of the full-length sequence of the expressed mRNAs in the cell, yet existing methods have either 3' biased or variable transcript representation. Here, we report our protocols for the amplification and high-throughput sequencing of very small amounts of RNA for sequencing using procedures of either semirandom primed PCR or phi29 DNA polymerase-based DNA amplification, for the cDNA generated with oligo-dT and/or random oligonucleotide primers. Unlike existing methods, these protocols produce relatively uniformly distributed sequences covering the full length of almost all transcripts independent of their sizes, from 1,000 to 10 cells, and even with single cells. Both protocols produced satisfactory detection/coverage of the abundant mRNAs from a single K562 erythroleukemic cell or a single dorsal root ganglion neuron. The phi29-based method produces long products with less noise, uses an isothermal reaction, and is simple to practice. The semirandom primed PCR procedure is more sensitive and reproducible at low transcript levels or with low quantities of cells. These methods provide tools for mRNA sequencing or RNA sequencing when only low quantities of cells, a single cell, or even degraded RNA are available for profiling.

    View details for DOI 10.1073/pnas.1217322109

    View details for Web of Science ID 000313906600047

    View details for PubMedID 23267071

  • Multimodal Dynamic Profiling of Healthy and Diseased States for Future Personalized Health Care CLINICAL PHARMACOLOGY & THERAPEUTICS Mias, G. I., Snyder, M. 2013; 93 (1): 29-32

    View details for DOI 10.1038/clpt.2012.204

    View details for Web of Science ID 000312618200021

    View details for PubMedID 23187877

  • High-throughput sequencing for biology and medicine MOLECULAR SYSTEMS BIOLOGY Soon, W. W., Hariharan, M., Snyder, M. P. 2013; 9

    Abstract

    Advances in genome sequencing have progressed at a rapid pace, with increased throughput accompanied by plunging costs. But these advances go far beyond faster and cheaper. High-throughput sequencing technologies are now routinely being applied to a wide range of important topics in biology and medicine, often allowing researchers to address important biological questions that were not possible before. In this review, we discuss these innovative new approaches-including ever finer analyses of transcriptome dynamics, genome structure and genomic variation-and provide an overview of the new insights into complex biological systems catalyzed by these technologies. We also assess the impact of genotyping, genome sequencing and personal omics profiling on medical applications, including diagnosis and disease monitoring. Finally, we review recent developments in single-cell sequencing, and conclude with a discussion of possible future advances and obstacles for sequencing in biology and health.

    View details for DOI 10.1038/msb.2012.61

    View details for Web of Science ID 000314415800010

    View details for PubMedID 23340846

  • Systematic investigation of protein-small molecule interactions IUBMB LIFE Li, X., Wang, X., Snyder, M. 2013; 65 (1): 2-8

    Abstract

    Cell signaling is extensively wired between cellular components to sustain cell proliferation, differentiation, and adaptation. The interaction network is often manifested in how protein function is regulated through interacting with other cellular components including small molecule metabolites. While many biochemical interactions have been established as reactions between protein enzymes and their substrates and products, much less is known at the system level about how small metabolites regulate protein functions through allosteric binding. In the past decade, study of protein-small molecule interactions has been lagging behind other types of interactions. Recent technological advances have explored several high-throughput platforms to reveal many "unexpected" protein-small molecule interactions that could have profound impact on our understanding of cell signaling. These interactions will help bridge gaps in existing regulatory loops of cell signaling and serve as new targets for medical intervention. In this review, we summarize recent advances of systematic investigation of protein-metabolite/small molecule interactions, and discuss the impact of such studies and their potential impact on both biological researches and medicine.

    View details for DOI 10.1002/iub.1111

    View details for Web of Science ID 000312886200002

    View details for PubMedID 23225626

  • A Chromosome-centric Human Proteome Project (C-HPP) to Characterize the Sets of Proteins Encoded in Chromosome 17 JOURNAL OF PROTEOME RESEARCH Liu, S., Im, H., Bairoch, A., Cristofanilli, M., Chen, R., Deutsch, E. W., Dalton, S., Fenyo, D., Fanayan, S., Gates, C., Gaudet, P., Hincapie, M., Hanash, S., Kim, H., Jeong, S., Lundberg, E., Mias, G., Menon, R., Mu, Z., Nice, E., Paik, Y., Uhlen, M., Wells, L., Wu, S., Yan, F., Zhang, F., Zhang, Y., Snyder, M., Omenn, G. S., Beavis, R. C., Hancock, W. S. 2013; 12 (1): 45-57

    Abstract

    We report progress assembling the parts list for chromosome 17 and illustrate the various processes that we have developed to integrate available data from diverse genomic and proteomic knowledge bases. As primary resources, we have used GPMDB, neXtProt, PeptideAtlas, Human Protein Atlas (HPA), and GeneCards. All sites share the common resource of Ensembl for the genome modeling information. We have defined the chromosome 17 parts list with the following information: 1169 protein-coding genes, the numbers of proteins confidently identified by various experimental approaches as documented in GPMDB, neXtProt, PeptideAtlas, and HPA, examples of typical data sets obtained by RNASeq and proteomic studies of epithelial derived tumor cell lines (disease proteome) and a normal proteome (peripheral mononuclear cells), reported evidence of post-translational modifications, and examples of alternative splice variants (ASVs). We have constructed a list of the 59 "missing" proteins as well as 201 proteins that have inconclusive mass spectrometric (MS) identifications. In this report we have defined a process to establish a baseline for the incorporation of new evidence on protein identification and characterization as well as related information from transcriptome analyses. This initial list of "missing" proteins that will guide the selection of appropriate samples for discovery studies as well as antibody reagents. Also we have illustrated the significant diversity of protein variants (including post-translational modifications, PTMs) using regions on chromosome 17 that contain important oncogenes. We emphasize the need for mandated deposition of proteomics data in public databases, the further development of improved PTM, ASV, and single nucleotide variant (SNV) databases, and the construction of Web sites that can integrate and regularly update such information. In addition, we describe the distribution of both clustered and scattered sets of protein families on the chromosome. Since chromosome 17 is rich in cancer-associated genes, we have focused the clustering of cancer-associated genes in such genomic regions and have used the ERBB2 amplicon as an example of the value of a proteogenomic approach in which one integrates transcriptomic with proteomic information and captures evidence of coexpression through coordinated regulation.

    View details for DOI 10.1021/pr300985j

    View details for Web of Science ID 000313156300007

    View details for PubMedID 23259914

  • Exome sequencing by targeted enrichment. Current protocols in molecular biology / edited by Frederick M. Ausubel ... [et al.] Clark, M. J., Chen, R., Snyder, M. 2013; Chapter 7: Unit7 12-?

    Abstract

    This unit describes methods for targeted enrichment of the exon-coding portions of the genome using Agilent SureSelect Human All Exon 50 Mb and Roche Nimblegen SeqCap EZ Exome platforms. Each platform targets and enriches a large overlapping portion of the greater human exome. The protocols here describe the biochemical procedures used to enrich exomic DNA with each platform, including recommended modifications to the manufacturers' protocols. In addition, a brief description of the sequencing protocol and estimation of the needed amount of sequencing for each platform is included. Finally, a detailed analytical pipeline for processing the subsequent data is described. These protocols focus specifically on human exome sequencing platforms, but can be applied with some modification to other organisms and targeted enrichment approaches.

    View details for DOI 10.1002/0471142727.mb0712s102

    View details for PubMedID 23547016

  • The variable somatic genome CELL CYCLE O'Huallachain, M., Weissman, S. M., Snyder, M. P. 2013; 12 (1): 5-6

    View details for DOI 10.4161/cc.23069

    View details for Web of Science ID 000313414700003

    View details for PubMedID 23255102

  • Proteogenomic Analysis of Human Colon Carcinoma Cell Lines LIM1215, LIM1899, and LIM2405. Journal of proteome research Fanayan, S., Smith, J. T., Lee, L. Y., Yan, F., Snyder, M., Hancock, W. S., Nice, E. 2013

    Abstract

    As part of the genome-wide and chromosome-centric human proteomic project (C-HPP), we have integrated shotgun proteomics approach and a genome-wide transcriptomic approach (RNA-Seq) of a set of human colon cancer cell lines (LIM1215, LIM1899 and LIM2405) that were selected to represent a wide range of pathological states of colorectal cancer. The combination of a standard proteomics approach (1D-gel electrophoresis coupled to LC/ion trap mass spectrometry) and RNA-Seq allowed us to exploit the greater depth of the transcriptomics measurement (?9800 transcripts per cell line) versus the protein observations (?1900 protein identifications per cell line). Conversely, the proteomics data were helpful in identifying both cancer associated proteins with differential expression patterns as well as protein networks and pathways which appear to be deregulated in these cell lines. Examples of potential markers include mortalin, nucleophosmin, ezrin, LASP1, alpha and beta forms of spectrin, exportin, the carcinoembryonic antigen family, EGFR and MET. Interaction analyses identified the large intermediate filament family, the protein folding network and adapter proteins in focal adhesion networks, which included the CDC42 and RHOA signaling pathways that may have potential for identifying phenotypic states representing poorly and moderately differentiated states of CRC, with or without metastases.

    View details for PubMedID 23458625

  • Tissue-specific direct targets of Caenorhabditis elegans Rb/E2F dictate distinct somatic and germline programs. Genome biology Kudron, M., Niu, W., Lu, Z., Wang, G., Gerstein, M., Snyder, M., Reinke, V. 2013; 14 (1): R5

    Abstract

    BACKGROUND: The tumor suppressor Rb/E2F regulates gene expression to control differentiation in multiple tissues during development, although how it directs tissue-specific gene regulation in vivo is poorly understood. RESULTS: We determined the genome-wide binding profiles for Caenorhabditis elegans Rb/E2F-like components in the germline, in the intestine and broadly throughout the soma, and uncovered highly tissue-specific binding patterns and target genes. Chromatin association by LIN-35, the C. elegans ortholog of Rb, is impaired in the germline but robust in the soma, a characteristic that might govern differential effects on gene expression in the two cell types. In the intestine, LIN-35 and the heterochromatin protein HPL-2, the ortholog of Hp1, coordinately bind at many sites lacking E2F. Finally, selected direct target genes contribute to the soma-to-germline transformation of lin-35 mutants, including mes-4, a soma-specific target that promotes H3K36 methylation, and csr-1, a germline-specific target that functions in a 22G small RNA pathway. CONCLUSIONS: In sum, identification of tissue-specific binding profiles and effector target genes reveals important insights into the mechanisms by which Rb/E2F controls distinct cell fates in vivo.

    View details for PubMedID 23347407

  • Promise of personalized omics to precision medicine WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE Chen, R., Snyder, M. 2013; 5 (1): 73-82

    Abstract

    The rapid development of high-throughput technologies and computational frameworks enables the examination of biological systems in unprecedented detail. The ability to study biological phenomena at omics levels in turn is expected to lead to significant advances in personalized and precision medicine. Patients can be treated according to their own molecular characteristics. Individual omes as well as the integrated profiles of multiple omes, such as the genome, the epigenome, the transcriptome, the proteome, the metabolome, the antibodyome, and other omics information are expected to be valuable for health monitoring, preventative measures, and precision medicine. Moreover, omics technologies have the potential to transform medicine from traditional symptom-oriented diagnosis and treatment of diseases toward disease prevention and early diagnostics. We discuss here the advances and challenges in systems biology-powered personalized medicine at its current stage, as well as a prospective view of future personalized health care at the end of this review.

    View details for DOI 10.1002/wsbm.1198

    View details for Web of Science ID 000312736200005

    View details for PubMedID 23184638

  • Centromere-Like Regions in the Budding Yeast Genome PLOS GENETICS Lefrancois, P., Auerbach, R. K., Yellman, C. M., Roeder, G. S., Snyder, M. 2013; 9 (1)

    Abstract

    Accurate chromosome segregation requires centromeres (CENs), the DNA sequences where kinetochores form, to attach chromosomes to microtubules. In contrast to most eukaryotes, which have broad centromeres, Saccharomyces cerevisiae possesses sequence-defined point CENs. Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) reveals colocalization of four kinetochore proteins at novel, discrete, non-centromeric regions, especially when levels of the centromeric histone H3 variant, Cse4 (a.k.a. CENP-A or CenH3), are elevated. These regions of overlapping protein binding enhance the segregation of plasmids and chromosomes and have thus been termed Centromere-Like Regions (CLRs). CLRs form in close proximity to S. cerevisiae CENs and share characteristics typical of both point and regional CENs. CLR sequences are conserved among related budding yeasts. Many genomic features characteristic of CLRs are also associated with these conserved homologous sequences from closely related budding yeasts. These studies provide general and important insights into the origin and evolution of centromeres.

    View details for DOI 10.1371/journal.pgen.1003209

    View details for Web of Science ID 000314651500052

    View details for PubMedID 23349633

  • Copy Number Variation detection from 1000 Genomes project exon capture sequencing data BMC BIOINFORMATICS Wu, J., Grzeda, K. R., Stewart, C., Grubert, F., Urban, A. E., Snyder, M. P., Marth, G. T. 2012; 13

    Abstract

    DNA capture technologies combined with high-throughput sequencing now enable cost-effective, deep-coverage, targeted sequencing of complete exomes. This is well suited for SNP discovery and genotyping. However there has been little attention devoted to Copy Number Variation (CNV) detection from exome capture datasets despite the potentially high impact of CNVs in exonic regions on protein function.As members of the 1000 Genomes Project analysis effort, we investigated 697 samples in which 931 genes were targeted and sampled with 454 or Illumina paired-end sequencing. We developed a rigorous Bayesian method to detect CNVs in the genes, based on read depth within target regions. Despite substantial variability in read coverage across samples and targeted exons, we were able to identify 107 heterozygous deletions in the dataset. The experimentally determined false discovery rate (FDR) of the cleanest dataset from the Wellcome Trust Sanger Institute is 12.5%. We were able to substantially improve the FDR in a subset of gene deletion candidates that were adjacent to another gene deletion call (17 calls). The estimated sensitivity of our call-set was 45%.This study demonstrates that exonic sequencing datasets, collected both in population based and medical sequencing projects, will be a useful substrate for detecting genic CNV events, particularly deletions. Based on the number of events we found and the sensitivity of the methods in the present dataset, we estimate on average 16 genic heterozygous deletions per individual genome. Our power analysis informs ongoing and future projects about sequencing depth and uniformity of read coverage required for efficient detection.

    View details for DOI 10.1186/1471-2105-13-305

    View details for Web of Science ID 000314688600001

    View details for PubMedID 23157288

  • Genome interpretation and assembly-recent progress and next steps. Nature biotechnology Baker, S., Joecker, A., Church, G., Snyder, M., West, J., Salzberg, S., Worthey, E., Smith, T., Wang, J., Reid, J. G. 2012; 30 (11): 1081-1083

    View details for DOI 10.1038/nbt.2425

    View details for PubMedID 23138307

  • Michael Snyder. Interview by Asher Mullard. Nature reviews. Drug discovery Snyder, M. 2012; 11 (10): 744-?

    View details for DOI 10.1038/nrd3867

    View details for PubMedID 23023673

  • Systems biology: personalized medicine for the future? CURRENT OPINION IN PHARMACOLOGY Chen, R., Snyder, M. 2012; 12 (5): 623-628

    Abstract

    Systems biology is actively transforming the field of modern health care from symptom-based disease diagnosis and treatment to precision medicine in which patients are treated based on their individual characteristics. Development of high-throughput technologies such as high-throughout sequencing and mass spectrometry has enabled scientists and clinicians to examine genomes, transcriptomes, proteomes, metabolomes, and other omics information in unprecedented detail. The combined 'omics' information leads to a global profiling of health and disease, and provides new approaches for personalized health monitoring and preventative medicine. In this article, we review the efforts of systems biology in personalized medicine in the past 2 years, and discuss in detail achievements and concerns, as well as highlights and hurdles for future personalized health care.

    View details for DOI 10.1016/j.coph.2012.07.011

    View details for Web of Science ID 000310478800017

    View details for PubMedID 22858243

  • SWI/SNF Chromatin-remodeling Factors: Multiscale Analyses and Diverse Functions JOURNAL OF BIOLOGICAL CHEMISTRY Euskirchen, G., Auerbach, R. K., Snyder, M. 2012; 287 (37): 30897-30905

    Abstract

    Chromatin-remodeling enzymes play essential roles in many biological processes, including gene expression, DNA replication and repair, and cell division. Although one such complex, SWI/SNF, has been extensively studied, new discoveries are still being made. Here, we review SWI/SNF biochemistry; highlight recent genomic and proteomic advances; and address the role of SWI/SNF in human diseases, including cancer and viral infections. These studies have greatly increased our understanding of complex nuclear processes.

    View details for DOI 10.1074/jbc.R111.309302

    View details for Web of Science ID 000308791300003

    View details for PubMedID 22952240

  • Ubiquitous heterogeneity and asymmetry of the chromatin environment at regulatory elements GENOME RESEARCH Kundaje, A., Kyriazopoulou-Panagiotopoulou, S., Libbrecht, M., Smith, C. L., Raha, D., Winters, E. E., Johnson, S. M., Snyder, M., Batzoglou, S., Sidow, A. 2012; 22 (9): 1735-1747

    Abstract

    Gene regulation at functional elements (e.g., enhancers, promoters, insulators) is governed by an interplay of nucleosome remodeling, histone modifications, and transcription factor binding. To enhance our understanding of gene regulation, the ENCODE Consortium has generated a wealth of ChIP-seq data on DNA-binding proteins and histone modifications. We additionally generated nucleosome positioning data on two cell lines, K562 and GM12878, by MNase digestion and high-depth sequencing. Here we relate 14 chromatin signals (12 histone marks, DNase, and nucleosome positioning) to the binding sites of 119 DNA-binding proteins across a large number of cell lines. We developed a new method for unsupervised pattern discovery, the Clustered AGgregation Tool (CAGT), which accounts for the inherent heterogeneity in signal magnitude, shape, and implicit strand orientation of chromatin marks. We applied CAGT on a total of 5084 data set pairs to obtain an exhaustive catalog of high-resolution patterns of histone modifications and nucleosome positioning signals around bound transcription factors. Our analyses reveal extensive heterogeneity in how histone modifications are deposited, and how nucleosomes are positioned around binding sites. With the exception of the CTCF/cohesin complex, asymmetry of nucleosome positioning is predominant. Asymmetry of histone modifications is also widespread, for all types of chromatin marks examined, including promoter, enhancer, elongation, and repressive marks. The fine-resolution signal shapes discovered by CAGT unveiled novel correlation patterns between chromatin marks, nucleosome positioning, and sequence content. Meta-analyses of the signal profiles revealed a common vocabulary of chromatin signals shared across multiple cell lines and binding proteins.

    View details for DOI 10.1101/gr.136366.111

    View details for Web of Science ID 000308272800015

    View details for PubMedID 22955985

  • A highly integrated and complex PPARGC1A transcription factor binding network in HepG2 cells GENOME RESEARCH Charos, A. E., Reed, B. D., Raha, D., Szekely, A. M., Weissman, S. M., Snyder, M. 2012; 22 (9): 1668-1679

    Abstract

    PPARGC1A is a transcriptional coactivator that binds to and coactivates a variety of transcription factors (TFs) to regulate the expression of target genes. PPARGC1A plays a pivotal role in regulating energy metabolism and has been implicated in several human diseases, most notably type II diabetes. Previous studies have focused on the interplay between PPARGC1A and individual TFs, but little is known about how PPARGC1A combines with all of its partners across the genome to regulate transcriptional dynamics. In this study, we describe a core PPARGC1A transcriptional regulatory network operating in HepG2 cells treated with forskolin. We first mapped the genome-wide binding sites of PPARGC1A using chromatin-IP followed by high-throughput sequencing (ChIP-seq) and uncovered overrepresented DNA sequence motifs corresponding to known and novel PPARGC1A network partners. We then profiled six of these site-specific TF partners using ChIP-seq and examined their network connectivity and combinatorial binding patterns with PPARGC1A. Our analysis revealed extensive overlap of targets including a novel link between PPARGC1A and HSF1, a TF regulating the conserved heat shock response pathway that is misregulated in diabetes. Importantly, we found that different combinations of TFs bound to distinct functional sets of genes, thereby helping to reveal the combinatorial regulatory code for metabolic and other cellular processes. In addition, the different TFs often bound near the promoters and coding regions of each other's genes suggesting an intricate network of interdependent regulation. Overall, our study provides an important framework for understanding the systems-level control of metabolic gene expression in humans.

    View details for DOI 10.1101/gr.127761.111

    View details for Web of Science ID 000308272800009

    View details for PubMedID 22955979

  • Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for IncRNAs GENOME RESEARCH Tilgner, H., Knowles, D. G., Johnson, R., Davis, C. A., Chakrabortty, S., Djebali, S., Curado, J., Snyder, M., Gingeras, T. R., Guigo, R. 2012; 22 (9): 1616-1625

    Abstract

    Splicing remains an incompletely understood process. Recent findings suggest that chromatin structure participates in its regulation. Here, we analyze the RNA from subcellular fractions obtained through RNA-seq in the cell line K562. We show that in the human genome, splicing occurs predominantly during transcription. We introduce the coSI measure, based on RNA-seq reads mapping to exon junctions and borders, to assess the degree of splicing completion around internal exons. We show that, as expected, splicing is almost fully completed in cytosolic polyA+ RNA. In chromatin-associated RNA (which includes the RNA that is being transcribed), for 5.6% of exons, the removal of the surrounding introns is fully completed, compared with 0.3% of exons for which no intron-removal has occurred. The remaining exons exist as a mixture of spliced and fewer unspliced molecules, with a median coSI of 0.75. Thus, most RNAs undergo splicing while being transcribed: "co-transcriptional splicing." Consistent with co-transcriptional spliceosome assembly and splicing, we have found significant enrichment of spliceosomal snRNAs in chromatin-associated RNA compared with other cellular RNA fractions and other nonspliceosomal snRNAs. CoSI scores decrease along the gene, pointing to a "first transcribed, first spliced" rule, yet more downstream exons carry other characteristics, favoring rapid, co-transcriptional intron removal. Exons with low coSI values, that is, in the process of being spliced, are enriched with chromatin marks, consistent with a role for chromatin in splicing during transcription. For alternative exons and long noncoding RNAs, splicing tends to occur later, and the latter might remain unspliced in some cases.

    View details for DOI 10.1101/gr.134445.111

    View details for Web of Science ID 000308272800004

    View details for PubMedID 22955974

  • VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment BIOINFORMATICS Habegger, L., Balasubramanian, S., Chen, D. Z., Khurana, E., Sboner, A., Harmanci, A., Rozowsky, J., Clarke, D., Snyder, M., Gerstein, M. 2012; 28 (17): 2267-2269

    Abstract

    The functional annotation of variants obtained through sequencing projects is generally assumed to be a simple intersection of genomic coordinates with genomic features. However, complexities arise for several reasons, including the differential effects of a variant on alternatively spliced transcripts, as well as the difficulty in assessing the impact of small insertions/deletions and large structural variants. Taking these factors into consideration, we developed the Variant Annotation Tool (VAT) to functionally annotate variants from multiple personal genomes at the transcript level as well as obtain summary statistics across genes and individuals. VAT also allows visualization of the effects of different variants, integrates allele frequencies and genotype data from the underlying individuals and facilitates comparative analysis between different groups of individuals. VAT can either be run through a command-line interface or as a web application. Finally, in order to enable on-demand access and to minimize unnecessary transfers of large data files, VAT can be run as a virtual machine in a cloud-computing environment.VAT is implemented in C and PHP. The VAT web service, Amazon Machine Image, source code and detailed documentation are available at vat.gersteinlab.org.

    View details for DOI 10.1093/bioinformatics/bts368

    View details for Web of Science ID 000308019200008

    View details for PubMedID 22743228

  • Understanding transcriptional regulation by integrative analysis of transcription factor binding data GENOME RESEARCH Cheng, C., Alexander, R., Min, R., Leng, J., Yip, K. Y., Rozowsky, J., Yan, K., Dong, X., Djebali, S., Ruan, Y., Davis, C. A., Carninci, P., Lassman, T., Gingerasi, T. R., Guigo, R., Birney, E., Weng, Z., Snyder, M., Gerstein, M. 2012; 22 (9): 1658-1667

    Abstract

    Statistical models have been used to quantify the relationship between gene expression and transcription factor (TF) binding signals. Here we apply the models to the large-scale data generated by the ENCODE project to study transcriptional regulation by TFs. Our results reveal a notable difference in the prediction accuracy of expression levels of transcription start sites (TSSs) captured by different technologies and RNA extraction protocols. In general, the expression levels of TSSs with high CpG content are more predictable than those with low CpG content. For genes with alternative TSSs, the expression levels of downstream TSSs are more predictable than those of the upstream ones. Different TF categories and specific TFs vary substantially in their contributions to predicting expression. Between two cell lines, the differential expression of TSS can be precisely reflected by the difference of TF-binding signals in a quantitative manner, arguing against the conventional on-and-off model of TF binding. Finally, we explore the relationships between TF-binding signals and other chromatin features such as histone modifications and DNase hypersensitivity for determining expression. The models imply that these features regulate transcription in a highly coordinated manner.

    View details for DOI 10.1101/gr.136838.111

    View details for Web of Science ID 000308272800008

    View details for PubMedID 22955978

  • Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors GENOME RESEARCH Wang, J., Zhuang, J., Iyer, S., Lin, X., Whitfield, T. W., Greven, M. C., Pierce, B. G., Dong, X., Kundaje, A., Cheng, Y., Rando, O. J., Birney, E., Myers, R. M., Noble, W. S., Snyder, M., Weng, Z. 2012; 22 (9): 1798-1812

    Abstract

    Chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-seq) has become the dominant technique for mapping transcription factor (TF) binding regions genome-wide. We performed an integrative analysis centered around 457 ChIP-seq data sets on 119 human TFs generated by the ENCODE Consortium. We identified highly enriched sequence motifs in most data sets, revealing new motifs and validating known ones. The motif sites (TF binding sites) are highly conserved evolutionarily and show distinct footprints upon DNase I digestion. We frequently detected secondary motifs in addition to the canonical motifs of the TFs, indicating tethered binding and cobinding between multiple TFs. We observed significant position and orientation preferences between many cobinding TFs. Genes specifically expressed in a cell line are often associated with a greater occurrence of nearby TF binding in that cell line. We observed cell-line-specific secondary motifs that mediate the binding of the histone deacetylase HDAC2 and the enhancer-binding protein EP300. TF binding sites are located in GC-rich, nucleosome-depleted, and DNase I sensitive regions, flanked by well-positioned nucleosomes, and many of these features show cell type specificity. The GC-richness may be beneficial for regulating TF binding because, when unoccupied by a TF, these regions are occupied by nucleosomes in vivo. We present the results of our analysis in a TF-centric web repository Factorbook (http://factorbook.org) and will continually update this repository as more ENCODE data are generated.

    View details for DOI 10.1101/gr.139105.112

    View details for Web of Science ID 000308272800020

    View details for PubMedID 22955990

  • A Genome-Scale Resource for In Vivo Tag-Based Protein Function Exploration in C. elegans CELL Sarov, M., Murray, J. I., Schanze, K., Pozniakovski, A., Niu, W., Angermann, K., Hasse, S., Rupprecht, M., Vinis, E., Tinney, M., Preston, E., Zinke, A., Enst, S., Teichgraber, T., Janette, J., Reis, K., Janosch, S., Schloissnig, S., Ejsmont, R. K., Slightam, C., Xu, X., Kim, S. K., Reinke, V., Stewart, A. F., Snyder, M., Waterston, R. H., Hyman, A. A. 2012; 150 (4): 855-866

    Abstract

    Understanding the in vivo dynamics of protein localization and their physical interactions is important for many problems in biology. To enable systematic protein function interrogation in a multicellular context, we built a genome-scale transgenic platform for in vivo expression of fluorescent- and affinity-tagged proteins in Caenorhabditis elegans under endogenous cis regulatory control. The platform combines computer-assisted transgene design, massively parallel DNA engineering, and next-generation sequencing to generate a resource of 14,637 genomic DNA transgenes, which covers 73% of the proteome. The multipurpose tag used allows any protein of interest to be localized in vivo or affinity purified using standard tag-based assays. We illustrate the utility of the resource by systematic chromatin immunopurification and automated 4D imaging, which produced detailed DNA binding and cell/tissue distribution maps for key transcription factor proteins.

    View details for DOI 10.1016/j.cell.2012.08.001

    View details for Web of Science ID 000308002300018

    View details for PubMedID 22901814

  • Discovery of Stress Responsive DNA Regulatory Motifs in Arabidopsis PLOS ONE Ma, S., Bachan, S., Porto, M., Bohnert, H. J., Snyder, M., Dinesh-Kumar, S. P. 2012; 7 (8)

    Abstract

    The discovery of DNA regulatory motifs in the sequenced genomes using computational methods remains challenging. Here, we present MotifIndexer--a comprehensive strategy for de novo identification of DNA regulatory motifs at a genome level. Using word-counting methods, we indexed the existence of every 8-mer oligo composed of bases A, C, G, T, r, y, s, w, m, k, n or 12-mer oligo composed of A, C, G, T, n, in the promoters of all predicted genes of Arabidopsis thaliana genome and of selected stress-induced co-expressed genes. From this analysis, we identified number of over-represented motifs. Among these, major critical motifs were identified using a position filter. We used a model based on uniform distribution and the z-scores derived from this model to describe position bias. Interestingly, many motifs showed position bias towards the transcription start site. We extended this model to show biased distribution of motifs in the genomes of both A. thaliana and rice. We also used MotifIndexer to identify conserved motifs in co-expressed gene groups from two Arabidopsis species, A. thaliana and A. lyrata. This new comparative genomics method does not depend on alignments of homologous gene promoter sequences.

    View details for DOI 10.1371/journal.pone.0043198

    View details for Web of Science ID 000307500100069

    View details for PubMedID 22912824

  • Investigating metabolite-protein interactions: An overview of available techniques METHODS Yang, G. X., Li, X., Snyder, M. 2012; 57 (4): 459-466

    Abstract

    Metabolites comprise the molar majority of chemical substances in living cells, and metabolite-protein interactions are expected to be quite common. Many interactions have already been identified and have been shown to be involved in the regulation of different types of cellular processes including signaling events, enzyme activities, protein localizations and interactions. Recent technological advances have greatly facilitated the detection of metabolite-protein interactions at high sensitivity and some of these have been applied on a large scale. In this manuscript, we review the available in vitro, in silico and in vivo technologies for mapping small-molecule-protein interactions. Although some of these were developed for drug-protein interactions they can be applied for mapping metabolite-protein interactions. Information gained from the use of these approaches can be applied to the manipulation of cellular processes and therapeutic applications.

    View details for DOI 10.1016/j.ymeth.2012.06.013

    View details for Web of Science ID 000309625600009

    View details for PubMedID 22750303

  • Patient-Specific Induced Pluripotent Stem Cells as a Model for Familial Dilated Cardiomyopathy SCIENCE TRANSLATIONAL MEDICINE Sun, N., Yazawa, M., Liu, J., Han, L., Sanchez-Freire, V., Abilez, O. J., Navarrete, E. G., Hu, S., Wang, L., Lee, A., Pavlovic, A., Lin, S., Chen, R., Hajjar, R. J., Snyder, M. P., Dolmetsch, R. E., Butte, M. J., Ashley, E. A., Longaker, M. T., Robbins, R. C., Wu, J. C. 2012; 4 (130)

    Abstract

    Characterized by ventricular dilatation, systolic dysfunction, and progressive heart failure, dilated cardiomyopathy (DCM) is the most common form of cardiomyopathy in patients. DCM is the most common diagnosis leading to heart transplantation and places a significant burden on healthcare worldwide. The advent of induced pluripotent stem cells (iPSCs) offers an exceptional opportunity for creating disease-specific cellular models, investigating underlying mechanisms, and optimizing therapy. Here, we generated cardiomyocytes from iPSCs derived from patients in a DCM family carrying a point mutation (R173W) in the gene encoding sarcomeric protein cardiac troponin T. Compared to control healthy individuals in the same family cohort, cardiomyocytes derived from iPSCs from DCM patients exhibited altered regulation of calcium ion (Ca(2+)), decreased contractility, and abnormal distribution of sarcomeric ?-actinin. When stimulated with a ?-adrenergic agonist, DCM iPSC-derived cardiomyocytes showed characteristics of cellular stress such as reduced beating rates, compromised contraction, and a greater number of cells with abnormal sarcomeric ?-actinin distribution. Treatment with ?-adrenergic blockers or overexpression of sarcoplasmic reticulum Ca(2+) adenosine triphosphatase (Serca2a) improved the function of iPSC-derived cardiomyocytes from DCM patients. Thus, iPSC-derived cardiomyocytes from DCM patients recapitulate to some extent the morphological and functional phenotypes of DCM and may serve as a useful platform for exploring disease mechanisms and for drug screening.

    View details for DOI 10.1126/scitranslmed.3003552

    View details for Web of Science ID 000303045900004

    View details for PubMedID 22517884

  • A core erythroid transcriptional network is repressed by a master regulator of myelo-lymphoid differentiation PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Wontakal, S. N., Guo, X., Smith, C., MacCarthy, T., Bresnick, E. H., Bergman, A., Snyder, M. P., Weissman, S. M., Zheng, D., Skoultchi, A. I. 2012; 109 (10): 3832-3837

    Abstract

    Two mechanisms that play important roles in cell fate decisions are control of a "core transcriptional network" and repression of alternative transcriptional programs by antagonizing transcription factors. Whether these two mechanisms operate together is not known. Here we report that GATA-1, SCL, and Klf1 form an erythroid core transcriptional network by co-occupying >300 genes. Importantly, we find that PU.1, a negative regulator of terminal erythroid differentiation, is a highly integrated component of this network. GATA-1, SCL, and Klf1 act to promote, whereas PU.1 represses expression of many of the core network genes. PU.1 also represses the genes encoding GATA-1, SCL, Klf1, and important GATA-1 cofactors. Conversely, in addition to repressing PU.1 expression, GATA-1 also binds to and represses >100 PU.1 myelo-lymphoid gene targets in erythroid progenitors. Mathematical modeling further supports that this dual mechanism of repressing both the opposing upstream activator and its downstream targets provides a synergistic, robust mechanism for lineage specification. Taken together, these results amalgamate two key developmental principles, namely, regulation of a core transcriptional network and repression of an alternative transcriptional program, thereby enhancing our understanding of the mechanisms that establish cellular identity.

    View details for DOI 10.1073/pnas.1121019109

    View details for Web of Science ID 000301117700049

    View details for PubMedID 22357756

  • Tcf7 Is an Important Regulator of the Switch of Self-Renewal and Differentiation in a Multipotential Hematopoietic Cell Line PLOS GENETICS Wu, J. Q., Seay, M., Schulz, V. P., Hariharan, M., Tuck, D., Lian, J., Du, J., Shi, M., Ye, Z., Gerstein, M., Snyder, M. P., Weissman, S. 2012; 8 (3)

    Abstract

    A critical problem in biology is understanding how cells choose between self-renewal and differentiation. To generate a comprehensive view of the mechanisms controlling early hematopoietic precursor self-renewal and differentiation, we used systems-based approaches and murine EML multipotential hematopoietic precursor cells as a primary model. EML cells give rise to a mixture of self-renewing Lin-SCA+CD34+ cells and partially differentiated non-renewing Lin-SCA-CD34- cells in a cell autonomous fashion. We identified and validated the HMG box protein TCF7 as a regulator in this self-renewal/differentiation switch that operates in the absence of autocrine Wnt signaling. We found that Tcf7 is the most down-regulated transcription factor when CD34+ cells switch into CD34- cells, using RNA-Seq. We subsequently identified the target genes bound by TCF7, using ChIP-Seq. We show that TCF7 and RUNX1 (AML1) bind to each other's promoter regions and that TCF7 is necessary for the production of the short isoforms, but not the long isoforms of RUNX1, suggesting that TCF7 and the short isoforms of RUNX1 function coordinately in regulation. Tcf7 knock-down experiments and Gene Set Enrichment Analyses suggest that TCF7 plays a dual role in promoting the expression of genes characteristic of self-renewing CD34+ cells while repressing genes activated in partially differentiated CD34- state. Finally a network of up-regulated transcription factors of CD34+ cells was constructed. Factors that control hematopoietic stem cell (HSC) establishment and development, cell growth, and multipotency were identified. These studies in EML cells demonstrate fundamental cell-intrinsic properties of the switch between self-renewal and differentiation, and yield valuable insights for manipulating HSCs and other differentiating systems.

    View details for DOI 10.1371/journal.pgen.1002565

    View details for Web of Science ID 000302254800041

    View details for PubMedID 22412390

  • The Chromosome-Centric Human Proteome Project for cataloging proteins encoded in the genome NATURE BIOTECHNOLOGY Paik, Y., Jeong, S., Omenn, G. S., Uhlen, M., Hanash, S., Cho, S. Y., Lee, H., Na, K., Choi, E., Yan, F., Zhang, F., Zhang, Y., Snyder, M., Cheng, Y., Chen, R., Marko-Varga, G., Deutsch, E. W., Kim, H., Kwon, J., Aebersold, R., Bairoch, A., Taylor, A. D., Kim, K. Y., Lee, E., Hochstrasser, D., Legrain, P., Hancock, W. S. 2012; 30 (3): 221-223

    View details for Web of Science ID 000301303800011

    View details for PubMedID 22398612

  • Correlation of Global MicroRNA Expression With Basal Cell Carcinoma Subtype G3-GENES GENOMES GENETICS Heffelfinger, C., Ouyang, Z., Engberg, A., Leffell, D. J., Hanlon, A. M., Gordon, P. B., Zheng, W., Zhao, H., Snyder, M. P., Bale, A. E. 2012; 2 (2): 279-286

    Abstract

    Basal cell carcinomas (BCCs) are the most common cancers in the United States. The histologic appearance distinguishes several subtypes, each of which can have a different biologic behavior. In this study, global miRNA expression was quantified by high-throughput sequencing in nodular BCCs, a subtype that is slow growing, and infiltrative BCCs, aggressive tumors that extend through the dermis and invade structures such as cutaneous nerves. Principal components analysis correctly classified seven of eight infiltrative tumors on the basis of miRNA expression. The remaining tumor, on pathology review, contained a mixture of nodular and infiltrative elements. Nodular tumors did not cluster tightly, likely reflecting broader histopathologic diversity in this class, but trended toward forming a group separate from infiltrative BCCs. Quantitative polymerase chain reaction assays were developed for six of the miRNAs that showed significant differences between the BCC subtypes, and five of these six were validated in a replication set of four infiltrative and three nodular tumors. The expression level of miR-183, a miRNA that inhibits invasion and metastasis in several types of malignancies, was consistently lower in infiltrative than nodular tumors and could be one element underlying the difference in invasiveness. These results represent the first miRNA profiling study in BCCs and demonstrate that miRNA gene expression may be involved in tumor pathogenesis and particularly in determining the aggressiveness of these malignancies.

    View details for DOI 10.1534/g3.111.001115

    View details for Web of Science ID 000312411000015

    View details for PubMedID 22384406

  • An encyclopedia of mouse DNA elements (Mouse ENCODE). Genome biology Stamatoyannopoulos, J. A., Snyder, M., Hardison, R., Ren, B., Gingeras, T., Gilbert, D. M., Groudine, M., Bender, M., Kaul, R., Canfield, T., Giste, E., Johnson, A., Zhang, M., Balasundaram, G., Byron, R., Roach, V., Sabo, P. J., Sandstrom, R., Stehling, A. S., Thurman, R. E., Weissman, S. M., Cayting, P., Hariharan, M., Lian, J., Cheng, Y., Landt, S. G., Ma, Z., Wold, B. J., Dekker, J., Crawford, G. E., Keller, C. A., Wu, W., Morrissey, C., Kumar, S. A., Mishra, T., Jain, D., Byrska-Bishop, M., Blankenberg, D., Lajoie1, B. R., Jain, G., Sanyal, A., Chen, K. B., Denas, O., Taylor, J., Blobel, G. A., Weiss, M. J., Pimkin, M., Deng, W., Marinov, G. K., Williams, B. A., Fisher-Aylor, K. I., Desalvo, G., Kiralusha, A., Trout, D., Amrhein, H., Mortazavi, A., Edsall, L., McCleary, D., Kuan, S., Shen, Y., Yue, F., Ye, Z., Davis, C. A., Zaleski, C., Jha, S., Xue, C., Dobin, A., Lin, W., Fastuca, M., Wang, H., Guigo, R., Djebali, S., Lagarde, J., Ryba, T., Sasaki, T., Malladi, V. S., Cline, M. S., Kirkup, V. M., Learned, K., Rosenbloom, K. R., Kent, W. J., Feingold, E. A., Good, P. J., Pazin, M., Lowdon, R. F., Adams, L. B. 2012; 13 (8): 418

    Abstract

    ABSTRACT: To complement the human Encyclopedia of DNA Elements (ENCODE) project and to enable a broad range of mouse genomics efforts, the Mouse ENCODE Consortium is applying the same experimental pipelines developed for human ENCODE to annotate the mouse genome.

    View details for PubMedID 22889292

  • Q & A: the Snyderome GENOME BIOLOGY Snyder, M. 2012; 13 (3)

    Abstract

    Michael Snyder answers Genome Biology's questions on the human and professional stories underlying his Snyderome integrative omics project.

    View details for DOI 10.1186/gb-2012-13-3-147

    View details for Web of Science ID 000308544200010

    View details for PubMedID 22424393

  • Phosphorylation of Yeast Transcription Factors Correlates with the Evolution of Novel Sequence and Function JOURNAL OF PROTEOME RESEARCH Kaganovich, M., Snyder, M. 2012; 11 (1): 261-268

    Abstract

    Gene duplication is a significant source of novel genes and the dynamics of gene duplicate retention vs loss are poorly understood, particularly in terms of the functional and regulatory specialization of their gene products. We compiled a comprehensive data set of S. cerevisiae phosphosites to study the role of phosphorylation in yeast paralog divergence. We found that proteins coded by duplicated genes created in the Whole Genome Duplication (WGD) event and in a period prior to the WGD are significantly more phosphorylated than other duplicates or singletons. Though the amino acid sequence of each paralog of a given pair tends to diverge fairly similarly from their common ortholog in a related species, the phosphorylated amino acids tend to diverge in sequence from the ortholog at different rates. We observed that transcription factors (TFs) are disproportionately present among the set of duplicate genes and among the set of proteins that are phosphorylated. Interestingly, TFs that occur on higher levels of the transcription network hierarchy (i.e., tend to regulate other TFs) tend to be more phosphorylated than lower-level TFs. We found that TF paralog divergence in expression, binding, and sequence correlates with the abundance of phosphosites. Overall, these studies have important implications for understanding divergence of gene function and regulation in eukaryotes.

    View details for DOI 10.1021/pr201065k

    View details for Web of Science ID 000298827700024

    View details for PubMedID 22141333

  • Interpretome: a freely available, modular, and secure personal genome interpretation engine. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Karczewski, K. J., Tirrell, R. P., Cordero, P., Tatonetti, N. P., Dudley, J. T., Salari, K., Snyder, M., Altman, R. B., Kim, S. K. 2012: 339-350

    Abstract

    The decreasing cost of genotyping and genome sequencing has ushered in an era of genomic personalized medicine. More than 100,000 individuals have been genotyped by direct-to-consumer genetic testing services, which offer a glimpse into the interpretation and exploration of a personal genome. However, these interpretations, which require extensive manual curation, are subject to the preferences of the company and are not customizable by the individual. Academic institutions teaching personalized medicine, as well as genetic hobbyists, may prefer to customize their analysis and have full control over the content and method of interpretation. We present the Interpretome, a system for private genome interpretation, which contains all genotype information in client-side interpretation scripts, supported by server-side databases. We provide state-of-the-art analyses for teaching clinical implications of personal genomics, including disease risk assessment and pharmacogenomics. Additionally, we have implemented client-side algorithms for ancestry inference, demonstrating the power of these methods without excessive computation. Finally, the modular nature of the system allows for plugin capabilities for custom analyses. This system will allow for personal genome exploration without compromising privacy, facilitating hands-on courses in genomics and personalized medicine.

    View details for PubMedID 22174289

  • Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors GENOME BIOLOGY Yip, K. Y., Cheng, C., Bhardwaj, N., Brown, J. B., Leng, J., Kundaje, A., Rozowsky, J., Birney, E., Bickel, P., Snyder, M., Gerstein, M. 2012; 13 (9)

    Abstract

    Transcription factors function by binding different classes of regulatory elements. The Encyclopedia of DNA Elements (ENCODE) project has recently produced binding data for more than 100 transcription factors from about 500 ChIP-seq experiments in multiple cell types. While this large amount of data creates a valuable resource, it is nonetheless overwhelmingly complex and simultaneously incomplete since it covers only a small fraction of all human transcription factors.As part of the consortium effort in providing a concise abstraction of the data for facilitating various types of downstream analyses, we constructed statistical models that capture the genomic features of three paired types of regions by machine-learning methods: firstly, regions with active or inactive binding; secondly, those with extremely high or low degrees of co-binding, termed HOT and LOT regions; and finally, regulatory modules proximal or distal to genes. From the distal regulatory modules, we developed computational pipelines to identify potential enhancers, many of which were validated experimentally. We further associated the predicted enhancers with potential target transcripts and the transcription factors involved. For HOT regions, we found a significant fraction of transcription factor binding without clear sequence motifs and showed that this observation could be related to strong DNA accessibility of these regions.Overall, the three pairs of regions exhibit intricate differences in chromosomal locations, chromatin features, factors that bind them, and cell-type specificity. Our machine learning approach enables us to identify features potentially general to all transcription factors, including those not included in the data.

    View details for DOI 10.1186/gb-2012-13-9-r48

    View details for Web of Science ID 000313182600001

    View details for PubMedID 22950945

  • Characterization of Enhancer Function from Genome-Wide Analyses ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, VOL 13 Maston, G. A., Landt, S. G., Snyder, M., Green, M. R. 2012; 13: 29-57

    Abstract

    There has been a recent surge in the use of genome-wide methodologies to identify and annotate the transcriptional regulatory elements in the human genome. Here we review some of these methodologies and the conceptual insights about transcription regulation that have been gained from the use of genome-wide studies. It has become clear that the binding of transcription factors is itself a highly regulated process, and binding does not always appear to have functional consequences. Numerous properties have now been associated with regulatory elements that may be useful in their identification. Several aspects of enhancer function have been shown to be more widespread than was previously appreciated, including the highly combinatorial nature of transcription factor binding, the postinitiation regulation of many target genes, and the binding of enhancers at early stages to maintain their competence during development. Going forward, the integration of multiple genome-wide data sets should become a standard approach to elucidate higher-order regulatory interactions.

    View details for DOI 10.1146/annurev-genom-090711-163723

    View details for Web of Science ID 000310143800002

    View details for PubMedID 22703170

  • A High-Resolution Whole-Genome Map of Key Chromatin Modifications in the Adult Drosophila melanogaster PLOS GENETICS Yin, H., Sweeney, S., Raha, D., Snyder, M., Lin, H. 2011; 7 (12)

    Abstract

    Epigenetic research has been focused on cell-type-specific regulation; less is known about common features of epigenetic programming shared by diverse cell types within an organism. Here, we report a modified method for chromatin immunoprecipitation and deep sequencing (ChIP-Seq) and its use to construct a high-resolution map of the Drosophila melanogaster key histone marks, heterochromatin protein 1a (HP1a) and RNA polymerase II (polII). These factors are mapped at 50-bp resolution genome-wide and at 5-bp resolution for regulatory sequences of genes, which reveals fundamental features of chromatin modification landscape shared by major adult Drosophila cell types: the enrichment of both heterochromatic and euchromatic marks in transposons and repetitive sequences, the accumulation of HP1a at transcription start sites with stalled polII, the signatures of histone code and polII level/position around the transcriptional start sites that predict both the mRNA level and functionality of genes, and the enrichment of elongating polII within exons at splicing junctions. These features, likely conserved among diverse epigenomes, reveal general strategies for chromatin modifications.

    View details for DOI 10.1371/journal.pgen.1002380

    View details for Web of Science ID 000299167900003

    View details for PubMedID 22194694

  • Genome-Wide Mapping of Copy Number Variation in Humans: Comparative Analysis of High Resolution Array Platforms PLOS ONE Haraksingh, R. R., Abyzov, A., Gerstein, M., Urban, A. E., Snyder, M. 2011; 6 (11)

    Abstract

    Accurate and efficient genome-wide detection of copy number variants (CNVs) is essential for understanding human genomic variation, genome-wide CNV association type studies, cytogenetics research and diagnostics, and independent validation of CNVs identified from sequencing based technologies. Numerous, array-based platforms for CNV detection exist utilizing array Comparative Genome Hybridization (aCGH), Single Nucleotide Polymorphism (SNP) genotyping or both. We have quantitatively assessed the abilities of twelve leading genome-wide CNV detection platforms to accurately detect Gold Standard sets of CNVs in the genome of HapMap CEU sample NA12878, and found significant differences in performance. The technologies analyzed were the NimbleGen 4.2 M, 2.1 M and 3×720 K Whole Genome and CNV focused arrays, the Agilent 1×1 M CGH and High Resolution and 2×400 K CNV and SNP+CGH arrays, the Illumina Human Omni1Quad array and the Affymetrix SNP 6.0 array. The Gold Standards used were a 1000 Genomes Project sequencing-based set of 3997 validated CNVs and an ultra high-resolution aCGH-based set of 756 validated CNVs. We found that sensitivity, total number, size range and breakpoint resolution of CNV calls were highest for CNV focused arrays. Our results are important for cost effective CNV detection and validation for both basic and clinical applications.

    View details for DOI 10.1371/journal.pone.0027859

    View details for Web of Science ID 000298168100021

    View details for PubMedID 22140474

  • Construction and Analysis of an Integrated Regulatory Network Derived from High-Throughput Sequencing Data PLOS COMPUTATIONAL BIOLOGY Cheng, C., Yan, K., Hwang, W., Qian, J., Bhardwaj, N., Rozowsky, J., Lu, Z. J., Niu, W., Alves, P., Kato, M., Snyder, M., Gerstein, M. 2011; 7 (11)

    Abstract

    We present a network framework for analyzing multi-level regulation in higher eukaryotes based on systematic integration of various high-throughput datasets. The network, namely the integrated regulatory network, consists of three major types of regulation: TF?gene, TF?miRNA and miRNA?gene. We identified the target genes and target miRNAs for a set of TFs based on the ChIP-Seq binding profiles, the predicted targets of miRNAs using annotated 3'UTR sequences and conservation information. Making use of the system-wide RNA-Seq profiles, we classified transcription factors into positive and negative regulators and assigned a sign for each regulatory interaction. Other types of edges such as protein-protein interactions and potential intra-regulations between miRNAs based on the embedding of miRNAs in their host genes were further incorporated. We examined the topological structures of the network, including its hierarchical organization and motif enrichment. We found that transcription factors downstream of the hierarchy distinguish themselves by expressing more uniformly at various tissues, have more interacting partners, and are more likely to be essential. We found an over-representation of notable network motifs, including a FFL in which a miRNA cost-effectively shuts down a transcription factor and its target. We used data of C. elegans from the modENCODE project as a primary model to illustrate our framework, but further verified the results using other two data sets. As more and more genome-wide ChIP-Seq and RNA-Seq data becomes available in the near future, our methods of data integration have various potential applications.

    View details for DOI 10.1371/journal.pcbi.1002190

    View details for Web of Science ID 000297263700001

    View details for PubMedID 22125477

  • Performance comparison of exome DNA sequencing technologies NATURE BIOTECHNOLOGY Clark, M. J., Chen, R., Lam, H. Y., Karczewski, K. J., Chen, R., Euskirchen, G., Butte, A. J., Snyder, M. 2011; 29 (10): 908-U206

    Abstract

    Whole exome sequencing by high-throughput sequencing of target-enriched genomic DNA (exome-seq) has become common in basic and translational research as a means of interrogating the interpretable part of the human genome at relatively low cost. We present a comparison of three major commercial exome sequencing platforms from Agilent, Illumina and Nimblegen applied to the same human blood sample. Our results suggest that the Nimblegen platform, which is the only one to use high-density overlapping baits, covers fewer genomic regions than the other platforms but requires the least amount of sequencing to sensitively detect small variants. Agilent and Illumina are able to detect a greater total number of variants with additional sequencing. Illumina captures untranslated regions, which are not targeted by the Nimblegen and Agilent platforms. We also compare exome sequencing and whole genome sequencing (WGS) of the same sample, demonstrating that exome sequencing can detect additional small variants missed by WGS.

    View details for DOI 10.1038/nbt.1975

    View details for Web of Science ID 000296273000017

    View details for PubMedID 21947028

  • Phased Whole-Genome Genetic Risk in a Family Quartet Using a Major Allele Reference Sequence PLOS GENETICS Dewey, F. E., Chen, R., Cordero, S. P., Ormond, K. E., Caleshu, C., Karczewski, K. J., Whirl-Carrillo, M., Wheeler, M. T., Dudley, J. T., Byrnes, J. K., Cornejo, O. E., Knowles, J. W., Woon, M., Sangkuhl, K., Gong, L., Thorn, C. F., Hebert, J. M., Capriotti, E., David, S. P., Pavlovic, A., West, A., Thakuria, J. V., Ball, M. P., Zaranek, A. W., Rehm, H. L., Church, G. M., West, J. S., Bustamante, C. D., Snyder, M., Altman, R. B., Klein, T. E., Butte, A. J., Ashley, E. A. 2011; 7 (9)

    Abstract

    Whole-genome sequencing harbors unprecedented potential for characterization of individual and family genetic variation. Here, we develop a novel synthetic human reference sequence that is ethnically concordant and use it for the analysis of genomes from a nuclear family with history of familial thrombophilia. We demonstrate that the use of the major allele reference sequence results in improved genotype accuracy for disease-associated variant loci. We infer recombination sites to the lowest median resolution demonstrated to date (< 1,000 base pairs). We use family inheritance state analysis to control sequencing error and inform family-wide haplotype phasing, allowing quantification of genome-wide compound heterozygosity. We develop a sequence-based methodology for Human Leukocyte Antigen typing that contributes to disease risk prediction. Finally, we advance methods for analysis of disease and pharmacogenomic risk across the coding and non-coding genome that incorporate phased variant data. We show these methods are capable of identifying multigenic risk for inherited thrombophilia and informing the appropriate pharmacological therapy. These ethnicity-specific, family-based approaches to interpretation of genetic variation are emblematic of the next generation of genetic risk assessment using whole-genome sequencing.

    View details for DOI 10.1371/journal.pgen.1002280

    View details for Web of Science ID 000295419100031

    View details for PubMedID 21935354

  • Arabidopsis RTNLB1 and RTNLB2 Reticulon-Like Proteins Regulate Intracellular Trafficking and Activity of the FLS2 Immune Receptor PLANT CELL Lee, H. Y., Bowen, C. H., Popescu, G. V., Kang, H., Kato, N., Ma, S., Dinesh-Kumar, S., Snyder, M., Popescu, S. C. 2011; 23 (9): 3374-3391

    Abstract

    Receptors localized at the plasma membrane are critical for the recognition of pathogens. The molecular determinants that regulate receptor transport to the plasma membrane are poorly understood. In a screen for proteins that interact with the FLAGELIN-SENSITIVE2 (FLS2) receptor using Arabidopsis thaliana protein microarrays, we identified the reticulon-like protein RTNLB1. We showed that FLS2 interacts in vivo with both RTNLB1 and its homolog RTNLB2 and that a Ser-rich region in the N-terminal tail of RTNLB1 is critical for the interaction with FLS2. Transgenic plants that lack RTNLB1 and RTNLB2 (rtnlb1 rtnlb2) or overexpress RTNLB1 (RTNLB1ox) exhibit reduced activation of FLS2-dependent signaling and increased susceptibility to pathogens. In both rtnlb1 rtnlb2 and RTNLB1ox, FLS2 accumulation at the plasma membrane was significantly affected compared with the wild type. Transient overexpression of RTNLB1 led to FLS2 retention in the endoplasmic reticulum (ER) and affected FLS2 glycosylation but not FLS2 stability. Removal of the critical N-terminal Ser-rich region or either of the two Tyr-dependent sorting motifs from RTNLB1 causes partial reversion of the negative effects of excess RTNLB1 on FLS2 transport out of the ER and accumulation at the membrane. The results are consistent with a model whereby RTNLB1 and RTNLB2 regulate the transport of newly synthesized FLS2 to the plasma membrane.

    View details for DOI 10.1105/tpc.111.089656

    View details for Web of Science ID 000296739100025

    View details for PubMedID 21949153

  • Cooperative transcription factor associations discovered using regulatory variation PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Karczewski, K. J., Tatonetti, N. P., Landt, S. G., Yang, X., Slifer, T., Altman, R. B., Snyder, M. 2011; 108 (32): 13353-13358

    Abstract

    Regulation of gene expression at the transcriptional level is achieved by complex interactions of transcription factors operating at their target genes. Dissecting the specific combination of factors that bind each target is a significant challenge. Here, we describe in detail the Allele Binding Cooperativity test, which uses variation in transcription factor binding among individuals to discover combinations of factors and their targets. We developed the ALPHABIT (a large-scale process to hunt for allele binding interacting transcription factors) pipeline, which includes statistical analysis of binding sites followed by experimental validation, and demonstrate that this method predicts transcription factors that associate with NF?B. Our method successfully identifies factors that have been known to work with NF?B (E2A, STAT1, IRF2), but whose global coassociation and sites of cooperative action were not known. In addition, we identify a unique coassociation (EBF1) that had not been reported previously. We present a general approach for discovering combinatorial models of regulation and advance our understanding of the genetic basis of variation in transcription factor binding.

    View details for DOI 10.1073/pnas.1103105108

    View details for Web of Science ID 000293691400076

    View details for PubMedID 21828005

  • A Comprehensive Map of Mobile Element Insertion Polymorphisms in Humans PLOS GENETICS Stewart, C., Kural, D., Stroemberg, M. P., Walker, J. A., Konkel, M. K., Stuetz, A. M., Urban, A. E., Grubert, F., Lam, H. Y., Lee, W., Busby, M., Indap, A. R., Garrison, E., Huff, C., Xing, J., Snyder, M. P., Jorde, L. B., Batzer, M. A., Korbel, J. O., Marth, G. T. 2011; 7 (8)

    Abstract

    As a consequence of the accumulation of insertion events over evolutionary time, mobile elements now comprise nearly half of the human genome. The Alu, L1, and SVA mobile element families are still duplicating, generating variation between individual genomes. Mobile element insertions (MEI) have been identified as causes for genetic diseases, including hemophilia, neurofibromatosis, and various cancers. Here we present a comprehensive map of 7,380 MEI polymorphisms from the 1000 Genomes Project whole-genome sequencing data of 185 samples in three major populations detected with two detection methods. This catalog enables us to systematically study mutation rates, population segregation, genomic distribution, and functional properties of MEI polymorphisms and to compare MEI to SNP variation from the same individuals. Population allele frequencies of MEI and SNPs are described, broadly, by the same neutral ancestral processes despite vastly different mutation mechanisms and rates, except in coding regions where MEI are virtually absent, presumably due to strong negative selection. A direct comparison of MEI and SNP diversity levels suggests a differential mobile element insertion rate among populations.

    View details for DOI 10.1371/journal.pgen.1002236

    View details for Web of Science ID 000294297000031

    View details for PubMedID 21876680

  • AlleleSeq: analysis of allele-specific expression and binding in a network framework MOLECULAR SYSTEMS BIOLOGY Rozowsky, J., Abyzov, A., Wang, J., Alves, P., Raha, D., Harmanci, A., Leng, J., Bjornson, R., Kong, Y., Kitabayashi, N., Bhardwaj, N., Rubin, M., Snyder, M., Gerstein, M. 2011; 7

    Abstract

    To study allele-specific expression (ASE) and binding (ASB), that is, differences between the maternally and paternally derived alleles, we have developed a computational pipeline (AlleleSeq). Our pipeline initially constructs a diploid personal genome sequence (and corresponding personalized gene annotation) using genomic sequence variants (SNPs, indels, and structural variants), and then identifies allele-specific events with significant differences in the number of mapped reads between maternal and paternal alleles. There are many technical challenges in the construction and alignment of reads to a personal diploid genome sequence that we address, for example, bias of reads mapping to the reference allele. We have applied AlleleSeq to variation data for NA12878 from the 1000 Genomes Project as well as matched, deeply sequenced RNA-Seq and ChIP-Seq data sets generated for this purpose. In addition to observing fairly widespread allele-specific behavior within individual functional genomic data sets (including results consistent with X-chromosome inactivation), we can study the interaction between ASE and ASB. Furthermore, we investigate the coordination between ASE and ASB from multiple transcription factors events using a regulatory network framework. Correlation analyses and network motifs show mostly coordinated ASB and ASE.

    View details for DOI 10.1038/msb.2011.54

    View details for Web of Science ID 000294537800003

    View details for PubMedID 21811232

  • Identification of genomic indels and structural variations using split reads BMC GENOMICS Zhang, Z. D., Du, J., Lam, H., Abyzov, A., Urban, A. E., Snyder, M., Gerstein, M. 2011; 12

    Abstract

    Recent studies have demonstrated the genetic significance of insertions, deletions, and other more complex structural variants (SVs) in the human population. With the development of the next-generation sequencing technologies, high-throughput surveys of SVs on the whole-genome level have become possible. Here we present split-read identification, calibrated (SRiC), a sequence-based method for SV detection.We start by mapping each read to the reference genome in standard fashion using gapped alignment. Then to identify SVs, we score each of the many initial mappings with an assessment strategy designed to take into account both sequencing and alignment errors (e.g. scoring more highly events gapped in the center of a read). All current SV calling methods have multilevel biases in their identifications due to both experimental and computational limitations (e.g. calling more deletions than insertions). A key aspect of our approach is that we calibrate all our calls against synthetic data sets generated from simulations of high-throughput sequencing (with realistic error models). This allows us to calculate sensitivity and the positive predictive value under different parameter-value scenarios and for different classes of events (e.g. long deletions vs. short insertions). We run our calculations on representative data from the 1000 Genomes Project. Coupling the observed numbers of events on chromosome 1 with the calibrations gleaned from the simulations (for different length events) allows us to construct a relatively unbiased estimate for the total number of SVs in the human genome across a wide range of length scales. We estimate in particular that an individual genome contains ~670,000 indels/SVs.Compared with the existing read-depth and read-pair approaches for SV identification, our method can pinpoint the exact breakpoints of SV events, reveal the actual sequence content of insertions, and cover the whole size spectrum for deletions. Moreover, with the advent of the third-generation sequencing technologies that produce longer reads, we expect our method to be even more useful.

    View details for DOI 10.1186/1471-2164-12-375

    View details for Web of Science ID 000294205500001

    View details for PubMedID 21787423

  • Metabolites as global regulators: A new view of protein regulation BIOESSAYS Li, X., Snyder, M. 2011; 33 (7): 485-489

    View details for DOI 10.1002/bies.201100026

    View details for Web of Science ID 000292710500002

    View details for PubMedID 21495048

  • The Human Proteome Project: Current State and Future Direction MOLECULAR & CELLULAR PROTEOMICS Legrain, P., Aebersold, R., Archakov, A., Bairoch, A., Bala, K., Beretta, L., Bergeron, J., Borchers, C. H., Corthals, G. L., Costello, C. E., Deutsch, E. W., Domon, B., Hancock, W., He, F., Hochstrasser, D., Marko-Varga, G., Salekdeh, G. H., Sechi, S., Snyder, M., Srivastava, S., Uhlen, M., Wu, C. H., Yamamoto, T., Paik, Y., Omenn, G. S. 2011; 10 (7)

    Abstract

    After the successful completion of the Human Genome Project, the Human Proteome Organization has recently officially launched a global Human Proteome Project (HPP), which is designed to map the entire human protein set. Given the lack of protein-level evidence for about 30% of the estimated 20,300 protein-coding genes, a systematic global effort will be necessary to achieve this goal with respect to protein abundance, distribution, subcellular localization, interaction with other biomolecules, and functions at specific time points. As a general experimental strategy, HPP research groups will use the three working pillars for HPP: mass spectrometry, antibody capture, and bioinformatics tools and knowledge bases. The HPP participants will take advantage of the output and cross-analyses from the ongoing Human Proteome Organization initiatives and a chromosome-centric protein mapping strategy, termed C-HPP, with which many national teams are currently engaged. In addition, numerous biologically driven and disease-oriented projects will be stimulated and facilitated by the HPP. Timely planning with proper governance of HPP will deliver a protein parts list, reagents, and tools for protein studies and analyses, and a stronger basis for personalized medicine. The Human Proteome Organization urges each national research funding agency and the scientific community at large to identify their preferred pathways to participate in aspects of this highly promising project in a HPP consortium of funders and investigators.

    View details for DOI 10.1074/mcp.M111.009993

    View details for Web of Science ID 000292541500012

    View details for PubMedID 21742803

  • Landscape of Next-Generation Sequencing Technologies ANALYTICAL CHEMISTRY Niedringhaus, T. P., Milanova, D., Kerby, M. B., Snyder, M. P., Barron, A. E. 2011; 83 (12): 4327-4341

    View details for DOI 10.1021/ac2010857

    View details for Web of Science ID 000291499800001

    View details for PubMedID 21612267

  • A Large Gene Network in Immature Erythroid Cells Is Controlled by the Myeloid and B Cell Transcriptional Regulator PU.1 PLOS GENETICS Wontakal, S. N., Guo, X., Will, B., Shi, M., Raha, D., Mahajan, M. C., Weissman, S., Snyder, M., Steidl, U., Zheng, D., Skoultchi, A. I. 2011; 7 (6)

    Abstract

    PU.1 is a hematopoietic transcription factor that is required for the development of myeloid and B cells. PU.1 is also expressed in erythroid progenitors, where it blocks erythroid differentiation by binding to and inhibiting the main erythroid promoting factor, GATA-1. However, other mechanisms by which PU.1 affects the fate of erythroid progenitors have not been thoroughly explored. Here, we used ChIP-Seq analysis for PU.1 and gene expression profiling in erythroid cells to show that PU.1 regulates an extensive network of genes that constitute major pathways for controlling growth and survival of immature erythroid cells. By analyzing fetal liver erythroid progenitors from mice with low PU.1 expression, we also show that the earliest erythroid committed cells are dramatically reduced in vivo. Furthermore, we find that PU.1 also regulates many of the same genes and pathways in other blood cells, leading us to propose that PU.1 is a multifaceted factor with overlapping, as well as distinct, functions in several hematopoietic lineages.

    View details for DOI 10.1371/journal.pgen.1001392

    View details for Web of Science ID 000292386300004

    View details for PubMedID 21695229

  • CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing GENOME RESEARCH Abyzov, A., Urban, A. E., Snyder, M., Gerstein, M. 2011; 21 (6): 974-984

    Abstract

    Copy number variation (CNV) in the genome is a complex phenomenon, and not completely understood. We have developed a method, CNVnator, for CNV discovery and genotyping from read-depth (RD) analysis of personal genome sequencing. Our method is based on combining the established mean-shift approach with additional refinements (multiple-bandwidth partitioning and GC correction) to broaden the range of discovered CNVs. We calibrated CNVnator using the extensive validation performed by the 1000 Genomes Project. Because of this, we could use CNVnator for CNV discovery and genotyping in a population and characterization of atypical CNVs, such as de novo and multi-allelic events. Overall, for CNVs accessible by RD, CNVnator has high sensitivity (86%-96%), low false-discovery rate (3%-20%), high genotyping accuracy (93%-95%), and high resolution in breakpoint discovery (<200 bp in 90% of cases with high sequencing coverage). Furthermore, CNVnator is complementary in a straightforward way to split-read and read-pair approaches: It misses CNVs created by retrotransposable elements, but more than half of the validated CNVs that it identifies are not detected by split-read or read-pair. By genotyping CNVs in the CEPH, Yoruba, and Chinese-Japanese populations, we estimated that at least 11% of all CNV loci involve complex, multi-allelic events, a considerably higher estimate than reported earlier. Moreover, among these events, we observed cases with allele distribution strongly deviating from Hardy-Weinberg equilibrium, possibly implying selection on certain complex loci. Finally, by combining discovery and genotyping, we identified six potential de novo CNVs in two family trios.

    View details for DOI 10.1101/gr.114876.110

    View details for Web of Science ID 000291153400017

    View details for PubMedID 21324876

  • Genome-wide chromatin occupancy analysis reveals a role for ASH2 in transcriptional pausing NUCLEIC ACIDS RESEARCH Perez-Lluch, S., Blanco, E., Carbonell, A., Raha, D., Snyder, M., Serras, F., Corominas, M. 2011; 39 (11): 4628-4639

    Abstract

    An important mechanism for gene regulation involves chromatin changes via histone modification. One such modification is histone H3 lysine 4 trimethylation (H3K4me3), which requires histone methyltranferase complexes (HMT) containing the trithorax-group (trxG) protein ASH2. Mutations in ash2 cause a variety of pattern formation defects in the Drosophila wing. We have identified genome-wide binding of ASH2 in wing imaginal discs using chromatin immunoprecipitation combined with sequencing (ChIP-Seq). Our results show that genes with functions in development and transcriptional regulation are activated by ASH2 via H3K4 trimethylation in nearby nucleosomes. We have characterized the occupancy of phosphorylated forms of RNA Polymerase II and histone marks associated with activation and repression of transcription. ASH2 occupancy correlates with phosphorylated forms of RNA Polymerase II and histone activating marks in expressed genes. Additionally, RNA Polymerase II phosphorylation on serine 5 and H3K4me3 are reduced in ash2 mutants in comparison to wild-type flies. Finally, we have identified specific motifs associated with ASH2 binding in genes that are differentially expressed in ash2 mutants. Our data suggest that recruitment of the ASH2-containing HMT complexes is context specific and points to a function of ASH2 and H3K4me3 in transcriptional pausing control.

    View details for DOI 10.1093/nar/gkq1322

    View details for Web of Science ID 000291755000015

    View details for PubMedID 21310711

  • Diverse protein kinase interactions identified by protein microarrays reveal novel connections between cellular processes GENES & DEVELOPMENT Fasolo, J., Sboner, A., Sun, M. G., Yu, H., Chen, R., Sharon, D., Kim, P. M., Gerstein, M., Snyder, M. 2011; 25 (7): 767-778

    Abstract

    Protein kinases are key regulators of cellular processes. In spite of considerable effort, a full understanding of the pathways they participate in remains elusive. We globally investigated the proteins that interact with the majority of yeast protein kinases using protein microarrays. Eighty-five kinases were purified and used to probe yeast proteome microarrays. One-thousand-twenty-three interactions were identified, and the vast majority were novel. Coimmunoprecipitation experiments indicate that many of these interactions occurred in vivo. Many novel links of kinases to previously distinct cellular pathways were discovered. For example, the well-studied Kss1 filamentous pathway was found to bind components of diverse cellular pathways, such as those of the stress response pathway and the Ccr4-Not transcriptional/translational regulatory complex; genetic tests revealed that these different components operate in the filamentation pathway in vivo. Overall, our results indicate that kinases operate in a highly interconnected network that coordinates many activities of the proteome. Our results further demonstrate that protein microarrays uncover a diverse set of interactions not observed previously.

    View details for DOI 10.1101/gad.1998811

    View details for Web of Science ID 000289062700010

    View details for PubMedID 21460040

  • A User's Guide to the Encyclopedia of DNA Elements (ENCODE) PLOS BIOLOGY Myers, R. M., Stamatoyannopoulos, J., Snyder, M., Dunham, I., Hardison, R. C., Bernstein, B. E., Gingeras, T. R., Kent, W. J., Birney, E., Wold, B., Crawford, G. E., Bernstein, B. E., Epstein, C. B., Shoresh, N., Ernst, J., Mikkelsen, T. S., Kheradpour, P., Zhang, X., Wang, L., Issner, R., Coyne, M. J., Durham, T., Ku, M., Thanh Truong, T., Ward, L. D., Altshuler, R. C., Lin, M. F., Kellis, M., Gingeras, T. R., Davis, C. A., Kapranov, P., Dobin, A., Zaleski, C., Schlesinger, F., Batut, P., Chakrabortty, S., Jha, S., Lin, W., Drenkow, J., Wang, H., Bell, K., Gao, H., Bell, I., Dumais, E., Dumais, J., Antonarakis, S. E., Ucla, C., Borel, C., Guigo, R., Djebali, S., Lagarde, J., Kingswood, C., Ribeca, P., Sammeth, M., Alioto, T., Merkel, A., Tilgner, H., Carninci, P., Hayashizaki, Y., Lassmann, T., Takahashi, H., Abdelhamid, R. F., Hannon, G., Fejes-Toth, K., Preall, J., Gordon, A., Sotirova, V., Reymond, A., Howald, C., Graison, E. A., Chrast, J., Ruan, Y., Ruan, X., Shahab, A., Poh, W. T., Wei, C., Crawford, G. E., Furey, T. S., Boyle, A. P., Sheffield, N. C., Song, L., Shibata, Y., Vales, T., Winter, D., Zhang, Z., London, D., Wang, T., Birney, E., Keefe, D., Iyer, V. R., Lee, B., McDaniell, R. M., Liu, Z., Battenhouse, A., Bhinge, A. A., Lieb, J. D., Grasfeder, L. L., Showers, K. A., Giresi, P. G., Kim, S. K., Shestak, C., Myers, R. M., Pauli, F., Reddy, T. E., Gertz, J., Partridge, E. C., Jain, P., Sprouse, R. O., Bansal, A., Pusey, B., Muratet, M. A., Varley, K. E., Bowling, K. M., Newberry, K. M., Nesmith, A. S., Dilocker, J. A., Parker, S. L., Waite, L. L., Thibeault, K., Roberts, K., Absher, D. M., Wold, B., Mortazavi, A., Williams, B., Marinov, G., Trout, D., Pepke, S., King, B., McCue, K., Kirilusha, A., DeSalvo, G., Fisher-Aylor, K., Amrhein, H., Vielmetter, J., Sherlock, G., Sidow, A., Batzoglou, S., Rauch, R., Kundaje, A., Libbrecht, M., Margulies, E. H., Parker, S. C., Elnitski, L., Green, E. D., Hubbard, T., Harrow, J., Searle, S., Kokocinski, F., Aken, B., Frankish, A., Hunt, T., Despacio-Reyes, G., Kay, M., Mukherjee, G., Bignell, A., Saunders, G., Boychenko, V., Brent, M., van Baren, M. J., Brown, R. H., Gerstein, M., Khurana, E., Balasubramanian, S., Zhang, Z., Lam, H., Cayting, P., Robilotto, R., Lu, Z., Guigo, R., Derrien, T., Tanzer, A., Knowles, D. G., Mariotti, M., Kent, W. J., Haussler, D., Harte, R., Diekhans, M., Kellis, M., Lin, M., Kheradpour, P., Ernst, J., Reymond, A., Howald, C., Graison, E. A., Chrast, J., Valencia, A., Tress, M., Manuel Rodriguez, J., Snyder, M., Landt, S. G., Raha, D., Shi, M., Euskirchen, G., Grubert, F., Kasowski, M., Lian, J., Cayting, P., Lacroute, P., Xu, Y., Monahan, H., Patacsil, D., Slifer, T., Yang, X., Charos, A., Reed, B., Wu, L., Auerbach, R. K., Habegger, L., Hariharan, M., Rozowsky, J., Abyzov, A., Weissman, S. M., Gerstein, M., Struhl, K., Lamarre-Vincent, N., Lindahl-Allen, M., Miotto, B., Moqtaderi, Z., Fleming, J. D., Newburger, P., Farnham, P. J., Frietze, S., O'Geen, H., Xu, X., Blahnik, K. R., Cao, A. R., Iyengar, S., Stamatoyannopoulos, J. A., Kaul, R., Thurman, R. E., Wang, H., Navas, P. A., Sandstrom, R., Sabo, P. J., Weaver, M., Canfield, T., Lee, K., Neph, S., Roach, V., Reynolds, A., Johnson, A., Rynes, E., Giste, E., Vong, S., Neri, J., Frum, T., Johnson, E. M., Nguyen, E. D., Ebersol, A. K., Sanchez, M. E., Sheffer, H. H., Lotakis, D., Haugen, E., Humbert, R., Kutyavin, T., Shafer, T., Dekker, J., Lajoie, B. R., Sanyal, A., Kent, W. J., Rosenbloom, K. R., Dreszer, T. R., Raney, B. J., Barber, G. P., Meyer, L. R., Sloan, C. A., Malladi, V. S., Cline, M. S., Learned, K., Swing, V. K., Zweig, A. S., Rhead, B., Fujita, P. A., Roskin, K., Karolchik, D., Kuhn, R. M., Haussler, D., Birney, E., Dunham, I., Wilder, S. P., Keefe, D., Sobral, D., Herrero, J., Beal, K., Lukk, M., Brazma, A., Vaquerizas, J. M., Luscombe, N. M., Bickel, P. J., Boley, N., Brown, J. B., Li, Q., Huang, H., Gerstein, M., Habegger, L., Sboner, A., Rozowsky, J., Auerbach, R. K., Yip, K. Y., Cheng, C., Yan, K., Bhardwaj, N., Wang, J., Lochovsky, L., Jee, J., Gibson, T., Leng, J., Du, J., Hardison, R. C., Harris, R. S., Song, G., Miller, W., Haussler, D., Roskin, K., Suh, B., Wang, T., Paten, B., Noble, W. S., Hoffman, M. M., Buske, O. J., Weng, Z., Dong, X., Wang, J., Xi, H., Tenenbaum, S. A., Doyle, F., Penalva, L. O., Chittur, S., Tullius, T. D., Parker, S. C., White, K. P., Karmakar, S., Victorsen, A., Jameel, N., Bild, N., Grossman, R. L., Snyder, M., Landt, S. G., Yang, X., Patacsil, D., Slifer, T., Dekker, J., Lajoie, B. R., Sanyal, A., Weng, Z., Whitfield, T. W., Wang, J., Collins, P. J., Trinklein, N. D., Partridge, E. C., Myers, R. M., Giddings, M. C., Chen, X., Khatun, J., Maier, C., Yu, Y., Gunawardena, H., Risk, B., Feingold, E. A., Lowdon, R. F., Dillon, L. A., Good, P. J. 2011; 9 (4)
  • Diverse Roles and Interactions of the SWI/SNF Chromatin Remodeling Complex Revealed Using Global Approaches PLOS GENETICS Euskirchen, G. M., Auerbach, R. K., Davidov, E., Gianoulis, T. A., Zhong, G., Rozowsky, J., Bhardwaj, N., Gerstein, M. B., Snyder, M. 2011; 7 (3)

    Abstract

    A systems understanding of nuclear organization and events is critical for determining how cells divide, differentiate, and respond to stimuli and for identifying the causes of diseases. Chromatin remodeling complexes such as SWI/SNF have been implicated in a wide variety of cellular processes including gene expression, nuclear organization, centromere function, and chromosomal stability, and mutations in SWI/SNF components have been linked to several types of cancer. To better understand the biological processes in which chromatin remodeling proteins participate, we globally mapped binding regions for several components of the SWI/SNF complex throughout the human genome using ChIP-Seq. SWI/SNF components were found to lie near regulatory elements integral to transcription (e.g. 5' ends, RNA Polymerases II and III, and enhancers) as well as regions critical for chromosome organization (e.g. CTCF, lamins, and DNA replication origins). Interestingly we also find that certain configurations of SWI/SNF subunits are associated with transcripts that have higher levels of expression, whereas other configurations of SWI/SNF factors are associated with transcripts that have lower levels of expression. To further elucidate the association of SWI/SNF subunits with each other as well as with other nuclear proteins, we also analyzed SWI/SNF immunoprecipitated complexes by mass spectrometry. Individual SWI/SNF factors are associated with their own family members, as well as with cellular constituents such as nuclear matrix proteins, key transcription factors, and centromere components, implying a ubiquitous role in gene regulation and nuclear function. We find an overrepresentation of both SWI/SNF-associated regions and proteins in cell cycle and chromosome organization. Taken together the results from our ChIP and immunoprecipitation experiments suggest that SWI/SNF facilitates gene regulation and genome function more broadly and through a greater diversity of interactions than previously appreciated.

    View details for DOI 10.1371/journal.pgen.1002008

    View details for Web of Science ID 000288996600042

    View details for PubMedID 21408204

  • Mapping copy number variation by population-scale genome sequencing NATURE Mills, R. E., Walter, K., Stewart, C., Handsaker, R. E., Chen, K., Alkan, C., Abyzov, A., Yoon, S. C., Ye, K., Cheetham, R. K., Chinwalla, A., Conrad, D. F., Fu, Y., Grubert, F., Hajirasouliha, I., Hormozdiari, F., Iakoucheva, L. M., Iqbal, Z., Kang, S., Kidd, J. M., Konkel, M. K., Korn, J., Khurana, E., Kural, D., Lam, H. Y., Leng, J., Li, R., Li, Y., Lin, C., Luo, R., Mu, X. J., Nemesh, J., Peckham, H. E., Rausch, T., Scally, A., Shi, X., Stromberg, M. P., Stuetz, A. M., Urban, A. E., Walker, J. A., Wu, J., Zhang, Y., Zhang, Z. D., Batzer, M. A., Ding, L., Marth, G. T., McVean, G., Sebat, J., Snyder, M., Wang, J., Ye, K., Eichler, E. E., Gerstein, M. B., Hurles, M. E., Lee, C., McCarroll, S. A., Korbel, J. O. 2011; 470 (7332): 59-65

    Abstract

    Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.

    View details for DOI 10.1038/nature09708

    View details for Web of Science ID 000286886400033

    View details for PubMedID 21293372

  • Prediction and characterization of noncoding RNAs in C. elegans by integrating conservation, secondary structure, and high-throughput sequencing and array data GENOME RESEARCH Lu, Z. J., Yip, K. Y., Wang, G., Shou, C., Hillier, L. W., Khurana, E., Agarwal, A., Auerbach, R., Rozowsky, J., Cheng, C., Kato, M., Miller, D. M., Slack, F., Snyder, M., Waterston, R. H., Reinke, V., Gerstein, M. B. 2011; 21 (2): 276-285

    Abstract

    We present an integrative machine learning method, incRNA, for whole-genome identification of noncoding RNAs (ncRNAs). It combines a large amount of expression data, RNA secondary-structure stability, and evolutionary conservation at the protein and nucleic-acid level. Using the incRNA model and data from the modENCODE consortium, we are able to separate known C. elegans ncRNAs from coding sequences and other genomic elements with a high level of accuracy (97% AUC on an independent validation set), and find more than 7000 novel ncRNA candidates, among which more than 1000 are located in the intergenic regions of C. elegans genome. Based on the validation set, we estimate that 91% of the approximately 7000 novel ncRNA candidates are true positives. We then analyze 15 novel ncRNA candidates by RT-PCR, detecting the expression for 14. In addition, we characterize the properties of all the novel ncRNA candidates and find that they have distinct expression patterns across developmental stages and tend to use novel RNA structural families. We also find that they are often targeted by specific transcription factors (?59% of intergenic novel ncRNA candidates). Overall, our study identifies many new potential ncRNAs in C. elegans and provides a method that can be adapted to other organisms.

    View details for DOI 10.1101/gr.110189.110

    View details for Web of Science ID 000286804100013

    View details for PubMedID 21177971

  • Diverse transcription factor binding features revealed by genome-wide ChIP-seq in C. elegans GENOME RESEARCH Niu, W., Lu, Z. J., Zhong, M., Sarov, M., Murray, J. I., Brdlik, C. M., Janette, J., Chen, C., Alves, P., Preston, E., Slightham, C., Jiang, L., Hyman, A. A., Kim, S. K., Waterston, R. H., Gerstein, M., Snyder, M., Reinke, V. 2011; 21 (2): 245-254

    Abstract

    Regulation of gene expression by sequence-specific transcription factors is central to developmental programs and depends on the binding of transcription factors with target sites in the genome. To date, most such analyses in Caenorhabditis elegans have focused on the interactions between a single transcription factor with one or a few select target genes. As part of the modENCODE Consortium, we have used chromatin immunoprecipitation coupled with high-throughput DNA sequencing (ChIP-seq) to determine the genome-wide binding sites of 22 transcription factors (ALR-1, BLMP-1, CEH-14, CEH-30, EGL-27, EGL-5, ELT-3, EOR-1, GEI-11, HLH-1, LIN-11, LIN-13, LIN-15B, LIN-39, MAB-5, MDL-1, MEP-1, PES-1, PHA-4, PQM-1, SKN-1, and UNC-130) at diverse developmental stages. For each factor we determined candidate gene targets, both coding and non-coding. The typical binding sites of almost all factors are within a few hundred nucleotides of the transcript start site. Most factors target a mixture of coding and non-coding target genes, although one factor preferentially binds to non-coding RNA genes. We built a regulatory network among the 22 factors to determine their functional relationships to each other and found that some factors appear to act preferentially as regulators and others as target genes. Examination of the binding targets of three related HOX factors--LIN-39, MAB-5, and EGL-5--indicates that these factors regulate genes involved in cellular migration, neuronal function, and vulval differentiation, consistent with their known roles in these developmental processes. Ultimately, the comprehensive mapping of transcription factor binding sites will identify features of transcriptional networks that regulate C. elegans developmental processes.

    View details for DOI 10.1101/gr.114587.110

    View details for Web of Science ID 000286804100010

    View details for PubMedID 21177963

  • RSEQtools: a modular framework to analyze RNA-Seq data using compact, anonymized data summaries BIOINFORMATICS Habegger, L., Sboner, A., Gianoulis, T. A., Rozowsky, J., Agarwal, A., Snyder, M., Gerstein, M. 2011; 27 (2): 281-283

    Abstract

    The advent of next-generation sequencing for functional genomics has given rise to quantities of sequence information that are often so large that they are difficult to handle. Moreover, sequence reads from a specific individual can contain sufficient information to potentially identify and genetically characterize that person, raising privacy concerns. In order to address these issues, we have developed the Mapped Read Format (MRF), a compact data summary format for both short and long read alignments that enables the anonymization of confidential sequence information, while allowing one to still carry out many functional genomics studies. We have developed a suite of tools (RSEQtools) that use this format for the analysis of RNA-Seq experiments. These tools consist of a set of modules that perform common tasks such as calculating gene expression values, generating signal tracks of mapped reads and segmenting that signal into actively transcribed regions. Moreover, the tools can readily be used to build customizable RNA-Seq workflows. In addition to the anonymization afforded by MRF, this format also facilitates the decoupling of the alignment of reads from downstream analyses. Availability and implementation: RSEQtools is implemented in C and the source code is available at http://rseqtools.gersteinlab.org/.

    View details for DOI 10.1093/bioinformatics/btq643

    View details for Web of Science ID 000286215200025

    View details for PubMedID 21134889

  • The human proteome project: Current state and future direction. Molecular & cellular proteomics : MCP Legrain, P., Aebersold, R., Archakov, A., Bairoch, A., Bala, K., Beretta, L., Bergeron, J., Borchers, C., Corthals, G. L., Costello, C. E., Deutsch, E. W., Domon, B., Hancock, W., He, F., Hochstrasser, D., Marko-Varga, G., Salekdeh, G. H., Sechi, S., Snyder, M., Srivastava, S., Uhlen, M., Hu, C. H., Yamamoto, T., Paik, Y. K., Omenn, G. S. 2011

    Abstract

    After successful completion of the Human Genome Project (HGP), HUPO has recently officially launched a global Human Proteome Project (HPP) which is designed to map the entire human protein set. Given the presence of about 30% undisclosed proteins out of 20,300 protein gene products, a systematic global effort is necessary to achieve this goal with respect to protein abundance, distribution, subcellular localization, interaction with other biomolecules, and functions at specific time points. As a general experimental strategy, HPP groups employ the three working pillars for HPP: mass spectrometry, antibody capture, and bioinformatics tools and knowledge base. The HPP participants will take advantage of the output and cross-analyses from the ongoing HUPO initiatives and a chromosome-based protein mapping strategy, termed C-HPP with many national teams currently engaged. In addition, numerous biologically-driven projects will be stimulated and facilitated by the HPP. Timely planning with proper governance of HPP will deliver a protein parts list, reagents and tools for protein studies and analyses, and a stronger basis for personalized medicine. HUPO urges each national research funding agency and the scientific community at large to identify their preferred pathways to participate in aspects of this highly promising project in a HPP consortium of funders and investigators.

    View details for PubMedID 21531903

  • The CRIT framework for identifying cross patterns in systems biology and application to chemogenomics GENOME BIOLOGY Gianoulis, T. A., Agarwal, A., Snyder, M., Gerstein, M. B. 2011; 12 (3)

    Abstract

    Biological data is often tabular but finding statistically valid connections between entities in a sequence of tables can be problematic--for example, connecting particular entities in a drug property table to gene properties in a second table, using a third table associating genes with drugs. Here we present an approach (CRIT) to find connections such as these and show how it can be applied in a variety of genomic contexts including chemogenomics data.

    View details for DOI 10.1186/gb-2011-12-3-r32

    View details for Web of Science ID 000291309200012

    View details for PubMedID 21453526

  • Regulatory Variation Within Between Species ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, VOL 12 Zheng, W., Gianoulis, T. A., Karczewski, K. J., Zhao, H., Snyder, M. 2011; 12: 327-346

    Abstract

    Understanding how individuals differ from one another and from closely related species is a fundamental problem in biology. Recent evidence suggests that much of the variation both within and between species is due to differential gene regulation. Here we review differential gene regulation focusing on evolutionary-developmental (evo-devo) biology, global comparison of genomic sequences, whole-genome gene expression, and transcription factor (TF) binding profiles. We also explore the relationship between divergence rate of regulatory sequences, coding sequences, and TF binding events using several different measures and discuss their implications in the context of evolution of regulatory networks. Finally, we discuss the current status and future challenges in relating regulatory variation to the divergence across and within species.

    View details for DOI 10.1146/annurev-genom-082908-150139

    View details for Web of Science ID 000295819900014

    View details for PubMedID 21721942

  • Kinase substrate interactions. Methods in molecular biology (Clifton, N.J.) Smith, M. G., Ptacek, J., Snyder, M. 2011; 723: 201-212

    Abstract

    Kinases have become popular therapeutic targets primarily due to their integral role in cell cycle and tumor progression. The efficacy of high-throughput screening efforts is dependent on the development of high quality multiplex tools capable of replacing lower-throughput technologies such as mass spectroscopy or solution-based assays for the study of kinase-substrate interactions. Functional protein microarrays are comprised of thousands of immobilized proteins on glass slides that have been used successfully to identify protein-protein interactions. Here, we describe the application of functional protein microarrays for the identification of the phosphorylation targets of individual protein kinases using highly sensitive radioactive detection and robust informatics algorithms.

    View details for DOI 10.1007/978-1-61779-043-0_13

    View details for PubMedID 21370067

  • Measuring the Evolutionary Rewiring of Biological Networks PLOS COMPUTATIONAL BIOLOGY Shou, C., Bhardwaj, N., Lam, H. Y., Yan, K., Kim, P. M., Snyder, M., Gerstein, M. B. 2011; 7 (1)

    Abstract

    We have accumulated a large amount of biological network data and expect even more to come. Soon, we anticipate being able to compare many different biological networks as we commonly do for molecular sequences. It has long been believed that many of these networks change, or "rewire", at different rates. It is therefore important to develop a framework to quantify the differences between networks in a unified fashion. We developed such a formalism based on analogy to simple models of sequence evolution, and used it to conduct a systematic study of network rewiring on all the currently available biological networks. We found that, similar to sequences, biological networks show a decreased rate of change at large time divergences, because of saturation in potential substitutions. However, different types of biological networks consistently rewire at different rates. Using comparative genomics and proteomics data, we found a consistent ordering of the rewiring rates: transcription regulatory, phosphorylation regulatory, genetic interaction, miRNA regulatory, protein interaction, and metabolic pathway network, from fast to slow. This ordering was found in all comparisons we did of matched networks between organisms. To gain further intuition on network rewiring, we compared our observed rewirings with those obtained from simulation. We also investigated how readily our formalism could be mapped to other network contexts; in particular, we showed how it could be applied to analyze changes in a range of "commonplace" networks such as family trees, co-authorships and linux-kernel function dependencies.

    View details for DOI 10.1371/journal.pcbi.1001050

    View details for Web of Science ID 000286652100009

    View details for PubMedID 21253555

  • RNA sequencing. Methods in molecular biology (Clifton, N.J.) Waern, K., Nagalakshmi, U., Snyder, M. 2011; 759: 125-132

    Abstract

    This chapter describes the RNA sequencing (RNA-Seq) protocol, whereby RNA from yeast cells is prepared for sequencing on an Illumina Genome Analyzer. The protocol can easily be altered to use RNA from a different organism. This chapter covers RNA extraction, cDNA synthesis, cDNA fragmentation, and Illumina cDNA library generation and contains some brief remarks on bioinformatic analysis.

    View details for DOI 10.1007/978-1-61779-173-4_8

    View details for PubMedID 21863485

  • Integrative Analysis of the Caenorhabditis elegans Genome by the modENCODE Project SCIENCE Gerstein, M. B., Lu, Z. J., Van Nostrand, E. L., Cheng, C., Arshinoff, B. I., Liu, T., Yip, K. Y., Robilotto, R., Rechtsteiner, A., Ikegami, K., Alves, P., Chateigner, A., Perry, M., Morris, M., Auerbach, R. K., Feng, X., Leng, J., Vielle, A., Niu, W., Rhrissorrakrai, K., Agarwal, A., Alexander, R. P., Barber, G., Brdlik, C. M., Brennan, J., Brouillet, J. J., Carr, A., Cheung, M., Clawson, H., Contrino, S., Dannenberg, L. O., Dernburg, A. F., Desai, A., Dick, L., Dose, A. C., Du, J., Egelhofer, T., Ercan, S., Euskirchen, G., Ewing, B., Feingold, E. A., Gassmann, R., Good, P. J., Green, P., Gullier, F., Gutwein, M., Guyer, M. S., Habegger, L., Han, T., Henikoff, J. G., Henz, S. R., Hinrichs, A., Holster, H., Hyman, T., Iniguez, A. L., Janette, J., Jensen, M., Kato, M., Kent, W. J., Kephart, E., Khivansara, V., Khurana, E., Kim, J. K., Kolasinska-Zwierz, P., Lai, E. C., Latorre, I., Leahey, A., Lewis, S., Lloyd, P., Lochovsky, L., Lowdon, R. F., Lubling, Y., Lyne, R., MacCoss, M., Mackowiak, S. D., Mangone, M., McKay, S., Mecenas, D., Merrihew, G., Miller, D. M., Muroyama, A., Murray, J. I., Ooi, S., Pham, H., Phippen, T., Preston, E. A., Rajewsky, N., Raetsch, G., Rosenbaum, H., Rozowsky, J., Rutherford, K., Ruzanov, P., Sarov, M., Sasidharan, R., Sboner, A., Scheid, P., Segal, E., Shin, H., Shou, C., Slack, F. J., Slightam, C., Smith, R., Spencer, W. C., Stinson, E. O., Taing, S., Takasaki, T., Vafeados, D., Voronina, K., Wang, G., Washington, N. L., Whittle, C. M., Wu, B., Yan, K., Zeller, G., Zha, Z., Zhong, M., Zhou, X., Ahringer, J., Strome, S., Gunsalus, K. C., Micklem, G., Liu, X. S., Reinke, V., Kim, S. K., Hillier, L. W., Henikoff, S., Piano, F., Snyder, M., Stein, L., Lieb, J. D., Waterston, R. H. 2010; 330 (6012): 1775-1787

    Abstract

    We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor-binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor-binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.

    View details for DOI 10.1126/science.1196914

    View details for Web of Science ID 000285603700031

    View details for PubMedID 21177976

  • Extensive In Vivo Metabolite-Protein Interactions Revealed by Large-Scale Systematic Analyses CELL Li, X., Gianoulis, T. A., Yip, K. Y., Gerstein, M., Snyder, M. 2010; 143 (4): 639-650

    Abstract

    Natural small compounds comprise most cellular molecules and bind proteins as substrates, products, cofactors, and ligands. However, a large-scale investigation of in vivo protein-small metabolite interactions has not been performed. We developed a mass spectrometry assay for the large-scale identification of in vivo protein-hydrophobic small metabolite interactions in yeast and analyzed compounds that bind ergosterol biosynthetic proteins and protein kinases. Many of these proteins bind small metabolites; a few interactions were previously known, but the vast majority are new. Importantly, many key regulatory proteins such as protein kinases bind metabolites. Ergosterol was found to bind many proteins and may function as a general regulator. It is required for the activity of Ypk1, a mammalian AKT/SGK kinase homolog. Our study defines potential key regulatory steps in lipid biosynthetic pathways and suggests that small metabolites may play a more general role as regulators of protein activity and function than previously appreciated.

    View details for DOI 10.1016/j.cell.2010.09.048

    View details for Web of Science ID 000284149100020

    View details for PubMedID 21035178

  • Yeast proteomics and protein microarrays JOURNAL OF PROTEOMICS Chen, R., Snyder, M. 2010; 73 (11): 2147-2157

    Abstract

    Our understanding of biological processes as well as human diseases has improved greatly thanks to studies on model organisms such as yeast. The power of scientific approaches with yeast lies in its relatively simple genome, its facile classical and molecular genetics, as well as the evolutionary conservation of many basic biological mechanisms. However, even in this simple model organism, systems biology studies, especially proteomic studies had been an intimidating task. During the past decade, powerful high-throughput technologies in proteomic research have been developed for yeast including protein microarray technology. The protein microarray technology allows the interrogation of protein-protein, protein-DNA, protein-small molecule interaction networks as well as post-translational modification networks in a large-scale, high-throughput manner. With this technology, many groundbreaking findings have been established in studies with the budding yeast Saccharomyces cerevisiae, most of which could have been unachievable with traditional approaches. Discovery of these networks has profound impact on explicating biological processes with a proteomic point of view, which may lead to a better understanding of normal biological phenomena as well as various human diseases.

    View details for DOI 10.1016/j.jprot.2010.08.003

    View details for Web of Science ID 000283903000008

    View details for PubMedID 20728591

  • Comprehensive annotation of the transcriptome of the human fungal pathogen Candida albicans using RNA-seq GENOME RESEARCH Bruno, V. M., Wang, Z., Marjani, S. L., Euskirchen, G. M., Martin, J., Sherlock, G., Snyder, M. 2010; 20 (10): 1451-1458

    Abstract

    Candida albicans is the major invasive fungal pathogen of humans, causing diseases ranging from superficial mucosal infections to disseminated, systemic infections that are often lifethreatening. We have used massively parallel high-throughput sequencing of cDNA (RNA-seq) to generate a high-resolution map of the C. albicans transcriptome under several different environmental conditions. We have quantitatively determined all of the regions that are transcribed under these different conditions, and have identified 602 novel transcriptionally active regions (TARs) and numerous novel introns that are not represented in the current genome annotation. Interestingly, the expression of many of these TARs is regulated in a condition-specific manner. This comprehensive transcriptome analysis significantly enhances the current genome annotation of C. albicans, a necessary framework for a complete understanding of the molecular mechanisms of pathogenesis for this important eukaryotic pathogen.

    View details for DOI 10.1101/gr.109553.110

    View details for Web of Science ID 000282375000015

    View details for PubMedID 20810668

  • Annotating non-coding regions of the genome NATURE REVIEWS GENETICS Alexander, R. P., Fang, G., Rozowsky, J., Snyder, M., Gerstein, M. B. 2010; 11 (8): 559-571

    Abstract

    Most of the human genome consists of non-protein-coding DNA. Recently, progress has been made in annotating these non-coding regions through the interpretation of functional genomics experiments and comparative sequence analysis. One can conceptualize functional genomics analysis as involving a sequence of steps: turning the output of an experiment into a 'signal' at each base pair of the genome; smoothing this signal and segmenting it into small blocks of initial annotation; and then clustering these small blocks into larger derived annotations and networks. Finally, one can relate functional genomics annotations to conserved units and measures of conservation derived from comparative sequence analysis.

    View details for DOI 10.1038/nrg2814

    View details for Web of Science ID 000279988800012

    View details for PubMedID 20628352

  • MOTIPS: Automated Motif Analysis for Predicting Targets of Modular Protein Domains BMC BIOINFORMATICS Lam, H. Y., Kim, P. M., Mok, J., Tonikian, R., Sidhu, S. S., Turk, B. E., Snyder, M., Gerstein, M. B. 2010; 11

    Abstract

    Many protein interactions, especially those involved in signaling, involve short linear motifs consisting of 5-10 amino acid residues that interact with modular protein domains such as the SH3 binding domains and the kinase catalytic domains. One straightforward way of identifying these interactions is by scanning for matches to the motif against all the sequences in a target proteome. However, predicting domain targets by motif sequence alone without considering other genomic and structural information has been shown to be lacking in accuracy.We developed an efficient search algorithm to scan the target proteome for potential domain targets and to increase the accuracy of each hit by integrating a variety of pre-computed features, such as conservation, surface propensity, and disorder. The integration is performed using naïve Bayes and a training set of validated experiments.By integrating a variety of biologically relevant features to predict domain targets, we demonstrated a notably improved prediction of modular protein domain targets. Combined with emerging high-resolution data of domain specificities, we believe that our approach can assist in the reconstruction of many signaling pathways.

    View details for DOI 10.1186/1471-2105-11-243

    View details for Web of Science ID 000279728900007

    View details for PubMedID 20459839

  • Genomic binding profiles of functionally distinct RNA polymerase III transcription complexes in human cells NATURE STRUCTURAL & MOLECULAR BIOLOGY Moqtaderi, Z., Wang, J., Raha, D., White, R. J., Snyder, M., Weng, Z., Struhl, K. 2010; 17 (5): 635-U139

    Abstract

    Genome-wide occupancy profiles of five components of the RNA polymerase III (Pol III) machinery in human cells identified the expected tRNA and noncoding RNA targets and revealed many additional Pol III-associated loci, mostly near short interspersed elements (SINEs). Several genes are targets of an alternative transcription factor IIIB (TFIIIB) containing Brf2 instead of Brf1 and have extremely low levels of TFIIIC. Strikingly, expressed Pol III genes, unlike nonexpressed Pol III genes, are situated in regions with a pattern of histone modifications associated with functional Pol II promoters. TFIIIC alone associates with numerous ETC loci, via the B box or a novel motif. ETCs are often near CTCF binding sites, suggesting a potential role in chromosome organization. Our results suggest that human Pol III complexes associate preferentially with regions near functional Pol II promoters and that TFIIIC-mediated recruitment of TFIIIB is regulated in a locus-specific manner.

    View details for DOI 10.1038/nsmb.1794

    View details for Web of Science ID 000277330700020

    View details for PubMedID 20418883

  • Genetic analysis of variation in transcription factor binding in yeast NATURE Zheng, W., Zhao, H., Mancera, E., Steinmetz, L. M., Snyder, M. 2010; 464 (7292): 1187-U106

    Abstract

    Variation in transcriptional regulation is thought to be a major cause of phenotypic diversity. Although widespread differences in gene expression among individuals of a species have been observed, studies to examine the variability of transcription factor binding on a global scale have not been performed, and thus the extent and underlying genetic basis of transcription factor binding diversity is unknown. By mapping differences in transcription factor binding among individuals, here we present the genetic basis of such variation on a genome-wide scale. Whole-genome Ste12-binding profiles were determined using chromatin immunoprecipitation coupled with DNA sequencing in pheromone-treated cells of 43 segregants of a cross between two highly diverged yeast strains and their parental lines. We identified extensive Ste12-binding variation among individuals, and mapped underlying cis- and trans-acting loci responsible for such variation. We showed that most transcription factor binding variation is cis-linked, and that many variations are associated with polymorphisms residing in the binding motifs of Ste12 as well as those of several proposed Ste12 cofactors. We also identified two trans-factors, AMN1 and FLO8, that modulate Ste12 binding to promoters of more than ten genes under alpha-factor treatment. Neither of these two genes was previously known to regulate Ste12, and we suggest that they may be mediators of gene activity and phenotypic diversity. Ste12 binding strongly correlates with gene expression for more than 200 genes, indicating that binding variation is functional. Many of the variable-bound genes are involved in cell wall organization and biogenesis. Overall, these studies identified genetic regulators of molecular diversity among individuals and provide new insights into mechanisms of gene regulation.

    View details for DOI 10.1038/nature08934

    View details for Web of Science ID 000276891100036

    View details for PubMedID 20237471

  • Variation in Transcription Factor Binding Among Humans SCIENCE Kasowski, M., Grubert, F., Heffelfinger, C., Hariharan, M., Asabere, A., Waszak, S. M., Habegger, L., Rozowsky, J., Shi, M., Urban, A. E., Hong, M., Karczewski, K. J., Huber, W., Weissman, S. M., Gerstein, M. B., Korbel, J. O., Snyder, M. 2010; 328 (5975): 232-235

    Abstract

    Differences in gene expression may play a major role in speciation and phenotypic diversity. We examined genome-wide differences in transcription factor (TF) binding in several humans and a single chimpanzee by using chromatin immunoprecipitation followed by sequencing. The binding sites of RNA polymerase II (PolII) and a key regulator of immune responses, nuclear factor kappaB (p65), were mapped in 10 lymphoblastoid cell lines, and 25 and 7.5% of the respective binding regions were found to differ between individuals. Binding differences were frequently associated with single-nucleotide polymorphisms and genomic structural variants, and these differences were often correlated with differences in gene expression, suggesting functional consequences of binding variation. Furthermore, comparing PolII binding between humans and chimpanzee suggests extensive divergence in TF binding. Our results indicate that many differences in individuals and species occur at the level of TF binding, and they provide insight into the genetic events responsible for these differences.

    View details for DOI 10.1126/science.1183621

    View details for Web of Science ID 000276459600043

    View details for PubMedID 20299548

  • Molecular Mechanisms of Ethanol-Induced Pathogenesis Revealed by RNA-Sequencing PLOS PATHOGENS Camarena, L., Bruno, V., Euskirchen, G., Poggio, S., Snyder, M. 2010; 6 (4)

    Abstract

    Acinetobacter baumannii is a common pathogen whose recent resistance to drugs has emerged as a major health problem. Ethanol has been found to increase the virulence of A. baumannii in Dictyostelium discoideum and Caenorhabditis elegans models of infection. To better understand the causes of this effect, we examined the transcriptional profile of A. baumannii grown in the presence or absence of ethanol using RNA-Seq. Using the Illumina/Solexa platform, a total of 43,453,960 reads (35 nt) were obtained, of which 3,596,474 mapped uniquely to the genome. Our analysis revealed that ethanol induces the expression of 49 genes that belong to different functional categories. A strong induction was observed for genes encoding metabolic enzymes, indicating that ethanol is efficiently assimilated. In addition, we detected the induction of genes encoding stress proteins, including upsA, hsp90, groEL and lon as well as permeases, efflux pumps and a secreted phospholipase C. In stationary phase, ethanol strongly induced several genes involved with iron assimilation and a high-affinity phosphate transport system, indicating that A. baumannii makes a better use of the iron and phosphate resources in the medium when ethanol is used as a carbon source. To evaluate the role of phospholipase C (Plc1) in virulence, we generated and analyzed a deletion mutant for plc1. This strain exhibits a modest, but reproducible, reduction in the cytotoxic effect caused by A. baumannii on epithelial cells, suggesting that phospholipase C is important for virulence. Overall, our results indicate the power of applying RNA-Seq to identify key modulators of bacterial pathogenesis. We suggest that the effect of ethanol on the virulence of A. baumannii is multifactorial and includes a general stress response and other specific components such as phospholipase C.

    View details for DOI 10.1371/journal.ppat.1000834

    View details for Web of Science ID 000277722400007

    View details for PubMedID 20368969

  • Dynamic transcriptomes during neural differentiation of human embryonic stem cells revealed by short, long, and paired-end sequencing PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Wu, J. Q., Habegger, L., Noisa, P., Szekely, A., Qiu, C., Hutchison, S., Raha, D., Egholm, M., Lin, H., Weissman, S., Cui, W., Gerstein, M., Snyder, M. 2010; 107 (11): 5254-5259

    Abstract

    To examine the fundamental mechanisms governing neural differentiation, we analyzed the transcriptome changes that occur during the differentiation of hESCs into the neural lineage. Undifferentiated hESCs as well as cells at three stages of early neural differentiation-N1 (early initiation), N2 (neural progenitor), and N3 (early glial-like)-were analyzed using a combination of single read, paired-end read, and long read RNA sequencing. The results revealed enormous complexity in gene transcription and splicing dynamics during neural cell differentiation. We found previously unannotated transcripts and spliced isoforms specific for each stage of differentiation. Interestingly, splicing isoform diversity is highest in undifferentiated hESCs and decreases upon differentiation, a phenomenon we call isoform specialization. During neural differentiation, we observed differential expression of many types of genes, including those involved in key signaling pathways, and a large number of extracellular receptors exhibit stage-specific regulation. These results provide a valuable resource for studying neural differentiation and reveal insights into the mechanisms underlying in vitro neural differentiation of hESCs, such as neural fate specification, neural progenitor cell identity maintenance, and the transition from a predominantly neuronal state into one with increased gliogenic potential.

    View details for DOI 10.1073/pnas.0914114107

    View details for Web of Science ID 000275714300079

    View details for PubMedID 20194744

  • Personal genome sequencing: current approaches and challenges GENES & DEVELOPMENT Snyder, M., Du, J., Gerstein, M. 2010; 24 (5): 423-431

    Abstract

    The revolution in DNA sequencing technologies has now made it feasible to determine the genome sequences of many individuals; i.e., "personal genomes." Genome sequences of cells and tissues from both normal and disease states have been determined. Using current approaches, whole human genome sequences are not typically assembled and determined de novo, but, instead, variations relative to a reference sequence are identified. We discuss the current state of personal genome sequencing, the main steps involved in determining a genome sequence (i.e., identifying single-nucleotide polymorphisms [SNPs] and structural variations [SVs], assembling new sequences, and phasing haplotypes), and the challenges and performance metrics for evaluating the accuracy of the reconstruction. Finally, we consider the possible individual and societal benefits of personal genome sequences.

    View details for DOI 10.1101/gad.1864110

    View details for Web of Science ID 000275055900001

    View details for PubMedID 20194435

  • X chromosome-wide analyses of genomic DNA methylation states and gene expression in male and female neutrophils PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Yasukochi, Y., Maruyama, O., Mahajan, M. C., Padden, C., Euskirchen, G. M., Schulz, V., Hirakawa, H., Kuhara, S., Pan, X., Newburger, P. E., Snyder, M., Weissman, S. M. 2010; 107 (8): 3704-3709

    Abstract

    The DNA methylation status of human X chromosomes from male and female neutrophils was identified by high-throughput sequencing of HpaII and MspI digested fragments. In the intergenic and intragenic regions on the X chromosome, the sites outside CpG islands were heavily hypermethylated to the same degree in both genders. Nearly half of X chromosome promoters were either hypomethylated or hypermethylated in both females and males. Nearly one third of X chromosome promoters were a mixture of hypomethylated and heterogeneously methylated sites in females and were hypomethylated in males. Thus, a large fraction of genes that are silenced on the inactive X chromosome are hypomethylated in their promoter regions. These genes frequently belong to the evolutionarily younger strata of the X chromosome. The promoters that were hypomethylated at more than two sites contained most of the genes that escaped silencing on the inactive X chromosome. The overall levels of expression of X-linked genes were indistinguishable in females and males, regardless of the methylation state of the inactive X chromosome. Thus, in addition to DNA methylation, other factors are involved in the fine tuning of gene dosage compensation in neutrophils.

    View details for DOI 10.1073/pnas.0914812107

    View details for Web of Science ID 000275130900077

    View details for PubMedID 20133578

  • Close association of RNA polymerase II and many transcription factors with Pol III genes PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Raha, D., Wang, Z., Moqtaderi, Z., Wu, L., Zhong, G., Gerstein, M., Struhl, K., Snyder, M. 2010; 107 (8): 3639-3644

    Abstract

    Transcription of the eukaryotic genomes is carried out by three distinct RNA polymerases I, II, and III, whereby each polymerase is thought to independently transcribe a distinct set of genes. To investigate a possible relationship of RNA polymerases II and III, we mapped their in vivo binding sites throughout the human genome by using ChIP-Seq in two different cell lines, GM12878 and K562 cells. Pol III was found to bind near many known genes as well as several previously unidentified target genes. RNA-Seq studies indicate that a majority of the bound genes are expressed, although a subset are not suggestive of stalling by RNA polymerase III. Pol II was found to bind near many known Pol III genes, including tRNA, U6, HVG, hY, 7SK and previously unidentified Pol III target genes. Similarly, in vivo binding studies also reveal that a number of transcription factors normally associated with Pol II transcription, including c-Fos, c-Jun and c-Myc, also tightly associate with most Pol III-transcribed genes. Inhibition of Pol II activity using alpha-amanitin reduced expression of a number of Pol III genes (e.g., U6, hY, HVG), suggesting that Pol II plays an important role in regulating their transcription. These results indicate that, contrary to previous expectations, polymerases can often work with one another to globally coordinate gene expression.

    View details for DOI 10.1073/pnas.0911315106

    View details for Web of Science ID 000275130900066

    View details for PubMedID 20139302

  • Deciphering Protein Kinase Specificity Through Large-Scale Analysis of Yeast Phosphorylation Site Motifs SCIENCE SIGNALING Mok, J., Kim, P. M., Lam, H. Y., Piccirillo, S., Zhou, X., Jeschke, G. R., Sheridan, D. L., Parker, S. A., Desai, V., Jwa, M., Cameroni, E., Niu, H., Good, M., Remenyi, A., Ma, J. N., Sheu, Y., Sassi, H. E., Sopko, R., Chan, C. S., De Virgilio, C., Hollingsworth, N. M., Lim, W. A., Stern, D. F., Stillman, B., Andrews, B. J., Gerstein, M. B., Snyder, M., Turk, B. E. 2010; 3 (109)

    Abstract

    Phosphorylation is a universal mechanism for regulating cell behavior in eukaryotes. Although protein kinases target short linear sequence motifs on their substrates, the rules for kinase substrate recognition are not completely understood. We used a rapid peptide screening approach to determine consensus phosphorylation site motifs targeted by 61 of the 122 kinases in Saccharomyces cerevisiae. By correlating these motifs with kinase primary sequence, we uncovered previously unappreciated rules for determining specificity within the kinase family, including a residue determining P-3 arginine specificity among members of the CMGC [CDK (cyclin-dependent kinase), MAPK (mitogen-activated protein kinase), GSK (glycogen synthase kinase), and CDK-like] group of kinases. Furthermore, computational scanning of the yeast proteome enabled the prediction of thousands of new kinase-substrate relationships. We experimentally verified several candidate substrates of the Prk1 family of kinases in vitro and in vivo and identified a protein substrate of the kinase Vhs1. Together, these results elucidate how kinase catalytic domains recognize their phosphorylation targets and suggest general avenues for the identification of previously unknown kinase substrates across eukaryotes.

    View details for DOI 10.1126/scisignal.2000482

    View details for Web of Science ID 000275647900005

    View details for PubMedID 20159853

  • Genome-Wide Identification of Binding Sites Defines Distinct Functions for Caenorhabditis elegans PHA-4/FOXA in Development and Environmental Response PLOS GENETICS Zhong, M., Niu, W., Lu, Z. J., Sarov, M., Murray, J. I., Janette, J., Raha, D., Sheaffer, K. L., Lam, H. Y., Preston, E., Slightham, C., Hillier, L. W., Brock, T., Agarwal, A., Auerbach, R., Hyman, A. A., Gerstein, M., Mango, S. E., Kim, S. K., Waterston, R. H., Reinke, V., Snyder, M. 2010; 6 (2)

    Abstract

    Transcription factors are key components of regulatory networks that control development, as well as the response to environmental stimuli. We have established an experimental pipeline in Caenorhabditis elegans that permits global identification of the binding sites for transcription factors using chromatin immunoprecipitation and deep sequencing. We describe and validate this strategy, and apply it to the transcription factor PHA-4, which plays critical roles in organ development and other cellular processes. We identified thousands of binding sites for PHA-4 during formation of the embryonic pharynx, and also found a role for this factor during the starvation response. Many binding sites were found to shift dramatically between embryos and starved larvae, from developmentally regulated genes to genes involved in metabolism. These results indicate distinct roles for this regulator in two different biological processes and demonstrate the versatility of transcription factors in mediating diverse biological roles.

    View details for DOI 10.1371/journal.pgen.1000848

    View details for Web of Science ID 000275262700016

    View details for PubMedID 20174564

  • Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library NATURE BIOTECHNOLOGY Lam, H. Y., Mu, X. J., Stuetz, A. M., Tanzer, A., Cayting, P. D., Snyder, M., Kim, P. M., Korbel, J. O., Gerstein, M. B. 2010; 28 (1): 47-U76

    Abstract

    Structural variants (SVs) are a major source of human genomic variation; however, characterizing them at nucleotide resolution remains challenging. Here we assemble a library of breakpoints at nucleotide resolution from collating and standardizing ~2,000 published SVs. For each breakpoint, we infer its ancestral state (through comparison to primate genomes) and its mechanism of formation (e.g., nonallelic homologous recombination, NAHR). We characterize breakpoint sequences with respect to genomic landmarks, chromosomal location, sequence motifs and physical properties, finding that the occurrence of insertions and deletions is more balanced than previously reported and that NAHR-formed breakpoints are associated with relatively rigid, stable DNA helices. Finally, we demonstrate an approach, BreakSeq, for scanning the reads from short-read sequenced genomes against our breakpoint library to accurately identify previously overlooked SVs, which we then validate by PCR. As new data become available, we expect our BreakSeq approach will become more sensitive and facilitate rapid SV genotyping of personal genomes.

    View details for DOI 10.1038/nbt.1600

    View details for Web of Science ID 000273430400020

    View details for PubMedID 20037582

  • CHIP-SEQ: USING HIGH-THROUGHPUT DNA SEQUENCING FOR GENOME-WIDE IDENTIFICATION OF TRANSCRIPTION FACTOR BINDING SITES METHODS IN ENZYMOLOGY, VOL 470: GUIDE TO YEAST GENETICS: Lefrancois, P., Zheng, W., Snyder, M. 2010; 470: 77-104

    Abstract

    Much of eukaryotic gene regulation is mediated by binding of transcription factors near or within their target genes. Transcription factor binding sites (TFBS) are often identified globally using chromatin immunoprecipitation (ChIP) in which specific protein-DNA interactions are isolated using an antibody against the factor of interest. Coupling ChIP with high-throughput DNA sequencing allows identification of TFBS in a direct, unbiased fashion; this technique is termed ChIP-Sequencing (ChIP-Seq). In this chapter, we describe the yeast ChIP-Seq procedure, including the protocols for ChIP, input DNA preparation, and Illumina DNA sequencing library preparation. Descriptions of Illumina sequencing and data processing and analysis are also included. The use of multiplex short-read sequencing (i.e., barcoding) enables the analysis of many ChIP samples simultaneously, which is especially valuable for organisms with small genomes such as yeast.

    View details for DOI 10.1016/S0076-6879(10)70004-5

    View details for Web of Science ID 000275827900004

    View details for PubMedID 20946807

  • RNA-Seq: a method for comprehensive transcriptome analysis. Current protocols in molecular biology / edited by Frederick M. Ausubel ... [et al.] Nagalakshmi, U., Waern, K., Snyder, M. 2010; Chapter 4: Unit 4 11 1-13

    Abstract

    A recently developed technique called RNA Sequencing (RNA-Seq) uses massively parallel sequencing to allow transcriptome analyses of genomes at a far higher resolution than is available with Sanger sequencing- and microarray-based methods. In the RNA-Seq method, complementary DNAs (cDNAs) generated from the RNA of interest are directly sequenced using next-generation sequencing technologies. The reads obtained from this can then be aligned to a reference genome in order to construct a whole-genome transcriptome map. RNA-Seq has been used successfully to precisely quantify transcript levels, confirm or revise previously annotated 5' and 3' ends of genes, and map exon/intron boundaries. This unit describes protocols for performing RNA-Seq using the Illumina sequencing platform.

    View details for DOI 10.1002/0471142727.mb0411s89

    View details for PubMedID 20069539

  • Systems biology approaches to disease marker discovery DISEASE MARKERS Sharon, D., Chen, R., Snyder, M. 2010; 28 (4): 209-224

    Abstract

    Our understanding of human disease and potential therapeutics is improving rapidly. In order to take advantage of these developments it is important to be able to identify disease markers. Many new high-throughput genomics and proteomics technologies are being implemented to identify candidate disease markers. These technologies include protein microarrays, next-generation DNA sequencing and mass spectrometry platforms. Such methods are particularly important for elucidating the repertoire of molecular markers in the genome, transcriptome, proteome and metabolome of patients with diseases such as cancer, autoimmune diseases, and viral infections, resulting from the disruption of many biological pathways. These new technologies have identified many potential disease markers. These markers are expected to be valuable to achieve the promise of truly personalized medicine.

    View details for DOI 10.3233/DMA-2010-0707

    View details for Web of Science ID 000279321200003

    View details for PubMedID 20534906

  • EBNA1 regulates cellular gene expression by binding cellular promoters PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Canaan, A., Haviv, I., Urban, A. E., Schulz, V. P., Hartman, S., Zhang, Z., Palejev, D., Deisseroth, A. B., Lacy, J., Snyder, M., Gerstein, M., Weissman, S. M. 2009; 106 (52): 22421-22426

    Abstract

    Epstein-Barr virus (EBV) is associated with several types of lymphomas and epithelial tumors including Burkitt's lymphoma (BL), HIV-associated lymphoma, posttransplant lymphoproliferative disorder, and nasopharyngeal carcinoma. EBV nuclear antigen 1 (EBNA1) is expressed in all EBV associated tumors and is required for latency and transformation. EBNA1 initiates latent viral replication in B cells, maintains the viral genome copy number, and regulates transcription of other EBV-encoded latent genes. These activities are mediated through the ability of EBNA1 to bind viral-DNA. To further elucidate the role of EBNA1 in the host cell, we have examined the effect of EBNA1 on cellular gene expression by microarray analysis using the B cell BJAB and the epithelial 293 cell lines transfected with EBNA1. Analysis of the data revealed distinct profiles of cellular gene changes in BJAB and 293 cell lines. Subsequently, chromatin immune-precipitation revealed a direct binding of EBNA1 to cellular promoters. We have correlated EBNA1 bound promoters with changes in gene expression. Sequence analysis of the 100 promoters most enriched revealed a DNA motif that differs from the EBNA1 binding site in the EBV genome.

    View details for DOI 10.1073/pnas.0911676106

    View details for Web of Science ID 000273178700069

    View details for PubMedID 20080792

  • Mapping accessible chromatin regions using Sono-Seq PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Auerbach, R. K., Euskirchen, G., Rozowsky, J., Lamarre-Vincent, N., Moqtaderi, Z., Lefrancois, P., Struhl, K., Gerstein, M., Snyder, M. 2009; 106 (35): 14926-14931

    Abstract

    Disruptions in local chromatin structure often indicate features of biological interest such as regulatory regions. We find that sonication of cross-linked chromatin, when combined with a size-selection step and massively parallel short-read sequencing, can be used as a method (Sono-Seq) to map locations of high chromatin accessibility in promoter regions. Sono-Seq sites frequently correspond to actively transcribed promoter regions, as evidenced by their co-association with RNA Polymerase II ChIP regions, transcription start sites, histone H3 lysine 4 trimethylation (H3K4me3) marks, and CpG islands; signals over other sites, such as those bound by the CTCF insulator, are also observed. The pattern of breakage by Sono-Seq overlaps with, but is distinct from, that observed for FAIRE and DNase I hypersensitive sites. Our results demonstrate that Sono-Seq can be a useful and simple method by which to map many local alterations in chromatin structure. Furthermore, our results provide insights into the mapping of binding sites by using ChIP-Seq experiments and the value of reference samples that should be used in such experiments.

    View details for DOI 10.1073/pnas.0905443106

    View details for Web of Science ID 000269481000036

    View details for PubMedID 19706456

  • Global analysis of the glycoproteome in Saccharomyces cerevisiae reveals new roles for protein glycosylation in eukaryotes MOLECULAR SYSTEMS BIOLOGY Kung, L. A., Tao, S., Qian, J., Smith, M. G., Snyder, M., Zhu, H. 2009; 5

    Abstract

    To further understand the roles of protein glycosylation in eukaryotes, we globally identified glycan-containing proteins in yeast. A fluorescent lectin binding assay was developed and used to screen protein microarrays containing over 5000 proteins purified from yeast. A total of 534 yeast proteins were identified that bound either Concanavalin A (ConA) or Wheat-Germ Agglutinin (WGA); 406 of them were novel. Among the novel glycoproteins, 45 were validated by mobility shift upon treatment with EndoH and PNGase F, thereby extending the number of validated yeast glycoproteins to 350. In addition to many components of the secretory pathway, we identified other types of proteins, such as transcription factors and mitochondrial proteins. To further explore the role of glycosylation in mitochondrial function, the localization of four mitochondrial proteins was examined in the presence and absence of tunicamycin, an inhibitor of N-linked protein glycosylation. For two proteins, localization to the mitochondria is diminished upon tunicamycin treatment, indicating that protein glycosylation is important for protein function. Overall, our studies greatly extend our understanding of protein glycosylation in eukaryotes through the cataloguing of glycoproteins, and describe a novel role for protein glycosylation in mitochondrial protein function and localization.

    View details for DOI 10.1038/msb.2009.64

    View details for Web of Science ID 000270456400002

    View details for PubMedID 19756047

  • The genetic architecture of Down syndrome phenotypes revealed by high-resolution analysis of human segmental trisomies PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Korbel, J. O., Tirosh-Wagner, T., Urban, A. E., Chen, X., Kasowski, M., Dai, L., Grubert, F., Erdman, C., Gao, M. C., Lange, K., Sobel, E. M., Barlow, G. M., Aylsworth, A. S., Carpenter, N. J., Clark, R. D., Cohen, M. Y., Doran, E., Falik-Zaccai, T., Lewin, S. O., Lott, I. T., McGillivray, B. C., Moeschler, J. B., Pettenati, M. J., Pueschel, S. M., Rao, K. W., Shaffer, L. G., Shohat, M., Van Riper, A. J., Warburton, D., Weissman, S., Gerstein, M. B., Snyder, M., Korenberg, J. R. 2009; 106 (29): 12031-12036

    Abstract

    Down syndrome (DS), or trisomy 21, is a common disorder associated with several complex clinical phenotypes. Although several hypotheses have been put forward, it is unclear as to whether particular gene loci on chromosome 21 (HSA21) are sufficient to cause DS and its associated features. Here we present a high-resolution genetic map of DS phenotypes based on an analysis of 30 subjects carrying rare segmental trisomies of various regions of HSA21. By using state-of-the-art genomics technologies we mapped segmental trisomies at exon-level resolution and identified discrete regions of 1.8-16.3 Mb likely to be involved in the development of 8 DS phenotypes, 4 of which are congenital malformations, including acute megakaryocytic leukemia, transient myeloproliferative disorder, Hirschsprung disease, duodenal stenosis, imperforate anus, severe mental retardation, DS-Alzheimer Disease, and DS-specific congenital heart disease (DSCHD). Our DS-phenotypic maps located DSCHD to a <2-Mb interval. Furthermore, the map enabled us to present evidence against the necessary involvement of other loci as well as specific hypotheses that have been put forward in relation to the etiology of DS-i.e., the presence of a single DS consensus region and the sufficiency of DSCR1 and DYRK1A, or APP, in causing several severe DS phenotypes. Our study demonstrates the value of combining advanced genomics with cohorts of rare patients for studying DS, a prototype for the role of copy-number variation in complex disease.

    View details for DOI 10.1073/pnas.0813248106

    View details for Web of Science ID 000268178400040

    View details for PubMedID 19597142

  • Integrating Sequencing Technologies in Personal Genomics: Optimal Low Cost Reconstruction of Structural Variants PLOS COMPUTATIONAL BIOLOGY Du, J., Bjornson, R. D., Zhang, Z. D., Kong, Y., Snyder, M., Gerstein, M. B. 2009; 5 (7)

    Abstract

    The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen), with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs). SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome.) To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of human genomes at maximum accuracy and low cost.

    View details for DOI 10.1371/journal.pcbi.1000432

    View details for Web of Science ID 000269220100023

    View details for PubMedID 19593373

  • Unlocking the secrets of the genome NATURE Celniker, S. E., Dillon, L. A., Gerstein, M. B., Gunsalus, K. C., Henikoff, S., Karpen, G. H., Kellis, M., Lai, E. C., Lieb, J. D., MacAlpine, D. M., Micklem, G., Piano, F., Snyder, M., Stein, L., White, K. P., Waterston, R. H. 2009; 459 (7249): 927-930

    View details for DOI 10.1038/459927a

    View details for Web of Science ID 000267063500031

    View details for PubMedID 19536255

  • Dynamic and complex transcription factor binding during an inducible response in yeast GENES & DEVELOPMENT Ni, L., Bruce, C., Hart, C., Leigh-Bell, J., Gelperin, D., Umansky, L., Gerstein, M. B., Snyder, M. 2009; 23 (11): 1351-1363

    Abstract

    Complex biological processes are often regulated, at least in part, by the binding of transcription factors to their targets. Recently, considerable effort has been made to analyze the binding of relevant factors to the suite of targets they regulate, thereby generating a regulatory circuit map. However, for most studies the dynamics of binding have not been analyzed, and thus the temporal order of events and mechanisms by which this occurs are poorly understood. We globally analyzed in detail the temporal order of binding of several key factors involved in the salt response of yeast to their target genes. Analysis of Yap4 and Sko1 binding to their target genes revealed multiple temporal classes of binding patterns: (1) constant binding, (2) rapid induction, (3) slow induction, and (4) transient induction. These results demonstrate that individual transcription factors can have multiple binding patterns and help define the different types of temporal binding patterns used in eukaryotic gene regulation. To investigate these binding patterns further, we also analyzed the binding of seven other key transcription factors implicated in osmotic regulation, including Hot1, Msn1, Msn2, Msn4, Skn7, and Yap6, and found significant coassociation among the different factors at their gene targets. Moreover, the binding of several key factors was correlated with distinct classes of Yap4- and Sko1-binding patterns and with distinct types of genes. Gene expression studies revealed association of Yap4, Sko1, and other transcription factor-binding patterns with different gene expression patterns. The integration and analysis of binding and expression information reveals a complex dynamic and hierarchical circuit in which specific combinations of transcription factors target distinct sets of genes at discrete times to coordinate a rapid and important biological response.

    View details for DOI 10.1101/gad.1781909

    View details for Web of Science ID 000266524100009

    View details for PubMedID 19487574

  • A myelopoiesis-associated regulatory intergenic noncoding RNA transcript within the human HOXA cluster BLOOD Zhang, X., Lian, Z., Padden, C., Gerstein, M. B., Rozowsky, J., Snyder, M., Gingeras, T. R., Kapranov, P., Weissman, S. M., Newburger, P. E. 2009; 113 (11): 2526-2534

    Abstract

    We have identified an intergenic transcriptional activity that is located between the human HOXA1 and HOXA2 genes, shows myeloid-specific expression, and is up-regulated during granulocytic differentiation. The novel gene, termed HOTAIRM1 (HOX antisense intergenic RNA myeloid 1), is transcribed antisense to the HOXA genes and originates from the same CpG island that embeds the start site of HOXA1. The transcript appears to be a noncoding RNA containing no long open-reading frame; sucrose gradient analysis shows no association with polyribosomal fractions. HOTAIRM1 is the most prominent intergenic transcript expressed and up-regulated during induced granulocytic differentiation of NB4 promyelocytic leukemia and normal human hematopoietic cells; its expression is specific to the myeloid lineage. Its induction during retinoic acid (RA)-driven granulocytic differentiation is through RA receptor and may depend on the expression of myeloid cell development factors targeted by RA signaling. Knockdown of HOTAIRM1 quantitatively blunted RA-induced expression of HOXA1 and HOXA4 during the myeloid differentiation of NB4 cells, and selectively attenuated induction of transcripts for the myeloid differentiation genes CD11b and CD18, but did not noticeably impact the more distal HOXA genes. These findings suggest that HOTAIRM1 plays a role in the myelopoiesis through modulation of gene expression in the HOXA cluster.

    View details for DOI 10.1182/blood-2008-06-162164

    View details for Web of Science ID 000264110600021

    View details for PubMedID 19144990

  • A high throughput embryonic stem cell screen identifies Oct-2 as a bifunctional regulator of neuronal differentiation GENES & DEVELOPMENT Theodorou, E., Dalembert, G., Heffelfinger, C., White, E., Weissman, S., Corcoran, L., Snyder, M. 2009; 23 (5): 575-588

    Abstract

    Neuronal differentiation is a complex process that involves a plethora of regulatory steps. To identify transcription factors that influence neuronal differentiation we developed a high throughput screen using embryonic stem (ES) cells. Seven-hundred human transcription factor clones were stably introduced into mouse ES (mES) cells and screened for their ability to induce neuronal differentiation of mES cells. Twenty-four factors that are capable of inducing neuronal differentiation were identified, including four known effectors of neuronal differentiation, 11 factors with limited evidence of involvement in regulating neuronal differentiation, and nine novel factors. One transcription factor, Oct-2, was studied in detail and found to be a bifunctional regulator: It can either repress or induce neuronal differentiation, depending on the particular isoform. Ectopic expression experiments demonstrate that isoform Oct-2.4 represses neuronal differentiation, whereas Oct-2.2 activates neuron formation. Consistent with a role in neuronal differentiation, Oct-2.2 expression is induced during differentiation, and cells depleted of Oct-2 and its homolog Oct-1 have a reduced capacity to differentiate into neurons. Our results reveal a number of transcription factors potentially important for mammalian neuronal differentiation, and indicate that Oct-2 may serve as a binary switch to repress differentiation in precursor cells and induce neuronal differentiation later during neuronal development.

    View details for DOI 10.1101/gad.1772509

    View details for Web of Science ID 000263918500005

    View details for PubMedID 19270158

  • Quantifying environmental adaptation of metabolic pathways in metagenomics PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Gianoulis, T. A., Raes, J., Patel, P. V., Bjornson, R., Korbel, J. O., Letunic, I., Yamada, T., Paccanaro, A., Jensen, L. J., Snyder, M., Bork, P., Gerstein, M. B. 2009; 106 (5): 1374-1379

    Abstract

    Recently, approaches have been developed to sample the genetic content of heterogeneous environments (metagenomics). However, by what means these sequences link distinct environmental conditions with specific biological processes is not well understood. Thus, a major challenge is how the usage of particular pathways and subnetworks reflects the adaptation of microbial communities across environments and habitats-i.e., how network dynamics relates to environmental features. Previous research has treated environments as discrete, somewhat simplified classes (e.g., terrestrial vs. marine), and searched for obvious metabolic differences among them (i.e., treating the analysis as a typical classification problem). However, environmental differences result from combinations of many factors, which often vary only slightly. Therefore, we introduce an approach that employs correlation and regression to relate multiple, continuously varying factors defining an environment to the extent of particular microbial pathways present in a geographic site. Moreover, rather than looking only at individual correlations (one-to-one), we adapted canonical correlation analysis and related techniques to define an ensemble of weighted pathways that maximally covaries with a combination of environmental variables (many-to-many), which we term a metabolic footprint. Applied to available aquatic datasets, we identified footprints predictive of their environment that can potentially be used as biosensors. For example, we show a strong multivariate correlation between the energy-conversion strategies of a community and multiple environmental gradients (e.g., temperature). Moreover, we identified covariation in amino acid transport and cofactor synthesis, suggesting that limiting amounts of cofactor can (partially) explain increased import of amino acids in nutrient-limited conditions.

    View details for DOI 10.1073/pnas.0808022106

    View details for Web of Science ID 000263074600018

    View details for PubMedID 19164758

  • Efficient yeast ChIP-Seq using multiplex short-read DNA sequencing BMC GENOMICS Lefrancois, P., Euskirchen, G. M., Auerbach, R. K., Rozowsky, J., Gibson, T., Yellman, C. M., Gerstein, M., Snyder, M. 2009; 10

    Abstract

    Short-read high-throughput DNA sequencing technologies provide new tools to answer biological questions. However, high cost and low throughput limit their widespread use, particularly in organisms with smaller genomes such as S. cerevisiae. Although ChIP-Seq in mammalian cell lines is replacing array-based ChIP-chip as the standard for transcription factor binding studies, ChIP-Seq in yeast is still underutilized compared to ChIP-chip. We developed a multiplex barcoding system that allows simultaneous sequencing and analysis of multiple samples using Illumina's platform. We applied this method to analyze the chromosomal distributions of three yeast DNA binding proteins (Ste12, Cse4 and RNA PolII) and a reference sample (input DNA) in a single experiment and demonstrate its utility for rapid and accurate results at reduced costs.We developed a barcoding ChIP-Seq method for the concurrent analysis of transcription factor binding sites in yeast. Our multiplex strategy generated high quality data that was indistinguishable from data obtained with non-barcoded libraries. None of the barcoded adapters induced differences relative to a non-barcoded adapter when applied to the same DNA sample. We used this method to map the binding sites for Cse4, Ste12 and Pol II throughout the yeast genome and we found 148 binding targets for Cse4, 823 targets for Ste12 and 2508 targets for PolII. Cse4 was strongly bound to all yeast centromeres as expected and the remaining non-centromeric targets correspond to highly expressed genes in rich media. The presence of Cse4 non-centromeric binding sites was not reported previously.We designed a multiplex short-read DNA sequencing method to perform efficient ChIP-Seq in yeast and other small genome model organisms. This method produces accurate results with higher throughput and reduced cost. Given constant improvements in high-throughput sequencing technologies, increasing multiplexing will be possible to further decrease costs per sample and to accelerate the completion of large consortium projects such as modENCODE.

    View details for DOI 10.1186/1471-2164-10-37

    View details for Web of Science ID 000264970100002

    View details for PubMedID 19159457

  • Proteomic-Based Detection of a Protein Cluster Dysregulated during Cardiovascular Development Identifies Biomarkers of Congenital Heart Defects PLOS ONE Nath, A. K., Krauthammer, M., Li, P., Davidov, E., Butler, L. C., Copel, J., Katajamaa, M., Oresic, M., Buhimschi, I., Buhimschi, C., Snyder, M., Madri, J. A. 2009; 4 (1)

    Abstract

    Cardiovascular development is vital for embryonic survival and growth. Early gestation embryo loss or malformation has been linked to yolk sac vasculopathy and congenital heart defects (CHDs). However, the molecular pathways that underlie these structural defects in humans remain largely unknown hindering the development of molecular-based diagnostic tools and novel therapies.Murine embryos were exposed to high glucose, a condition known to induce cardiovascular defects in both animal models and humans. We further employed a mass spectrometry-based proteomics approach to identify proteins differentially expressed in embryos with defects from those with normal cardiovascular development. The proteins detected by mass spectrometry (WNT16, ST14, Pcsk1, Jumonji, Morca2a, TRPC5, and others) were validated by Western blotting and immunoflorescent staining of the yolk sac and heart. The proteins within the proteomic dataset clustered to adhesion/migration, differentiation, transport, and insulin signaling pathways. A functional role for several proteins (WNT16, ADAM15 and NOGO-A/B) was demonstrated in an ex vivo model of heart development. Additionally, a successful application of a cluster of protein biomarkers (WNT16, ST14 and Pcsk1) as a prenatal screen for CHDs was confirmed in a study of human amniotic fluid (AF) samples from women carrying normal fetuses and those with CHDs.The novel finding that WNT16, ST14 and Pcsk1 protein levels increase in fetuses with CHDs suggests that these proteins may play a role in the etiology of human CHDs. The information gained through this bed-side to bench translational approach contributes to a more complete understanding of the protein pathways dysregulated during cardiovascular development and provides novel avenues for diagnostic and therapeutic interventions, beneficial to fetuses at risk for CHDs.

    View details for DOI 10.1371/journal.pone.0004221

    View details for Web of Science ID 000265481900004

    View details for PubMedID 19156209

  • PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data GENOME BIOLOGY Korbel, J. O., Abyzov, A., Mu, X. J., Carriero, N., Cayting, P., Zhang, Z., Snyder, M., Gerstein, M. B. 2009; 10 (2)

    Abstract

    Personal-genomics endeavors, such as the 1000 Genomes project, are generating maps of genomic structural variants by analyzing ends of massively sequenced genome fragments. To process these we developed Paired-End Mapper (PEMer; http://sv.gersteinlab.org/pemer). This comprises an analysis pipeline, compatible with several next-generation sequencing platforms; simulation-based error models, yielding confidence-values for each structural variant; and a back-end database. The simulations demonstrated high structural variant reconstruction efficiency for PEMer's coverage-adjusted multi-cutoff scoring-strategy and showed its relative insensitivity to base-calling errors.

    View details for DOI 10.1186/gb-2009-10-2-r23

    View details for Web of Science ID 000266345600020

    View details for PubMedID 19236709

  • RNA-Seq: a revolutionary tool for transcriptomics NATURE REVIEWS GENETICS Wang, Z., Gerstein, M., Snyder, M. 2009; 10 (1): 57-63

    Abstract

    RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.

    View details for DOI 10.1038/nrg2484

    View details for Web of Science ID 000261866500012

    View details for PubMedID 19015660

  • MAPK target networks in Arabidopsis thaliana revealed using functional protein microarrays GENES & DEVELOPMENT Popescu, S. C., Popescu, G. V., Bachan, S., Zhang, Z., Gerstein, M., Snyder, M., Dinesh-Kumar, S. P. 2009; 23 (1): 80-92

    Abstract

    Signaling through mitogen-activated protein kinases (MPKs) cascades is a complex and fundamental process in eukaryotes, requiring MPK-activating kinases (MKKs) and MKK-activating kinases (MKKKs). However, to date only a limited number of MKK-MPK interactions and MPK phosphorylation substrates have been revealed. We determined which Arabidopsis thaliana MKKs preferentially activate 10 different MPKs in vivo and used the activated MPKs to probe high-density protein microarrays to determine their phosphorylation targets. Our analyses revealed known and novel signaling modules encompassing 570 MPK phosphorylation substrates; these substrates were enriched in transcription factors involved in the regulation of development, defense, and stress responses. Selected MPK substrates were validated by in planta reconstitution experiments. A subset of activated and wild-type MKKs induced cell death, indicating a possible role for these MKKs in the regulation of cell death. Interestingly, MKK7- and MKK9-induced death requires Sgt1, a known regulator of cell death induced during plant innate immunity. Our predicted MKK-MPK phosphorylation network constitutes a valuable resource to understand the function and specificity of MPK signaling systems.

    View details for DOI 10.1101/gad.1740009

    View details for Web of Science ID 000262369700008

    View details for PubMedID 19095804

  • PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls NATURE BIOTECHNOLOGY Rozowsky, J., Euskirchen, G., Auerbach, R. K., Zhang, Z. D., Gibson, T., Bjornson, R., Carriero, N., Snyder, M., Gerstein, M. B. 2009; 27 (1): 66-75

    Abstract

    Chromatin immunoprecipitation (ChIP) followed by tag sequencing (ChIP-seq) using high-throughput next-generation instrumentation is fast, replacing chromatin immunoprecipitation followed by genome tiling array analysis (ChIP-chip) as the preferred approach for mapping of sites of transcription-factor binding and chromatin modification. Using two deeply sequenced data sets for human RNA polymerase II and STAT1, each with matching input-DNA controls, we describe a general scoring approach to address unique challenges in ChIP-seq data analysis. Our approach is based on the observation that sites of potential binding are strongly correlated with signal peaks in the control, likely revealing features of open chromatin. We develop a two-pass strategy called PeakSeq to compensate for this. A two-pass strategy compensates for signal caused by open chromatin, as revealed by inclusion of the controls. The first pass identifies putative binding sites and compensates for genomic variation in the 'mappability' of sequences. The second pass filters out sites not significantly enriched compared to the normalized control, computing precise enrichments and significances. Our scoring procedure enables us to optimize experimental design by estimating the depth of sequencing required for a desired level of coverage and demonstrating that more than two replicates provides only a marginal gain in information.

    View details for DOI 10.1038/nbt.1518

    View details for Web of Science ID 000262471200025

    View details for PubMedID 19122651

  • Protein microarrays. Methods in molecular biology (Clifton, N.J.) Fasolo, J., Snyder, M. 2009; 548: 209-222

    Abstract

    Protein microarrays containing nearly the entire yeast proteome have been constructed. They are typically prepared by overexpression and high-throughput purification and printing onto microscope slides. The arrays can be used to screen nearly the entire proteome in an unbiased fashion and have enormous utility for a variety of applications. These include protein-protein interactions, identification of novel lipid- and nucleic acid-binding proteins, and finding targets of small molecules, protein kinases, and other modification enzymes. Protein microarrays are thus powerful tools for individual studies as well as systematic characterization of proteins and their biochemical activities and regulation.

    View details for DOI 10.1007/978-1-59745-540-4_12

    View details for PubMedID 19521827

  • Global identification of protein kinase substrates by protein microarray analysis NATURE PROTOCOLS Mok, J., Im, H., Snyder, M. 2009; 4 (12): 1820-1827

    Abstract

    Herein, we describe a protocol for the global identification of in vitro substrates targeted by protein kinases using protein microarray technology. Large numbers of fusion proteins tagged at their carboxy-termini are purified in 96-well format and spotted in duplicate onto amino-silane-coated slides in a spatially addressable manner. These arrays are incubated in the presence of purified kinase and radiolabeled ATP, and then washed, dried and analyzed by autoradiography. The extent of phosphorylation of each spot is quantified and normalized, and proteins that are reproducibly phosphorylated in the presence of the active kinase relative to control slides are scored as positive substrates. This approach enables the rapid determination of kinase-substrate relationship on a proteome-wide scale, and although developed using yeast, has since been adapted to higher eukaryotic systems. Expression, purification and printing of the yeast proteome require about 3 weeks. Afterwards, each kinase assay takes approximately 3 h to perform.

    View details for DOI 10.1038/nprot.2009.194

    View details for Web of Science ID 000274226100011

    View details for PubMedID 20010933

  • MSB: A mean-shift-based approach for the analysis of structural variation in the genome GENOME RESEARCH Wang, L., Abyzov, A., Korbel, J. O., Snyder, M., Gerstein, M. 2009; 19 (1): 106-117

    Abstract

    Genome structural variation includes segmental duplications, deletions, and other rearrangements, and array-based comparative genomic hybridization (array-CGH) is a popular technology for determining this. Drawing relevant conclusions from array-CGH requires computational methods for partitioning the chromosome into segments of elevated, reduced, or unchanged copy number. Several approaches have been described, most of which attempt to explicitly model the underlying distribution of data based on particular assumptions. Often, they optimize likelihood functions for estimating model parameters, by expectation maximization or related approaches; however, this requires good parameter initialization through prespecifying the number of segments. Moreover, convergence is difficult to achieve, since many parameters are required to characterize an experiment. To overcome these limitations, we propose a nonparametric method without a global criterion to be optimized. Our method involves mean-shift-based (MSB) procedures; it considers the observed array-CGH signal as sampling from a probability-density function, uses a kernel-based approach to estimate local gradients for this function, and iteratively follows them to determine local modes of the signal. Overall, our method achieves robust discontinuity-preserving smoothing, thus accurately segmenting chromosomes into regions of duplication and deletion. It does not require the number of segments as input, nor does its convergence depend on this. We successfully applied our method to both simulated data and array-CGH experiments on glioblastoma and adenocarcinoma. We show that it performs at least as well as, and often better than, 10 previously published algorithms. Finally, we show that our approach can be extended to segmenting the signal resulting from the depth-of-coverage of mapped reads from next-generation sequencing.

    View details for DOI 10.1101/gr.080069.108

    View details for Web of Science ID 000262200000010

    View details for PubMedID 19037015

  • Mismatch oligonucleotides in human and yeast: guidelines for probe design on tiling microarrays BMC GENOMICS Seringhaus, M., Rozowsky, J., Royce, T., Nagalakshmi, U., Jee, J., Snyder, M., Gerstein, M. 2008; 9

    Abstract

    Mismatched oligonucleotides are widely used on microarrays to differentiate specific from nonspecific hybridization. While many experiments rely on such oligos, the hybridization behavior of various degrees of mismatch (MM) structure has not been extensively studied. Here, we present the results of two large-scale microarray experiments on S. cerevisiae and H. sapiens genomic DNA, to explore MM oligonucleotide behavior with real sample mixtures under tiling-array conditions.We examined all possible nucleotide substitutions at the central position of 36-nucleotide probes, and found that nonspecific binding by MM oligos depends upon the individual nucleotide substitutions they incorporate: C-->A, C-->G and T-->A (yielding purine-purine mispairs) are most disruptive, whereas A-->X were least disruptive. We also quantify a marked GC skew effect: substitutions raising probe GC content exhibit higher intensity (and vice versa). This skew is small in highly-expressed regions (+/- 0.5% of total intensity range) and large (+/- 2% or more) elsewhere. Multiple mismatches per oligo are largely additive in effect: each MM added in a distributed fashion causes an additional 21% intensity drop relative to PM, three-fold more disruptive than adding adjacent mispairs (7% drop per MM).We investigate several parameters for oligonucleotide design, including the effects of each central nucleotide substitution on array signal intensity and of multiple MM per oligo. To avoid GC skew, individual substitutions should not alter probe GC content. RNA sample mixture complexity may increase the amount of nonspecific hybridization, magnify GC skew and boost the intensity of MM oligos at all levels.

    View details for DOI 10.1186/1471-2164-9-635

    View details for Web of Science ID 000264109200001

    View details for PubMedID 19117516

  • Analysis of copy number variants and segmental duplications in the human genome: Evidence for a change in the process of formation in recent evolutionary history GENOME RESEARCH Kim, P. M., Lam, H. Y., Urban, A. E., Korbel, J. O., Affourtit, J., Grubert, F., Chen, X., Weissman, S., Snyder, M., Gerstein, M. B. 2008; 18 (12): 1865-1874

    Abstract

    Segmental duplications (SDs) are operationally defined as >1 kb stretches of duplicated DNA with high sequence identity. They arise from copy number variants (CNVs) fixed in the population. To investigate the formation of SDs and CNVs, we examine their large-scale patterns of co-occurrence with different repeats. Alu elements, a major class of genomic repeats, had previously been identified as prime drivers of SD formation. We also observe this association; however, we find that it sharply decreases for younger SDs. Continuing this trend, we find only weak associations of CNVs with Alus. Similarly, we find an association of SDs with processed pseudogenes, which is decreasing for younger SDs and absent entirely for CNVs. Next, we find that SDs are significantly co-localized with each other, resulting in a highly skewed "power-law" distribution and chromosomal hotspots. We also observe a significant association of CNVs with SDs, but find that an SD-mediated mechanism only accounts for some CNVs (<28%). Overall, our results imply that a shift in predominant formation mechanism occurred in recent history: approximately 40 million years ago, during the "Alu burst" in retrotransposition activity, non-allelic homologous recombination, first mediated by Alus and then the by newly formed CNVs themselves, was the main driver of genome rearrangements; however, its relative importance has decreased markedly since then, with proportionally more events now stemming from other repeats and from non-homologous end-joining. In addition to a coarse-grained analysis, we performed targeted sequencing of 67 CNVs and then analyzed a combined set of 270 CNVs (540 breakpoints) to verify our conclusions.

    View details for DOI 10.1101/gr.081422.108

    View details for Web of Science ID 000261398900002

    View details for PubMedID 18842824

  • A procedure for highly specific, sensitive, and unbiased whole-genome amplification PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Pan, X., Urban, A. E., Palejev, D., Schulz, V., Grubert, F., Hu, Y., Snyder, M., Weissman, S. M. 2008; 105 (40): 15499-15504

    Abstract

    Highly specific amplification of complex DNA pools without bias or template-independent products (TIPs) remains a challenge. We have developed a method using phi29 DNA polymerase and trehalose and optimized control of amplification to create micrograms of specific amplicons without TIPs from down to subfemtograms of DNA. With an input of as little as 0.5-2.5 ng of human gDNA or a few cells, the product could be close to native DNA in locus representation. The amplicons from 5 and 0.5 ng of DNA faithfully demonstrated all previously known heterozygous segmental duplications and deletions (3 Mb to 18 kb) located on chromosome 22 and even a homozygous deletion smaller than 1 kb with high-resolution chromosome-wide comparative genomic hybridization. With 550k Infinium BeadChip SNP typing, the >99.7% accuracy was compared favorably with results on unamplified DNA. Importantly, underrepresentation of chromosome termini that occurred with GenomiPhi v2 was greatly rescued with the present procedure, and the call rate and accuracy of SNP typing were also improved for the amplicons with a 0.5-ng, partially degraded DNA input. In addition, the amplification proceeded logarithmically in terms of total yield before saturation; the intact cells was amplified >50 times more efficiently than an equivalent amount of extracted DNA; and the locus imbalance for amplicons with 0.1 ng or lower input of DNA was variable, whereas for higher input it was largely reproducible. This procedure facilitates genomic analysis with single cells or other traces of DNA, and generates products suitable for analysis by massively parallel sequencing as well as microarray hybridization.

    View details for DOI 10.1073/pnas.0808028105

    View details for Web of Science ID 000260360500052

    View details for PubMedID 18832167

  • Modeling ChIP Sequencing In Silico with Applications PLOS COMPUTATIONAL BIOLOGY Zhang, Z. D., Rozowsky, J., Snyder, M., Chang, J., Gerstein, M. 2008; 4 (8)

    Abstract

    ChIP sequencing (ChIP-seq) is a new method for genomewide mapping of protein binding sites on DNA. It has generated much excitement in functional genomics. To score data and determine adequate sequencing depth, both the genomic background and the binding sites must be properly modeled. To develop a computational foundation to tackle these issues, we first performed a study to characterize the observed statistical nature of this new type of high-throughput data. By linking sequence tags into clusters, we show that there are two components to the distribution of tag counts observed in a number of recent experiments: an initial power-law distribution and a subsequent long right tail. Then we develop in silico ChIP-seq, a computational method to simulate the experimental outcome by placing tags onto the genome according to particular assumed distributions for the actual binding sites and for the background genomic sequence. In contrast to current assumptions, our results show that both the background and the binding sites need to have a markedly nonuniform distribution in order to correctly model the observed ChIP-seq data, with, for instance, the background tag counts modeled by a gamma distribution. On the basis of these results, we extend an existing scoring approach by using a more realistic genomic-background model. This enables us to identify transcription-factor binding sites in ChIP-seq data in a statistically rigorous fashion.

    View details for DOI 10.1371/journal.pcbi.1000158

    View details for Web of Science ID 000260041300021

    View details for PubMedID 18725927

  • A genomic analysis of RNA polymerase II modification and chromatin architecture related to 3 ' end RNA polyadenylation GENOME RESEARCH Lian, Z., Karpikov, A., Lian, J., Mahajan, M. C., Hartman, S., Gerstein, M., Snyder, M., Weissman, S. M. 2008; 18 (8): 1224-1237

    Abstract

    Genomic analyses have been applied extensively to analyze the process of transcription initiation in mammalian cells, but less to transcript 3' end formation and transcription termination. We used a novel approach to prepare 3' end fragments from polyadenylated RNA, and mapped the position of the poly(A) addition site using oligonucleotide arrays tiling 1% of the human genome. This approach revealed more 3' ends than had been annotated. The distribution of these ends relative to RNA polymerase II (PolII) and di- and trimethylated lysine 4 and lysine 36 of histone H3 was compared. A substantial fraction of unannotated 3' ends of RNA are intronic and antisense to the embedding gene. Poly(A) ends of annotated messages lie on average 2 kb upstream of the end of PolII binding (termination). Near the termination sites, and in some internal sites, unphosphorylated and C-terminal domain (CTD) serine 2 phosphorylated PolII (POLR2A) accumulate, suggesting pausing of the polymerase and perhaps dephosphorylation prior to release. Lysine 36 trimethylation occurs across transcribed genes, sometimes alternating with stretches of DNA in which lysine 36 dimethylation is more prominent. Lysine 36 methylation decreases at or near the site of polyadenylation, sometimes disappearing before disappearance of phosphorylated RNA PolII or release of PolII from DNA. Our results suggest that transcription termination loss of histone 3 lysine 36 methylation and later release of RNA polymerase. The latter is often associated with polymerase pausing. Overall, our study reveals extensive sites of poly(A) addition and provides insights into the events that occur during 3' end formation.

    View details for DOI 10.1101/gr.075804.107

    View details for Web of Science ID 000258116100004

    View details for PubMedID 18487515

  • Genome-Wide Occupancy of SREBP1 and Its Partners NFY and SP1 Reveals Novel Functional Roles and Combinatorial Regulation of Distinct Classes of Genes PLOS GENETICS Reed, B. D., Charos, A. E., Szekely, A. M., Weissman, S. M., Snyder, M. 2008; 4 (7)

    Abstract

    The sterol regulatory element-binding protein (SREBP) family member SREBP1 is a critical transcriptional regulator of cholesterol and fatty acid metabolism and has been implicated in insulin resistance, diabetes, and other diet-related diseases. We globally identified the promoters occupied by SREBP1 and its binding partners NFY and SP1 in a human hepatocyte cell line using chromatin immunoprecipitation combined with genome tiling arrays (ChIP-chip). We find that SREBP1 occupies the promoters of 1,141 target genes involved in diverse biological pathways, including novel targets with roles in lipid metabolism and insulin signaling. We also identify a conserved SREBP1 DNA-binding motif in SREBP1 target promoters, and we demonstrate that many SREBP1 target genes are transcriptionally activated by treatment with insulin and glucose using gene expression microarrays. Finally, we show that SREBP1 cooperates extensively with NFY and SP1 throughout the genome and that unique combinations of these factors target distinct functional pathways. Our results provide insight into the regulatory circuitry in which SREBP1 and its network partners coordinate a complex transcriptional response in the liver with cues from the diet.

    View details for DOI 10.1371/journal.pgen.1000133

    View details for Web of Science ID 000260410600005

    View details for PubMedID 18654640

  • The transcriptional landscape of the yeast genome defined by RNA sequencing SCIENCE Nagalakshmi, U., Wang, Z., Waern, K., Shou, C., Raha, D., Gerstein, M., Snyder, M. 2008; 320 (5881): 1344-1349

    Abstract

    The identification of untranslated regions, introns, and coding regions within an organism remains challenging. We developed a quantitative sequencing-based method called RNA-Seq for mapping transcribed regions, in which complementary DNA fragments are subjected to high-throughput sequencing and mapped to the genome. We applied RNA-Seq to generate a high-resolution transcriptome map of the yeast genome and demonstrated that most (74.5%) of the nonrepetitive sequence of the yeast genome is transcribed. We confirmed many known and predicted introns and demonstrated that others are not actively used. Alternative initiation codons and upstream open reading frames also were identified for many yeast genes. We also found unexpected 3'-end heterogeneity and the presence of many overlapping genes. These results indicate that the yeast transcriptome is more complex than previously appreciated.

    View details for DOI 10.1126/science.1158441

    View details for Web of Science ID 000256441100046

    View details for PubMedID 18451266

  • The current excitement about copy-number variation: how it relates to gene duplications and protein families CURRENT OPINION IN STRUCTURAL BIOLOGY Korbel, J. O., Kim, P. M., Chen, X., Urban, A. E., Weissman, S., Snyder, M., Gerstein, M. B. 2008; 18 (3): 366-374

    Abstract

    Following recent technological advances there has been an increasing interest in genome structural variants (SVs), in particular copy-number variants (CNVs)--large-scale duplications and deletions. Although not immediately evident, CNV surveys make a conceptual connection between the fields of population genetics and protein families, in particular with regard to the stability and expandability of families. The mechanisms giving rise to CNVs can be considered as fundamental processes underlying gene duplication and loss; duplicated genes being the results of 'successful' copies, fixed and maintained in the population. Conversely, many 'unsuccessful' duplicates remain in the genome as pseudogenes. Here, we survey studies on CNVs, highlighting issues related to protein families. In particular, CNVs tend to affect specific gene functional categories, such as those associated with environmental response, and are depleted in genes related to basic cellular processes. Furthermore, CNVs occur more often at the periphery of the protein interaction network. In comparison, protein families associated with successful and unsuccessful duplicates are associated with similar functional categories but are differentially placed in the interaction network. These trends are likely reflective of CNV formation biases and natural selection, both of which differentially influence distinct protein families.

    View details for DOI 10.1016/j.sbi.2008.02.005

    View details for Web of Science ID 000257539100013

    View details for PubMedID 18511261

  • Leptin affects endocardial cushion formation by modulating EMT and migration via Akt signaling cascades JOURNAL OF CELL BIOLOGY Nath, A. K., Brown, R. M., Michaud, M., Sierra-Honigmann, M. R., Snyder, M., Madri, J. A. 2008; 181 (2): 367-380

    Abstract

    Blood circulation is dependent on heart valves to direct blood flow through the heart and great vessels. Valve development relies on epithelial to mesenchymal transition (EMT), a central feature of embryonic development and metastatic cancer. Abnormal EMT and remodeling contribute to the etiology of several congenital heart defects. Leptin and its receptor were detected in the mouse embryonic heart. Using an ex vivo model of cardiac EMT, the inhibition of leptin results in a signal transducer and activator of transcription 3 and Snail/vascular endothelial cadherin-independent decrease in EMT and migration. Our data suggest that an Akt signaling pathway underlies the observed phenotype. Furthermore, loss of leptin phenocopied the functional inhibition of alphavbeta3 integrin receptor and resulted in decreased alphavbeta3 integrin and matrix metalloprotease 2, suggesting that the leptin signaling pathway is involved in adhesion and migration processes. This study adds leptin to the repertoire of factors that mediate EMT and, for the first time, demonstrates a role for the interleukin 6 family in embryonic EMT.

    View details for Web of Science ID 000255410300018

    View details for PubMedID 18411306

  • Myo2p, a class V myosin in budding yeast, associates with a large ribonucleic acid-protein complex that contains mRNAs and subunits of the RNA-processing body RNA-A PUBLICATION OF THE RNA SOCIETY Chang, W., Zaarour, R. F., Reck-Peterson, S., Rinn, J., Singer, R. H., Snyder, M., Novick, P., Mooseker, M. S. 2008; 14 (3): 491-502

    Abstract

    Myo2p is an essential class V myosin in budding yeast with several identified functions in organelle trafficking and spindle orientation. The present study demonstrates that Myo2p is a component of a large RNA-containing complex (Myo2p-RNP) that is distinct from polysomes based on sedimentation analysis and lack of ribosomal subunits in the Myo2p-RNP. Microarray analysis of RNAs that coimmunoprecipitate with Myo2p revealed the presence of a large number of mRNAs in this complex. The Myo2p-RNA complex is in part composed of the RNA processing body (P-body) based on coprecipitation with P-body protein subunits and partial colocalization of Myo2p with P-bodies. P-body disassembly is delayed in the motor mutant, myo2-66, indicating that Myo2p may facilitate the release of mRNAs from the P-body.

    View details for DOI 10.1261/rna.665008

    View details for Web of Science ID 000253565400012

    View details for PubMedID 18218704

  • Systematic analysis of transcribed loci in ENCODE regions using RACE sequencing reveals extensive transcription in the human genome GENOME BIOLOGY QianWu, J., Du, J., Rozowsky, J., Zhang, Z., Urban, A. E., Euskirchen, G., Weissman, S., Gerstein, M., Snyder, M. 2008; 9 (1)

    Abstract

    Recent studies of the mammalian transcriptome have revealed a large number of additional transcribed regions and extraordinary complexity in transcript diversity. However, there is still much uncertainty regarding precisely what portion of the genome is transcribed, the exact structures of these novel transcripts, and the levels of the transcripts produced.We have interrogated the transcribed loci in 420 selected ENCyclopedia Of DNA Elements (ENCODE) regions using rapid amplification of cDNA ends (RACE) sequencing. We analyzed annotated known gene regions, but primarily we focused on novel transcriptionally active regions (TARs), which were previously identified by high-density oligonucleotide tiling arrays and on random regions that were not believed to be transcribed. We found RACE sequencing to be very sensitive and were able to detect low levels of transcripts in specific cell types that were not detectable by microarrays. We also observed many instances of sense-antisense transcripts; further analysis suggests that many of the antisense transcripts (but not all) may be artifacts generated from the reverse transcription reaction. Our results show that the majority of the novel TARs analyzed (60%) are connected to other novel TARs or known exons. Of previously unannotated random regions, 17% were shown to produce overlapping transcripts. Furthermore, it is estimated that 9% of the novel transcripts encode proteins.We conclude that RACE sequencing is an efficient, sensitive, and highly accurate method for characterization of the transcriptome of specific cell/tissue types. Using this method, it appears that much of the genome is represented in polyA+ RNA. Moreover, a fraction of the novel RNAs can encode protein and are likely to be functional.

    View details for DOI 10.1186/gb-2008-9-1-r3

    View details for Web of Science ID 000253779800011

    View details for PubMedID 18173853

  • The development of protein microarrays and their applications in DNA-protein and protein-protein interaction analyses of Arabidopsis transcription factors MOLECULAR PLANT Gong, W., He, K., Covington, M., Dinesh-Kumar, S. P., Snyder, M., Harmer, S. L., Zhu, Y., Deng, X. W. 2008; 1 (1): 27-41

    Abstract

    We used our collection of Arabidopsis transcription factor (TF) ORFeome clones to construct protein microarrays containing as many as 802 TF proteins. These protein microarrays were used for both protein-DNA and protein-protein interaction analyses. For protein-DNA interaction studies, we examined AP2/ERF family TFs and their cognate cis-elements. By careful comparison of the DNA-binding specificity of 13 TFs on the protein microarray with previous non-microarray data, we showed that protein microarrays provide an efficient and high throughput tool for genome-wide analysis of TF-DNA interactions. This microarray protein-DNA interaction analysis allowed us to derive a comprehensive view of DNA-binding profiles of AP2/ERF family proteins in Arabidopsis. It also revealed four TFs that bound the EE (evening element) and had the expected phased gene expression under clock-regulation, thus providing a basis for further functional analysis of their roles in clock regulation of gene expression. We also developed procedures for detecting protein interactions using this TF protein microarray and discovered four novel partners that interact with HY5, which can be validated by yeast two-hybrid assays. Thus, plant TF protein microarrays offer an attractive high-throughput alternative to traditional techniques for TF functional characterization on a global scale.

    View details for DOI 10.1093/mp/ssm009

    View details for Web of Science ID 000259068900005

    View details for PubMedID 19802365

  • RNA polymerase II stalling: loading at the start prepares genes for a sprint GENOME BIOLOGY Wu, J. Q., Snyder, M. 2008; 9 (5)

    Abstract

    Stalling of RNA polymerase II near the promoter has recently been found to be much more common than previously thought. Genome-wide surveys of the phenomenon suggest that it is likely to be a rate-limiting control on gene activation that poises developmental and stimulus-responsive genes for prompt expression when inducing signals are received.

    View details for DOI 10.1186/gb-2008-9-5-220

    View details for Web of Science ID 000257564800002

    View details for PubMedID 18466645

  • Identification of differentially expressed proteins in ovarian cancer using high-density protein microarrays PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Hudson, M. E., Pozdnyakova, I., Haines, K., Mor, G., Snyder, M. 2007; 104 (44): 17494-17499

    Abstract

    Ovarian cancer is a leading cause of deaths, yet many aspects of the biology of the disease and a routine means of its detection are lacking. We have used protein microarrays and autoantibodies from cancer patients to identify proteins that are aberrantly expressed in ovarian tissue. Sera from 30 cancer patients and 30 healthy individuals were used to probe microarrays containing 5,005 human proteins. Ninety-four antigens were identified that exhibited enhanced reactivity from sera in cancer patients relative to control sera. The differential reactivity of four antigens was tested by using immunoblot analysis and tissue microarrays. Lamin A/C, SSRP1, and RALBP1 were found to exhibit increased expression in the cancer tissue relative to controls. The combined signals from multiple antigens proved to be a robust test to identify cancerous ovarian tissue. These antigens were also reactive with tissue from other types of cancer and thus are not specific to ovarian cancer. Overall our studies identified candidate tissue marker proteins for ovarian cancer and demonstrate that protein microarrays provide a powerful approach to identify proteins aberrantly expressed in disease states.

    View details for DOI 10.1073/pnas.0708572104

    View details for Web of Science ID 000250638400048

    View details for PubMedID 17954908

  • Paired-end mapping reveals extensive structural variation in the human genome SCIENCE Korbel, J. O., Urban, A. E., Affourtit, J. P., Godwin, B., Grubert, F., Simons, J. F., Kim, P. M., Palejev, D., Carriero, N. J., Du, L., Taillon, B. E., Chen, Z., Tanzer, A., Saunders, A. C., Chi, J., Yang, F., Carter, N. P., Hurles, M. E., Weissman, S. M., Harkins, T. T., Gerstein, M. B., Egholm, M., Snyder, M. 2007; 318 (5849): 420-426

    Abstract

    Structural variation of the genome involves kilobase- to megabase-sized deletions, duplications, insertions, inversions, and complex combinations of rearrangements. We introduce high-throughput and massive paired-end mapping (PEM), a large-scale genome-sequencing method to identify structural variants (SVs) approximately 3 kilobases (kb) or larger that combines the rescue and capture of paired ends of 3-kb fragments, massive 454 sequencing, and a computational approach to map DNA reads onto a reference genome. PEM was used to map SVs in an African and in a putatively European individual and identified shared and divergent SVs relative to the reference genome. Overall, we fine-mapped more than 1000 SVs and documented that the number of SVs among humans is much larger than initially hypothesized; many of the SVs potentially affect gene function. The breakpoint junction sequences of more than 200 SVs were determined with a novel pooling strategy and computational analysis. Our analysis provided insights into the mechanisms of SV formation in humans.

    View details for DOI 10.1126/science.1149504

    View details for Web of Science ID 000250230400038

    View details for PubMedID 17901297

  • Transcription factor binding site identification in yeast: a comparison of high-density oligonucleotide and PCR-based microarray platforms FUNCTIONAL & INTEGRATIVE GENOMICS Borneman, A. R., Zhang, Z. D., Rozowsky, J., Seringhaus, M. R., Gerstein, M., Snyder, M. 2007; 7 (4): 335-345

    Abstract

    In recent years, techniques have been developed to map transcription factor binding sites using chromatin immunoprecipitation combined with DNA microarrays (chIP chip). Initially, polymerase chain reaction (PCR)-based DNA arrays were used for the chIP chip procedure, however, high-density oligonucleotide (HDO) arrays, which allow for the production of thousands more features per array, have emerged as a competing array platform. To compare the two platforms, data from chIP chip analysis performed for three factors (Tec1, Ste12, and Sok2) using both HDO and PCR arrays under identical experimental conditions were compared. HDO arrays provided increased reproducibility and sensitivity, detecting approximately three times more binding events than the PCR arrays while also showing increased accuracy. The increased resolution provided by the HDO arrays also allowed for the identification of multiple binding peaks in close proximity and of novel binding events such as binding within ORFs. The HDO array platform provides a far more robust array system by all measures than PCR-based arrays, all of which is directly attributable to the large number of probes available.

    View details for DOI 10.1007/s10142-007-0054-7

    View details for Web of Science ID 000249808300006

    View details for PubMedID 17638031

  • Arabidopsis protein microarrays for the high-throughput identification of protein-protein interactions. Plant signaling & behavior Popescu, S. C., Snyder, M., Dinesh-Kumar, S. 2007; 2 (5): 416-420

    Abstract

    Protein microarray technology has emerged as a powerful new approach for the study of thousands of proteins simultaneously. Protein microarrays have been used for a wide variety of applications for the human and yeast systems. In a recent study, we demonstrated that Arabidopsis functional protein microarrays can be generated and employed to characterize the function of plant proteins. The arrayed proteins were produced using an optimized large-scale plant-based expression system. In a proof-of concept study, 173 known and novel potential substrates of calmodulin (CaM) and calmodulin-like proteins (CML) were identified in an unbiased and high-throughput manner. The information documented here on novel potential CaM targets provides new testable hypotheses in the area of CaM/Ca(2+)-regulated processes and represents a resource of functional information for the scientific community.

    View details for PubMedID 19704619

  • Divergence of transcription factor binding sites across related yeast species SCIENCE Borneman, A. R., Gianoulis, T. A., Zhang, Z. D., Yu, H., Rozowsky, J., Seringhaus, M. R., Wang, L. Y., Gerstein, M., Snyder, M. 2007; 317 (5839): 815-819

    Abstract

    Characterization of interspecies differences in gene regulation is crucial for understanding the molecular basis of both phenotypic diversity and evolution. By means of chromatin immunoprecipitation and DNA microarray analysis, the divergence in the binding sites of the pseudohyphal regulators Ste12 and Tec1 was determined in the yeasts Saccharomyces cerevisiae, S. mikatae, and S. bayanus under pseudohyphal conditions. We have shown that most of these sites have diverged across these species, far exceeding the interspecies variation in orthologous genes. A group of Ste12 targets was shown to be bound only in S. mikatae and S. bayanus under pseudohyphal conditions. Many of these genes are targets of Ste12 during mating in S. cerevisiae, indicating that specialization between the two pathways has occurred in this species. Transcription factor binding sites have therefore diverged substantially faster than ortholog content. Thus, gene regulation resulting from transcription factor binding is likely to be a major cause of divergence between related species.

    View details for DOI 10.1126/science.1140748

    View details for Web of Science ID 000248624500044

    View details for PubMedID 17690298

  • Systematic prediction and validation of breakpoints associated with copy-number variants in the human genome PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Korbel, J. O., Urban, A. E., Grubert, F., Du, J., Royce, T. E., Starr, P., Zhong, G., Emanuel, B. S., Weissman, S. M., Snyder, M., Gerstein, M. B. 2007; 104 (24): 10110-10115

    Abstract

    Copy-number variants (CNVs) are an abundant form of genetic variation in humans. However, approaches for determining exact CNV breakpoint sequences (physical deletion or duplication boundaries) across individuals, crucial for associating genotype to phenotype, have been lacking so far, and the vast majority of CNVs have been reported with approximate genomic coordinates only. Here, we report an approach, called BreakPtr, for fine-mapping CNVs (available from http://breakptr.gersteinlab.org). We statistically integrate both sequence characteristics and data from high-resolution comparative genome hybridization experiments in a discrete-valued, bivariate hidden Markov model. Incorporation of nucleotide-sequence information allows us to take into account the fact that recently duplicated sequences (e.g., segmental duplications) often coincide with breakpoints. In anticipation of an upcoming increase in CNV data, we developed an iterative, "active" approach to initially scoring with a preliminary model, performing targeted validations, retraining the model, and then rescoring, and a flexible parameterization system that intuitively collapses from a full model of 2,503 parameters to a core one of only 10. Using our approach, we accurately mapped >400 breakpoints on chromosome 22 and a region of chromosome 11, refining the boundaries of many previously approximately mapped CNVs. Four predicted breakpoints flanked known disease-associated deletions. We validated an additional four predicted CNV breakpoints by sequencing. Overall, our results suggest a predictive resolution of approximately 300 bp. This level of resolution enables more precise correlations between CNVs and across individuals than previously possible, allowing the study of CNV population frequencies. Further, it enabled us to demonstrate a clear Mendelian pattern of inheritance for one of the CNVs.

    View details for DOI 10.1073/pnas.0703834104

    View details for Web of Science ID 000247363000036

    View details for PubMedID 17551006

  • Assessing the performance of different high-density tiling microarray strategies for mapping transcribed regions of the human genome GENOME RESEARCH Emanuelsson, O., Nagalakshmi, U., Zheng, D., Rozowsky, J. S., Urban, A. E., Du, J., Lian, Z., Stolc, V., Weissman, S., Snyder, M., Gerstein, M. B. 2007; 17 (6): 886-897

    Abstract

    Genomic tiling microarrays have become a popular tool for interrogating the transcriptional activity of large regions of the genome in an unbiased fashion. There are several key parameters associated with each tiling experiment (e.g., experimental protocols and genomic tiling density). Here, we assess the role of these parameters as they are manifest in different tiling-array platforms used for transcription mapping. First, we analyze how a number of published tiling-array experiments agree with established gene annotation on human chromosome 22. We observe that the transcription detected from high-density arrays correlates substantially better with annotation than that from other array types. Next, we analyze the transcription-mapping performance of the two main high-density oligonucleotide array platforms in the ENCODE regions of the human genome. We hybridize identical biological samples and develop several ways of scoring the arrays and segmenting the genome into transcribed and nontranscribed regions, with the aim of making the platforms most comparable to each other. Finally, we develop a platform comparison approach based on agreement with known annotation. Overall, we find that the performance improves with more data points per locus, coupled with statistical scoring approaches that properly take advantage of this, where this larger number of data points arises from higher genomic tiling density and the use of replicate arrays and mismatches. While we do find significant differences in the performance of the two high-density platforms, we also find that they complement each other to some extent. Finally, our experiments reveal a significant amount of novel transcription outside of known genes, and an appreciable sample of this was validated by independent experiments.

    View details for DOI 10.1101/gr.5014606

    View details for Web of Science ID 000247226900020

    View details for PubMedID 17119069

  • Mapping of transcription factor binding regions in mammalian cells by ChIP: Comparison of array- and sequencing-based technologies GENOME RESEARCH Euskirchen, G. M., Rozowsky, J. S., Wei, C., Lee, W. H., Zhang, Z. D., Hartman, S., Emanuelsson, O., Stolc, V., Weissman, S., Gerstein, M. B., Ruan, Y., Snyder, M. 2007; 17 (6): 898-909

    Abstract

    Recent progress in mapping transcription factor (TF) binding regions can largely be credited to chromatin immunoprecipitation (ChIP) technologies. We compared strategies for mapping TF binding regions in mammalian cells using two different ChIP schemes: ChIP with DNA microarray analysis (ChIP-chip) and ChIP with DNA sequencing (ChIP-PET). We first investigated parameters central to obtaining robust ChIP-chip data sets by analyzing STAT1 targets in the ENCODE regions of the human genome, and then compared ChIP-chip to ChIP-PET. We devised methods for scoring and comparing results among various tiling arrays and examined parameters such as DNA microarray format, oligonucleotide length, hybridization conditions, and the use of competitor Cot-1 DNA. The best performance was achieved with high-density oligonucleotide arrays, oligonucleotides >/=50 bases (b), the presence of competitor Cot-1 DNA and hybridizations conducted in microfluidics stations. When target identification was evaluated as a function of array number, 80%-86% of targets were identified with three or more arrays. Comparison of ChIP-chip with ChIP-PET revealed strong agreement for the highest ranked targets with less overlap for the low ranked targets. With advantages and disadvantages unique to each approach, we found that ChIP-chip and ChIP-PET are frequently complementary in their relative abilities to detect STAT1 targets for the lower ranked targets; each method detected validated targets that were missed by the other method. The most comprehensive list of STAT1 binding regions is obtained by merging results from ChIP-chip and ChIP-sequencing. Overall, this study provides information for robust identification, scoring, and validation of TF targets using ChIP-based technologies.

    View details for DOI 10.1101/gr.5583007

    View details for Web of Science ID 000247226900021

    View details for PubMedID 17568005

  • Integrated analysis of experimental data sets reveals many novel promoters in 1% of the human genome GENOME RESEARCH Trinklein, N. D., Karaoz, U., Wu, J., Halees, A., Aldred, S. F., Collins, P. J., Zheng, D., Zhang, Z. D., Gerstein, M. B., Snyder, M., Myers, R. M., Weng, Z. 2007; 17 (6): 720-731

    Abstract

    The regulation of transcriptional initiation in the human genome is a critical component of global gene regulation, but a complete catalog of human promoters currently does not exist. In order to identify regulatory regions, we developed four computational methods to integrate 129 sets of ENCODE-wide chromatin immunoprecipitation data. They collectively predicted 1393 regions. Roughly 47% of the regions were unique to one method, as each method makes different assumptions about the data. Overall, predicted regions tend to localize to highly conserved, DNase I hypersensitive, and actively transcribed regions in the genome. Interestingly, a significant portion of the regions overlaps with annotated 3'-UTRs, suggesting that some of them might regulate anti-sense transcription. The majority of the predicted regions are >2 kb away from the 5'-ends of previously annotated human cDNAs and hence are novel. These novel regions may regulate unannotated transcripts or may represent new alternative transcription start sites of known genes. We tested 163 such regions for promoter activity in four cell lines using transient transfection assays, and 25% of them showed transcriptional activity above background in at least one cell line. We also performed 5'-RACE experiments on 62 novel regions, and 76% of the regions were associated with the 5'-ends of at least two RACE products. Our results suggest that there are at least 35% more functional promoters in the human genome than currently annotated.

    View details for DOI 10.1101/gr.5716607

    View details for Web of Science ID 000247226900006

    View details for PubMedID 17567992

  • Statistical analysis of the genomic distribution and correlation of regulatory elements in the ENCODE regions GENOME RESEARCH Zhang, Z. D., Paccanaro, A., Fu, Y., Weissman, S., Weng, Z., Chang, J., Snyder, M., Gerstein, M. B. 2007; 17 (6): 787-797

    Abstract

    The comprehensive inventory of functional elements in 44 human genomic regions carried out by the ENCODE Project Consortium enables for the first time a global analysis of the genomic distribution of transcriptional regulatory elements. In this study we developed an intuitive and yet powerful approach to analyze the distribution of regulatory elements found in many different ChIP-chip experiments on a 10 approximately 100-kb scale. First, we focus on the overall chromosomal distribution of regulatory elements in the ENCODE regions and show that it is highly nonuniform. We demonstrate, in fact, that regulatory elements are associated with the location of known genes. Further examination on a local, single-gene scale shows an enrichment of regulatory elements near both transcription start and end sites. Our results indicate that overall these elements are clustered into regulatory rich "islands" and poor "deserts." Next, we examine how consistent the nonuniform distribution is between different transcription factors. We perform on all the factors a multivariate analysis in the framework of a biplot, which enhances biological signals in the experiments. This groups transcription factors into sequence-specific and sequence-nonspecific clusters. Moreover, with experimental variation carefully controlled, detailed correlations show that the distribution of sites was generally reproducible for a specific factor between different laboratories and microarray platforms. Data sets associated with histone modifications have particularly strong correlations. Finally, we show how the correlations between factors change when only regulatory elements far from the transcription start sites are considered.

    View details for DOI 10.1101/gr.5573107

    View details for Web of Science ID 000247226900011

    View details for PubMedID 17567997

  • The DART classification of unannotated transcription within the ENCODE regions: Associating transcription with known and novel loci GENOME RESEARCH Rozowsky, J. S., Newburger, D., Sayward, F., Wu, J., Jordan, G., Korbel, J. O., Nagalakshmi, U., Yang, J., Zheng, D., Guigo, R., Gingeras, T. R., Weissman, S., Miller, P., Snyder, M., Gerstein, M. B. 2007; 17 (6): 732-745

    Abstract

    For the approximately 1% of the human genome in the ENCODE regions, only about half of the transcriptionally active regions (TARs) identified with tiling microarrays correspond to annotated exons. Here we categorize this large amount of "unannotated transcription." We use a number of disparate features to classify the 6988 novel TARs-array expression profiles across cell lines and conditions, sequence composition, phylogenetic profiles (presence/absence of syntenic conservation across 17 species), and locations relative to genes. In the classification, we first filter out TARs with unusual sequence composition and those likely resulting from cross-hybridization. We then associate some of those remaining with proximal exons having correlated expression profiles. Finally, we cluster unclassified TARs into putative novel loci, based on similar expression and phylogenetic profiles. To encapsulate our classification, we construct a Database of Active Regions and Tools (DART.gersteinlab.org). DART has special facilities for rapidly handling and comparing many sets of TARs and their heterogeneous features, synchronizing across builds, and interfacing with other resources. Overall, we find that approximately 14% of the novel TARs can be associated with known genes, while approximately 21% can be clustered into approximately 200 novel loci. We observe that TARs associated with genes are enriched in the potential to form structural RNAs and many novel TAR clusters are associated with nearby promoters. To benchmark our classification, we design a set of experiments for testing the connectivity of novel TARs. Overall, we find that 18 of the 46 connections tested validate by RT-PCR and four of five sequenced PCR products confirm connectivity unambiguously.

    View details for DOI 10.1101/gr.5696007

    View details for Web of Science ID 000247226900007

    View details for PubMedID 17567993

  • What is a gene, post-ENCODE? History and updated definition GENOME RESEARCH Gerstein, M. B., Bruce, C., Rozowsky, J. S., Zheng, D., Du, J., Korbel, J. O., Emanuelsson, O., Zhang, Z. D., Weissman, S., Snyder, M. 2007; 17 (6): 669-681

    Abstract

    While sequencing of the human genome surprised us with how many protein-coding genes there are, it did not fundamentally change our perspective on what a gene is. In contrast, the complex patterns of dispersed regulation and pervasive transcription uncovered by the ENCODE project, together with non-genic conservation and the abundance of noncoding RNA genes, have challenged the notion of the gene. To illustrate this, we review the evolution of operational definitions of a gene over the past century--from the abstract elements of heredity of Mendel and Morgan to the present-day ORFs enumerated in the sequence databanks. We then summarize the current ENCODE findings and provide a computational metaphor for the complexity. Finally, we propose a tentative update to the definition of a gene: A gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products. Our definition side-steps the complexities of regulation and transcription by removing the former altogether from the definition and arguing that final, functional gene products (rather than intermediate transcripts) should be used to group together entities associated with a single gene. It also manifests how integral the concept of biological function is in defining genes.

    View details for DOI 10.1101/gr.6339607

    View details for Web of Science ID 000247226900002

    View details for PubMedID 17567988

  • Pseudogenes in the ENCODE regions: Consensus annotation, analysis of transcription, and evolution GENOME RESEARCH Zheng, D., Frankish, A., Baertsch, R., Kapranov, P., Reymond, A., Choo, S. W., Lu, Y., Denoeud, F., Antonarakis, S. E., Snyder, M., Ruan, Y., Wei, C., Gingeras, T. R., Guigo, R., Harrow, J., Gerstein, M. B. 2007; 17 (6): 839-851

    Abstract

    Arising from either retrotransposition or genomic duplication of functional genes, pseudogenes are "genomic fossils" valuable for exploring the dynamics and evolution of genes and genomes. Pseudogene identification is an important problem in computational genomics, and is also critical for obtaining an accurate picture of a genome's structure and function. However, no consensus computational scheme for defining and detecting pseudogenes has been developed thus far. As part of the ENCyclopedia Of DNA Elements (ENCODE) project, we have compared several distinct pseudogene annotation strategies and found that different approaches and parameters often resulted in rather distinct sets of pseudogenes. We subsequently developed a consensus approach for annotating pseudogenes (derived from protein coding genes) in the ENCODE regions, resulting in 201 pseudogenes, two-thirds of which originated from retrotransposition. A survey of orthologs for these pseudogenes in 28 vertebrate genomes showed that a significant fraction ( approximately 80%) of the processed pseudogenes are primate-specific sequences, highlighting the increasing retrotransposition activity in primates. Analysis of sequence conservation and variation also demonstrated that most pseudogenes evolve neutrally, and processed pseudogenes appear to have lost their coding potential immediately or soon after their emergence. In order to explore the functional implication of pseudogene prevalence, we have extensively examined the transcriptional activity of the ENCODE pseudogenes. We performed systematic series of pseudogene-specific RACE analyses. These, together with complementary evidence derived from tiling microarrays and high throughput sequencing, demonstrated that at least a fifth of the 201 pseudogenes are transcribed in one or more cell lines or tissues.

    View details for DOI 10.1101/gr.5586307

    View details for Web of Science ID 000247226900016

    View details for PubMedID 17568002

  • Structured RNAs in the ENCODE selected regions of the human genome GENOME RESEARCH Washietl, S., Pedersen, J. S., Korbel, J. O., Stocsits, C., Gruber, A. R., Hackermueller, J., Hertel, J., Lindemeyer, M., Reiche, K., Tanzer, A., Ucla, C., Wyss, C., Antonarakis, S. E., Denoeud, F., Lagarde, J., Drenkow, J., Kapranov, P., Gingeras, T. R., Guigo, R., Snyder, M., Gerstein, M. B., Reymond, A., Hofacker, I. L., Stadler, P. F. 2007; 17 (6): 852-864

    Abstract

    Functional RNA structures play an important role both in the context of noncoding RNA transcripts as well as regulatory elements in mRNAs. Here we present a computational study to detect functional RNA structures within the ENCODE regions of the human genome. Since structural RNAs in general lack characteristic signals in primary sequence, comparative approaches evaluating evolutionary conservation of structures are most promising. We have used three recently introduced programs based on either phylogenetic-stochastic context-free grammar (EvoFold) or energy directed folding (RNAz and AlifoldZ), yielding several thousand candidate structures (corresponding to approximately 2.7% of the ENCODE regions). EvoFold has its highest sensitivity in highly conserved and relatively AU-rich regions, while RNAz favors slightly GC-rich regions, resulting in a relatively small overlap between methods. Comparison with the GENCODE annotation points to functional RNAs in all genomic contexts, with a slightly increased density in 3'-UTRs. While we estimate a significant false discovery rate of approximately 50%-70% many of the predictions can be further substantiated by additional criteria: 248 loci are predicted by both RNAz and EvoFold, and an additional 239 RNAz or EvoFold predictions are supported by the (more stringent) AlifoldZ algorithm. Five hundred seventy RNAz structure predictions fall into regions that show signs of selection pressure also on the sequence level (i.e., conserved elements). More than 700 predictions overlap with noncoding transcripts detected by oligonucleotide tiling arrays. One hundred seventy-five selected candidates were tested by RT-PCR in six tissues, and expression could be verified in 43 cases (24.6%).

    View details for DOI 10.1101/gr.5650707

    View details for Web of Science ID 000247226900017

    View details for PubMedID 17568003

  • Getting connected: analysis and principles of biological networks GENES & DEVELOPMENT Zhu, X., Gerstein, M., Snyder, M. 2007; 21 (9): 1010-1024

    Abstract

    The execution of complex biological processes requires the precise interaction and regulation of thousands of molecules. Systematic approaches to study large numbers of proteins, metabolites, and their modification have revealed complex molecular networks. These biological networks are significantly different from random networks and often exhibit ubiquitous properties in terms of their structure and organization. Analyzing these networks provides novel insights in understanding basic mechanisms controlling normal cellular processes and disease pathologies.

    View details for DOI 10.1101/gad.1528707

    View details for Web of Science ID 000246154100002

    View details for PubMedID 17473168

  • Differential binding of calmodulin-related proteins to their targets revealed through high-density Arabidopsis protein microarrays PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Popescu, S. C., Popescu, G. V., Bachan, S., Zhang, Z., Seay, M., Gerstein, M., Snyder, M., Dinesh-Kumar, S. P. 2007; 104 (11): 4730-4735

    Abstract

    Calmodulins (CaMs) are the most ubiquitous calcium sensors in eukaryotes. A number of CaM-binding proteins have been identified through classical methods, and many proteins have been predicted to bind CaMs based on their structural homology with known targets. However, multicellular organisms typically contain many CaM-like (CML) proteins, and a global identification of their targets and specificity of interaction is lacking. In an effort to develop a platform for large-scale analysis of proteins in plants we have developed a protein microarray and used it to study the global analysis of CaM/CML interactions. An Arabidopsis thaliana expression collection containing 1,133 ORFs was generated and used to produce proteins with an optimized medium-throughput plant-based expression system. Protein microarrays were prepared and screened with several CaMs/CMLs. A large number of previously known and novel CaM/CML targets were identified, including transcription factors, receptor and intracellular protein kinases, F-box proteins, RNA-binding proteins, and proteins of unknown function. Multiple CaM/CML proteins bound many binding partners, but the majority of targets were specific to one or a few CaMs/CMLs indicating that different CaM family members function through different targets. Based on our analyses, the emergent CaM/CML interactome is more extensive than previously predicted. Our results suggest that calcium functions through distinct CaM/CML proteins to regulate a wide range of targets and cellular activities.

    View details for DOI 10.1073/pnas.0611615104

    View details for Web of Science ID 000244972700086

    View details for PubMedID 17360592

  • New insights into Acinetobacter baumannii pathogenesis revealed by high-density pyrosequencing and transposon mutagenesis GENES & DEVELOPMENT Smith, M. G., Gianoulis, T. A., Pukatzki, S., Mekalanos, J. J., Ornston, L. N., Gerstein, M., Snyder, M. 2007; 21 (5): 601-614

    Abstract

    Acinetobacter baumannii has emerged as an important and problematic human pathogen as it is the causative agent of several types of infections including pneumonia, meningitis, septicemia, and urinary tract infections. We explored the pathogenic content of this harmful pathogen using a combination of DNA sequencing and insertional mutagenesis. The genome of this organism was sequenced using a strategy involving high-density pyrosequencing, a novel, rapid method of high-throughput sequencing. Excluding the rDNA repeats, the assembled genome is 3,976,746 base pairs (bp) and has 3830 ORFs. A significant fraction of ORFs (17.2%) are located in 28 putative alien islands, indicating that the genome has acquired a large amount of foreign DNA. Consistent with its role in pathogenesis, a remarkable number of the islands (16) contain genes implicated in virulence, indicating the organism devotes a considerable portion of its genes to pathogenesis. The largest island contains elements homologous to the Legionella/Coxiella Type IV secretion apparatus. Type IV secretion systems have been demonstrated to be important for virulence in other organisms and thus are likely to help mediate pathogenesis of A. baumannii. Insertional mutagenesis generated avirulent isolates of A. baumannii and verified that six of the islands contain virulence genes, including two novel islands containing genes that lacked homology with others in the databases. The DNA sequencing approach described in this study allows the rapid elucidation of the DNA sequence of any microbe and, when combined with genetic screens, can identify many novel genes important for microbial pathogenesis.

    View details for DOI 10.1101/gad.1510307

    View details for Web of Science ID 000244760600011

    View details for PubMedID 17344419

  • Positional artifacts in microarrays: experimental verification and construction of COP, an automated detection tool NUCLEIC ACIDS RESEARCH Yu, H., Nguyen, K., Royce, T., Qian, J., Nelson, K., Snyder, M., Gerstein, M. 2007; 35 (2)

    Abstract

    Microarray technology is currently one of the most widely-used technologies in biology. Many studies focus on inferring the function of an unknown gene from its co-expressed genes. Here, we are able to show that there are two types of positional artifacts in microarray data introducing spurious correlations between genes. First, we find that genes that are close on the microarray chips tend to have higher correlations between their expression profiles. We call this the 'chip artifact'. Our calculations suggest that the carry-over during the printing process is one of the major sources of this type of artifact, which is later confirmed by our experiments. Based on our experiments, the measured intensity of a microarray spot contains 0.1% (for fully-hybridized spots) to 93% (for un-hybridized ones) of noise resulting from this artifact. Secondly, we, for the first time, show that genes that are close on the microtiter plates in microarray experiments also tend to have higher correlations. We call this the 'plate artifact'. Both types of artifacts exist with different severity in all cDNA microarray experiments that we analyzed. Therefore, we develop an automated web tool-COP (COrrelations by Positional artifacts) to detect these artifacts in microarray experiments. COP has been integrated with the microarray data normalization tool, ExpressYourself, which is available at http://bioinfo.mbb.yale.edu/ExpressYourself/. Together, the two can eliminate most of the common noises in microarray data.

    View details for DOI 10.1093/nar/gkl871

    View details for Web of Science ID 000243993600001

    View details for PubMedID 17158151

  • Protein microarray technology MECHANISMS OF AGEING AND DEVELOPMENT Hall, D. A., Ptacek, J., Snyder, M. 2007; 128 (1): 161-167

    Abstract

    Protein chips have emerged as a promising approach for a wide variety of applications including the identification of protein-protein interactions, protein-phospholipid interactions, small molecule targets, and substrates of proteins kinases. They can also be used for clinical diagnostics and monitoring disease states. This article reviews current methods in the generation and applications of protein microarrays.

    View details for DOI 10.1016/j.mad.2006.11.021

    View details for Web of Science ID 000244301700024

    View details for PubMedID 17126887

  • Tilescope: online analysis pipeline for high-density tiling microarray data GENOME BIOLOGY Zhang, Z. D., Rozowsky, J., Lam, H. Y., Du, J., Snyder, M., Gerstein, M. 2007; 8 (5)

    Abstract

    We developed Tilescope, a fully integrated data processing pipeline for analyzing high-density tiling-array data http://tilescope.gersteinlab.org. In a completely automated fashion, Tilescope will normalize signals between channels and across arrays, combine replicate experiments, score each array element, and identify genomic features. The program is designed with a modular, three-tiered architecture, facilitating parallelism, and a graphic user-friendly interface, presenting results in an organized web page, downloadable for further analysis.

    View details for DOI 10.1186/gb-2007-8-5-r81

    View details for Web of Science ID 000246983100034

    View details for PubMedID 17501994

  • A supervised hidden markov model framework for efficiently segmenting tiling array data in transcriptional and chIP-chip experiments: systematically incorporating validated biological knowledge BIOINFORMATICS Du, J., Rozowsky, J. S., Korbel, J. O., Zhang, Z. D., Royce, T. E., Schultz, M. H., Snyder, M., Gerstein, M. 2006; 22 (24): 3016-3024

    Abstract

    Large-scale tiling array experiments are becoming increasingly common in genomics. In particular, the ENCODE project requires the consistent segmentation of many different tiling array datasets into 'active regions' (e.g. finding transfrags from transcriptional data and putative binding sites from ChIP-chip experiments). Previously, such segmentation was done in an unsupervised fashion mainly based on characteristics of the signal distribution in the tiling array data itself. Here we propose a supervised framework for doing this. It has the advantage of explicitly incorporating validated biological knowledge into the model and allowing for formal training and testing.In particular, we use a hidden Markov model (HMM) framework, which is capable of explicitly modeling the dependency between neighboring probes and whose extended version (the generalized HMM) also allows explicit description of state duration density. We introduce a formal definition of the tiling-array analysis problem, and explain how we can use this to describe sampling small genomic regions for experimental validation to build up a gold-standard set for training and testing. We then describe various ideal and practical sampling strategies (e.g. maximizing signal entropy within a selected region versus using gene annotation or known promoters as positives for transcription or ChIP-chip data, respectively).For the practical sampling and training strategies, we show how the size and noise in the validated training data affects the performance of an HMM applied to the ENCODE transcriptional and ChIP-chip experiments. In particular, we show that the HMM framework is able to efficiently process tiling array data as well as or better than previous approaches. For the idealized sampling strategies, we show how we can assess their performance in a simulation framework and how a maximum entropy approach, which samples sub-regions with very different signal intensities, gives the maximally performing gold-standard. This latter result has strong implications for the optimum way medium-scale validation experiments should be carried out to verify the results of the genome-scale tiling array experiments.

    View details for DOI 10.1093/bioinformatics/btl515

    View details for Web of Science ID 000242715200008

    View details for PubMedID 17038339

  • High-throughput methods of regulatory element discovery BIOTECHNIQUES Hudson, M. E., Snyder, M. 2006; 41 (6): 673-?

    Abstract

    With the number of organisms whose genomes have been sequenced, a vast amount of information concerning the genetic structure of an organism's genome has been collected. However, effective experiment means to study how this information is accessed have only recently been developed. In this review, three basic methods for identifying regions of protein-DNA interaction will be introduced. The first two, chromatin immunoprecipitation (ChIP)-chip and ChIP-PET (for paired-end ditag), rely on the enrichment provided by chromosomal immunoprecipitation to interrogate the genomic sequence for the interaction sites of a protein of interest. In contrast, protein microarrays allow the identification of DNA binding protein that interacts with a DNA sequence of interest. These complementary methods of exploring protein-DNA interactions will increase our fundamental knowledge of how the information contained within the genome sequence is accessed and processed.

    View details for Web of Science ID 000242737100019

    View details for PubMedID 17191608

  • HTRA1 promoter polymorphism in wet age-related macular degeneration SCIENCE DeWan, A., Liu, M., Hartman, S., Zhang, S. S., Liu, D. T., Zhao, C., Tam, P. O., Chan, W. M., Lam, D. S., Snyder, M., Barnstable, C., Pang, C. P., Hoh, J. 2006; 314 (5801): 989-992

    Abstract

    Age-related macular degeneration (AMD), the most common cause of irreversible vision loss in individuals aged older than 50 years, is classified as either wet (neovascular) or dry (nonneovascular). Inherited variation in the complement factor H gene is a major risk factor for drusen in dry AMD. Here we report that a single-nucleotide polymorphism in the promoter region of HTRA1, a serine protease gene on chromosome 10q26, is a major genetic risk factor for wet AMD. A whole-genome association mapping strategy was applied to a Chinese population, yielding a P value of <10(-11). Individuals with the risk-associated genotype were estimated to have a likelihood of developing wet AMD 10 times that of individuals with the wild-type genotype.

    View details for DOI 10.1126/science.1133807

    View details for Web of Science ID 000241896000052

    View details for PubMedID 17053108

  • Charging it up: global analysis of protein phosphorylation TRENDS IN GENETICS Ptacek, J., Snyder, M. 2006; 22 (10): 545-554

    Abstract

    Protein phosphorylation affects most, if not all, cellular activities in eukaryotes and is essential for cell proliferation and development. An estimated 30% of cellular proteins are phosphorylated, representing the phosphoproteome, and phosphorylation can alter a protein's function, activity, localization and stability. Recent studies for large-scale identification of phosphosites using mass spectrometry are revealing the components of the phosphoproteome. The development of new tools, such as kinase assays using modified kinases or protein microarrays, enables rapid kinase substrate identification. The dynamics of specific phosphorylation events can now be monitored using mass spectrometry, single-cell analysis of flow cytometry, or fluorescent reporters. Together, these techniques are beginning to elucidate cellular processes and pathways regulated by phosphorylation, in addition to global regulatory networks.

    View details for DOI 10.1016/j.tig.2006.08.005

    View details for Web of Science ID 000241268400006

    View details for PubMedID 16908088

  • Predicting essential genes in fungal genomes GENOME RESEARCH Seringhaus, M., Paccanaro, A., Borneman, A., Snyder, M., Gerstein, M. 2006; 16 (9): 1126-1135

    Abstract

    Essential genes are required for an organism's viability, and the ability to identify these genes in pathogens is crucial to directed drug development. Predicting essential genes through computational methods is appealing because it circumvents expensive and difficult experimental screens. Most such prediction is based on homology mapping to experimentally verified essential genes in model organisms. We present here a different approach, one that relies exclusively on sequence features of a gene to estimate essentiality and offers a promising way to identify essential genes in unstudied or uncultured organisms. We identified 14 characteristic sequence features potentially associated with essentiality, such as localization signals, codon adaptation, GC content, and overall hydrophobicity. Using the well-characterized baker's yeast Saccharomyces cerevisiae, we employed a simple Bayesian framework to measure the correlation of each of these features with essentiality. We then employed the 14 features to learn the parameters of a machine learning classifier capable of predicting essential genes. We trained our classifier on known essential genes in S. cerevisiae and applied it to the closely related and relatively unstudied yeast Saccharomyces mikatae. We assessed predictive success in two ways: First, we compared all of our predictions with those generated by homology mapping between these two species. Second, we verified a subset of our predictions with eight in vivo knockouts in S. mikatae, and we present here the first experimentally confirmed essential genes in this species.

    View details for DOI 10.1101/gr.5144106

    View details for Web of Science ID 000240238600007

    View details for PubMedID 16899653

  • Proteome chips for whole-organism assays NATURE REVIEWS MOLECULAR CELL BIOLOGY Kung, L. A., Snyder, M. 2006; 7 (8): 617-622

    Abstract

    Over the past 5 years, protein-chip technology has emerged as a useful tool for the study of many kinds of protein interactions and biochemical activities. The construction of Saccharomyces cerevisiae whole-proteome arrays has enabled further studies of such interactions in a proteome-wide context. Here, we explore some of the recent advances that have been made at the '-omic' level using protein microarrays.

    View details for DOI 10.1038/nrm1941

    View details for Web of Science ID 000239240000019

    View details for PubMedID 16723973

  • Defined culture conditions of human embryonic stem cells PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Lu, J., Hou, R. H., Booth, C. J., Yang, S. H., Snyder, M. 2006; 103 (15): 5688-5693

    Abstract

    Human embryonic stem cells (hESCs) are pluripotent cells that have the potential to differentiate into any tissue in the human body; therefore, they are a valuable resource for regenerative medicine, drug screening, and developmental studies. However, the clinical application of hESCs is hampered by the difficulties of eliminating animal products in the culture medium and/or the complexity of conditions required to support hESC growth. We have developed a simple medium [termed hESC Cocktail (HESCO)] containing basic fibroblast growth factor, Wnt3a, April (a proliferation-inducing ligand)/BAFF (B cell-activating factor belonging to TNF), albumin, cholesterol, insulin, and transferrin, which is sufficient for hESC self-renewal and proliferation. Cells grown in HESCO were maintained in an undifferentiated state as determined by using six different stem cell markers, and their genomic integrity was confirmed by karyotyping. Cells cultured in HESCO readily form embryoid bodies in tissue culture and teratomas in mice. In both cases, the cells differentiated into each of the three cell lineages, ectoderm, endoderm, and mesoderm, indicating that they maintained their pluripotency. The use of a minimal medium sufficient for hESC growth is expected to greatly facilitate clinical application and developmental studies of hESCs.

    View details for DOI 10.1073/pnas.0601383103

    View details for Web of Science ID 000236896200012

    View details for PubMedID 16595624

  • High-resolution mapping of DNA copy alterations in human chromosome 22 using high-density tiling oligonucleotide arrays PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Urban, A. E., Korbel, J. O., Selzer, R., Richmond, T., Hacker, A., Popescu, G. V., Clubells, J. F., Green, R., Emanuel, B. S., Gerstein, M. B., Weissman, S. M., Snyder, M. 2006; 103 (12): 4534-4539

    Abstract

    Deletions and amplifications of the human genomic sequence (copy number polymorphisms) are the cause of numerous diseases and a potential cause of phenotypic variation in the normal population. Comparative genomic hybridization (CGH) has been developed as a useful tool for detecting alterations in DNA copy number that involve blocks of DNA several kilobases or larger in size. We have developed high-resolution CGH (HR-CGH) to detect accurately and with relatively little bias the presence and extent of chromosomal aberrations in human DNA. Maskless array synthesis was used to construct arrays containing 385,000 oligonucleotides with isothermal probes of 45-85 bp in length; arrays tiling the beta-globin locus and chromosome 22q were prepared. Arrays with a 9-bp tiling path were used to map a 622-bp heterozygous deletion in the beta-globin locus. Arrays with an 85-bp tiling path were used to analyze DNA from patients with copy number changes in the pericentromeric region of chromosome 22q. Heterozygous deletions and duplications as well as partial triploidies and partial tetraploidies of portions of chromosome 22q were mapped with high resolution (typically up to 200 bp) in each patient, and the precise breakpoints of two deletions were confirmed by DNA sequencing. Additional peaks potentially corresponding to known and novel additional CNPs were also observed. Our results demonstrate that HR-CGH allows the detection of copy number changes in the human genome at an unprecedented level of resolution.

    View details for DOI 10.1073/pnas.0511340103

    View details for Web of Science ID 000236362600039

    View details for PubMedID 16537408

  • Severe acute respiratory syndrome diagnostics using a coronavirus protein microarray PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Zhu, H., Hu, S. H., Jona, G., Zhu, X. W., Kreiswirth, N., Willey, B. M., Mazzulli, T., Liu, G. Z., Song, Q. F., Chen, P., Cameron, M., Tyler, A., Wang, J., Wen, J., Chen, W. J., Compton, S., Snyder, M. 2006; 103 (11): 4011-4016

    Abstract

    To monitor severe acute respiratory syndrome (SARS) infection, a coronavirus protein microarray that harbors proteins from SARS coronavirus (SARS-CoV) and five additional coronaviruses was constructed. These microarrays were used to screen approximately 400 Canadian sera from the SARS outbreak, including samples from confirmed SARS-CoV cases, respiratory illness patients, and healthcare professionals. A computer algorithm that uses multiple classifiers to predict samples from SARS patients was developed and used to predict 206 sera from Chinese fever patients. The test assigned patients into two distinct groups: those with antibodies to SARS-CoV and those without. The microarray also identified patients with sera reactive against other coronavirus proteins. Our results correlated well with an indirect immunofluorescence test and demonstrated that viral infection can be monitored for many months after infection. We show that protein microarrays can serve as a rapid, sensitive, and simple tool for large-scale identification of viral-specific antibodies in sera.

    View details for Web of Science ID 000236429300016

    View details for PubMedID 16537477

  • Target hub proteins serve as master regulators of development in yeast GENES & DEVELOPMENT Borneman, A. R., Leigh-Bell, J. A., Yu, H. Y., Bertone, P., Gerstein, M., Snyder, M. 2006; 20 (4): 435-448

    Abstract

    To understand the organization of the transcriptional networks that govern cell differentiation, we have investigated the transcriptional circuitry controlling pseudohyphal development in Saccharomyces cerevisiae. The binding targets of Ste12, Tec1, Sok2, Phd1, Mga1, and Flo8 were globally mapped across the yeast genome. The factors and their targets form a complex binding network, containing patterns characteristic of autoregulation, feedback and feed-forward loops, and cross-talk. Combinatorial binding to intergenic regions was commonly observed, which allowed for the identification of a novel binding association between Mga1 and Flo8, in which Mga1 requires Flo8 for binding to promoter regions. Further analysis of the network showed that the promoters of MGA1 and PHD1 were bound by all of the factors used in this study, identifying them as key target hubs. Overexpression of either of these two proteins specifically induced pseudohyphal growth under noninducing conditions, highlighting them as master regulators of the system. Our results indicate that target hubs can serve as master regulators whose activity is sufficient for the induction of complex developmental responses and therefore represent important regulatory nodes in biological networks.

    View details for DOI 10.1101/gad.1389306

    View details for Web of Science ID 000235428600007

    View details for PubMedID 16449570

  • Yeast as a model for human disease. Current protocols in human genetics / editorial board, Jonathan L. Haines ... [et al.] Smith, M. G., Snyder, M. 2006; Chapter 15: Unit 15 6-?

    Abstract

    The sequencing of the human genome promised the identification of disease-causing genes and, subsequently, therapies for those diseases. However, when identifying the genetic basis of a disease, it is not uncommon to discover an abnormal protein whose normal function is unknown. The genetic manipulations required to assign function to genes is often extremely difficult, if not impossible, in human cells. Model organisms have been used to facilitate understanding of gene function because of the ease of genetic manipulations and because many features of eukaryotic physiology have been conserved across phyla. Yeast is a simple eukaryote with a tractable genome, a short generation time, and a large network of researchers who have generated a vast arsenal of research tools. These traits make yeast ideally suited to help reveal the function of genes implicated in human disease.

    View details for DOI 10.1002/0471142905.hg1506s48

    View details for PubMedID 18428391

  • Design optimization methods for genomic DNA tiling arrays GENOME RESEARCH Bertone, P., Trifonov, V., Rozowsky, J. S., Schubert, F., Emanuelsson, O., Karro, J., Kao, M. Y., Snyder, M., Gerstein, M. 2006; 16 (2): 271-281

    Abstract

    A recent development in microarray research entails the unbiased coverage, or tiling, of genomic DNA for the large-scale identification of transcribed sequences and regulatory elements. A central issue in designing tiling arrays is that of arriving at a single-copy tile path, as significant sequence cross-hybridization can result from the presence of non-unique probes on the array. Due to the fragmentation of genomic DNA caused by the widespread distribution of repetitive elements, the problem of obtaining adequate sequence coverage increases with the sizes of subsequence tiles that are to be included in the design. This becomes increasingly problematic when considering complex eukaryotic genomes that contain many thousands of interspersed repeats. The general problem of sequence tiling can be framed as finding an optimal partitioning of non-repetitive subsequences over a prescribed range of tile sizes, on a DNA sequence comprising repetitive and non-repetitive regions. Exact solutions to the tiling problem become computationally infeasible when applied to large genomes, but successive optimizations are developed that allow their practical implementation. These include an efficient method for determining the degree of similarity of many oligonucleotide sequences over large genomes, and two algorithms for finding an optimal tile path composed of longer sequence tiles. The first algorithm, a dynamic programming approach, finds an optimal tiling in linear time and space; the second applies a heuristic search to reduce the space complexity to a constant requirement. A Web resource has also been developed, accessible at http://tiling.gersteinlab.org, to generate optimal tile paths from user-provided DNA sequences.

    View details for DOI 10.1101/gr.4455906

    View details for Web of Science ID 000235122000015

    View details for PubMedID 16365382

  • ProCAT: a data analysis approach for protein microarrays GENOME BIOLOGY Zhu, X., Gerstein, M., Snyder, M. 2006; 7 (11)

    Abstract

    Protein microarrays provide a versatile method for the analysis of many protein biochemical activities. Existing DNA microarray analytical methods do not translate to protein microarrays due to differences between the technologies. Here we report a new approach, ProCAT, which corrects for background bias and spatial artifacts, identifies significant signals, filters nonspecific spots, and normalizes the resulting signal to protein abundance. ProCAT provides a powerful and flexible new approach for analyzing many types of protein microarrays.

    View details for DOI 10.1186/gb-2006-7-11-r110

    View details for Web of Science ID 000243967000014

    View details for PubMedID 17109749

  • Extrapolating traditional DNA microarray statistics to tiling and protein microarray technologies DNA MICROARRAYS, PART B: DATABASES AND STATISTICS Royce, T. E., Rozowsky, J. S., Luscombe, N. M., Emanuelsson, O., Yu, H., Zhu, X., Snyder, M., Gerstein, M. B. 2006; 411: 282-311

    Abstract

    A credit to microarray technology is its broad application. Two experiments--the tiling microarray experiment and the protein microarray experiment--are exemplars of the versatility of the microarrays. With the technology's expanding list of uses, the corresponding bioinformatics must evolve in step. There currently exists a rich literature developing statistical techniques for analyzing traditional gene-centric DNA microarrays, so the first challenge in analyzing the advanced technologies is to identify which of the existing statistical protocols are relevant and where and when revised methods are needed. A second challenge is making these often very technical ideas accessible to the broader microarray community. The aim of this chapter is to present some of the most widely used statistical techniques for normalizing and scoring traditional microarray data and indicate their potential utility for analyzing the newer protein and tiling microarray experiments. In so doing, we will assume little or no prior training in statistics of the reader. Areas covered include background correction, intensity normalization, spatial normalization, and the testing of statistical significance.

    View details for DOI 10.1016/S0076-6879(06)11015-0

    View details for Web of Science ID 000244506300015

    View details for PubMedID 16939796

  • Genomic analysis of insertion behavior and target specificity of mini-Tn7 and Tn3 transposons in Saccharomyces cerevisiae NUCLEIC ACIDS RESEARCH Seringhaus, M., Kumar, A., Hartigan, J., Snyder, M., Gerstein, M. 2006; 34 (8)

    Abstract

    Transposons are widely employed as tools for gene disruption. Ideally, they should display unbiased insertion behavior, and incorporate readily into any genomic DNA to which they are exposed. However, many transposons preferentially insert at specific nucleotide sequences. It is unclear to what extent such bias affects their usefulness as mutagenesis tools. Here, we examine insertion site specificity and global insertion behavior of two mini-transposons previously used for large-scale gene disruption in Saccharomyces cerevisiae: Tn3 and Tn7. Using an expanded set of insertion data, we confirm that Tn3 displays marked preference for the AT-rich 5 bp consensus site TA[A/T]TA, whereas Tn7 displays negligible target site preference. On a genome level, both transposons display marked non-uniform insertion behavior: certain sites are targeted far more often than expected, and both distributions depart drastically from Poisson. Thus, to compare their insertion behavior on a genome level, we developed a windowed Kolmogorov-Smirnov (K-S) test to analyze transposon insertion distributions in sequence windows of various sizes. We find that when scored in large windows (>300 bp), both Tn3 and Tn7 distributions appear uniform, whereas in smaller windows, Tn7 appears uniform while Tn3 does not. Thus, both transposons are effective tools for gene disruption, but Tn7 does so with less duplication and a more uniform distribution, better approximating the behavior of the ideal transposon.

    View details for DOI 10.1093/nar/gkl184

    View details for Web of Science ID 000237697000001

    View details for PubMedID 16648358

  • Novel transcribed regions in the human genome COLD SPRING HARBOR SYMPOSIA ON QUANTITATIVE BIOLOGY Rozowsky, J., Wu, J., Lian, Z., Nagalakshmi, U., Korbel, J. O., Kapranov, P., Zheng, D., Dyke, S., Newburger, P., Miller, P., Gingeras, T. R., Weissman, S., Gerstein, M., Snyder, M. 2006; 71: 111-116

    Abstract

    We have used genomic tiling arrays to identify transcribed regions throughout the human genome. Analysis of the mapping results of RNA isolated from five cell/tissue types, NB4 cells, NB4 cells treated with retinoic acid (RA), NB4 cells treated with 12-O-tetradecanoylphorbol-13 acetate (TPA), neutrophils, and placenta, throughout the ENCODE region reveals a large number of novel transcribed regions. Interestingly, neutrophils exhibit a great deal of novel expression in several intronic regions. Comparison of the hybridization results of NB4 cells treated with different stimuli relative to untreated cells reveals that many new regions are expressed upon cell differentiation. One such region is the Hox locus, which contains a large number of novel regions expressed in a number of cell types. Analysis of the trinucleotide composition of the novel transcribed regions reveals that it is similar to that of known exons. These results suggest that many of the novel transcribed regions may have a functional role.

    View details for Web of Science ID 000245962800015

    View details for PubMedID 17381286

  • Global changes in STAT target selection and transcription regulation upon interferon treatments GENES & DEVELOPMENT Hartman, S. E., Bertone, P., Nath, A. K., Royce, T. E., Gerstein, M., Weissman, S., Snyder, M. 2005; 19 (24): 2953-2968

    Abstract

    The STAT (signal transducer and activator of transcription) proteins play a crucial role in the regulation of gene expression, but their targets and the manner in which they select them remain largely unknown. Using chromatin immunoprecipitation and DNA microarray analysis (ChIP-chip), we have identified the regions of human chromosome 22 bound by STAT1 and STAT2 in interferon-treated cells. Analysis of the genomic loci proximal to these binding sites introduced new candidate STAT1 and STAT2 target genes, several of which are affiliated with proliferation and apoptosis. The genes on chromosome 22 that exhibited interferon-induced up- or down-regulated expression were determined and correlated with the STAT-binding site information, revealing the potential regulatory effects of STAT1 and STAT2 on their target genes. Importantly, the comparison of STAT1-binding sites upon interferon (IFN)-gamma and IFN-alpha treatments revealed dramatic changes in binding locations between the two treatments. The IFN-alpha induction revealed nonconserved STAT1 occupancy at IFN-gamma-induced sites, as well as novel sites of STAT1 binding not evident in IFN-gamma-treated cells. Many of these correlated with binding by STAT2, but others were STAT2 independent, suggesting that multiple mechanisms direct STAT1 binding to its targets under different activation conditions. Overall, our results reveal a wealth of new information regarding IFN/STAT-binding targets and also fundamental insights into mechanisms of regulation of gene expression in different cell states.

    View details for DOI 10.1101/gad.1371305

    View details for Web of Science ID 000234095500004

    View details for PubMedID 16319195

  • Global analysis of protein phosphorylation in yeast NATURE Ptacek, J., Devgan, G., Michaud, G., Zhu, H., Zhu, X. W., Fasolo, J., Guo, H., Jona, G., Breitkreutz, A., Sopko, R., McCartney, R. R., Schmidt, M. C., Rachidi, N., Lee, S. J., Mah, A. S., Meng, L., Stark, M. J., Stern, D. F., De Virgilio, C., Tyers, M., Andrews, B., Gerstein, M., Schweitzer, B., Predki, P. F., Snyder, M. 2005; 438 (7068): 679-684

    Abstract

    Protein phosphorylation is estimated to affect 30% of the proteome and is a major regulatory mechanism that controls many basic cellular processes. Until recently, our biochemical understanding of protein phosphorylation on a global scale has been extremely limited; only one half of the yeast kinases have known in vivo substrates and the phosphorylating kinase is known for less than 160 phosphoproteins. Here we describe, with the use of proteome chip technology, the in vitro substrates recognized by most yeast protein kinases: we identified over 4,000 phosphorylation events involving 1,325 different proteins. These substrates represent a broad spectrum of different biochemical functions and cellular roles. Distinct sets of substrates were recognized by each protein kinase, including closely related kinases of the protein kinase A family and four cyclin-dependent kinases that vary only in their cyclin subunits. Although many substrates reside in the same cellular compartment or belong to the same functional category as their phosphorylating kinase, many others do not, indicating possible new roles for several kinases. Furthermore, integration of the phosphorylation results with protein-protein interaction and transcription factor binding data revealed novel regulatory modules. Our phosphorylation results have been assembled into a first-generation phosphorylation map for yeast. Because many yeast proteins and pathways are conserved, these results will provide insights into the mechanisms and roles of protein phosphorylation in many eukaryotes.

    View details for DOI 10.1038/nature04187

    View details for Web of Science ID 000233593100053

    View details for PubMedID 16319894

  • Advances in functional protein microarray technology FEBS JOURNAL Bertone, P., Snyder, M. 2005; 272 (21): 5400-5411

    Abstract

    Numerous innovations in high-throughput protein production and microarray surface technologies have enabled the development of addressable formats for proteins ordered at high spatial density. Protein array implementations have largely focused on antibody arrays for high-throughput protein profiling. However, it is also possible to construct arrays of full-length, functional proteins from a library of expression clones. The advent of protein-based microarrays allows the global observation of biochemical activities on an unprecedented scale, where hundreds or thousands of proteins can be simultaneously screened for protein-protein, protein-nucleic acid, and small molecule interactions. This technology holds great potential for basic molecular biology research, disease marker identification, toxicological response profiling and pharmaceutical target screening.

    View details for DOI 10.1111/j.1742-4658.2005.04970.x

    View details for Web of Science ID 000232772200003

    View details for PubMedID 16262682

  • A pilot study of transcription unit analysis in rice using oligonucleotide tiling-path microarray PLANT MOLECULAR BIOLOGY Stolc, V., Li, L., Wang, X. F., Li, X. Y., Su, N., Tongprasit, W., Han, B., Xue, Y. B., Li, J. Y., Snyder, M., Gerstein, M., Wang, J., Deng, X. W. 2005; 59 (1): 137-149

    Abstract

    As the international efforts to sequence the rice genome are completed, an immediate challenge and opportunity is to comprehensively and accurately define all transcription units in the rice genome. Here we describe a strategy of using high-density oligonucleotide tiling-path microarrays to map transcription of the japonica rice genome. In a pilot experiment to test this approach, one array representing the reverse strand of the last 11.2 Mb sequence of chromosome 10 was analyzed in detail based on a mathematical model developed in this study. Analysis of the array data detected 77% of the reference gene models in a mixture of four RNA populations. Moreover, significant transcriptional activities were found in many of the previously annotated intergenic regions. These preliminary results demonstrate the utility of genome tiling microarrays in evaluating annotated rice gene models and in identifying novel transcription units that will facilitate rice genome annotation.

    View details for DOI 10.1007/s11103-005-6164-5

    View details for Web of Science ID 000232498000012

    View details for PubMedID 16217608

  • Issues in the analysis of oligonucleotide tiling microarrays for transcript mapping TRENDS IN GENETICS Royce, T. E., Rozowsky, J. S., Bertone, P., Samanta, M., Stolc, V., Weissman, S., Snyder, M., Gerstein, M. 2005; 21 (8): 466-475

    Abstract

    Traditional microarrays use probes complementary to known genes to quantitate the differential gene expression between two or more conditions. Genomic tiling microarray experiments differ in that probes that span a genomic region at regular intervals are used to detect the presence or absence of transcription. This difference means the same sets of biases and the methods for addressing them are unlikely to be relevant to both types of experiment. We introduce the informatics challenges arising in the analysis of tiling microarray experiments as open problems to the scientific community and present initial approaches for the analysis of this nascent technology.

    View details for DOI 10.1016/j.tig.2005.06.007

    View details for Web of Science ID 000231209200010

    View details for PubMedID 15979196

  • Prospects and challenges in proteomics PLANT PHYSIOLOGY Bertone, P., Snyder, M. 2005; 138 (2): 560-562

    View details for DOI 10.1104/pp.104.900154

    View details for Web of Science ID 000229774200009

    View details for PubMedID 15955915

  • Sexual dimorphism in mammalian gene expression TRENDS IN GENETICS Rinn, J. L., Snyder, M. 2005; 21 (5): 298-305

    Abstract

    Males and females have obvious phenotypic differences; they also exhibit differences related to health, life span, cognitive abilities and have different responses to diseases such as anemia, coronary heart disease, hypertension and renal dysfunction. Although the anatomical, hormonal and chemical differences between the sexes are well known, there are few molecular descriptors for gender-specific physiological traits and health risks. Recent studies using microarrays and other methods have made significant progress towards elucidating the molecular differences between mammalian sexes in a variety of tissues and towards identifying the transcription factors that regulate sex-biased gene expression. These findings are providing new insights into the molecular and genetic differences that dictate the different behaviors and physiologies of mammalian sexes.

    View details for DOI 10.1016/j.tig.2005.03.005

    View details for Web of Science ID 000229143800012

    View details for PubMedID 15851067

  • Applications of DNA tiling arrays to experimental genome annotation and regulatory pathway discovery CHROMOSOME RESEARCH Bertone, P., Gerstein, M., Snyder, M. 2005; 13 (3): 259-274

    Abstract

    Microarrays have become a popular and important technology for surveying global patterns in gene expression and regulation. A number of innovative experiments have extended microarray applications beyond the measurement of mRNA expression levels, in order to uncover aspects of large-scale chromosome function and dynamics. This has been made possible due to the recent development of tiling arrays, where all non-repetitive DNA comprising a chromosome or locus is represented at various sequence resolutions. Since tiling arrays are designed to contain the entire DNA sequence without prior consultation of existing gene annotation, they enable the discovery of novel transcribed sequences and regulatory elements through the unbiased interrogation of genomic loci. The implementation of such methods for the global analysis of large eukaryotic genomes presents significant technical challenges. Nonetheless, tiling arrays are expected to become instrumental for the genome-wide identification and characterization of functional elements. Combined with computational methods to relate these data and map the complex interactions of transcriptional regulators, tiling array experiments can provide insight toward a more comprehensive understanding of fundamental molecular and cellular processes.

    View details for DOI 10.1007/s10577-005-2165-0

    View details for Web of Science ID 000228868500005

    View details for PubMedID 15868420

  • Global analysis of protein function using protein microarrays MECHANISMS OF AGEING AND DEVELOPMENT Smith, M. G., Jona, G., Ptacek, J., Devgan, G., Zhu, H., Zhu, X. W., Snyder, M. 2005; 126 (1): 171-175

    Abstract

    Protein microarrays containing thousands of proteins arrayed at high density can be prepared and probed for a wide variety of activities, thereby allowing the large scale analysis of many proteins simultaneously. In addition to identifying the activities of many previously uncharacterized proteins, protein microarrays can reveal new activities of well-characterized proteins, thus providing new insights about the functions of these proteins. Below, we describe the construction and use of protein microarrays and their applications using yeast as a model system.

    View details for DOI 10.1016/j.mad.2004.09.019

    View details for Web of Science ID 000226564000022

    View details for PubMedID 15610776

  • Global identification of human transcribed sequences with genome tiling arrays SCIENCE Bertone, P., Stolc, V., Royce, T. E., Rozowsky, J. S., Urban, A. E., Zhu, X. W., Rinn, J. L., Tongprasit, W., Samanta, M., Weissman, S., Gerstein, M., Snyder, M. 2004; 306 (5705): 2242-2246

    Abstract

    Elucidating the transcribed regions of the genome constitutes a fundamental aspect of human biology, yet this remains an outstanding problem. To comprehensively identify coding sequences, we constructed a series of high-density oligonucleotide tiling arrays representing sense and antisense strands of the entire nonrepetitive sequence of the human genome. Transcribed sequences were located across the genome via hybridization to complementary DNA samples, reverse-transcribed from polyadenylated RNA obtained from human liver tissue. In addition to identifying many known and predicted genes, we found 10,595 transcribed sequences not detected by other methods. A large fraction of these are located in intergenic regions distal from previously annotated genes and exhibit significant homology to other mammalian proteins.

    View details for DOI 10.1126/science.1103388

    View details for Web of Science ID 000225950000042

    View details for PubMedID 15539566

  • DNA replication-timing analysis of human chromosome 22 at high resolution and different developmental states PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA White, E. J., Emanuelsson, O., Scalzo, D., Royce, T., Kosak, S., Oakeley, E. J., Weissman, S., Gerstein, M., Groudine, M., Snyder, M., Schubeler, D. 2004; 101 (51): 17771-17776

    Abstract

    Duplication of the genome during the S phase of the cell cycle does not occur simultaneously; rather, different sequences are replicated at different times. The replication timing of specific sequences can change during development; however, the determinants of this dynamic process are poorly understood. To gain insights into the contribution of developmental state, genomic sequence, and transcriptional activity to replication timing, we investigated the timing of DNA replication at high resolution along an entire human chromosome (chromosome 22) in two different cell types. The pattern of replication timing was correlated with respect to annotated genes, gene expression, novel transcribed regions of unknown function, sequence composition, and cytological features. We observed that chromosome 22 contains regions of early- and late-replicating domains of 100 kb to 2 Mb, many (but not all) of which are associated with previously described chromosomal bands. In both cell types, expressed sequences are replicated earlier than nontranscribed regions. However, several highly transcribed regions replicate late. Overall, the DNA replication-timing profiles of the two different cell types are remarkably similar, with only nine regions of difference observed. In one case, this difference reflects the differential expression of an annotated gene that resides in this region. Novel transcribed regions with low coding potential exhibit a strong propensity for early DNA replication. Although the cellular function of such transcripts is poorly understood, our results suggest that their activity is linked to the replication-timing program.

    View details for DOI 10.1073/pnas.0408170101

    View details for Web of Science ID 000225951500038

    View details for PubMedID 15591350

  • Regulation of gene expression by a metabolic enzyme SCIENCE Hall, D. A., Zhu, H., Zhu, X. W., Royce, T., Gerstein, M., Snyder, M. 2004; 306 (5695): 482-484

    Abstract

    Gene expression in eukaryotes is normally believed to be controlled by transcriptional regulators that activate genes encoding structural proteins and enzymes. To identify previously unrecognized DNA binding activities, a yeast proteome microarray was screened with DNA probes; Arg5,6, a well-characterized mitochondrial enzyme involved in arginine biosynthesis, was identified. Chromatin immunoprecipitation experiments revealed that Arg5,6 is associated with specific nuclear and mitochondrial loci in vivo, and Arg5,6 binds to specific fragments in vitro. Deletion of Arg5,6 causes altered transcript levels of both nuclear and mitochondrial target genes. These results indicate that metabolic enzymes can directly regulate eukaryotic gene expression.

    View details for Web of Science ID 000224626500052

    View details for PubMedID 15486299

  • Large-scale mutagenesis of the yeast genome using a Tn7-derived multipurpose transposon GENOME RESEARCH Kumar, A., Seringhaus, M., Biery, M. C., Sarnovsky, R. J., Umansky, L., Piccirillo, S., Heidtman, M., Cheung, K. H., Dobry, C. J., Gerstein, M. B., Craig, N. L., Snyder, M. 2004; 14 (10A): 1975-1986

    Abstract

    We present here an unbiased and extremely versatile insertional library of yeast genomic DNA generated by in vitro mutagenesis with a multipurpose element derived from the bacterial transposon Tn7. This mini-Tn7 element has been engineered such that a single insertion can be used to generate a lacZ fusion, gene disruption, and epitope-tagged gene product. Using this transposon, we generated a plasmid-based library of approximately 300,000 mutant alleles; by high-throughput screening in yeast, we identified and sequenced 9032 insertions affecting 2613 genes (45% of the genome). From analysis of 7176 insertions, we found little bias in Tn7 target-site selection in vitro. In contrast, we also sequenced 10,174 Tn3 insertions and found a markedly stronger preference for an AT-rich 5-base pair target sequence. We further screened 1327 insertion alleles in yeast for hypersensitivity to the chemotherapeutic cisplatin. Fifty-one genes were identified, including four functionally uncharacterized genes and 25 genes involved in DNA repair, replication, transcription, and chromatin structure. In total, the collection reported here constitutes the largest plasmid-based set of sequenced yeast mutant alleles to date and, as such, should be singularly useful for gene and genome-wide functional analysis.

    View details for DOI 10.1101/gr.2875304

    View details for Web of Science ID 000224405900017

    View details for PubMedID 15466296

  • Genomic analysis of regulatory network dynamics reveals large topological changes NATURE Luscombe, N. M., Babu, M. M., Yu, H. Y., Snyder, M., Teichmann, S. A., Gerstein, M. 2004; 431 (7006): 308-312

    Abstract

    Network analysis has been applied widely, providing a unifying language to describe disparate systems ranging from social interactions to power grids. It has recently been used in molecular biology, but so far the resulting networks have only been analysed statically. Here we present the dynamics of a biological network on a genomic scale, by integrating transcriptional regulatory information and gene-expression data for multiple conditions in Saccharomyces cerevisiae. We develop an approach for the statistical analysis of network dynamics, called SANDY, combining well-known global topological measures, local motifs and newly derived statistics. We uncover large changes in underlying network architecture that are unexpected given current viewpoints and random simulations. In response to diverse stimuli, transcription factors alter their interactions to varying degrees, thereby rewiring the network. A few transcription factors serve as permanent hubs, but most act transiently only during certain conditions. By studying sub-network structures, we show that environmental responses facilitate fast signal propagation (for example, with short regulatory cascades), whereas the cell cycle and sporulation direct temporal progression through multiple stages (for example, with highly inter-connected transcription factors). Indeed, to drive the latter processes forward, phase-specific transcription factors inter-regulate serially, and ubiquitously active transcription factors layer above them in a two-tiered hierarchy. We anticipate that many of the concepts presented here--particularly the large-scale topological changes and hub transience--will apply to other biological networks, including complex sub-systems in higher eukaryotes.

    View details for DOI 10.1038/nature02782

    View details for Web of Science ID 000223864000041

    View details for PubMedID 15372033

  • Major molecular differences between mammalian sexes are involved in drug metabolism and renal function DEVELOPMENTAL CELL Rinn, J. L., Rozowsky, J. S., Laurenzi, I. J., Petersen, P. H., Zou, K. Y., Zhong, W. M., Gerstein, M., Snyder, M. 2004; 6 (6): 791-800

    Abstract

    Many anatomical differences exist between males and females; these are manifested on a molecular level by different hormonal environments. Although several molecular differences in adult tissues have been identified, a comprehensive investigation of the gene expression differences between males and females has not been performed. We surveyed the expression patterns of 13,977 mouse genes in male and female hypothalamus, kidney, liver, and reproductive tissues. Extensive differential gene expression was observed not only in the reproductive tissues, but also in the kidney and liver. The differentially expressed genes are involved in drug and steroid metabolism, osmotic regulation, or as yet unresolved cellular roles. In contrast, very few molecular differences were observed between the male and female hypothalamus in both mice and humans. We conclude that there are persistent differences in gene expression between adult males and females. These molecular differences have important implications for the physiological differences between males and females.

    View details for Web of Science ID 000222443200012

    View details for PubMedID 15177028

  • CREB binds to multiple loci on human chromosome 22 MOLECULAR AND CELLULAR BIOLOGY Euskirchen, G., Royce, T. E., Bertone, P., Martone, R., Rinn, J. L., Nelson, F. K., Sayward, F., Luscombe, N. M., Miller, P., Gerstein, M., Weissman, S., Snyder, M. 2004; 24 (9): 3804-3814

    Abstract

    The cyclic AMP-responsive element-binding protein (CREB) is an important transcription factor that can be activated by hormonal stimulation and regulates neuronal function and development. An unbiased, global analysis of where CREB binds has not been performed. We have mapped for the first time the binding distribution of CREB along an entire human chromosome. Chromatin immunoprecipitation of CREB-associated DNA and subsequent hybridization of the associated DNA to a genomic DNA microarray containing all of the nonrepetitive DNA of human chromosome 22 revealed 215 binding sites corresponding to 192 different loci and 100 annotated potential gene targets. We found binding near or within many genes involved in signal transduction and neuronal function. We also found that only a small fraction of CREB binding sites lay near well-defined 5' ends of genes; the majority of sites were found elsewhere, including introns and unannotated regions. Several of the latter lay near novel unannotated transcriptionally active regions. Few CREB targets were found near full-length cyclic AMP response element sites; the majority contained shorter versions or close matches to this sequence. Several of the CREB targets were altered in their expression by treatment with forskolin; interestingly, both induced and repressed genes were found. Our results provide novel molecular insights into how CREB mediates its functions in humans.

    View details for DOI 10.1128/MCB.24.9.3804-3814.2004

    View details for Web of Science ID 000220898100021

    View details for PubMedID 15082775

  • Microbial synergy via an ethanol-triggered pathway MOLECULAR AND CELLULAR BIOLOGY Smith, M. G., Des Etages, S. G., Snyder, M. 2004; 24 (9): 3874-3884

    Abstract

    We have discovered a microbial interaction between yeast, bacteria, and nematodes. Upon coculturing, Saccharomyces cerevisiae stimulated the growth of several species of Acinetobacter, including, A. baumannii, A. haemolyticus, A. johnsonii, and A. radioresistens, as well as several natural isolates of Acinetobacter. This enhanced growth was due to a diffusible factor that was shown to be ethanol by chemical assays and evaluation of strains lacking ADH1, ADH3, and ADH5, as all three genes are involved in ethanol production by yeast. This effect is specific to ethanol: methanol, butanol, and dimethyl sulfoxide were unable to stimulate growth to any appreciable level. Low doses of ethanol not only stimulated growth to a higher cell density but also served as a signaling molecule: in the presence of ethanol, Acinetobacter species were able to withstand the toxic effects of salt, indicating that ethanol alters cell physiology. Furthermore, ethanol-fed A. baumannii displayed increased pathogenicity when confronted with a predator, Caenorhabditis elegans. Our results are consistent with the concept that ethanol can serve as a signaling molecule which can affect bacterial physiology and survival.

    View details for DOI 10.1128/MCB.24.9.3874-3884.2004

    View details for Web of Science ID 000220898100027

    View details for PubMedID 15082781

  • Regulation of polarized growth initiation and termination cycles by the polarisome and Cdc42 regulators JOURNAL OF CELL BIOLOGY Bidlingmaier, S., Snyder, M. 2004; 164 (2): 207-218

    Abstract

    The dynamic regulation of polarized cell growth allows cells to form structures of defined size and shape. We have studied the regulation of polarized growth using mating yeast as a model. Haploid yeast cells treated with high concentration of pheromone form successive mating projections that initiate and terminate growth with regular periodicity. The mechanisms that control the frequency of growth initiation and termination under these conditions are not well understood. We found that the polarisome components Spa2, Pea2, and Bni1 and the Cdc42 regulators Cdc24 and Bem3 control the timing and frequency of projection formation. Loss of polarisome components and mutation of Cdc24 decrease the frequency of projection formation, while loss of Bem3 increases the frequency of projection formation. We found that polarisome components and the cell fusion proteins Fus1 and Fus2 are important for the termination of projection growth. Our results define the first molecular regulators that control the timing of growth initiation and termination during eukaryotic cell differentiation.

    View details for DOI 10.1083/jcb.200307065

    View details for Web of Science ID 000188370500006

    View details for PubMedID 14734532

  • Microarrays to characterize protein interactions on a whole-proteome scale PROTEOMICS Schweitzer, B., Predki, P., Snyder, M. 2003; 3 (11): 2190-2199

    Abstract

    Protein microarrays contain a defined set of proteins spotted and analyzed at high density, and can be generally classified into two categories; protein profiling arrays and functional protein arrays. Functional protein arrays can be made up of any type of protein, and therefore have a diverse set of useful applications. Advantages of these arrays include low reagent consumption, rapid interpretation of results, and the ability to easily control experimental conditions. The ultimate form of a functional protein array consists of all of the proteins encoded by the genome of an organism; such an array would be the whole proteome equivalent of the whole genome DNA arrays that are now available. While proteome microarrays may not have reached the stage of maturity of DNA microarrays, recent developments have shown that many of the barriers holding back the technology can be overcome. Arrays of this type have already been used to rapidly screen large numbers of proteins simultaneously for biochemical activities, protein-protein interactions, protein-lipid interactions, protein-nucleic acid interactions, and protein-small molecule interactions. Eventually, functional protein arrays will be used to facilitate various steps in the drug discovery and early development processes that are currently bottlenecks in the drug development continuum.

    View details for DOI 10.1002/pmic.200300610

    View details for Web of Science ID 000186582500015

    View details for PubMedID 14595818

  • Negative regulation of calcineurin signaling by Hrr25p, a yeast homolog of casein kinase I GENES & DEVELOPMENT Kafadar, K. A., Zhu, H., Snyder, M., Cyert, M. S. 2003; 17 (21): 2698-2708

    Abstract

    Calcineurin is a Ca2+/calmodulin-regulated protein phosphatase required for Saccharomyces cerevisiae to respond to a variety of environmental stresses. Calcineurin promotes cell survival during stress by dephosphorylating and activating the Zn-finger transcription factor Crz1p/Tcn1p. Using a high-throughput assay, we screened 119 yeast kinases for their ability to phosphorylate Crz1p in vitro and identified the casein kinase I homolog Hrr25p. Here we show that Hrr25p negatively regulates Crz1p activity and nuclear localization in vivo. Hrr25p binds to and phosphorylates Crz1p in vitro and in vivo. Overexpression of Hrr25p decreases Crz1p-dependent transcription and antagonizes its Ca2+-induced nuclear accumulation. In the absence of Hrr25p, activation of Crz1p by Ca2+/calcineurin is potentiated. These findings represent the first identification of a negative regulator for Crz1p, and establish a novel physiological role for Hrr25p in antagonizing calcineurin signaling.

    View details for DOI 10.1101/gad.1140603

    View details for Web of Science ID 000186299700011

    View details for PubMedID 14597664

  • A Bayesian networks approach for predicting protein-protein interactions from genomic data SCIENCE Jansen, R., Yu, H. Y., Greenbaum, D., Kluger, Y., Krogan, N. J., Chung, S. B., Emili, A., Snyder, M., Greenblatt, J. F., Gerstein, M. 2003; 302 (5644): 449-453

    Abstract

    We have developed an approach using Bayesian networks to predict protein-protein interactions genome-wide in yeast. Our method naturally weights and combines into reliable predictions genomic features only weakly associated with interaction (e.g., messenger RNAcoexpression, coessentiality, and colocalization). In addition to de novo predictions, it can integrate often noisy, experimental interaction data sets. We observe that at given levels of sensitivity, our predictions are more accurate than the existing high-throughput experimental data sets. We validate our predictions with TAP (tandem affinity purification) tagging experiments. Our analysis, which gives a comprehensive view of yeast interactions, is available at genecensus.org/intint.

    View details for Web of Science ID 000185963200044

    View details for PubMedID 14564010

  • Distribution of NF-kappa B-binding sites across human chromosome 22 PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Martone, R., Euskirchen, G., Bertone, P., Hartman, S., Royce, T. E., Luscombe, N. M., Rinn, J. L., Nelson, F. K., Miller, P., Gerstein, M., Weissman, S., Snyder, M. 2003; 100 (21): 12247-12252

    Abstract

    We have mapped the chromosomal binding site distribution of a transcription factor in human cells. The NF-kappaB family of transcription factors plays an essential role in regulating the induction of genes involved in several physiological processes, including apoptosis, immunity, and inflammation. The binding sites of the NF-kappaB family member p65 were determined by using chromatin immunoprecipitation and a genomic microarray of human chromosome 22 DNA. Sites of binding were observed along the entire chromosome in both coding and noncoding regions, with an enrichment at the 5' end of genes. Strikingly, a significant proportion of binding was seen in intronic regions, demonstrating that transcription factor binding is not restricted to promoter regions. NF-kappaB binding was also found at genes whose expression was regulated by tumor necrosis factor alpha, a known inducer of NF-kappaB-dependent gene expression, as well as adjacent to genes whose expression is not affected by tumor necrosis factor alpha. Many of these latter genes are either known to be activated by NF-kappaB under other conditions or are consistent with NF-kappaB's role in the immune and apoptotic responses. Our results suggest that binding is not restricted to promoter regions and that NF-kappaB binding occurs at a significant number of genes whose expression is not altered, thereby suggesting that binding alone is not sufficient for gene activation.

    View details for DOI 10.1073/pnas.2135255100

    View details for Web of Science ID 000186024300058

    View details for PubMedID 14527995

  • Cytoskeletal activation of a checkpoint kinase MOLECULAR CELL Hanrahan, J., Snyder, M. 2003; 12 (3): 663-673

    Abstract

    The assembly of cytoskeletal structures is coupled to other cellular processes. We have studied the molecular mechanism by which assembly of the yeast septin cytoskeleton is monitored and coordinated with cell cycle progression by analyzing a key regulatory protein kinase, Hsl1, that becomes activated only when the septin cytoskeleton is properly assembled. We first identified a regulatory region of Hsl1 that physically associates with the kinase domain and found that it performs an autoinhibitory function both in vivo and in vitro. Several septin binding domains lie near and overlap the inhibitory domain; these are important for Hsl1 function, and binding of two septins, Cdc11 and Cdc12, relieves the autoinhibition imposed by the kinase inhibitory domain in vitro. Our results suggest that binding to multiple septins activates Hsl1 kinase activity, thereby promoting cell cycle progression. The high conservation of Hsl1 indicates that similar mechanisms may monitor cytoskeletal organization in other eukaryotes.

    View details for Web of Science ID 000185613800015

    View details for PubMedID 14527412

  • Specific protein targeting during cell differentiation: Polarized localization of Fus1p during mating depends on Chs5p in Saccharomyces cerevisiae EUKARYOTIC CELL Santos, B., Snyder, M. 2003; 2 (4): 821-825

    Abstract

    In budding yeast, chs5 mutants are defective in chitin synthesis and cell fusion during mating. Chs5p is a late-Golgi protein required for the polarized transport of the chitin synthase Chs3p to the membrane. Here we show that Chs5p is also essential for the polarized targeting of Fus1p, but not of other cell fusion proteins, to the membrane during mating.

    View details for DOI 10.1128/EC.2.4.821-825.2003

    View details for Web of Science ID 000184803000018

    View details for PubMedID 12912901

  • ExpressYourself: a modular platform for processing and visualizing microarray data NUCLEIC ACIDS RESEARCH Luscombe, N. M., Royce, T. E., Bertone, P., Echols, N., Horak, C. E., Chang, J. T., Snyder, M., Gerstein, M. 2003; 31 (13): 3477-3482

    Abstract

    DNA microarrays are widely used in biological research; by analyzing differential hybridization on a single microarray slide, one can detect changes in mRNA expression levels, increases in DNA copy numbers and the location of transcription factor binding sites on a genomic scale. Having performed the experiments, the major challenge is to process large, noisy datasets in order to identify the specific array elements that are significantly differentially hybridized. This normally requires aggregating different, often incompatible programs into a multi-step pipeline. Here we present ExpressYourself, a fully integrated platform for processing microarray data. In completely automated fashion, it will correct the background array signal, normalize the Cy5 and Cy3 signals, score levels of differential hybridization, combine the results of replicate experiments, filter problematic regions of the array and assess the quality of individual and replicate experiments. ExpressYourself is designed with a highly modular architecture so various types of microarray analysis algorithms can readily be incorporated as they are developed; for example, the system currently implements several normalization methods, including those that simultaneously consider signal intensity and slide location. The processed data are presented using a web-based graphical interface to facilitate comparison with the original images of the array slides. In particular, Express Yourself is able to regenerate images of the original microarray after applying various steps of processing, which greatly facilities identification of position-specific artifacts. The program is freely available for use at http://bioinfo.mbb.yale.edu/expressyourself.

    View details for DOI 10.1093/nar/gkg628

    View details for Web of Science ID 000183832900040

    View details for PubMedID 12824348

  • Recent developments in analytical and functional protein microarrays CURRENT OPINION IN MOLECULAR THERAPEUTICS Jona, G., Snyder, M. 2003; 5 (3): 271-277

    Abstract

    In recent years, the genomes of many different organisms have been fully sequenced and annotated. As a consequence of this information, a number of methods have emerged to study the function of many genes and proteins in parallel. One recent approach for the large-scale analysis of proteins is the use of protein microarrays in which hundreds to thousands of proteins are arrayed and assayed simultaneously. Protein arrays can be used for assessing protein levels and following disease markers, identifying biochemical activities, analyzing post-translational modifications, building interaction networks, and for drug discovery and development. In this review, we discuss the construction of different types of protein arrays, and their numerous and diverse applications.

    View details for Web of Science ID 000184024600009

    View details for PubMedID 12870437

  • Genomics - Defining genes in the genomics era SCIENCE Snyder, M., Gerstein, M. 2003; 300 (5617): 258-260

    View details for Web of Science ID 000182135400032

    View details for PubMedID 12690176

  • Molecular dissection of a yeast septin: Distinct domains are required for septin interaction, localization, and function MOLECULAR AND CELLULAR BIOLOGY Casamayor, A., Snyder, M. 2003; 23 (8): 2762-2777

    Abstract

    The septins are a family of cytoskeletal proteins present in animal and fungal cells. They were first identified for their essential role in cytokinesis, but more recently, they have been found to play an important role in many cellular processes, including bud site selection, chitin deposition, cell compartmentalization, and exocytosis. Septin proteins self-associate into filamentous structures that, in yeast cells, form a cortical ring at the mother bud neck. Members of the septin family share common structural domains: a GTPase domain in the central region of the protein, a stretch of basic residues at the amino terminus, and a predicted coiled-coil domain at the carboxy terminus. We have studied the role of each domain in the Saccharomyces cerevisiae septin Cdc11 and found that the three domains are responsible for distinct and sometimes overlapping functions. All three domains are important for proper localization and function in cytokinesis and morphogenesis. The basic region was found to bind the phosphoinositides phosphatidylinositol 4-phosphate and phosphatidylinositol 5-phosphate. The coiled-coil domain is important for interaction with Cdc3 and Bem4. The GTPase domain is involved in Cdc11-septin interaction and targeting to the mother bud neck. Surprisingly, GTP binding appears to be dispensable for Cdc11 function, localization, and lipid binding. Thus, we find that septins are multifunctional proteins with specific domains involved in distinct molecular interactions required for assembly, localization, and function within the cell.

    View details for DOI 10.1128/MCB.23.8.2762-2777.2003

    View details for Web of Science ID 000182049900012

    View details for PubMedID 12665577

  • The transcriptional activity of human Chromosome 22 GENES & DEVELOPMENT Rinn, J. L., Euskirchen, G., Bertone, P., Martone, R., Luscombe, N. M., Hartman, S., Harrison, P. M., Nelson, F. K., Miller, P., Gerstein, M., Weissman, S., Snyder, M. 2003; 17 (4): 529-540

    Abstract

    A DNA microarray representing nearly all of the unique sequences of human Chromosome 22 was constructed and used to measure global-transcriptional activity in placental poly(A)(+) RNA. We found that many of the known, related and predicted genes are expressed. More importantly, our study reveals twice as many transcribed bases as have been reported previously. Many of the newly discovered expressed fragments were verified by RNA blot analysis and a novel technique called differential hybridization mapping (DHM). Interestingly, a significant fraction of these novel fragments are expressed antisense to previously annotated introns. The coding potential of these novel expressed regions is supported by their sequence conservation in the mouse genome. This study has greatly increased our understanding of the biological information encoded on a human chromosome. To facilitate the dissemination of these results to the scientific community, we have developed a comprehensive Web resource to present the findings of this study and other features of human Chromosome 22 at http://array.mbb.yale.edu/chr22.

    View details for DOI 10.1101/gad.1055203

    View details for Web of Science ID 000181276200011

    View details for PubMedID 12600945

  • Protein chip technology CURRENT OPINION IN CHEMICAL BIOLOGY Zhu, H., Snyder, M. 2003; 7 (1): 55-63

    Abstract

    Microarray technology has become a crucial tool for large-scale and high-throughput biology. It allows fast, easy and parallel detection of thousands of addressable elements in a single experiment. In the past few years, protein microarray technology has shown its great potential in basic research, diagnostics and drug discovery. It has been applied to analyse antibody-antigen, protein-protein, protein-nucleic-acid, protein-lipid and protein-small-molecule interactions, as well as enzyme-substrate interactions. Recent progress in the field of protein chips includes surface chemistry, capture molecule attachment, protein labeling and detection methods, high-throughput protein/antibody production, and applications to analyse entire proteomes.

    View details for DOI 10.1016/S1367-5931(02)00005-4

    View details for Web of Science ID 000180868900009

    View details for PubMedID 12547427

  • Identification of novel functional elements in the human genome COLD SPRING HARBOR SYMPOSIA ON QUANTITATIVE BIOLOGY Lian, Z., Euskirchen, G., Rinn, J., Martone, R., Bertone, P., Hartman, S., Royce, T., Nelson, K., Sayward, F., Luscombe, N., Yang, J., Li, J. L., Miller, P., Urban, A. E., Gerstein, M., Weissman, S., Snyder, M. 2003; 68: 317-322

    View details for Web of Science ID 000222969300037

    View details for PubMedID 15338632

  • Proteomics ANNUAL REVIEW OF BIOCHEMISTRY Zhu, H., Bilgin, M., Snyder, M. 2003; 72: 783-812

    Abstract

    Fueled by ever-growing DNA sequence information, proteomics-the large scale analysis of proteins-has become one of the most important disciplines for characterizing gene function, for building functional linkages between protein molecules, and for providing insight into the mechanisms of biological processes in a high-throughput mode. It is now possible to examine the expression of more than 1000 proteins using mass spectrometry technology coupled with various separation methods. High-throughput yeast two-hybrid approaches and analysis of protein complexes using affinity tag purification have yielded valuable protein-protein interaction maps. Large-scale protein tagging and subcellular localization projects have provided considerable information about protein function. Finally, recent developments in protein microarray technology provide a versatile tool to study protein-protein, protein-nucleic acid, protein-lipid, enzyme-substrate, and protein-drug interactions. Other types of microarrays, though not fully developed, also show great potential in diagnostics, protein profiling, and drug identification and validation. This review discusses high-throughput technologies for proteome analysis and their applications. Also discussed are the approaches used for the integrated analysis of the voluminous sets of data generated by proteome analysis conducted on a global scale.

    View details for DOI 10.1146/annurev.biochem.72.121801.161511

    View details for Web of Science ID 000185092500024

    View details for PubMedID 14527327

  • Proteomic approaches for the global analysis of proteins BIOTECHNIQUES Michaud, G. A., Snyder, M. 2002; 33 (6): 1308-1316

    Abstract

    Improvements in technology that allow miniaturization and high-throughput analyses of thousand of genes and gene products have changed the focus and scope of research and development in both academia and industry. It is now possible to study entire proteomes with the goals of elucidating protein expression, subcellular localization, biochemical activities, and their regulation. Alterations in different cell types and conditions and in normal and disease states can be revealed. This wealth of information not only has facilitated our basic understanding of many biological processes but also has enormous potential for drug discovery and development.

    View details for Web of Science ID 000179996500022

    View details for PubMedID 12503317

  • The alpha-factor receptor C-terminus is important for mating projection formation and orientation in Saccharomyces cerevisiae CELL MOTILITY AND THE CYTOSKELETON Vallier, L. G., Segall, J. E., Snyder, M. 2002; 53 (4): 251-266

    Abstract

    Successful mating of MATa Saccharomyces cerevisiae cells is dependent on Ste2p, the alpha-factor receptor. Besides receiving the pheromone signal and transducing it through the G-protein coupled MAP kinase pathway, Ste2p is active in the establishment and orientation of the mating projection. We investigated the role of the carboxyl terminus of the receptor in mating projection formation and orientation using a spatial gradient assay. Cells carrying the ste2-T326 mutation, truncating 105 of the 135 amino acids in the receptor tail including a motif necessary for its ligand-mediated internalization, display slow onset of projection formation, abnormal shmoo morphology, and reduced ability to orient the mating projection toward a pheromone source. This reduction was due to the increased loss of mating projection orientation in a pheromone gradient. Cells with a mutated endocytosis motif were defective in reorientation in a pheromone gradient. ste2-Delta296 cells, which carry a complete truncation of the Ste2p tail, exhibit a severe defect in projection formation, and those projections that do form are unable to orient in a pheromone gradient. These results suggest a complex role for the Ste2p carboxy-terminal tail in the formation, orientation, and directional adjustment of the mating projection, and that endocytosis of the receptor is important for this process. In addition, mutations in RSR1/BUD1 and SPA2, genes necessary for budding polarity, exhibited little or no defect in formation or orientation of mating projections. We conclude that mating projection orientation depends upon the carboxyl terminus of the pheromone receptor and not the directional machinery used in budding.

    View details for DOI 10.1002/cm.10073

    View details for Web of Science ID 000179314000001

    View details for PubMedID 12378535

  • A novel mitochondrial protein, Tar1p, is encoded on the antisense strand of the nuclear 25S rDNA GENES & DEVELOPMENT Coelho, P. S., Bryan, A. C., Kumar, A., Shadel, G. S., Snyder, M. 2002; 16 (21): 2755-2760

    Abstract

    In eukaryotes, it is widely assumed that genes coding for proteins and structural RNAs do not overlap. Using a transposon-tagging strategy to globally analyze the Saccharomyces cerevisiae genome for expressed genes, we identified multiple insertions in an open reading frame that is contained fully within and transcribed antisense to the 25S rRNA gene in the nuclear rDNA repeat region on Chromosome XII. Expression of this gene, TAR1 (Transcript Antisense to Ribosomal RNA), can be detected at the RNA and protein levels, and the primary sequence of the corresponding 124-amino-acid protein is conserved in several yeast species. Tar1p was found to localize to mitochondria, and overexpression of the protein suppresses the respiration-deficient petite phenotype of a point mutation in mitochondrial RNA polymerase that affects mitochondrial gene expression and mtDNA stability. These findings indicate that coding information for protein and structural RNAs can overlap, raising issues regarding the coevolution of such complex genes, and also suggest that rDNA transcription and mitochondrial function are coordinately regulated in eukaryotic cells.

    View details for DOI 10.1101/gad.1035002

    View details for Web of Science ID 000179027900004

    View details for PubMedID 12414727

  • A dynamic approach to mapping coordinates between microplates and microarrays JOURNAL OF BIOMEDICAL INFORMATICS Cheung, K. H., Hager, J., Nelson, K., White, K., Li, Y. L., Snyder, M., Williams, K., Miller, P. 2002; 35 (5-6): 306-312

    Abstract

    The retrieval of useful data from spotted microarray slides requires keeping track of which microplate wells and DNA sample corresponds to each spot on each array slide. Existing approaches are closely coupled with the type of arrayer in use and are computer operating-system-specific. To support the microarray researcher community at large who use different arrayers and computer platforms, increased flexibility, generality, and portability of these approaches are required. In this paper, we describe a general algorithm that correlates the well positions of DNA samples in each microplate to the positions of the spots on each array slide. Based on this algorithm, we have implemented a flexible and platform-independent program named MicroArray Convolutor (MAC) that provides a Web solution allowing the user to: (a) import a text file that identifies the DNA samples and their well locations, (b) select a transformation method that converts data in 96-well plate format into 384-well plate format, and (c) specify the output format of the array lists dependant on the configuration of the array platform as well as the downstream analysis software chosen for the array. MAC and its source code can be accessed via the following Web address: http://ymd.med.yale.edu/kei-cgi/kc_mac_dev8.pl.

    View details for DOI 10.1016/S1532-0464(03)00033-9

    View details for Web of Science ID 000184879000004

    View details for PubMedID 12968779

  • Global analysis of gene expression in yeast. Functional & integrative genomics Horak, C. E., Snyder, M. 2002; 2 (4-5): 171-180

    Abstract

    In the past decade, there has been an intense effort to comprehensively catalogue the expressed genes in the yeast Saccharomyces cerevisiae and to determine the absolute and relative abundance of transcript and protein levels under different cellular conditions. Several methods have been developed to monitor gene expression: DNA microarray analysis, Serial Analysis of Gene Expression (SAGE), kinetic RT-PCR and monitoring expression of beta-galactosidase fusion proteins. These techniques have been used to measure transcript and protein abundance in different developmental states and under different environmental stimuli. A wealth of expression data for yeast is now publicly available through several web sites. The expression information that exists has the obvious benefits of providing a better understanding of the gene expression patterns that accompany changes in a yeast cell's environmental and developmental states. This data has also, however, provided clues to unraveling the complicated questions surrounding gene regulation: why and how is gene expression controlled?

    View details for PubMedID 12192590

  • Functional profiling of the Saccharomyces cerevisiae genome NATURE Giaever, G., Chu, A. M., Ni, L., CONNELLY, C., Riles, L., Veronneau, S., Dow, S., Lucau-Danila, A., Anderson, K., Andre, B., Arkin, A. P., Astromoff, A., El Bakkoury, M., Bangham, R., Benito, R., Brachat, S., Campanaro, S., Curtiss, M., Davis, K., Deutschbauer, A., Entian, K. D., Flaherty, P., Foury, F., Garfinkel, D. J., Gerstein, M., Gotte, D., Guldener, U., Hegemann, J. H., Hempel, S., Herman, Z., Jaramillo, D. F., Kelly, D. E., Kelly, S. L., Kotter, P., LaBonte, D., Lamb, D. C., Lan, N., Liang, H., Liao, H., Liu, L., Luo, C. Y., Lussier, M., Mao, R., Menard, P., Ooi, S. L., Revuelta, J. L., Roberts, C. J., Rose, M., Ross-Macdonald, P., Scherens, B., Schimmack, G., Shafer, B., Shoemaker, D. D., Sookhai-Mahadeo, S., Storms, R. K., Strathern, J. N., Valle, G., Voet, M., Volckaert, G., Wang, C. Y., Ward, T. R., Wilhelmy, J., Winzeler, E. A., Yang, Y. H., Yen, G., Youngman, E., Yu, K. X., Bussey, H., Boeke, J. D., Snyder, M., Philippsen, P., Davis, R. W., Johnston, M. 2002; 418 (6896): 387-391

    Abstract

    Determining the effect of gene deletion is a fundamental approach to understanding gene function. Conventional genetic screens exhibit biases, and genes contributing to a phenotype are often missed. We systematically constructed a nearly complete collection of gene-deletion mutants (96% of annotated open reading frames, or ORFs) of the yeast Saccharomyces cerevisiae. DNA sequences dubbed 'molecular bar codes' uniquely identify each strain, enabling their growth to be analysed in parallel and the fitness contribution of each gene to be quantitatively assessed by hybridization to high-density oligonucleotide arrays. We show that previously known and new genes are necessary for optimal growth under six well-studied conditions: high salt, sorbitol, galactose, pH 8, minimal medium and nystatin treatment. Less than 7% of genes that exhibit a significant increase in messenger RNA expression are also required for optimal growth in four of the tested conditions. Our results validate the yeast gene-deletion collection as a valuable resource for functional genomics.

    View details for DOI 10.1038/nature00935

    View details for Web of Science ID 000177009700029

    View details for PubMedID 12140549

  • Large-scale identification of genes important for apical growth in Saccharomyces cerevisiae by directed allele replacement technology (DART) screening. Functional & integrative genomics Bidlingmaier, S., Snyder, M. 2002; 1 (6): 345-356

    Abstract

    In Saccharomyces cerevisiae, apical bud growth occurs for a brief period in G1 when the deposition of membrane and cell wall is restricted to the tip of the growing bud. To identify genes important for apical bud growth, we have utilized a novel transposon-based mutagenesis system termed DART (Directed Allele Replacement Technology) that allows the rapid transfer of defined insertion alleles into any strain background. A total of 4,810 insertion alleles affecting 1,392 different yeast genes were transferred into a cdc34-2 mutant strain that arrests in the apical growth phase when grown at the restrictive temperature of 37 degrees C. We identified 29 insertion alleles, containing mutations in 17 different genes ( SMY1, SPA2, PAN1, SLA1, SLA2, CBK1, SEC22, FAB1, VPS36, VID22, RAS2, ECM33, OPI3, API1/YDR372c, API2/YDR525w, API3/YKR020w, and API4/YNL051w), which alter the elongated bud morphology of cdc34-2 cells arrested in the apical growth phase. Upon treatment with mating pheromone at 25 degrees C, cells containing insertion alleles affecting ten of these genes ( SMY1, SPA2, PAN1, SLA1, SLA2, CBK1, FAB1, VPS36, VID22, and API2/YDR525w) form abnormal mating projections. Additionally, cells containing insertion alleles affecting SEC22, RAS2, API1/YDR372c, API3/YKR020w,and API4/YNL051display severe mating projection formation defects at the elevated temperature of 37 degrees C. DART mutagenesis has many advantages over traditional mutagenesis methods and will be a useful tool for dissecting gene networks important for biological processes.

    View details for PubMedID 11957109

  • Bud-site selection and cell polarity in budding yeast CURRENT OPINION IN MICROBIOLOGY Casamayor, A., Snyder, M. 2002; 5 (2): 179-186

    Abstract

    Polarized growth involves a hierarchy of events such as selection of the growth site, polarization of the cytoskeleton to the selected growth site, and transport of secretory vesicles containing components required for growth. The budding yeast Saccharomyces cerevisiae is an excellent model system for the study of polarized cell growth. A large number of proteins have been found to be involved in these processes, although their mechanisms of action are not yet well-understood. Recent discoveries have helped elucidate many of the processes involved in cell polarity and bud-site selection in yeast and have modified the traditional view of cellular structures involved in these processes. This review focuses on recent advances on the roles of cortical tags, GTPases and the cytoskeleton in the generation and maintenance of cell polarity in yeast.

    View details for Web of Science ID 000175460500009

    View details for PubMedID 11934615

  • 'Omic' approaches for unraveling signaling networks CURRENT OPINION IN CELL BIOLOGY Zhu, H., Snyder, M. 2002; 14 (2): 173-179

    Abstract

    Signaling pathways are crucial for cell differentiation and response to cellular environments. Recently, a large number of approaches for the global analysis of genes and proteins have been described. These have provided important new insights into the components of different pathways and the molecular and cellular responses of these pathways. This review covers genomic and proteomic (collectively referred to as "omic") approaches for the global analysis of cell signaling, including gene expression profiling and analysis, protein-protein interaction methods, protein microarrays, mass spectroscopy and gene-disruption and engineering approaches.

    View details for DOI 10.1016/S0955-0674(02)00315-0

    View details for Web of Science ID 000174193300007

    View details for PubMedID 11891116

  • Carbohydrate analysis prepares to enter the "omics" era CHEMISTRY & BIOLOGY Bidlingmaier, S., Snyder, M. 2002; 9 (4): 400-401

    Abstract

    In this issue, Houseman and Mrksich describe a carbohydrate array preparation method that can be used to analyze protein-carbohydrate interactions and to characterize the substrate specificity of a carbohydrate-modifying enzyme. Carbohydrate chips were prepared by a novel procedure that allows the covalent attachment of carbohydrate-diene conjugates to a specially engineered monolayer surface. The surface presents a precisely controllable ratio of reactive benzoquinone and inert ethylene glycol groups. Nonspecific adsorption of proteins to the surface is extremely low, and the surface is compatible with popular detection techniques. The immobilization technique was demonstrated to be compatible with recently developed automated solid phase carbohydrate synthesis methods, paving the way for the development of highly complex carbohydrate arrays.

    View details for Web of Science ID 000175379100002

    View details for PubMedID 11983329

  • Subcellular localization of the yeast proteome GENES & DEVELOPMENT Kumar, A., Agarwal, S., Heyman, J. A., Matson, S., Heidtman, M., Piccirillo, S., Umansky, L., Drawid, A., Jansen, R., Liu, Y., Cheung, K. H., Miller, P., Gerstein, M., Roeder, G. S., Snyder, M. 2002; 16 (6): 707-719

    Abstract

    Protein localization data are a valuable information resource helpful in elucidating eukaryotic protein function. Here, we report the first proteome-scale analysis of protein localization within any eukaryote. Using directed topoisomerase I-mediated cloning strategies and genome-wide transposon mutagenesis, we have epitope-tagged 60% of the Saccharomyces cerevisiae proteome. By high-throughput immunolocalization of tagged gene products, we have determined the subcellular localization of 2744 yeast proteins. Extrapolating these data through a computational algorithm employing Bayesian formalism, we define the yeast localizome (the subcellular distribution of all 6100 yeast proteins). We estimate the yeast proteome to encompass approximately 5100 soluble proteins and >1000 transmembrane proteins. Our results indicate that 47% of yeast proteins are cytoplasmic, 13% mitochondrial, 13% exocytic (including proteins of the endoplasmic reticulum and secretory vesicles), and 27% nuclear/nucleolar. A subset of nuclear proteins was further analyzed by immunolocalization using surface-spread preparations of meiotic chromosomes. Of these proteins, 38% were found associated with chromosomal DNA. As determined from phenotypic analyses of nuclear proteins, 34% are essential for spore viability--a percentage nearly twice as great as that observed for the proteome as a whole. In total, this study presents experimentally derived localization data for 955 proteins of previously unknown function: nearly half of all functionally uncharacterized proteins in yeast. To facilitate access to these data, we provide a searchable database featuring 2900 fluorescent micrographs at http://ygac.med.yale.edu.

    View details for DOI 10.1101/gad.970902

    View details for Web of Science ID 000174516500007

    View details for PubMedID 11914276

  • GATA-1 binding sites mapped in the beta-globin locus by using mammalian chlp-chip analysis PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Horak, C. E., Mahajan, M. C., Luscombe, N. M., Gerstein, M., Weissman, S. M., Snyder, M. 2002; 99 (5): 2924-2929

    Abstract

    The expression of the beta-like globin genes is intricately regulated by a series of both general and tissue-restricted transcription factors. The hemapoietic lineage-specific transcription factor GATA-1 is important for erythroid differentiation and has been implicated in regulating the expression of the erythroid-specific genes including the genes of the beta-globin locus. In the human erythroleukemic K562 cell line, only one DNA region has been identified previously as a putative site of GATA-1 interaction by in vivo footprinting studies. We mapped GATA-1 binding throughout the beta-globin locus by using chIp-chip analysis of K562 cells. We found that GATA-1 binds in a region encompassing the HS2 core element, as was previously identified, and an additional region of GATA-1 binding upstream of the gammaG gene. This approach will be of general utility for mapping transcription factor binding sites within the beta-globin locus and throughout the genome.

    View details for DOI 10.1073/pnas.052706999

    View details for Web of Science ID 000174284600061

    View details for PubMedID 11867748

  • A question of size: the eukaryotic proteome and the problems in defining it NUCLEIC ACIDS RESEARCH Harrison, P. M., Kumar, A., Lang, N., Snyder, M., Gerstein, M. 2002; 30 (5): 1083-1090

    Abstract

    We discuss the problems in defining the extent of the proteomes for completely sequenced eukaryotic organisms (i.e. the total number of protein-coding sequences), focusing on yeast, worm, fly and human. (i) Six years after completion of its genome sequence, the true size of the yeast proteome is still not defined. New small genes are still being discovered, and a large number of existing annotations are being called into question, with these questionable ORFs (qORFs) comprising up to one-fifth of the 'current' proteome. We discuss these in the context of an ideal genome-annotation strategy that considers the proteome as a rigorously defined subset of all possible coding sequences ('the orfome'). (ii) Despite the greater apparent complexity of the fly (more cells, more complex physiology, longer lifespan), the nematode worm appears to have more genes. To explain this, we compare the annotated proteomes of worm and fly, relating to both genome-annotation and genome evolution issues. (iii) The unexpectedly small size of the gene complement estimated for the complete human genome provoked much public debate about the nature of biological complexity. However, in the first instance, for the human genome, the relationship between gene number and proteome size is far from simple. We survey the current estimates for the numbers of human genes and, from this, we estimate a range for the size of the human proteome. The determination of this is substantially hampered by the unknown extent of the cohort of pseudogenes ('dead' genes), in combination with the prevalence of alternative splicing. (Further information relating to yeast is available at http://genecensus.org/yeast/orfome)

    View details for Web of Science ID 000174229900001

    View details for PubMedID 11861898

  • A small reservoir of disabled ORFs in the yeast genome and its implications for the dynamics of proteome evolution JOURNAL OF MOLECULAR BIOLOGY Harrison, P., Kumar, A., Lan, N., Echols, N., Snyder, M., Gerstein, M. 2002; 316 (3): 409-419

    Abstract

    We surveyed the sequenced Saccharomyces cerevisiae genome (strain S288C) comprehensively for open reading frames (ORFs) that could encode full-length proteins but contain obvious mid-sequence disablements (frameshifts or premature stop codons). These pseudogenic features are termed disabled ORFs (dORFs). Using homology to annotated yeast ORFs and non-yeast proteins plus a simple region extension procedure, we have found 183 dORFs. Combined with the 38 existing annotations for potential dORFs, we have a total pool of up to 221 dORFs, corresponding to less than approximately 3% of the proteome. Additionally, we found 20 pairs of annotated ORFs for yeast that could be merged into a single ORF (termed a mORF) by read-through of the intervening stop codon, and may comprise a complete ORF in other yeast strains. Focussing on a core pool of 98 dORFs with a verifying protein homology, we find that most dORFs are substantially decayed, with approximately 90% having two or more disablements, and approximately 60% having four or more. dORFs are much more yeast-proteome specific than live yeast genes (having about half the chance that they are related to a non-yeast protein). They show a dramatically increased density at the telomeres of chromosomes, relative to genes. A microarray study shows that some dORFs are expressed even though they carry multiple disablements, and thus may be more resistant to nonsense-mediated decay. Many of the dORFs may be involved in responding to environmental stresses, as the largest functional groups include growth inhibition, flocculation, and the SRP/TIP1 family. Our results have important implications for proteome evolution. The characteristics of the dORF population suggest the sorts of genes that are likely to fall in and out of usage (and vary in copy number) in a strain-specific way and highlight the role of subtelomeric regions in engendering this diversity. Our results also have important implications for the effects of the [PSI+] prion. The dORFs disabled by only a single stop and the mORFs (together totalling 35) provide an estimate for the extent of the sequence population that can be resurrected readily through the demonstrated ability of the [PSI+] prion to cause nonsense-codon read-through. Also, the dORFs and mORFs that we find have properties (e.g. growth inhibition, flocculation, vanadate resistance, stress response) that are potentially related to the ability of [PSI+] to engender substantial phenotypic variation in yeast strains under different environmental conditions. (See genecensus.org/pseudogene for further information.)

    View details for DOI 10.1006/jmbi.2001.5343

    View details for Web of Science ID 000174216400001

    View details for PubMedID 11866506

  • An integrated approach for finding overlooked genes in yeast NATURE BIOTECHNOLOGY Kumar, A., Harrison, P. M., Cheung, K. H., Lan, N., Echols, N., Bertone, P., Miller, P., Gerstein, M. B., Snyder, M. 2002; 20 (1): 58-63

    Abstract

    We report here the discovery of 137 previously unappreciated genes in yeast through a widely applicable and highly scalable approach integrating methods of gene-trapping, microarray-based expression analysis, and genome-wide homology searching. Our approach is a multistep process in which expressed sequences are first trapped using a modified transposon that produces protein fusions to beta-galactosidase (beta-gal); non-annotated open reading frames (ORFs) translated as beta-gal chimeras are selected as a candidate pool of potential genes. To verify expression of these sequences, labeled RNA is hybridized against a microarray of oligonucleotides designed to detect gene transcripts in a strand-specific manner. In complement to this experimental method, novel genes are also identified in silico by homology to previously annotated proteins. As these methods are capable of identifying both short ORFs and antisense ORFs, our approach provides an effective supplement to current gene-finding schemes. In total, the genes discovered using this approach constitute 2% of the yeast genome and represent a wealth of overlooked biology.

    View details for Web of Science ID 000173031600037

    View details for PubMedID 11753363

  • Insertional mutagenesis: Transposon-insertion libraries as mutagens in yeast GUIDE TO YEAST GENETICS AND MOLECULAR AND CELL BIOLOGY, PT B Kumar, A., Vidan, S., Snyder, M. 2002; 350: 219-229

    View details for Web of Science ID 000176466300012

    View details for PubMedID 12073314

  • ChIP-chip: A genomic approach for identifying transcription factor binding sites GUIDE TO YEAST GENETICS AND MOLECULAR AND CELL BIOLOGY, PT B Horak, C. E., Snyder, M. 2002; 350: 469-483

    View details for Web of Science ID 000176466300026

    View details for PubMedID 12073330

  • The TRIPLES database: a community resource for yeast molecular biology NUCLEIC ACIDS RESEARCH Kumar, A., Cheung, K. H., Tosches, N., Masiar, P., Liu, Y., Miller, P., Snyder, M. 2002; 30 (1): 73-75

    Abstract

    TRIPLES is a web-accessible database of TRansposon-Insertion Phenotypes, Localization and Expression in Saccharomyces cerevisiae-a relational database housing nearly half a million data points generated from an ongoing study using large-scale transposon mutagenesis to characterize gene function in yeast. At present, TRIPLES contains three principal data sets (i.e. phenotypic data, protein localization data and expression data) for over 3500 annotated yeast genes as well as several hundred non-annotated open reading frames. In addition, the TRIPLES web site provides online order forms linked to each data set so that users may request any strain or reagent generated from this project free of charge. In response to user requests, the TRIPLES web site has undergone several recent modifications. Our localization data have been supplemented with approximately 500 fluorescent micrographs depicting actual staining patterns observed upon indirect immunofluorescence analysis of indicated epitope-tagged proteins. These localization data, as well as all other data sets within TRIPLES, are now available in full as tab-delimited text. To accommodate increased reagent requests, all orders are now cataloged in a separate database, and users are notified immediately of order receipt and shipment. Also, TRIPLES is one of five sites incorporated into the new functional analysis tool Function Junction provided by the Saccharomyces Genome Database. TRIPLES may be accessed from the Yale Genome Analysis Center (YGAC) homepage at http://ygac.med.yale.edu.

    View details for Web of Science ID 000173077100018

    View details for PubMedID 11752258

  • YMD: A microarray database for large-scale gene expression analysis AMIA 2002 SYMPOSIUM, PROCEEDINGS Cheung, K. H., White, K., Hager, J., Gerstein, M., Reinke, V., Nelson, K., Masiar, P., Srivastava, R., Li, Y. L., Li, J., Zhao, H. Y., Li, J. M., Allison, D. B., Snyder, M., Miller, P., Williams, K. 2002: 140-144

    Abstract

    The use of microarray technology to perform parallel analysis of the expression pattern of a large number of genes in a single experiment has created a new frontier of medical research. The vast amount of gene expression data generated from multiple microarray experiments requires a robust database system that allows efficient data storage, retrieval, secure access, data dissemination, and integrated data analyses. To address the growing needs of microarray researchers at Yale and their collaborators, we have built the Yale Microarray Database (YMD). YMD is Web-accessible with the following features: (i) a Web program that tracks DNA samples between source plates and arrays, (ii) the capability of finding common genes/clones across different array platforms, (iii) an image file server, (iv) laboratory-based user management and access privileges, (v) project management, (vi) template data entry, (vii) linking gene expression data to annotation databases for functional analysis. YMD is currently being used on a pilot basis by several laboratories for different organisms and array platforms.

    View details for Web of Science ID 000189418100029

    View details for PubMedID 12463803

  • Phosphorylation of gamma-tubulin regulates microtubule organization in budding yeast DEVELOPMENTAL CELL Vogel, J., Drapkin, B., Oomen, J., Beach, D., Bloom, K., Snyder, M. 2001; 1 (5): 621-631

    Abstract

    gamma-Tubulin is essential for microtubule nucleation in yeast and other organisms; whether this protein is regulated in vivo has not been explored. We show that the budding yeast gamma-tubulin (Tub4p) is phosphorylated in vivo. Hyperphosphorylated Tub4p isoforms are restricted to G1. A conserved tyrosine near the carboxy terminus (Tyr445) is required for phosphorylation in vivo. A point mutation, Tyr445 to Asp, causes cells to arrest prior to anaphase. The frequency of new microtubules appearing in the SPB region and the number of microtubules are increased in tub4-Y445D cells, suggesting this mutation promotes microtubule assembly. These data suggest that modification of gamma-tubulin is important for controlling microtubule number, thereby influencing microtubule organization and function during the yeast cell cycle.

    View details for Web of Science ID 000175301700008

    View details for PubMedID 11709183

  • A filamentous growth response mediated by the yeast mating pathway GENETICS Erdman, S., Snyder, M. 2001; 159 (3): 919-928

    Abstract

    Haploid cells of the budding yeast Saccharomyces cerevisiae respond to mating pheromones by arresting their cell-division cycle in G1 and differentiating into a cell type capable of locating and fusing with mating partners. Yeast cells undergo chemotactic cell surface growth when pheromones are present above a threshold level for morphogenesis; however, the morphogenetic responses of cells to levels of pheromone below this threshold have not been systematically explored. Here we show that MATa haploid cells exposed to low levels of the alpha-factor mating pheromone undergo a novel cellular response: cells modulate their division patterns and cell shape, forming colonies composed of filamentous chains of cells. Time-lapse analysis of filament formation shows that its dynamics are distinct from that of pseudohyphal growth; during pheromone-induced filament formation, daughter cells are delayed relative to mother cells with respect to the timing of bud emergence. Filament formation requires the RSR1(BUD1), BUD8, SLK1/BCK1, and SPA2 genes and many elements of the STE11/STE7 MAP kinase pathway; this response is also independent of FAR1, a gene involved in orienting cell polarization during the mating response. We suggest that mating yeast cells undergo a complex response to low levels of pheromone that may enhance the ability of cells to search for mating partners through the modification of cell shape and alteration of cell-division patterns.

    View details for Web of Science ID 000172665800002

    View details for PubMedID 11729141

  • Global analysis of protein activities using proteome chips SCIENCE Zhu, H., Bilgin, M., Bangham, R., Hall, D., Casamayor, A., Bertone, P., Lan, N., Jansen, R., Bidlingmaier, S., Houfek, T., Mitchell, T., Miller, P., Dean, R. A., Gerstein, M., Snyder, M. 2001; 293 (5537): 2101-2105

    Abstract

    To facilitate studies of the yeast proteome, we cloned 5800 open reading frames and overexpressed and purified their corresponding proteins. The proteins were printed onto slides at high spatial density to form a yeast proteome microarray and screened for their ability to interact with proteins and phospholipids. We identified many new calmodulin- and phospholipid-interacting proteins; a common potential binding motif was identified for many of the calmodulin-binding proteins. Thus, microarrays of an entire eukaryotic proteome can be prepared and screened for diverse biochemical activities. The microarrays can also be used to screen protein-drug interactions and to detect posttranslational modifications.

    View details for Web of Science ID 000171028700077

    View details for PubMedID 11474067

  • A genomic study of the bipolar bud site selection pattern in Saccharomyces cerevisiae MOLECULAR BIOLOGY OF THE CELL Ni, L., Snyder, M. 2001; 12 (7): 2147-2170

    Abstract

    A genome-wide screen of 4168 homozygous diploid yeast deletion strains has been performed to identify nonessential genes that participate in the bipolar budding pattern. By examining bud scar patterns representing the sites of previous cell divisions, 127 mutants representing three different phenotypes were found: unipolar, axial-like, and random. From this screen, 11 functional classes of known genes were identified, including those involved in actin-cytoskeleton organization, general bud site selection, cell polarity, vesicular transport, cell wall synthesis, protein modification, transcription, nuclear function, translation, and other functions. Four characterized genes that were not known previously to participate in bud site selection were also found to be important for the haploid axial budding pattern. In addition to known genes, we found 22 novel genes (20 are designated BUD13-BUD32) important for bud site selection. Deletion of one resulted in unipolar budding exclusively from the proximal pole, suggesting that this gene plays an important role in diploid distal budding. Mutations in 20 other novel BUD genes produced a random budding phenotype and one produced an axial-like budding defect. Several of the novel Bud proteins were fused to green fluorescence protein; two proteins were found to localize to sites of polarized cell growth (i.e., the bud tip in small budded cells and the neck in cells undergoing cytokinesis), similar to that postulated for the bipolar signals and proteins that target cell division site tags to their proper location in the cell. Four others localized to the nucleus, suggesting that they play a role in gene expression. The bipolar distal marker Bud8 was localized in a number of mutants; many showed an altered Bud8-green fluorescence protein localization pattern. Through the genome-wide identification and analysis of different mutants involved in bipolar bud site selection, an integrated pathway for this process is presented in which proximal and distal bud site selection tags are synthesized and localized at their appropriate poles, thereby directing growth at those sites. Genome-wide screens of defined collections of mutants hold significant promise for dissecting many biological processes in yeast.

    View details for Web of Science ID 000170350300019

    View details for PubMedID 11452010

  • Genome-wide transposon mutagenesis in yeast. Current protocols in molecular biology / edited by Frederick M. Ausubel ... [et al.] Kumar, A., Snyder, M. 2001; Chapter 13: Unit13 3-?

    Abstract

    This unit provides comprehensive protocols for the use of insertional libraries generated by shuttle mutagenesis. From the basic protocol, a small aliquot of insertional library DNA may be used to mutagenize yeast, producing strains containing a single transposon insertion within a transcribed and translated region of the genome. This transposon-mutagenized bank of yeast strains may be screened for any desired mutant phenotype. Alternatively, since the transposon contains a reporter gene lacking its start codon and promoter, transposon-tagged strains may also be screened for specific patterns of gene expression. Strains of interest may be characterized by vectorette PCR (protocol provided) in order to locate the precise genomic site of transposon insertion within each mutant. A method by which Cre/lox recombination may be used to reduce the transposon in yeast to a small insertion element encoding an epitope tag is described. This tag serves as a tool by which transposon-mutagenized gene products may be analyzed further (e.g., localized to a discrete subcellular site).

    View details for DOI 10.1002/0471142727.mb1303s51

    View details for PubMedID 18265099

  • Emerging technologies in yeast genomics NATURE REVIEWS GENETICS Kumar, A., Snyder, M. 2001; 2 (4): 302-312

    Abstract

    The genomic revolution is undeniable: in the past year alone, the term 'genomics' was found in nearly 500 research articles, and at least 6 journals are devoted solely to genomic biology. More than just a buzzword, molecular biology has genuinely embraced genomics (the systematic, large-scale study of genomes and their functions). With its facile genetics, the budding yeast Saccharomyces cerevisiae has emerged as an important model organism in the development of many current genomic methodologies. These techniques have greatly influenced the manner in which biology is studied in yeast and in other organisms. In this review, we summarize the most promising technologies in yeast genomics.

    View details for Web of Science ID 000167837900015

    View details for PubMedID 11283702

  • The Cbk1p pathway is important for polarized cell growth and cell separation in Saccharomyces cerevisiae MOLECULAR AND CELLULAR BIOLOGY Bidlingmaier, S., Weiss, E. L., Seidel, C., Drubin, D. G., Snyder, M. 2001; 21 (7): 2449-2462

    Abstract

    During the early stages of budding, cell wall remodeling and polarized secretion are concentrated at the bud tip (apical growth). The CBK1 gene, encoding a putative serine/threonine protein kinase, was identified in a screen designed to isolate mutations that affect apical growth. Analysis of cbk1Delta cells reveals that Cbk1p is required for efficient apical growth, proper mating projection morphology, bipolar bud site selection in diploid cells, and cell separation. Epitope-tagged Cbk1p localizes to both sides of the bud neck in late anaphase, just prior to cell separation. CBK1 and another gene, HYM1, were previously identified in a screen for genes involved in transcriptional repression and proposed to function in the same pathway. Deletion of HYM1 causes phenotypes similar to those observed in cbk1Delta cells and disrupts the bud neck localization of Cbk1p. Whole-genome transcriptional analysis of cbk1Delta suggests that the kinase regulates the expression of a number of genes with cell wall-related functions, including two genes required for efficient cell separation: the chitinase-encoding gene CTS1 and the glucanase-encoding gene SCW11. The Ace2p transcription factor is required for expression of CTS1 and has been shown to physically interact with Cbk1p. Analysis of ace2Delta cells reveals that Ace2p is required for cell separation but not for polarized growth. Our results suggest that Cbk1p and Hym1p function to regulate two distinct cell morphogenesis pathways: an ACE2-independent pathway that is required for efficient apical growth and mating projection formation and an ACE2-dependent pathway that is required for efficient cell separation following cytokinesis. Cbk1p is most closely related to the Neurospora crassa Cot-1; Schizosaccharomyces pombe Orb6; Caenorhabditis elegans, Drosophila, and human Ndr; and Drosophila and mammalian WARTS/LATS kinases. Many Cbk1-related kinases have been shown to regulate cellular morphology.

    View details for Web of Science ID 000167451500019

    View details for PubMedID 11259593

  • Protein arrays and microarrays CURRENT OPINION IN CHEMICAL BIOLOGY Zhu, H., Snyder, M. 2001; 5 (1): 40-45

    Abstract

    In the past, studies of protein activities have focused on studying a single protein at a time, which is often time-consuming and expensive. Recently, with the sequencing of entire genomes, large-scale proteome analysis has begun. Arrays of proteins have been used for the determination of subcellular localization, analysis of protein-protein interactions and biochemical analysis of protein function. New protein-microarray technologies have been introduced that enable the high-throughput analysis of protein activities. These have the potential to revolutionize the analysis of entire proteomes.

    View details for Web of Science ID 000167051500006

    View details for PubMedID 11166646

  • Large-scale mutagenesis: yeast genetics in the genome era CURRENT OPINION IN BIOTECHNOLOGY Vidan, S., Snyder, M. 2001; 12 (1): 28-34

    Abstract

    The completion of the DNA sequence of the budding yeast Saccharomyces cerevisiae resulted in the identification of a large number of genes. However, the function of most of these genes is not known. One of the best ways to determine gene function is to carry out mutational and phenotypic analysis. In recent years, several approaches have been developed for the mutational analysis of yeast genes on a large scale. These include transposon-based insertional mutagenesis, and systematic deletions using PCR-based approaches. These projects have produced collections of yeast strains and plasmid alleles that can be screened using novel approaches. Analysis of these collections by the scientific community promises to reveal a great deal of biological information about this organism.

    View details for Web of Science ID 000167209900005

    View details for PubMedID 11167069

  • The carboxy terminus of Tub4p is required for gamma-tubulin function in budding yeast JOURNAL OF CELL SCIENCE Vogel, J., Snyder, M. 2000; 113 (21): 3871-3882

    Abstract

    The role of gamma-tubulin in microtubule nucleation is well established, however, its function in other aspects of microtubule organization is unknown. The carboxy termini of alpha/beta-tubulins influence the assembly and stability of microtubules. We investigated the role of the carboxy terminus of yeast gamma-tubulin (Tub4p) in microtubule organization. This region consists of a conserved domain (DSYLD), and acidic tail. Cells expressing truncations lacking the DSYLD domain, tail or both regions are temperature sensitive for growth. Growth defects of tub4 mutants lacking either or both carboxy-terminal domains are suppressed by the microtubule destabilizing drug benomyl. tub4 carboxy-terminal mutants arrest as large budded cells with short bipolar spindles positioned at the bud neck. Electron microscopic analysis of wild-type and CTR mutant cells reveals that SPBs are tightly associated with the bud neck/cortex by cytoplasmic microtubules in mutants lacking the tail region (tub4-delta 444, tub4-delta 448). Mutants lacking the DSYLD residues (tub4-delta 444, tub4-delta DSYLD) form many cytoplasmic microtubules. We propose that the carboxy terminus of Tub4p is required for re-organization of the microtubules upon completion of nuclear migration, and facilitates spindle elongation into the bud.

    View details for Web of Science ID 000165515000019

    View details for PubMedID 11034914

  • Analysis of yeast protein kinases using protein chips NATURE GENETICS Zhu, H., Klemic, J. F., Chang, S., Bertone, P., Casamayor, A., Klemic, K. G., Smith, D., Gerstein, M., Reed, M. A., Snyder, M. 2000; 26 (3): 283-289

    Abstract

    We have developed a novel protein chip technology that allows the high-throughput analysis of biochemical activities, and used this approach to analyse nearly all of the protein kinases from Saccharomyces cerevisiae. Protein chips are disposable arrays of microwells in silicone elastomer sheets placed on top of microscope slides. The high density and small size of the wells allows for high-throughput batch processing and simultaneous analysis of many individual samples. Only small amounts of protein are required. Of 122 known and predicted yeast protein kinases, 119 were overexpressed and analysed using 17 different substrates and protein chips. We found many novel activities and that a large number of protein kinases are capable of phosphorylating tyrosine. The tyrosine phosphorylating enzymes often share common amino acid residues that lie near the catalytic region. Thus, our study identified a number of novel features of protein kinases and demonstrates that protein chip technology is useful for high-throughput screening of protein biochemical activity.

    View details for Web of Science ID 000165176500015

    View details for PubMedID 11062466

  • Polarized growth controls cell shape and bipolar bud site selection in Saccharomyces cerevisiae MOLECULAR AND CELLULAR BIOLOGY Sheu, Y. J., Barral, Y., Snyder, M. 2000; 20 (14): 5235-5247

    Abstract

    We examined the relationship between polarized growth and division site selection, two fundamental processes important for proper development of eukaryotes. Diploid Saccharomyces cerevisiae cells exhibit an ellipsoidal shape and a specific division pattern (a bipolar budding pattern). We found that the polarity genes SPA2, PEA2, BUD6, and BNI1 participate in a crucial step of bud morphogenesis, apical growth. Deleting these genes results in round cells and diminishes bud elongation in mutants that exhibit pronounced apical growth. Examination of distribution of the polarized secretion marker Sec4 demonstrates that spa2Delta, pea2Delta, bud6Delta, and bni1Delta mutants fail to concentrate Sec4 at the bud tip during apical growth and at the division site during repolarization just prior to cytokinesis. Moreover, cell surface expansion is not confined to the distal tip of the bud in these mutants. In addition, we found that the p21-activated kinase homologue Ste20 is also important for both apical growth and bipolar bud site selection. We further examined how the duration of polarized growth affects bipolar bud site selection by using mutations in cell cycle regulators that control the timing of growth phases. The grr1Delta mutation enhances apical growth by stabilizing G(1) cyclins and increases the distal-pole budding in diploids. Prolonging polarized growth phases by disrupting the G(2)/M cyclin gene CLB2 enhances the accuracy of bud site selection in wild-type, spa2Delta, and ste20Delta cells, whereas shortening the polarized growth phases by deleting SWE1 decreases the fidelity of bipolar budding. This study reports the identification of components required for apical growth and demonstrates the critical role of polarized growth in bipolar bud site selection. We propose that apical growth and repolarization at the site of cytokinesis are crucial for establishing spatial cues used by diploid yeast cells to position division planes.

    View details for Web of Science ID 000087820000027

    View details for PubMedID 10866679

  • The Kar3p kinesin-related protein forms a novel heterodimeric structure with its associated protein Cik1p MOLECULAR BIOLOGY OF THE CELL Barrett, J. G., Manning, B. D., Snyder, M. 2000; 11 (7): 2373-2385

    Abstract

    Proteins that physically associate with members of the kinesin superfamily are critical for the functional diversity observed for these microtubule motor proteins. However, quaternary structures of complexes between kinesins and kinesin-associated proteins are poorly defined. We have analyzed the nature of the interaction between the Kar3 motor protein, a minus-end-directed kinesin from yeast, and its associated protein Cik1. Extraction experiments demonstrate that Kar3p and Cik1p are tightly associated. Mapping of the interaction domains of the two proteins by two-hybrid analyses indicates that Kar3p and Cik1p associate in a highly specific manner along the lengths of their respective coiled-coil domains. Sucrose gradient velocity centrifugation and gel filtration experiments were used to determine the size of the Kar3-Cik1 complex from both mating pheromone-treated cells and vegetatively growing cells. These experiments predict a size for this complex that is consistent with that of a heterodimer containing one Kar3p subunit and one Cik1p subunit. Finally, immunoprecipitation of epitope-tagged and untagged proteins confirms that only one subunit of Kar3p and Cik1p are present in the Kar3-Cik1 complex. These findings demonstrate that the Kar3-Cik1 complex has a novel heterodimeric structure not observed previously for kinesin complexes.

    View details for Web of Science ID 000088184800016

    View details for PubMedID 10888675

  • Drivers and passengers wanted! The role of kinesin-associated proteins TRENDS IN CELL BIOLOGY Manning, B. D., Snyder, M. 2000; 10 (7): 281-289

    Abstract

    Members of the kinesin superfamily of proteins participate in a wide variety of cellular processes. Although much attention has been devoted to the structural and biophysical properties of the force-generating motor domain of kinesins, the factors controlling the functional specificity of each kinesin have only recently been examined. Genetic and biochemical approaches have identified two classes of proteins that associate physically with the diverse non-motor domains of kinesins. These proteins can be divided into two general classes: first, those that form tight complexes with the kinesin and are instrumental in directing the distinct function of the motor (i.e. drivers) and, second, those proteins that might transiently interact with the motor or be an integral part of the motor's cargo (i.e. passengers). Here, we discuss known kinesin-binding proteins, and how they might participate in the activity of their motor partners.

    View details for Web of Science ID 000087769300004

    View details for PubMedID 10856931

  • Genome-wide mutant collections: toolboxes for functional genomics CURRENT OPINION IN MICROBIOLOGY Coelho, P. S., Kumar, A., Snyder, M. 2000; 3 (3): 309-315

    Abstract

    The sequencing of entire genomes has led to the identification of many genes. A future challenge will be to determine the function of all of the genes of an organism. One of the best ways to ascertain function is to disrupt genes and determine the phenotype of the resulting organism. Novel large-scale approaches for generating gene disruptions and analyzing the resulting phenotype are underway in the budding yeast Saccharomyces cerevisiae and other organisms including flies, Mycoplasma, worms, plants and mice. These approaches and mutant collections will be extremely valuable to the scientific community and will dramatically alter the manner in which science is performed in the future.

    View details for Web of Science ID 000087635200015

    View details for PubMedID 10851164

  • An integrated web interface for large-scale characterization of sequence data. Functional & integrative genomics Cheung, K. H., Kumar, A., Snyder, M., Miller, P. 2000; 1 (1): 70-75

    Abstract

    Large-scale genome projects require the analysis of large amounts of raw data. This analysis often involves the application of a chain of biology-based programs. Many of these programs are difficult to operate because they are non-integrated, command-line driven, and platform-dependent. The problem is compounded when the number of data files involved is large, making navigation and status-tracking difficult. To demonstrate how this problem can be addressed, we have created a platform-independent Web front end that integrates a set of programs used in a genomic project analyzing gene function by transposon mutagenesis in Saccharomyces cerevisiae. In particular, these programs help define a large number of transposon insertion events within the yeast genome, identifying both the precise site of transposon insertion as well as potential open reading frames disrupted by this insertion event. Our Web interface facilitates this analysis by performing the following tasks. Firstly, it allows each of the analysis programs to be launched against multiple directories of data files. Secondly, it allows the user to view, download, and upload files generated by the programs. Thirdly, it indicates which sets of data directories have been processed by each program. Although designed specifically to aid in this project, our interface exemplifies a general approach by which independent software programs may be integrated into an efficient protocol for large-scale genomic data processing.

    View details for PubMedID 11793223

  • Compartmentalization of the cell cortex by septins is required for maintenance of cell polarity in yeast MOLECULAR CELL Barral, Y., Mermall, V., Mooseker, M. S., Snyder, M. 2000; 5 (5): 841-851

    Abstract

    Formation and maintenance of specialized plasma membrane domains are crucial for many biological processes, such as cell polarization and signaling. During isotropic bud growth, the yeast cell periphery is divided into two domains: the bud surface, an active site of exocytosis and growth, and the relatively quiescent surface of the mother cell. We found that cells lacking septins at the bud neck failed to maintain the exocytosis and morphogenesis factors Spa2, Sec3, Sec5, and Myo2 in the bud during isotropic growth. Furthermore, we found that septins were required for proper regulation of actin patch stability; septin-defective cells permitted to enter isotropic growth lost actin and growth polarity. We propose that septins maintain cell polarity by specifying a boundary between cortical domains.

    View details for Web of Science ID 000087332500008

    View details for PubMedID 10882120

  • Sbe2p and Sbe22p, two homologous Golgi proteins involved in yeast cell wall formation MOLECULAR BIOLOGY OF THE CELL Santos, B., Snyder, M. 2000; 11 (2): 435-452

    Abstract

    The cell wall of fungal cells is important for cell integrity and cell morphogenesis and protects against harmful environmental conditions. The yeast cell wall is a complex structure consisting mainly of mannoproteins, glucan, and chitin. The molecular mechanisms by which the cell wall components are synthesized and transported to the cell surface are poorly understood. We have identified and characterized two homologous yeast proteins, Sbe2p and Sbe22p, through their suppression of a chs5 spa2 mutant strain defective in chitin synthesis and cell morphogenesis. Although sbe2 and sbe22 null mutants are viable, sbe2 sbe22 cells display several phenotypes indicative of defects in cell integrity and cell wall structure. First, sbe2 sbe22 cells display a sorbitol-remediable lysis defect at 37 degrees C and are hypersensitive to SDS and calcofluor. Second, electron microscopic analysis reveals that sbe2 sbe22 cells have an aberrant cell wall structure with a reduced mannoprotein layer. Finally, immunofluorescence experiments reveal that in small-budded cells, sbe2 sbe22 mutants mislocalize Chs3p, a protein involved in chitin synthesis. In addition, sbe2 sbe22 diploids have a bud-site selection defect, displaying a random budding pattern. A Sbe2p-GFP fusion protein localizes to cytoplasmic patches, and Sbe2p cofractionates with Golgi proteins. Deletion of CHS5, which encodes a Golgi protein involved in the transport of Chs3p to the cell periphery, is lethal in combination with disruption of SBE2 and SBE22. Thus, we suggest a model in which Sbe2p and Sbe22p are involved in the transport of cell wall components from the Golgi apparatus to the cell surface periphery in a pathway independent of Chs5p.

    View details for Web of Science ID 000085478500003

    View details for PubMedID 10679005

  • TRIPLES: a database of gene function in Saccharomyces cerevisiae NUCLEIC ACIDS RESEARCH Kumar, A., Cheung, K. H., Ross-Macdonald, P., Coelho, P. S., Miller, P., Snyder, M. 2000; 28 (1): 81-84

    Abstract

    Using a novel multipurpose mini-transposon, we have generated a collection of defined mutant alleles for the analysis of disruption phenotypes, protein localization, and gene expression in Saccharomyces cerevisiae. To catalog this unique data set, we have developed TRIPLES, a Web-accessible database of TRansposon-Insertion Phenotypes, Localization and Expression in Saccharomyces. Encompassing over 250 000 data points, TRIPLES provides convenient access to information from nearly 7800 transposon-mutagenized yeast strains; within TRIPLES, complete data reports of each strain may be viewed in table format, or if desired, downloaded as tab-delimited text files. Each report contains external links to corresponding entries within the Saccharomyces Genome Database and International Nucleic Acid Sequence Data Library (GenBank). Unlike other yeast databases, TRIPLES also provides on-line order forms linked to each clone report; users may immediately request any desired strain free-of-charge by submitting a completed form. In addition to presenting a wealth of information for over 2300 open reading frames, TRIPLES constitutes an important medium for the distribution of useful reagents throughout the yeast scientific community. Maintained by the Yale Genome Analysis Center, TRIPLES may be accessed at http://ycmi.med.yale.edu/ygac/triples.htm

    View details for Web of Science ID 000084896300021

    View details for PubMedID 10592187

  • gamma-tubulin of budding yeast CENTROSOME IN CELL REPLICATION AND EARLY DEVELOPMENT Vogel, J., Snyder, M. 2000; 49: 75-104

    View details for Web of Science ID 000165501500004

    View details for PubMedID 11005015

  • High-throughput methods for the large-scale analysis of gene function by transposon tagging APPLICATIONS OF CHIMERIC GENES AND HYBRID PROTEINS, PT C Kumar, A., Des Etages, S. A., Coelho, P. S., Roeder, G. S., Snyder, M. 2000; 328: 550-574

    View details for Web of Science ID 000166565300033

    View details for PubMedID 11075366

  • Large-scale analysis of the yeast genome by transposon tagging and gene disruption NATURE Ross-Macdonald, P., Coelho, P. S., Roemer, T., Agarwal, S., Kumar, A., Jansen, R., Cheung, K. H., Sheehan, A., Symoniatis, D., Umansky, L., Heldtman, M., Nelson, F. K., Iwasaki, H., Hager, K., Gerstein, M., Miller, P., Roeder, G. S., Snyder, M. 1999; 402 (6760): 413-418

    Abstract

    Economical methods by which gene function may be analysed on a genomic scale are relatively scarce. To fill this need, we have developed a transposon-tagging strategy for the genome-wide analysis of disruption phenotypes, gene expression and protein localization, and have applied this method to the large-scale analysis of gene function in the budding yeast Saccharomyces cerevisiae. Here we present the largest collection of defined yeast mutants ever generated within a single genetic background--a collection of over 11,000 strains, each carrying a transposon inserted within a region of the genome expressed during vegetative growth and/or sporulation. These insertions affect nearly 2,000 annotated genes, representing about one-third of the 6,200 predicted genes in the yeast genome. We have used this collection to determine disruption phenotypes for nearly 8,000 strains using 20 different growth conditions; the resulting data sets were clustered to identify groups of functionally related genes. We have also identified over 300 previously non-annotated open reading frames and analysed by indirect immunofluorescence over 1,300 transposon-tagged proteins. In total, our study encompasses over 260,000 data points, constituting the largest functional analysis of the yeast genome ever undertaken.

    View details for Web of Science ID 000083913600057

    View details for PubMedID 10586881

  • Functional characterization of the S-cerevisiae genome by gene deletion and parallel analysis SCIENCE Winzeler, E. A., Shoemaker, D. D., Astromoff, A., Liang, H., Anderson, K., Andre, B., Bangham, R., Benito, R., Boeke, J. D., Bussey, H., Chu, A. M., CONNELLY, C., Davis, K., Dietrich, F., Dow, S. W., El Bakkoury, M., Foury, F., Friend, S. H., Gentalen, E., Giaever, G., Hegemann, J. H., Jones, T., Laub, M., Liao, H., Liebundguth, N., Lockhart, D. J., Lucau-Danila, A., Lussier, M., M'Rabet, N., Menard, P., Mittmann, M., Pai, C., Rebischung, C., Revuelta, J. L., Riles, L., Roberts, C. J., Ross-Macdonald, P., Scherens, B., Snyder, M., Sookhai-Mahadeo, S., Storms, R. K., Veronneau, S., Voet, M., Volckaert, G., Ward, T. R., Wysocki, R., Yen, G. S., Yu, K. X., Zimmermann, K., Philippsen, P., Johnston, M., Davis, R. W. 1999; 285 (5429): 901-906

    Abstract

    The functions of many open reading frames (ORFs) identified in genome-sequencing projects are unknown. New, whole-genome approaches are required to systematically determine their function. A total of 6925 Saccharomyces cerevisiae strains were constructed, by a high-throughput strategy, each with a precise deletion of one of 2026 ORFs (more than one-third of the ORFs in the genome). Of the deleted ORFs, 17 percent were essential for viability in rich medium. The phenotypes of more than 500 deletion strains were assayed in parallel. Of the deletion strains, 40 percent showed quantitative growth defects in either rich or minimal medium.

    View details for Web of Science ID 000081860900053

    View details for PubMedID 10436161

  • Differential regulation of the Kar3p kinesin-related protein by two associated proteins, Cik1p and Vik1p JOURNAL OF CELL BIOLOGY Manning, B. D., Barrett, J. G., Wallace, J. A., Granok, H., Snyder, M. 1999; 144 (6): 1219-1233

    Abstract

    The mechanisms by which kinesin-related proteins interact with other proteins to carry out specific cellular processes is poorly understood. The kinesin-related protein, Kar3p, has been implicated in many microtubule functions in yeast. Some of these functions require interaction with the Cik1 protein (Page, B.D., L.L. Satterwhite, M.D. Rose, and M. Snyder. 1994. J. Cell Biol. 124:507-519). We have identified a Saccharomyces cerevisiae gene, named VIK1, encoding a protein with sequence and structural similarity to Cik1p. The Vik1 protein is detected in vegetatively growing cells but not in mating pheromone-treated cells. Vik1p physically associates with Kar3p in a complex separate from that of the Kar3p-Cik1p complex. Vik1p localizes to the spindle-pole body region in a Kar3p-dependent manner. Reciprocally, concentration of Kar3p at the spindle poles during vegetative growth requires the presence of Vik1p, but not Cik1p. Phenotypic analysis suggests that Cik1p and Vik1p are involved in different Kar3p functions. Disruption of VIK1 causes increased resistance to the microtubule depolymerizing drug benomyl and partially suppresses growth defects of cik1Delta mutants. The vik1Delta and kar3Delta mutations, but not cik1Delta, partially suppresses the temperature-sensitive growth defect of strains lacking the function of two other yeast kinesin-related proteins, Cin8p and Kip1p. Our results indicate that Kar3p forms functionally distinct complexes with Cik1p and Vik1p to participate in different microtubule-mediated events within the same cell.

    View details for Web of Science ID 000079470900011

    View details for PubMedID 10087265

  • Nim1-related kinases coordinate cell cycle progression with the organization of the peripheral cytoskeleton in yeast GENES & DEVELOPMENT Barral, Y., Parra, M., Bidlingmaier, S., Snyder, M. 1999; 13 (2): 176-187

    Abstract

    The mechanisms that couple cell cycle progression with the organization of the peripheral cytoskeleton are poorly understood. In Saccharomyces cerevisiae, the Swe1 protein has been shown previously to phosphorylate and inactivate the cyclin-dependent kinase, Cdc28, thereby delaying the onset of mitosis. The nim1-related protein kinase, Hsl1, induces entry into mitosis by negatively regulating Swe1. We have found that Hsl1 physically associates with the septin cytoskeleton in vivo and that Hsl1 kinase activity depends on proper septin function. Genetic analysis indicates that two additional Hsl1-related kinases, Kcc4 and Gin4, act redundantly with Hsl1 to regulate Swe1. Kcc4, like Hsl1 and Gin4, was found to localize to the bud neck in a septin-dependent fashion. Interestingly, hsl1 kcc4 gin4 triple mutants develop a cellular morphology extremely similar to that of septin mutants. Consistent with the idea that Hsl1, Kcc4, and Gin4 link entry into mitosis to proper septin organization, we find that septin mutants incubated at the restrictive temperature trigger a Swe1-dependent mitotic delay that is necessary to maintain cell viability. These results reveal for the first time how cells monitor the organization of their cytoskeleton and demonstrate the existence of a cell cycle checkpoint that responds to defects in the peripheral cytoskeleton. Moreover, Hsl1, Kcc4, and Gin4 have homologs in higher eukaryotes, suggesting that the regulation of Swe1/Wee1 by this class of kinases is highly conserved.

    View details for Web of Science ID 000078395100007

    View details for PubMedID 9925642

  • Transposon mutagenesis for the analysis of protein production, function, and localization CDNA PREPARATION AND CHARACTERIZATION Ross-Macdonald, P., Sheehan, A., Friddle, C., Roeder, G. S., Snyder, M. 1999; 303: 512-532

    View details for Web of Science ID 000081913000029

    View details for PubMedID 10349663

  • Spa2p interacts with cell polarity proteins and signaling components involved in yeast cell morphogenesis MOLECULAR AND CELLULAR BIOLOGY Sheu, Y. J., Santos, B., Fortin, N., Costigan, C., Snyder, M. 1998; 18 (7): 4053-4069

    Abstract

    The yeast protein Spa2p localizes to growth sites and is important for polarized morphogenesis during budding, mating, and pseudohyphal growth. To better understand the role of Spa2p in polarized growth, we analyzed regions of the protein important for its function and proteins that interact with Spa2p. Spa2p interacts with Pea2p and Bud6p (Aip3p) as determined by the two-hybrid system; all of these proteins exhibit similar localization patterns, and spa2Delta, pea2Delta, and bud6Delta mutants display similar phenotypes, suggesting that these three proteins are involved in the same biological processes. Coimmunoprecipitation experiments demonstrate that Spa2p and Pea2p are tightly associated with each other in vivo. Velocity sedimentation experiments suggest that a significant portion of Spa2p, Pea2p, and Bud6p cosediment, raising the possibility that these proteins form a large, 12S multiprotein complex. Bud6p has been shown previously to interact with actin, suggesting that the 12S complex functions to regulate the actin cytoskeleton. Deletion analysis revealed that multiple regions of Spa2p are involved in its localization to growth sites. One of the regions involved in Spa2p stability and localization interacts with Pea2p; this region contains a conserved domain, SHD-II. Although a portion of Spa2p is sufficient for localization of itself and Pea2p to growth sites, only the full-length protein is capable of complementing spa2 mutant defects, suggesting that other regions are required for Spa2p function. By using the two-hybrid system, Spa2p and Bud6p were also found to interact with components of two mitogen-activated protein kinase (MAPK) pathways important for polarized cell growth. Spa2p interacts with Ste11p (MAPK kinase [MEK] kinase) and Ste7p (MEK) of the mating signaling pathway as well as with the MEKs Mkk1p and Mkk2p of the Slt2p (Mpk1p) MAPK pathway; for both Mkk1p and Ste7p, the Spa2p-interacting region was mapped to the N-terminal putative regulatory domain. Bud6p interacts with Ste11p. The MEK-interacting region of Spa2p corresponds to the highly conserved SHD-I domain, which is shown to be important for mating and MAPK signaling. spa2 mutants exhibit reduced levels of pheromone signaling and an elevated level of Slt2p kinase activity. We thus propose that Spa2p, Pea2p, and Bud6p function together, perhaps as a complex, to promote polarized morphogenesis through regulation of the actin cytoskeleton and signaling pathways.

    View details for Web of Science ID 000074380100044

    View details for PubMedID 9632790

  • Pheromone-regulated genes required for yeast mating differentiation JOURNAL OF CELL BIOLOGY Erdman, S., Lin, L., Malczynski, M., Snyder, M. 1998; 140 (3): 461-483

    Abstract

    Yeast cells mate by an inducible pathway that involves agglutination, mating projection formation, cell fusion, and nuclear fusion. To obtain insight into the mating differentiation of Saccharomyces cerevisiae, we carried out a large-scale transposon tagging screen to identify genes whose expression is regulated by mating pheromone. 91,200 transformants containing random lacZ insertions were screened for beta-galactosidase (beta-gal) expression in the presence and absence of alpha factor, and 189 strains containing pheromone-regulated lacZ insertions were identified. Transposon insertion alleles corresponding to 20 genes that are novel or had not previously been known to be pheromone regulated were examined for effects on the mating process. Mutations in four novel genes, FIG1, FIG2, KAR5/ FIG3, and FIG4 were found to cause mating defects. Three of the proteins encoded by these genes, Fig1p, Fig2p, and Fig4p, are dispensible for cell polarization in uniform concentrations of mating pheromone, but are required for normal cell polarization in mating mixtures, conditions that involve cell-cell communication. Fig1p and Fig2p are also important for cell fusion and conjugation bridge shape, respectively. The fourth protein, Kar5p/Fig3p, is required for nuclear fusion. Fig1p and Fig2p are likely to act at the cell surface as Fig1:: beta-gal and Fig2::beta-gal fusion proteins localize to the periphery of mating cells. Fig4p is a member of a family of eukaryotic proteins that contain a domain homologous to the yeast Sac1p. Our results indicate that a variety of novel genes are expressed specifically during mating differentiation to mediate proper cell morphogenesis, cell fusion, and other steps of the mating process.

    View details for Web of Science ID 000072026300002

    View details for PubMedID 9456310

  • The Spa2-related protein, Sph1p, is important for polarized growth in yeast JOURNAL OF CELL SCIENCE Roemer, T., Vallier, L., Sheu, Y. J., Snyder, M. 1998; 111: 479-494

    Abstract

    The Saccharomyces cerevisiae protein Sph1p is both structurally and functionally related to the polarity protein, Spa2p. Sph1p and Spa2p are predicted to share three 100-amino acid domains each exceeding 30% sequence identity, and the amino-terminal domain of each protein contains a direct repeat common to Homo sapiens and Caenorhabditis elegans protein sequences. sph1- and spa2-deleted cells possess defects in mating projection morphology and pseudohyphal growth. sph1(Delta) spa2(Delta) double mutants also exhibit a strong haploid invasive growth defect and an exacerbated mating projection defect relative to either sph1(Delta) or spa2(Delta) single mutants. Consistent with a role in polarized growth, Sph1p localizes to growth sites in a cell cycle-dependent manner: Sph1p concentrates as a cortical patch at the presumptive bud site in unbudded cells, at the tip of small, medium and large buds, and at the bud neck prior to cytokinesis. In pheromone-treated cells, Sph1p localizes to the tip of the mating projection. Proper localization of Sph1p to sites of active growth during budding and mating requires Spa2p. Sph1p interacts in the two-hybrid system with three mitogen-activated protein (MAP) kinase kinases (MAPKKs): Mkk1p and Mkk2p, which function in the cell wall integrity/cell polarization MAP kinase pathway, and Ste7p, which operates in the pheromone and pseudohyphal signaling response pathways. Sph1p also interacts weakly with STE11, the MAPKKK known to activate STE7. Moreover, two-hybrid interactions between SPH1 and STE7 and STE11 occur independently of STE5, a proposed scaffolding protein which interacts with several members of this MAP kinase module. We speculate that Spa2p and Sph1p may function during pseudohyphal and haploid invasive growth to help tether this MAP kinase module to sites of polarized growth. Our results indicate that Spa2p and Sph1p comprise two related proteins important for the control of cell morphogenesis in yeast.

    View details for Web of Science ID 000072336900007

    View details for PubMedID 9443897

  • Transposon tagging I: A novel system for monitoring protein production, function and localization YEAST GENE ANALYSIS Ross-Macdonald, P., Sheehan, A., Friddle, C., Roeder, G. S., Snyder, M. 1998; 26: 161-179
  • Cell polarity and morphogenesis in budding yeast ANNUAL REVIEW OF MICROBIOLOGY Madden, K., Snyder, M. 1998; 52: 687-744

    Abstract

    Eukaryotic cells respond to intracellular and extracellular cues to direct asymmetric cell growth and division. The yeast Saccharomyces cerevisiae undergoes polarized growth at several times during budding and mating and is a useful model organism for studying asymmetric growth and division. In recent years, many regulatory and cytoskeletal components important for directing and executing growth have been identified, and molecular mechanisms have been elucidated in yeast. Key signaling pathways that regulate polarization during the cell cycle and mating response have been described. Since many of the components important for polarized cell growth are conserved in other organisms, the basic mechanisms mediating polarized cell growth are likely to be universal among eukaryotes.

    View details for Web of Science ID 000076541000021

    View details for PubMedID 9891811

  • The Rho-GEF Rom2p localizes to sites of polarized cell growth and participates in cytoskeletal functions in Saccharomyces cerevisiae MOLECULAR BIOLOGY OF THE CELL Manning, B. D., Padmanabha, R., Snyder, M. 1997; 8 (10): 1829-1844

    Abstract

    Rom2p is a GDP/GTP exchange factor for Rho1p and Rho2p GTPases; Rho proteins have been implicated in control of actin cytoskeletal rearrangements. ROM2 and RHO2 were identified in a screen for high-copy number suppressors of cik1 delta, a mutant defective in microtubule-based processes in Saccharomyces cerevisiae. A Rom2p::3XHA fusion protein localizes to sites of polarized cell growth, including incipient bud sites, tips of small buds, and tips of mating projections. Disruption of ROM2 results in temperature-sensitive growth defects at 11 degrees C and 37 degrees C. rom2 delta cells exhibit morphological defects. At permissive temperatures, rom2 delta cells often form elongated buds and fail to form normal mating projections after exposure to pheromone; at the restrictive temperature, small budded cells accumulate. High-copy number plasmids containing either ROM2 or RHO2 suppress the temperature-sensitive growth defects of cik1 delta and kar3 delta strains. KAR3 encodes a kinesin-related protein that interacts with Cik1p. Furthermore, rom2 delta strains exhibit increased sensitivity to the microtubule depolymerizing drug benomyl. These results suggest a role for Rom2p in both polarized morphogenesis and functions of the microtubule cytoskeleton.

    View details for Web of Science ID A1997YB66300001

    View details for PubMedID 9348527

  • Human dishevelled genes constitute a DHR-containing multigene family GENOMICS Semenov, M. V., Snyder, M. 1997; 42 (2): 302-310

    Abstract

    Three human genes encoding proteins homologous to Drosophila Dishevelled protein were cloned and characterized. Amino acid similarity between the different Dishevelled proteins is concentrated in three highly conserved regions. Two of these regions do not exhibit significant sequence similarity with other known proteins; the third is similar to the discs-large homology region, which was first found in a Drosophila Discs-large tumor suppressor protein (also known as GLGF or PDZ domain). We produced antibodies against human Dishevelled-2 and demonstrated that it is a phosphoprotein and can be detected in all cell lines and human embryonic tissues examined. Indirect immunofluorescence indicates that it is found throughout the cytoplasm. Our results indicate that the human dishevelled genes constitute a multigene family and that Dishevelled proteins are highly conserved among metazoans.

    View details for Web of Science ID A1997XE86000015

    View details for PubMedID 9192851

  • SBF cell cycle regulator as a target of the yeast PKC-MAP kinase pathway SCIENCE Madden, K., Sheu, Y. J., Baetz, K., Andrews, B., Snyder, M. 1997; 275 (5307): 1781-1784

    Abstract

    Protein kinase C (PKC) signaling is highly conserved among eukaryotes and has been implicated in the regulation of cellular processes such as cell proliferation and growth. In the budding yeast, PKC1 functions to activate the SLT2(MPK1) mitogen-activated protein (MAP) kinase cascade, which is required for the maintenance of cell integrity during asymmetric cell growth. Genetic studies, coimmunoprecipitation experiments, and analysis of protein phosphorylation in vivo and in vitro indicate that the SBF transcription factor (composed of Swi4p and Swi6p), an important regulator of gene expression at the G1 to S phase cell cycle transition, is a target of the Slt2p(Mpk1p) MAP kinase. These studies provide evidence for a direct role of the PKC1 pathway in the regulation of the yeast cell cycle and cell growth and indicate that conserved signaling pathways can act to control key regulators of cell division.

    View details for Web of Science ID A1997WP05600038

    View details for PubMedID 9065400

  • Targeting of chitin synthase 3 to polarized growth sites in yeast requires Chs5p and Myo2p JOURNAL OF CELL BIOLOGY Santos, B., Snyder, M. 1997; 136 (1): 95-110

    Abstract

    Chitin is an essential structural component of the yeast cell wall whose deposition is regulated throughout the yeast life cycle. The temporal and spatial regulation of chitin synthesis was investigated during vegetative growth and mating of Saccharomyces cerevisiae by localization of the putative catalytic subunit of chitin synthase III, Chs3p, and its regulator, Chs5p. Immunolocalization of epitope-tagged Chs3p revealed a novel localization pattern that is cell cycle-dependent. Chs3p is polarized as a diffuse ring at the incipient bud site and at the neck between the mother and bud in small-budded cells; it is not found at the neck in large-budded cells containing a single nucleus. In large-budded cells undergoing cytokinesis, it reappears as a ring at the neck. In cells responding to mating pheromone, Chs3p is found throughout the projection. The appearance of Chs3p at cortical sites correlates with times that chitin synthesis is expected to occur. In addition to its localization at the incipient bud site and neck, Chs3p is also found in cytoplasmic patches in cells at different stages of the cell cycle. Epitope-tagged Chs5p also localizes to cytoplasmic patches; these patches contain Kex2p, a late Golgi-associated enzyme. Unlike Chs3p, Chs5p does not accumulate at the incipient bud site or neck. Nearly all Chs3p patches contain Chs5p, whereas some Chs5p patches lack detectable Chs3p. In the absence of Chs5p, Chs3p localizes in cytoplasmic patches, but it is no longer found at the neck or the incipient bud site, indicating that Chs5p is required for the polarization of Chs3p. Furthermore, Chs5p localization is not affected either by temperature shift or by the myo2-66 mutation, however, Chs3p polarization is affected by temperature shift and myo2-66. We suggest a model in which Chs3p polarization to cortical sites in yeast is dependent on both Chs5p and the actin cytoskeleton/Myo2p.

    View details for Web of Science ID A1997WC96100009

    View details for PubMedID 9008706

  • A multipurpose transposon system for analyzing protein production, localization, and function in Saccharomyces cerevisiae PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA ROSSMACDONALD, P., Sheehan, A., Roeder, G. S., Snyder, M. 1997; 94 (1): 190-195

    Abstract

    Analysis of the function of a particular gene product typically involves determining the expression profile of the gene, the subcellular location of the protein, and the phenotype of a null strain lacking the protein. Conditional alleles of the gene are often created as an additional tool. We have developed a multifunctional, transposon-based system that simultaneously generates constructs for all the above analyses and is suitable for mutagenesis of any given Saccharomyces cerevisiae gene. Depending on the transposon used, the yeast gene is fused to a coding region for beta-galactosidase or green fluorescent protein. Gene expression can therefore be monitored by chemical or fluorescence assays. The transposons create insertion mutations in the target gene, allowing phenotypic analysis. The transposon can be reduced by cre-lox site-specific recombination to a smaller element that leaves an epitope tag inserted in the encoded protein. In addition to its utility for a variety of immunodetection purposes, the epitope tag element also has the potential to create conditional alleles of the target gene. We demonstrate these features of the transposons by mutagenesis of the SPA2, ARP100, SER1, and BDF1 genes.

    View details for Web of Science ID A1997WC34700036

    View details for PubMedID 8990184

  • Selection of polarized growth sites in yeast TRENDS IN CELL BIOLOGY Roemer, T., Vallier, L. G., Snyder, M. 1996; 6 (11): 434-441

    Abstract

    The budding yeast Saccharomyces cerevisiae responds to intracellular and extracellular cues to direct cell growth. Genetic analysis has revealed many components that participate in this process and has provided insight into the mechanisms by which these proteins function. Several of these components, such as the septins, pheromone receptors and GTPase proteins, have homologues in multicellular eukaryotes, suggesting that many aspects of polarized cell growth may be conserved throughout evolution. This review discusses our current understanding of the molecular mechanisms of growth-site selection during the different stages of the yeast life cycle.

    View details for Web of Science ID A1996VT35300007

    View details for PubMedID 15157515

  • Selection of axial growth sites in yeast requires Axl2p, a novel plasma membrane glycoprotein GENES & DEVELOPMENT Roemer, T., Madden, K., Chang, J. T., Snyder, M. 1996; 10 (7): 777-793

    Abstract

    Spa2p and Cdc10p both participate in bud site selection and cell morphogenesis in yeast, and spa2delta cdc10-10 cells are inviable. To identify additional components important for these processes in yeast, a colony-sectoring assay was used to isolate high-copy suppressors of the spa2delda cdc10-10 lethality. One such gene, AXL2, has been characterized in detail. axl2 cells are defective in bud site selection in haploid cells and bud in a bipolar fashion. Genetic analysis indicates that AXL2 falls into the same epistasis group as BUD3. Axl2p is predicted to be a type I transmembrane protein. Tunicamycin treatment experiments, biochemical fractionation and extraction experiments, and proteinase K protection experiments collectively indicate that Axl2p is an integral membrane glycoprotein at the plasma membrane. Indirect immunofluorescence experiments using either Axl2p tagged with three copies of a hemagglutinin epitope or high-copy AXL2 and anti-Axl2p antibodies reveal a unique localization pattern for Axl2p. The protein is present as a patch at the incipient bud site and in emerging buds, and at the bud periphery in small-budded cells. In cells containing medium-sized or large buds, Axl2p is located as a ring at the neck. Thus, Axl2p is a novel membrane protein critical for selecting proper growth sites in yeast. We suggest that Axl2p acts as an anchor in the plasma membrane that helps direct new growth components and/or polarity establishment components to the cortical axial budding site.

    View details for Web of Science ID A1996UE55800001

    View details for PubMedID 8846915

  • Target gene identification: Target specific transcriptional activation by three murine homeodomain VP16 hybrid proteins in Saccharomyces cerevisiae JOURNAL OF EXPERIMENTAL ZOOLOGY FriedmanEinat, M., Einat, P., Snyder, M., Ruddle, F. 1996; 274 (3): 145-156

    Abstract

    The mammalian homeodomain proteins encoded by Hox genes play an important role in embryonic development by providing positional queues which define developmental identities along the anteroposterior axis of developing organisms. These proteins bind DNA specifically through their homeodomain to sequences containing ATTA cores, and thereby are thought to exert their effect regulating downstream genes. Little is known about the specificity of binding of homeodomain proteins to their sequences and the identity of their target genes. We have developed a transcriptional activation assay in yeast which employs a homeobox/VP16 fusion gene as a transcriptional activator and a target construct in which test fragments of DNA are inserted upstream to a reporter gene. Using this assay, we compared transcriptional activation by three chimeric proteins containing the homeodomains of the mouse homeobox genes, Hoxa-5, Hoxb-6, and Hoxc-8. When tested on previously defined target sequences, strong differential specificities of activation were observed. In an effort to identify enhancers that normally respond to homeodomain transcriptional activators, random fragments of mouse genomic DNA were cloned upstream of the reporter gene. Genomic DNA fragments with distinct activation profiles were obtained and were found to share matches beyond the ATTA core with previously described enhancers. These results demonstrate that the transcriptional activation system in yeast can be used as a convenient system to detect DNA motifs which bind homeodomain proteins, and subsequently, to identify authentic target genes responsive to Hox gene proteins.

    View details for Web of Science ID A1996UC28400001

    View details for PubMedID 8882492

  • Highly divergent gamma-tubulin gene is essential for cell growth and proper microtubule organization in Saccharomyces cerevisiae JOURNAL OF CELL BIOLOGY Sobel, S. G., Snyder, M. 1995; 131 (6): 1775-1788

    Abstract

    A Saccharomyces cerevisiae gamma-tubulin-related gene, TUB4, has been characterized. The predicted amino acid sequence of the Tub4 protein (Tub4p) is 29-38% identical to members of the gamma-tubulin family. Indirect immunofluorescence experiments using a strain containing an epitope-tagged Tub4p indicate that Tub4p resides at the spindle pole body throughout the yeast cell cycle. Deletion of the TUB4 gene indicates that Tub4p is essential for yeast cell growth. Tub4p-depleted cells arrest during nuclear division; most arrested cells contain a large bud, replicated DNA, and a single nucleus. Immunofluorescence and nuclear staining experiments indicate that cells depleted of Tub4p contain defects in the organization of both cytoplasmic and nuclear microtubule arrays; such cells exhibit nuclear migration failure, defects in spindle formation, and/or aberrantly long cytoplasmic microtubule arrays. These data indicate that the S. cerevisiae gamma-tubulin protein is an important SPB component that organizes both cytoplasmic and nuclear microtubule arrays.

    View details for Web of Science ID A1995TN76000011

    View details for PubMedID 8557744

  • 2 SHORT AUTOEPITOPES ON THE NUCLEAR DOT ANTIGEN ARE SIMILAR TO EPITOPES ENCODED BY THE EPSTEIN-BARR-VIRUS PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA XIE, K. W., Snyder, M. 1995; 92 (5): 1639-1643

    Abstract

    To understand the relationship between antibodies present in patients with anti-nuclear dot (ND) autoimmune disease and the proteins they recognize, epitopes that react with the autoantibodies were mapped. A panel of fusion proteins containing different portions of the ND protein were overproduced in Escherichia coli. Immunoblot analysis with anti-ND antibodies revealed that most (10 of 12) sera recognize two major autoepitopes that are each a maximum of 8 amino acids long. The other two sera recognize one of the two epitopes. In addition to the short linear autoepitopes, a conformational epitope appears to be present on the ND antigen. Each of the two linear epitope sequences shares sequence similarities with those of several viral proteins found in the databases. Furthermore, two fusion proteins containing short Epstein-Barr virus (EBV) protein sequences that are similar to the ND epitopes were recognized by the human autoimmune sera, indicating that the autoepitopes are present in EBV protein sequences. Our results are consistent with the hypothesis that ND autoimmune disease might be associated with EBV infections.

    View details for Web of Science ID A1995QK07700081

    View details for PubMedID 7878031

  • Methods for large-scale analysis of gene expression, protein localization, and disruption phenotypes in Saccharomyces cerevisiae METHODS IN MOLECULAR AND CELLULAR BIOLOGY ROSSMACDONALD, P., Burns, N., Malczynski, M., Sheehan, A., Roeder, S., Snyder, M. 1995; 5 (5): 298-308
  • THE SPINDLE POLE BODY OF YEAST CHROMOSOMA Snyder, M. 1994; 103 (6): 369-380

    Abstract

    Microtubule organizing centers play an essential cellular role in nucleating microtubule assembly and establishing the microtubule array. The microtubule organizing center of yeast, the spindle pole body (SPB), shares many functions and properties with those other organisms. In recent years considerable new information has been generated concerning components associated with the SPB, and the mechanism by which it duplicates. This article reviews our current view of the cytology and molecular composition of the SPB of the budding yeast, Saccharomyces cerevisiae, and the fission yeast, Schizosaccharomyces pombe. Genetic studies in these organisms has revealed information about how the SPB duplicates and separates, and its roles during vegetative growth, mating and meiosis.

    View details for Web of Science ID A1994PT81500001

    View details for PubMedID 7859557

  • SLK1, A YEAST HOMOLOG OF MAP KINASE ACTIVATORS, HAS A RAS/CAMP INDEPENDENT ROLE IN NUTRIENT SENSING MOLECULAR & GENERAL GENETICS Costigan, C., Snyder, M. 1994; 243 (3): 286-296

    Abstract

    The Saccharomyces cerevisiae SLK1 protein is implicated in nutrient sensing and growth control. Under nutrient-limiting conditions, slk1 mutants fail to undergo cell cycle arrest. The role of the SLK1 protein in nutrient sensing was examined with respect to the cAMP-dependent protein kinase (PKA) pathway, which has a well characterized role in growth control in yeast, and by the analysis of dominant SLK1 alleles that affect the nutrient response of wild-type cells. Interactions with the PKA pathway were examined by phenotypic analysis of double mutants of slk1 and various PKA pathway mutants. Combining the slk1-delta mutation with a mutation that is thought constitutively activate the PKA pathway, pde2, resulted in enhanced growth control defects. The combination of slk1-delta with mutations that inhibit the PKA pathway, cdc25 and ras1, ras2, failed to alleviate the slk1 cell cycle arrest defect and lowered the permissive temperature for growth. Furthermore bcy1 tpk1 tpk2 tpk3w (bcy1 tpkw) mutants, which have constitutive, low-level, cAMP-independent kinase activity, exhibit nutrient sensing, which is eliminated in the slk1 bcy1 tpkw mutants. These results implicated SLK1 in PKA-independent growth control in yeast. The amino-terminal, noncatalytic region of the SLK1 protein may be important in the regulation of SLK1 function in growth control. Overexpression of this region caused starvation sensitivity in wild-type cells by interfering with SLK1 protein function.

    View details for Web of Science ID A1994NM20300005

    View details for PubMedID 8190082

  • LARGE-SCALE ANALYSIS OF GENE-EXPRESSION, PROTEIN LOCALIZATION, AND GENE DISRUPTION SACCHAROMYCES-CEREVISIAE GENES & DEVELOPMENT Burns, N., Grimwade, B., ROSSMACDONALD, P. B., Choi, E. Y., Finberg, K., Roeder, G. S., Snyder, M. 1994; 8 (9): 1087-1105

    Abstract

    We have developed a large-scale screen to identify genes expressed at different times during the life cycle of Saccharomyces cerevisiae and to determine the subcellular locations of many of the encoded gene products. Diploid yeast strains containing random lacZ insertions throughout the genome have been constructed by transformation with a mutagenized genomic library. Twenty-eight hundred transformants containing fusion genes expressed during vegetative growth and 55 transformants containing meiotically induced fusion genes have been identified. Based on the frequency of transformed strains producing beta-galactosidase, we estimate that 80-86% of the yeast genome (excluding the rDNA) contains open reading frames expressed in vegetative cells and that there are 93-135 meiotically induced genes. Indirect immunofluorescence analysis of 2373 strains carrying fusion genes expressed in vegetative cells has identified 245 fusion proteins that localize to discrete locations in the cell, including the nucleus, mitochondria, endoplasmic reticulum, cytoplasmic dots, spindle pole body, and microtubules. The DNA sequence adjacent to the lacZ gene has been determined for 91 vegetative fusion genes whose products have been localized and for 43 meiotically induced fusions. Although most fusions represent genes unidentified previously, many correspond to known genes, including some whose expression has not been studied previously and whose products have not been localized. For example, Sec21-beta-gal fusion proteins yield a Golgi-like staining pattern, Ty1-beta-gal fusion proteins localize to cytoplasmic dots, and the meiosis-specific Mek1/Mre4-beta-gal and Spo11-beta-gal fusion proteins reside in the nucleus. The phenotypes in haploid cells have been analyzed for 59 strains containing chromosomal fusion genes expressed during vegetative growth; 9 strains fail to form colonies indicating that the disrupted genes are essential. Fifteen additional strains display slow growth or are impaired for growth on specific media or in the presence of inhibitors. Of 39 meiotically induced fusion genes examined, 14 disruptions confer defects in spore formation or spore viability in homozygous diploids. Our results will allow researchers who identify a yeast gene to determine immediately whether that gene is expressed at a specific time during the life cycle and whether its gene product localizes to a specific subcellular location.

    View details for Web of Science ID A1994NJ94700008

    View details for PubMedID 7926789

  • NHP6A AND NHP6B, WHICH ENCODE HMG1-LIKE PROTEINS, ARE CANDIDATES FOR DOWNSTREAM COMPONENTS OF THE YEAST SLT2 MITOGEN-ACTIVATED PROTEIN-KINASE PATHWAY MOLECULAR AND CELLULAR BIOLOGY Costigan, C., Kolodrubetz, D., Snyder, M. 1994; 14 (4): 2391-2403

    Abstract

    The yeast SLK1 (BCK1) gene encodes a mitogen-activated protein kinase (MAPK) activator protein which functions upstream in a protein kinase cascade that converges on the MAPK Slt2p (Mpk1p). Dominant alleles of SLK1 have been shown to bypass the conditional lethality of a protein kinase C mutation, pkc1-delta, suggesting that Pkc1p may regulate Slk1p function. Slk1p has an important role in morphogenesis and growth control, and deletions of the SLK1 gene are lethal in a spa2-delta mutant background. To search for genes that interact with the SLK1-SLT2 pathway, a synthetic lethal suppression screen was carried out. Genes which in multiple copies suppress the synthetic lethality of slk1-1 spa2-delta were identified, and one, the NHP6A gene, has been extensively characterized. The NHP6A gene and the closely related NHP6B gene were shown previously to encode HMG1-like chromatin-associated proteins. We demonstrate here that these genes are functionally redundant and that multiple copies of either NHP6A or NHP6B suppress slk1-delta and slt2-delta. Strains from which both NHP6 genes were deleted (nhp6-delta mutants) share many phenotypes with pkc1-delta, slk1-delta, and slt2-delta mutants. nhp6-delta cells display a temperature-sensitive growth defect that is rescued by the addition of 1 M sorbitol to the medium, and they are sensitive to starvation. nhp6-delta strains also exhibit a variety of morphological and cytoskeletal defects. At the restrictive temperature for growth, nhp6-delta mutant cells contain elongated buds and enlarged necks. Many cells have patches of chitin staining on their cell surfaces, and chitin deposition is enhanced at the necks of budded cells. nhp6-delta cells display a defect in actin polarity and often accumulate large actin chunks. Genetic and phenotypic analysis indicates that NHP6A and NHP6B function downstream of SLT2. Our results indicate that the Slt2p MAPK pathway in Saccharomyces cerevisiae may mediate its function in cell growth and morphogenesis, at least in part, through high-mobility group proteins.

    View details for Web of Science ID A1994NC05700018

    View details for PubMedID 8139543

  • MUTATIONS IN PRG1, A YEAST PROTEASOME-RELATED GENE, CAUSE DEFECTS IN NUCLEAR DIVISION AND ARE SUPPRESSED BY DELETION OF A MITOTIC CYCLIN GENE PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Friedman, H., Snyder, M. 1994; 91 (6): 2031-2035

    Abstract

    Proteasomes are ubiquitous complexes exhibiting proteolytic activity in vitro. The function(s) of these enzymes in vivo is not known. To investigate the in vivo role of proteasomes, four temperature-sensitive alleles of the Saccharomyces cerevisiae proteasome-related gene, PRG1, were constructed and analyzed. At both the permissive and restrictive temperatures, many prg1 cells have a large bud, contain replicated DNA, and have their nucleus positioned at the neck with a short spindle. These different phenotypes indicate a defect in nuclear division. Consistent with a nuclear division defect, prg1 mutant strains lose a dispensable chromosome at a higher frequency than wild-type cells. Importantly, deletion of CLB2, a gene encoding a mitotic cyclin, suppresses the temperature-sensitive growth phenotype of prg1 mutant strains. Our results indicate that proteasomes are important for nuclear division and suggest that they participate in degradation of the Clb2 protein (Clb2p).

    View details for Web of Science ID A1994NC04300014

    View details for PubMedID 8134345

  • LOCALIZATION OF THE KAR3 KINESIN HEAVY CHAIN-RELATED PROTEIN REQUIRES THE CIK1 INTERACTING PROTEIN JOURNAL OF CELL BIOLOGY Page, B. D., Satterwhite, L. L., Rose, M. D., Snyder, M. 1994; 124 (4): 507-519

    Abstract

    The Kar3 protein (Kar3p), a protein related to kinesin heavy chain, and the Cik1 protein (Cik1p) appear to participate in the same cellular processes in S. cerevisiae. Phenotypic analysis of mutants indicates that both CIK1 and KAR3 participate in spindle formation and karyogamy. In addition, the expression of both genes is induced by pheromone treatment. In vegetatively growing cells, both Cik1::beta-gal and Kar3::beta-gal fusions localize to the spindle pole body (SPB), and after pheromone treatment both fusion proteins localize to the spindle pole body and cytoplasmic microtubules. The dependence of Cik1p and Kar3p localization upon one another was investigated by indirect immunofluorescence of fusion proteins in pheromone-treated cells. The Cik1p::beta-gal fusion does not localize to the SPB or microtubules in a kar3 delta strain, and the Kar3p::beta-gal fusion protein does not localize to microtubule-associated structures in a cik1 delta strain. Thus, these proteins appear to be interdependent for localization to the SPB and microtubules. Analysis by both the two-hybrid system and co-immunoprecipitation experiments indicates that Cik1p and kar3p interact, suggesting that they are part of the same protein complex. These data indicate that interaction between a putative kinesin heavy chain-related protein and another protein can determine the localization of motor activity and thereby affect the functional specificity of the motor complex.

    View details for Web of Science ID A1994MW69100010

    View details for PubMedID 8106549

  • NUCLEAR DOT ANTIGENS MAY SPECIFY TRANSCRIPTIONAL DOMAINS IN THE NUCLEUS MOLECULAR AND CELLULAR BIOLOGY XIE, K. W., Lambie, E. J., Snyder, M. 1993; 13 (10): 6170-6179

    Abstract

    A bank of 892 human autoimmune serum samples was screened by indirect immunofluorescence on human tissue culture HT-29 cells. Seven serum samples that stain 4 to 10 bright dots in cell lines of several different mammals, including humans, monkeys, rats, and pigs, were identified. Immunofluorescence experiments indicate that these antigens, called nuclear dot (ND) antigens, are distinct from splicing complexes, kinetochores, and other known nuclear structures. An ND antigen recognized by these sera was cloned by immunoscreening a human lambda gt11 expression library. Analysis of seven cDNA clones for the ND antigen indicates that several mRNAs exist, perhaps derived through alternative splicing mechanisms. One major form of the message has an open reading frame of 1,440 bp capable of encoding a 53,000-M(r) protein. Treatment of cells with detergent, salt, or RNase A fails to remove the ND antigen from the nucleus. However, incubation with DNase I obliterates ND staining, indicating that the ND protein directly or indirectly associates with nuclear DNA. Fusion of the ND protein to a LexA DNA binding domain activates transcription in Saccharomyces cerevisiae. A 75-amino-acid domain that activates transcription in both yeast and primate cells has been identified. We suggest that ND antigens may participate in the activation of transcription of specific regions of the genome.

    View details for Web of Science ID A1993LY42400023

    View details for PubMedID 8413218

  • COMPONENTS REQUIRED FOR CYTOKINESIS ARE IMPORTANT FOR BUD SITE SELECTION IN YEAST JOURNAL OF CELL BIOLOGY FLESCHER, E. G., Madden, K., Snyder, M. 1993; 122 (2): 373-386

    Abstract

    Polarized cell division is a fundamental process that occurs in a variety of organisms; it is responsible for the proper positioning of daughter cells and the correct segregation of cytoplasmic components. The SPA2 gene of yeast encodes a nonessential protein that localizes to sites of cell growth and to the site of cytokinesis. spa2 mutants exhibit slightly altered budding patterns. In this report, a genetic screen was used to isolate a novel ochre allele of CDC10, cdc10-10; strains containing this mutation require the SPA2 gene for growth. CDC10 encodes a conserved potential GTP-binding protein that previously has been shown to localize to the bud neck and to be important for cytokinesis. The genetic interaction of cdc10-10 and spa2 suggests a role for SPA2 in cytokinesis. Most importantly, strains that contain a cdc10-10 mutation and those containing mutations affecting other putative neck filament proteins do not form buds at their normal proximal location. The finding that a component involved in cytokinesis is also important in bud site selection provides strong evidence for the cytokinesis tag model; i.e., critical components at the site of cytokinesis are involved in determining the next site of polarized growth and division.

    View details for Web of Science ID A1993LM58400008

    View details for PubMedID 8320260

  • CARBON SOURCE INDUCES GROWTH OF STATIONARY PHASE YEAST-CELLS, INDEPENDENT OF CARBON SOURCE METABOLISM YEAST Granot, D., Snyder, M. 1993; 9 (5): 465-479

    Abstract

    Nutrients regulate the proliferation of many eukaryotic cells: in the absence of sufficient nutrients vegetatively growing cells will enter stationary (G0 like) phase; in the presence of sufficient nutrients non-proliferative cells will begin growth. Previously we have shown that glucose is the critical nutrient which stimulates a variety of growth-related events in the yeast Saccharomyces cerevisiae (Granot and Snyder, 1991). This paper describes six new aspects of the induction of cell growth events by nutrients in S. cerevisiae. First, all carbon sources tested, both fermentable and non-fermentable, induce growth-related events in stationary phase cells, suggesting that the carbon source is the critical nutrient which stimulates growth. Second, the continuous presence of glucose is not necessary for the induction of growth events, but rather a short 'pulse' of glucose followed by an incubation period in water will induce growth events. Third, growth stimulation by glucose occurs in the absence of the SNF3 high affinity glucose transporter. Fourth, growth stimulation occurs independent of carbon source phosphorylation and carbon source metabolism. Fifth, growth induction by carbon source does not require protein synthesis or extracellular calcium. Sixth, following stimulation by carbon source, the cells remain induced for more than 2 h after removal of the carbon source. We suggest a general model in which different carbon sources act as signals to induce the earliest growth events during or following its entry into the cell and that these growth events do not depend upon metabolism of the carbon source.

    View details for Web of Science ID A1993LD24400002

    View details for PubMedID 8322510

  • NUCLEAR-PORE COMPLEX ANTIGENS DELINEATE NUCLEAR-ENVELOPE DYNAMICS IN VEGETATIVE AND CONJUGATING SACCHAROMYCES-CEREVISIAE YEAST Copeland, C. S., Snyder, M. 1993; 9 (3): 235-249

    Abstract

    In the yeast Saccharomyces cerevisiae, the nucleus undergoes dramatic shape changes during mitosis and mating. We have studied nuclear envelope dynamics during the processes of mitosis and conjugation using nuclear pore complexes as a marker for the nuclear envelope in wild-type cells and several cell-division-cycle (cdc) mutants. Three monoclonal antibodies are described that recognize nuclear pore complex-related antigens in S. cerevisiae. One of these antibodies, RL1, has been extensively characterized by Gerace and colleagues and recognizes nuclear pore complexes in mammalian and amphibian cells. By indirect immunofluorescence of yeast cells, all three antibodies yield a discontinuous nuclear rim stain. All three react with multiple nuclear-enriched proteins in immunoblots, including the nucleoporin protein encoded by the NSP1 gene. When the antibodies were used in immunofluorescence experiments on mating cells, the nuclear pore complex staining pattern proved to be a sensitive indicator of nuclear fusion. Nuclei with closely apposed spindle pole bodies and unfused nuclear envelopes could be readily distinguished. Marked shape changes were observed in nuclei during fusion and segregation of the diploid nucleus into the zygotic bud. In cdc14 and cdc15 mutants that arrest late in mitosis, the elongated nuclear envelope extension that stretches between daughter nuclei during telophase was preserved. In cytokinesis-defective mutants (cdc3, cdc10, cdc11 and cdc12), the elongated nuclear envelope was usually resolved into two daughter nuclei in the absence of cytokinesis. These results indicate that nuclear envelope division is mechanically distinguishable from chromosome segregation, nucleolar segregation and cytokinesis.

    View details for Web of Science ID A1993KV94000003

    View details for PubMedID 8488725

  • CHROMOSOME SEGREGATION IN YEAST ANNUAL REVIEW OF MICROBIOLOGY Page, B. D., Snyder, M. 1993; 47: 231-261

    Abstract

    Because of their genetic tractability, much has been learned concerning the mechanisms of chromosome segregation in budding yeast, Saccharomyces cerevisiae, and fission yeast, Schizosaccharomyces pombe. This chapter reviews the cytology and molecular and cell biology of mitosis in both of these yeasts. Current knowledge about the components of the mitotic spindle apparatus, including spindle pole bodies, centromeres, and microtubule components and motors, is summarized. Mechanisms of mitosis such as establishment and positioning of the mitotic spindle apparatus, anaphase A, and anaphase B are reviewed.

    View details for Web of Science ID A1993MA27600009

    View details for PubMedID 8257099

  • A HOMOLOG OF THE PROTEASOME-RELATED RING10 GENE IS ESSENTIAL FOR YEAST-CELL GROWTH GENE Friedman, H., Goebel, M., Snyder, M. 1992; 122 (1): 203-206

    Abstract

    Proteasomes are intracellular protein complexes displaying multiproteolytic activities. These complexes have been implicated in the antigen degradation process that generates peptides associated with the major histocompatibility complex (MHC) class-I molecule. RING10 and RING12 are genes encoded by the class-II region of the human MHC that have sequence homology to proteasome-encoding genes. We have identified a yeast gene, called PRG1, that encodes a protein predicted to contain 55.6% sequence identity to 80% of the RING10 gene product. Genomic disruption of PRG1 revealed that it is essential for yeast cell growth. These data strongly indicate that the antigen-processing system present in vertebrates evolved from a basic cellular process present in all organisms.

    View details for Web of Science ID A1992KB09800027

    View details for PubMedID 1452031

  • THE NUCLEAR-MITOTIC APPARATUS PROTEIN IS IMPORTANT IN THE ESTABLISHMENT AND MAINTENANCE OF THE BIPOLAR MITOTIC SPINDLE APPARATUS MOLECULAR BIOLOGY OF THE CELL Yang, C. H., Snyder, M. 1992; 3 (11): 1259-1267

    Abstract

    The formation and maintenance of the bipolar mitotic spindle apparatus require a complex and balanced interplay of several mechanisms, including the stabilization and separation of polar microtubules and the action of various microtubule motors. Nonmicrotubule elements are also present throughout the spindle apparatus and have been proposed to provide a structural support for the spindle. The Nuclear-Mitotic Apparatus protein (NuMA) is an abundant 240 kD protein that is present in the nucleus of interphase cells and concentrates in the polar regions of the spindle apparatus during mitosis. Sequence analysis indicates that NuMA possesses an unusually long alpha-helical central region characteristic of many filament forming proteins. In this report we demonstrate that microinjection of anti-NuMA antibodies into interphase and prophase cells results in a failure to form a mitotic spindle apparatus. Furthermore, injection of metaphase cells results in the collapse of the spindle apparatus into a monopolar microtubule array. These results identify for the first time a nontubulin component important for both the establishment and stabilization of the mitotic spindle apparatus in multicellular organisms. We suggest that nonmicrotubule structural components may be important for these processes.

    View details for Web of Science ID A1992JZ62000007

    View details for PubMedID 1457830

  • SPECIFICATION OF SITES FOR POLARIZED GROWTH IN SACCHAROMYCES-CEREVISIAE AND THE INFLUENCE OF EXTERNAL FACTORS ON SITE SELECTION MOLECULAR BIOLOGY OF THE CELL Madden, K., Snyder, M. 1992; 3 (9): 1025-1035

    Abstract

    Many eucaryotic cell types exhibit polarized cell growth and polarized cell division at nonrandom sites. The sites of polarized growth were investigated in G1 arrested haploid Saccharomyces cerevisiae cells. When yeast cells are arrested during G1 either by treatment with alpha-factor or by shifting temperature-sensitive cdc28-1 cells to the restrictive temperature, the cells form a projection. Staining with Calcofluor reveals that in both cases the projection usually forms at axial sites (i.e., next to the previous bud scar); these are the same sites where bud formation is expected to occur. These results indicate that sites of polarized growth are specified before the end of G1. Sites of polarized growth can be influenced by external conditions. Cells grown to stationary phase and diluted into fresh medium preferentially select sites for polarized growth opposite the previous bud scar (i.e., distal sites). Incubation of cells in a mating mixture results in projection formation at nonaxial sites: presumably cells form projections toward their mating partner. These observations have important implications in understanding three aspects of cell polarity in yeast: 1) how yeast cell shape is influenced by growth conditions 2) how sites of polarized growth are chosen, and 3) the pathway by which polarity is affected and redirected during the mating process.

    View details for Web of Science ID A1992JR05700009

    View details for PubMedID 1421575

  • CIK1 - A DEVELOPMENTALLY REGULATED SPINDLE POLE BODY-ASSOCIATED PROTEIN IMPORTANT FOR MICROTUBULE FUNCTIONS IN SACCHAROMYCES-CEREVISIAE GENES & DEVELOPMENT Page, B. D., Snyder, M. 1992; 6 (8): 1414-1429

    Abstract

    A genetic screen was devised to identify genes important for spindle pole body (SPB) and/or microtubule functions. Four mutants defective in both nuclear fusion (karyogamy) and chromosome maintenance were isolated; these mutants termed cik (for chromosome instability and karyogamy) define three complementation groups. The CIK1 gene was cloned and characterized. Sequence analysis of the CIK1 gene predicts that the CIK1 protein is 594 amino acids in length and possesses a central 300-amino-acid coiled-coil domain. Two different CIK1-beta-galactosidase fusions localize to the SPB region in vegetative cells, and antibodies against the authentic protein detect CIK1 in the SPB region of alpha-factor-treated cells. Evaluation of cells deleted for CIK1 (cik1-delta) indicates that CIK1 is important for the formation or maintenance of a spindle apparatus. Longer and slightly more microtubule bundles are visible in cik1-delta strains than in wild type. Thus, CIK1 encodes a SPB-associated component that is important for proper organization of microtubule arrays and the establishment of a spindle during vegetative growth. Furthermore, the CIK1 gene is essential for karyogamy, and the level of the CIK1 protein at the SPB appears to be dramatically induced by alpha-factor treatment. These results indicate that molecular changes occur at the microtubule-organizing center (MTOC) as the yeast cell prepares for karyogamy and imply that specialization of the MTOC or its associated microtubules occurs in preparation for particular microtubule functions in the yeast life cycle.

    View details for Web of Science ID A1992JH59900005

    View details for PubMedID 1644287

  • NUMA - AN UNUSUALLY LONG COILED-COIL RELATED PROTEIN IN THE MAMMALIAN NUCLEUS JOURNAL OF CELL BIOLOGY Yang, C. H., Lambie, E. J., Snyder, M. 1992; 116 (6): 1303-1317

    Abstract

    A bank of 892 autoimmune sera was screened by indirect immunofluorescence on mammalian cells. Six sera were identified that recognize an antigen(s) with a cell cycle-dependent localization pattern. In interphase cells, the antibodies stained the nucleus and in mitotic cells the spindle apparatus was recognized. Immunological criteria indicate that the antigen recognized by at least one of these sera corresponds to a previously identified protein called the nuclear mitotic apparatus protein (NuMA). A cDNA which partially encodes NuMA was cloned from a lambda gt11 human placental cDNA expression library, and overlapping cDNA clones that encode the entire gene were isolated. DNA sequence analysis of the clones has identified a long open reading frame capable of encoding a protein of 238 kD. Analysis of the predicted protein sequence suggests that NuMA contains an unusually large central alpha-helical domain of 1,485 amino acids flanked by nonhelical terminal domains. The central domain is similar to coiled-coil regions in structural proteins such as myosin heavy chains, cytokeratins, and nuclear lamins which are capable of forming filaments. Double immunofluorescence experiments performed with anti-NuMA and antilamin antibodies indicate that NuMA dissociates from condensing chromosomes during early prophase, before the complete disintegration of the nuclear lamina. As mitosis progresses, NuMA reassociates with telophase chromosomes very early during nuclear reformation, before substantial accumulation of lamins on chromosomal surfaces is evident. These results indicate that the NuMA proteins may be a structural component of the nucleus and may be involved in the early steps of nuclear reformation during telophase.

    View details for Web of Science ID A1992HH74900001

    View details for PubMedID 1541630

  • A SYNTHETIC LETHAL SCREEN IDENTIFIES SLK1, A NOVEL PROTEIN-KINASE HOMOLOG IMPLICATED IN YEAST-CELL MORPHOGENESIS AND CELL-GROWTH MOLECULAR AND CELLULAR BIOLOGY Costigan, C., GEHRUNG, S., Snyder, M. 1992; 12 (3): 1162-1178

    Abstract

    The Saccharomyces cerevisiae SPA2 protein localizes at sites involved in polarized cell growth in budding cells and mating cells. spa2 mutants have defects in projection formation during mating but are healthy during vegetative growth. A synthetic lethal screen was devised to identify mutants that require the SPA2 gene for vegetative growth. One mutant, called slk-1 (for synthetic lethal kinase), has been characterized extensively. The SLK1 gene has been cloned, and sequence analysis predicts that the SLK1 protein is 1,478 amino acid residues in length. Approximately 300 amino acids at the carboxy terminus exhibit sequence similarity with the catalytic domains of protein kinases. Disruption mutations have been constructed in the SLK1 gene. slk1 null mutants cannot grow at 37 degrees C, but many cells can grow at 30, 24, and 17 degrees C. Dead slk1 mutant cells usually have aberrant cell morphologies, and many cells are very small, approximately one-half the diameter of wild-type cells. Surviving slk1 cells also exhibit morphogenic defects; these cells are impaired in their ability to form projections upon exposure to mating pheromones. During vegetative growth, a higher fraction of slk1 cells are unbudded compared with wild-type cells, and under nutrient limiting conditions, slk1 cells exhibit defects in cell cycle arrest. The different slk1 mutant defects are partially rescued by an extra copy of the SSD1/SRK1 gene. SSD1/SRK1 has been independently isolated as a suppressor of mutations in genes involved in growth control, sit4, pde2, bcy1, and ins1 (A. Sutton, D. Immanuel, and K.T. Arnat, Mol. Cell. Biol. 11:2133-2148, 1991; R.B. Wilson, A.A. Brenner, T.B. White, M.J. Engler, J.P. Gaughran, and K. Tatchell, Mol. Cell. Biol. 11:3369-3373, 1991). These data suggest that SLK1 plays a role in both cell morphogenesis and the control of cell growth. We speculate that SLK1 may be a regulatory link for these two cellular processes.

    View details for Web of Science ID A1992HE83800026

    View details for PubMedID 1545797

  • THE NUF1 GENE ENCODES AN ESSENTIAL COILED-COIL RELATED PROTEIN THAT IS A POTENTIAL COMPONENT OF THE YEAST NUCLEOSKELETON JOURNAL OF CELL BIOLOGY Mirzayan, C., Copeland, C. S., Snyder, M. 1992; 116 (6): 1319-1332

    Abstract

    In an attempt to identify structural components of the yeast nucleus, subcellular fractions of yeast nuclei were prepared and used as immunogens to generate complex polyclonal antibodies. One such serum was used to screen a yeast genomic lambda gt11 expression library. A clone encoding a gene called NUF1 (for nuclear filament-related) was identified and extensively characterized. Antibodies to NUF1 fusion proteins were generated, and affinity-purified antibodies were used for immunoblot analysis and indirect immunofluorescence localization. The NUF1 protein is 110 kD in molecular mass and localizes to the yeast nucleus in small granular patches. Intranuclear staining is present in cells at all stages of the cell cycle. The NUF1 protein of yeast is tightly associated with the nucleus; it was not removed by extraction of nuclei with nonionic detergent or salt, or treatment with RNAse and DNAse. Sequence analysis of the NUF1 gene predicts a protein 945 amino acids in length that contains three domains: a large 627 residue central domain predicted to form a coiled-coil structure flanked by nonhelical amino-terminal and carboxy-terminal regions. Disruption of the NUF1 gene indicates that it is necessary for yeast cell growth. These results indicate that NUF1 encodes an essential coiled-coil protein within the yeast nucleus; we speculate that NUF1 is a component of the yeast nucleoskeleton. In addition, immunofluorescence results indicate that mammalian cells contain a NUF1-related nuclear protein. These data in conjunction with those in the accompanying manuscript (Yang et al., 1992) lead to the hypothesis that an internal coiled-coil filamentous system may be a general structural component of the eukaryotic nucleus.

    View details for Web of Science ID A1992HH74900002

    View details for PubMedID 1541631

  • Cell polarity and morphogenesis in Saccharomyces cerevisiae. Trends in cell biology Madden, K., Costigan, C., Snyder, M. 1992; 2 (1): 22-29

    Abstract

    Polarized cell growth and division are fundamental to cellular differentiation and tissue formation in eukaryotes. Analysis of cell polarity in the budding yeast Saccharomyces cerevisiae has allowed the identification of many regulatory, secretory and cytoskeletal components involved in these processes, as well as the elucidation of various steps in these events. Many of these components and processes may be similar in other eukaryotes.

    View details for PubMedID 14731634

  • THE KNS1 GENE OF SACCHAROMYCES-CEREVISIAE ENCODES A NONESSENTIAL PROTEIN-KINASE HOMOLOG THAT IS DISTANTLY RELATED TO MEMBERS OF THE CDC28/CDC2 GENE FAMILY MOLECULAR & GENERAL GENETICS Padmanabha, R., GEHRUNG, S., Snyder, M. 1991; 229 (1): 1-9

    Abstract

    A novel protein kinase homologue (KNS1) has been identified in Saccharomyces cerevisiae. KNS1 contains an open reading frame of 720 codons. The carboxy-terminal portion of the predicted protein sequence is similar to that of many other protein kinases, exhibiting 36% identity to the cdc2 gene product of Schizosaccharomyces pombe and 34% identity to the CDC28 gene product of S. cerevisiae. Deletion mutations were constructed in the KNS1 gene. kns1 mutants grow at the same rate as wild-type cells using several different carbon sources. They mate at normal efficiencies, and they sporulate successfully. No defects were found in entry into or exit from stationary phase. Thus, the KNS1 gene is not essential for cell growth and a variety of other cellular processes in yeast.

    View details for Web of Science ID A1991GF17600001

    View details for PubMedID 1910150

  • STUDIES CONCERNING THE TEMPORAL AND GENETIC-CONTROL OF CELL POLARITY IN SACCHAROMYCES-CEREVISIAE JOURNAL OF CELL BIOLOGY Snyder, M., GEHRUNG, S., Page, B. D. 1991; 114 (3): 515-532

    Abstract

    The establishment of cell polarity was examined in the budding yeast, S. cerevisiae. The distribution of a polarized protein, the SPA2 protein, was followed throughout the yeast cell cycle using synchronized cells and cdc mutants. The SPA2 protein localizes to a patch at the presumptive bud site of G1 cells. Later it concentrates at the bud tip in budded cells. At cytokinesis, the SPA2 protein is at the neck between the mother and daughter cells. Analysis of unbudded haploid cells has suggested a series of events that occurs during G1. The SPA2 patch is established very early in G1, while the spindle pole body residues on the distal side of the nucleus. Later, microtubules emanating from the spindle pole body intersect the SPA2 crescent, and the nucleus probably rotates towards the SPA2 patch. By middle G1, most cells contain the SPB on the side of the nucleus proximal to the SPA2 patch, and a long extranuclear microtubule bundle intersects this patch. We suggest that a microtubule capture site exists in the SPA2 staining region that stabilizes the long microtubule bundle; this capture site may be responsible for rotation of the nucleus. Cells containing a polarized distribution of the SPA2 protein also possess a polarized distribution of actin spots in the same region, although the actin staining is much more diffuse. Moreover, cdc4 mutants, which form multiple buds at the restrictive temperature, exhibit simultaneous staining of the SPA2 protein and actin spots in a subset of the bud tips. spa2 mutants contain a polarized distribution of actin spots, and act1-1 and act1-2 mutants often contain a polarized distribution of the SPA2 protein suggesting that the SPA2 protein is not required for localization of the actin spots and the actin spots are not required for localization of the SPA2 protein. cdc24 mutants, which fail to form buds at the restrictive temperature, fail to exhibit polarized localization of the SPA2 protein and actin spots, indicating that the CDC24 protein is directly or indirectly responsible for controlling the polarity of these proteins. Based on the cell cycle distribution of the SPA2 protein, a "cytokinesis tag" model is proposed to explain the mechanism of the non-random positioning of bud sites in haploid yeast cells.

    View details for Web of Science ID A1991FY14900012

    View details for PubMedID 1860883

  • GLUCOSE INDUCES CAMP-INDEPENDENT GROWTH-RELATED CHANGES IN STATIONARY-PHASE CELLS OF SACCHAROMYCES-CEREVISIAE PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Granot, D., Snyder, M. 1991; 88 (13): 5724-5728

    Abstract

    Nutrients play a critical role in the decision to initiate a new cell cycle. Addition of nutrients to arrested cells such as stationary-phase cells and spores induces them to begin growth. We have analyzed the nutrients required to induce early cellular events in yeast. When stationary-phase cells or spores are incubated in the presence of only glucose, morphological and physiological changes characteristic of mitotically growing cells are induced and, in the absence of additional nutrients to support growth, the cells rapidly lose viability. Preincubation of stationary-phase cells in the presence of glucose decreases the time required to reach bud emergence upon the subsequent addition of rich medium. These processes are specifically induced by D-glucose and not by other components such as nitrogen source or L-glucose. The glucose-induced events are independent of the adenylate cyclase pathway, since strains with a temperature-sensitive mutation in either the adenylate cyclase gene (CDC35) or its regulator (CDC25) undergo glucose-induced cellular changes when incubated at the restrictive temperature. We suggest that glucose triggers events in the induction of a new mitotic cell cycle and that these events are either prior to the adenylate cyclase pathway or are in an alternative pathway.

    View details for Web of Science ID A1991FU90100051

    View details for PubMedID 1648229

  • SEGREGATION OF THE NUCLEOLUS DURING MITOSIS IN BUDDING AND FISSION YEAST CELL MOTILITY AND THE CYTOSKELETON Granot, D., Snyder, M. 1991; 20 (1): 47-54

    Abstract

    The segregation of the nucleolus during mitosis was examined in Saccharomyces cerevisiae and Schizosaccharomyces pombe by indirect immunofluorescence using antibodies directed to highly conserved anti-nucleolus antigens. In mitotic S. pombe cells, the nucleolus appears to trail the bulk of the DNA. In wild-type cells of S. cerevisiae, the nucleolus segregates alongside the bulk of the genomic DNA. Based on its distance from the centromere, we would expect the rDNA in both organisms to segregate behind the majority of the genomic DNA, if telomeric regions trail centromeric regions as in other eukaryotes. We therefore suggest that in S. cerevisiae the nucleolus is attached to other parts of the nucleus which enable it to segregate along with the bulk of the DNA. The segregation of the nucleolus in topoisomerase mutants and nuclear division mutants of S. cerevisiae was also investigated. In cdc14 mutants which arrest at late anaphase, the vast majority of the DNA is separated, but the nucleolar antigens remain extended between the mother and daughter cells. Thus, the CDC14 gene of S. cerevisiae appears to be important for the separation of the nucleolus at mitosis.

    View details for Web of Science ID A1991GD91400005

    View details for PubMedID 1661641

  • THE SPA2 GENE OF SACCHAROMYCES-CEREVISIAE IS IMPORTANT FOR PHEROMONE-INDUCED MORPHOGENESIS AND EFFICIENT MATING JOURNAL OF CELL BIOLOGY GEHRUNG, S., Snyder, M. 1990; 111 (4): 1451-1464

    Abstract

    Upon exposure to mating pheromone, Saccharomyces cerevisiae undergoes cellular differentiation to form a morphologically distinct cell called a "shmoo". Double staining experiments revealed that both the SPA2 protein and actin localize to the shmoo tip which is the site of polarized cell growth. Actin concentrates as spots throughout the shmoo projection, while SPA2 localizes as a sharp patch at the shmoo tip. DNA sequence analysis of the SPA2 gene revealed an open reading frame 1,466 codons in length; the predicted protein sequence contains many internal repeats including a nine amino acid sequence that is imperfectly repeated 25 times. Portions of the SPA2 sequence exhibit a low-level similarity to proteins containing coiled-coil structures. Yeast cells containing a large deletion of the SPA2 gene are similar in growth rate to wild-type cells. However, spa2 mutant cells are impaired in their ability to form shmoos upon exposure to mating pheromone, and they do not mate efficiently with other spa2 mutant cells. Thus, we suggest that the SPA2 protein plays a critical role in cellular morphogenesis during mating, perhaps as a cytoskeletal protein.

    View details for Web of Science ID A1990EA35400012

    View details for PubMedID 2211820

  • HIGHER-ORDER STRUCTURE IS PRESENT IN THE YEAST NUCLEUS - AUTOANTIBODY PROBES DEMONSTRATE THAT THE NUCLEOLUS LIES OPPOSITE THE SPINDLE POLE BODY CHROMOSOMA Yang, C. H., Lambie, E. J., Hardin, J., Craft, J., Snyder, M. 1989; 98 (2): 123-128

    Abstract

    A panel of sera from 892 autoimmune patients was screened by indirect immunofluorescence on mammalian cells. Seventy-three sera were identified that recognize the nucleolus. Three of these sera appear to stain the nucleolus in yeast, suggesting that they recognize highly conserved antigens. These three sera also immunoprecipitate mammalian U3 snRNA-containing particles, which reside in the nucleolus and have been implicated in rRNA processing. Double immunofluorescence experiments with anti-nucleolus and anti-tubulin antibodies revealed a novel form of non-random nuclear organization in yeast. The spindle pole body and the nucleolus-both of which are associated with the nuclear envelope-preferentially localize at opposite ends of the nucleus. Organization of these and other components into specific regions of the nucleus may be important for optimizing their proper function.

    View details for Web of Science ID A1989AK25300008

    View details for PubMedID 2673672

  • THE SPA2 PROTEIN OF YEAST LOCALIZES TO SITES OF CELL-GROWTH JOURNAL OF CELL BIOLOGY Snyder, M. 1989; 108 (4): 1419-1429

    Abstract

    A yeast gene, SPA2, was isolated with human anti-spindle pole autoantibodies. The SPA2 gene was fused to the Escherichia coli trpE gene, and polyclonal antibodies were prepared to the fusion protein. Immunofluorescence experiments indicate that the SPA2 gene product has a sharply polarized distribution in yeast cells. In budded cells the SPA2 protein is present at the tip of the bud; in unbudded cells, it is localized to one edge of the cell. When a-cells are induced to form schmoos with alpha-factor, the SPA2 protein is found at the tip of the schmoo. These areas of SPA2 localization correspond to cellular sites expected to be involved in bud formation and/or cell growth. The SPA2 antigen is present in a-cells, alpha-cells, and a/alpha-diploid cells, but is absent in mutant cells in which the SPA2 gene has been disrupted. spa2 mutant cells are viable, but display defects in the direction and control of cell growth. Compared to wild-type cells, spa2 mutant cells have slightly altered budding patterns. Entry into stationary phase is impaired for spa2 mutants, and mutants with one particular allele, spa2-7, form multiple buds under nutrient-limiting conditions. Thus, SPA2 is a newly identified yeast gene that is involved in the direction and control of cell division, and whose gene product localizes to the site of cell growth.

    View details for Web of Science ID A1989T953300022

    View details for PubMedID 2647769

  • GENOMIC ORGANIZATION OF TRANSFER-RNA AND AMINOACYL-TRNA SYNTHETASE GENES FOR 2 AMINO-ACIDS IN SACCHAROMYCES-CEREVISIAE GENOMICS Kolman, C. J., Snyder, M., Soll, D. 1988; 3 (3): 201-206

    Abstract

    The genomic organization in Saccharomyces cerevisiae of the tRNA and aminoacyl-tRNA synthetase genes for two amino acids was investigated. Aspartic acid and serine were chosen for the study because of the number and diversity of their tRNA gene sequences and the availability of cloned tRNA and aminoacyl-tRNA synthetase genes. Chromosome assignments were determined by hybridization to DNA gel blots of chromosomal DNA resolved by contour-clamped homogeneous electric field gel electrophoresis. Our results show that the tRNA and the cognate synthetase genes in such a family are dispersed and, therefore, cannot be regulated via a mechanism dependent on close proximity of genes. In general, the genome of S. cerevisiae contains randomly dispersed tRNA genes that are transcribed individually. We have supported and expanded this view by applying the facile method of contour-clamped homogeneous electric field gel electrophoresis to the investigation of these small multigene families.

    View details for Web of Science ID A1988R066400004

    View details for PubMedID 3066745

Conference Proceedings


  • A metadata framework for interoperating heterogeneous genome data using XML Cheung, K. H., Deshpande, A. M., Tosches, N., Nath, S., Agrawal, A., Miller, P., Kumar, A., Snyder, M. BMJ PUBLISHING GROUP. 2001: 110-114

    Abstract

    The rapid advances in the Human Genome Project and genomic technologies have produced massive amounts of data populated in a large number of network-accessible databases. These technological advances and the associated data can have a great impact on biomedicine and healthcare. To answer many of the biologically or medically important questions, researchers often need to integrate data from a number of independent but related genome databases. One common practice is to download data sets (text files) from various genome Web sites and process them by some local programs. One main problem with this approach is that these programs are written on a case-by-case basis because the data sets involved are heterogeneous in structure. To address this problem, we define metadata that maps these heterogeneously structured files into a common eXtensible Markup Language (XML) structure to facilitate data interoperation. We illustrate this approach by interoperating two sets of essential yeast genes that are stored in two yeast genome databases (MIPS and YPD).

    View details for Web of Science ID 000172263400024

    View details for PubMedID 11825164

  • Graphically-enabled integration of bioinformatics tools allowing parallel execution Cheung, K. H., Miller, P., Sherman, A., Weston, S., Stratmann, E., Schultz, M., Snyder, M., Kumar, A. HANLEY & BELFUS INC. 2000: 141-145

    Abstract

    Rapid analysis of large amounts of genomic data is of great biological as well as medical interest. This type of analysis will greatly benefit from the ability to rapidly assemble a set of related analysis programs and to exploit the power of parallel computing. TurboGenomics, which is a software package currently in its alpha-testing phase, allows integration of heterogeneous software components to be done graphically. In addition, the tool is capable of making the integrated components run in parallel. To demonstrate these abilities, we use the tool to develop a Web-based application that allows integrated access to a set of large-scale sequence data analysis programs used by a transposon-insertion based yeast genome project. We also contrast the differences in building such an application with and without using the TurboGenomics software.

    View details for Web of Science ID 000170207500030

    View details for PubMedID 11079861