Mike Cherry's Profile | Stanford Profiles

Academic Appointments

Emeritus Faculty, Acad Council, Genetics
Member, Bio-X

Administrative Appointments

Advisory Board, WormBase, Caenorhabditis community database (2002 - 2018)
Advisory Board, The Blueprint Initiative, Interaction Database (2003 - 2005)
Advisory Board, dictyBase, Dictystelium discoideum community database (2003 - 2008)
Advisory Board, EcoCyc, E. coli genetics and pathway resource (2004 - 2008)
Advisory Board, TIGR Rice Genome Annotation Project (2004 - 2010)
External Consultants Panel, NHGRI ENCODE and modENCODE projects (2005 - 2011)
Member, Academic Council Committee on Libraries (C-LIB) (2008 - 2010)
Advisory Board, FlyBase, Drosophila Knowledgebase (2008 - 2018)
Executive Board, International Society of Biocuration (2010 - 2015)
Member, Committee on Academic Computing and Information Systems (C-ACIS) (2012 - 2015)
President, International Society of Biocuration (2015 - 2016)
Chair, Committee on Academic Computing and Information Systems (C-ACIS) (2015 - 2018)
External Scientific Advisers, 4D-Nucleome Common Fund Project (2015 - 2019)
Advisory Board, FaceBase, Comprehensive data and resources for craniofacial researchers. USC (2015 - 2024)
Advisory Board, Laboratory of Neuro Imaging Resource (LONIR), USC (2015 - 2024)
Advisory Board, XenBase, Xenopus Knowledgebase, Univ of Calgary (2016 - 2024)
Advisory Board, GlyGen, Glycoscience Informatics Resource, George Washington University (2020 - 2024)
Advisory Board, ZFIN, Zebrafish Genome Database, Univ of Oregon (2021 - 2024)
Member, Committee on Academic Computing and Information Systems (C-ACIS) (2024 - Present)

Honors & Awards

Ira Herskowitz Award, Genetics Society of America (August 2018)

Professional Education

Ph.D., University of California, Molecular Biology (1985)
B.S., Purdue University, Biochemistry (1979)
B.S., Purdue University, Biological Sciences (1979)

Community and International Work

Stanford at The Tech, San Jose

Topic

Public Understanding of Genetics

Partnering Organization(s)

The Tech Museum of San Jose

Location

Bay Area

Ongoing Project

Yes

Opportunities for Student Involvement

Yes

Current Research and Scholarly Interests

The Cherry lab is involved in identifying, validating and integrating scientific information into encyclopedic databases essential for investigation as well as scientific education. Published results of scientific experimentation are a foundation of our understanding of the natural world and provide motivation for new experiments. The combination of in-depth understanding reported in the literature with computational analyses is an essential ingredient of modern biological research. Mastery of the volumes of published literature requires comprehensive databases that provide the facts and underlying experimental data in publically accessible ways. Curation, extraction and sorting of factual experimental data from peer-reviewed journal articles is necessary to acquire these data from its source. Large quantitative datasets using global studies extend our knowledge of genes, their products and their interactions. By integrating quantitative datasets with curated focused experimental results creates unique comprehensive databases. My group creates such essential databases and makes them available to scientists and educators seeking to understand experimental results and to teach scientific knowledge.

The exploration of the genes and other important elements of a genome involve the use of previous results to aid the design of experiments that explore, for example, gene regulation, protein function, and interaction of these processes. New technologies are being applied to the determination of many molecular interactions of the components of chromosomes and the specific controls for the generation of the many cell types that create an organism from a single set of chromosomes. These methods create very large datasets that cannot be appreciated without computational methods and access to databases of scientific results.

The Cherry lab specializes in designing and managing a public database of information for the budding yeast Saccharomyces cerevisiae and have recently begun applying my expertise to human genomic information. Our current projects address three areas of research: engineering for the design of databases and software for the effective integration of complex experimental results; defining standards for eukaryotic genomic data that measure reliability and quality; and developing vocabularies that enhance communication between researchers, and between computational resources. This research involves the collection and standardization of experimental results and the detailed descriptions of these data into complex biological models, application of flexible search and retrieval tools, distribution of the integrated information for the acceleration of discovery.

Three major bioinformatics resources funded by the National Institutes of Health are provided by the lab. The Saccharomyces Genome Database project is the foremost database on a single organism. It is the archetype of all such databases because of its high quality, rich design, completeness, easy of use, and facilitation of scientific discovery. The Gene Ontology Consortium invented a structured vocabulary for the specification and description of gene function, their involvement in biological processes and their location within subcellular complexes and components. This innovative knowledgebase has unified biological nomenclature and is crucial for the analysis of biological results. The ENCODE Data Coordination Center provides an essential component for the analysis and use of large-scale studies of the human genome. Our work specifies the accurate and complete submission of human genomic experimental results, verifies the data quality, specifies and compiles the dataset experimental details, integrates data with existing human genome databases, distributed these results with its analyses via a portal that serves the diverse biomedical research community of skilled bioinformaticists, biologists, and educators.

2025-26 Courses

Independent Studies (7)
- Biomedical Informatics Teaching Methods
  BMDS 295 (Aut, Win, Spr)
- Directed Reading
  BMDS 299 (Aut, Win, Spr)
- Directed Reading in Genetics
  GENE 299 (Aut, Win, Spr, Sum)
- Graduate Research
  GENE 399 (Aut, Win, Spr, Sum)
- Medical Scholars Research
  BMDS 370 (Aut, Win, Spr)
- Medical Scholars Research
  GENE 370 (Aut, Win, Spr, Sum)
- Supervised Study
  GENE 260 (Aut, Win, Spr, Sum)

Graduate and Fellowship Programs

Genetics (Phd Program)

All Publications

The Gene Ontology knowledgebase in 2026 NUCLEIC ACIDS RESEARCH Aleksander, S. A., Balhoff, J. P., Carbon, S., Cherry, J., Ebert, D., Feuermann, M., Gaudet, P., Harris, N. L., Hill, D. P., Kalita, P., Lee, R., Mi, H., Moxon, S., Mungall, C. J., Muruganujan, A., Mushayahama, T., Sternberg, P. W., Thomas, P. D., Van Auken, K., Wong, E. D., Wood, V., Ramsey, J., Siegele, D. A., Chisholm, R. L., Dodson, R., Fey, P., Aspromonte, M., Nugnes, M., Naser, X., Tosatto, S. C. E., Giglio, M., Nadendla, S., Antonazzo, G., Attrill, H., Brown, N. H., Dos Santos, G., Marygold, S., Roper, K., Strelets, V., Tabone, C. J., Thurmond, J., Zhou, P., Zaru, R., Lovering, R. C., Logie, C., Chen, D., Naba, A., Christie, K., Corbani, L., Ni, L., Sitnikov, D., Smith, C., Seager, J., Cooper, L., Elser, J., Jaiswal, P., Gupta, P., Naithani, S., Carme, P., Rutherford, K., De Pons, J. L., Dwinell, M. R., Hayman, G., Kaldunski, M. L., Kwitek, A. E., Laulederkind, S. J. F., Tutaj, M. A., Vedi, M., Wang, S., D'eustachio, P., Aimo, L., Axelsen, K., Bridge, A., Hyka-Nouspikel, N., Morgat, A., Goldbold, G., Engel, S. R., Miyasato, S. R., Nash, R. S., Sherlock, G., Weng, S., Bakker, E., Berardini, T. Z., Reiser, L., Auchincloss, A., Argoud-Puy, G., Blatter, M., Boutet, E., Breuza, L., Casals-Casas, C., Coudert, E., Estreicher, A., Famiglietti, M., Gos, A., Gruaz-Gumowski, N., Hulo, C., Jungo, F., Mercier, P., Lieberherr, D., Masson, P., Pedruzzi, I., Pourcel, L., Poux, S., Rivoire, C., Sundaram, S., Bateman, A., Adesina, A., Bowler-Barnett, E., Carpentier, D., Denny, P., Ignatchenko, A., Ishtiaq, R., Lock, A., Lussi, Y., Magrane, M., Martin, M. J., Orchard, S., Raposo, P., Speretta, E., Tyagi, N., Urakova, N., Warner, K., Yu, C., Chan, J., Diamantakis, S., Quinton-Tulloch, M., Raciti, D., Fisher, M., James-Zorn, C., Ponferrada, V., Zorn, A., Howe, D., Ramachandran, S., Ruzicka, L., Westerfield, M. 2025

Abstract

The Gene Ontology (GO) knowledgebase (https://geneontology.org) is a comprehensive resource describing the functions of genes. The GO knowledgebase is regularly updated and improved. We describe here the major updates that have been made in the past 3 years. The ontology and annotations have been expanded and revised, particularly in several areas of biology: cellular metabolism, multi-organism interactions (e.g. host-pathogen), extracellular matrix proteins, chromatin remodeling (e.g. the "histone code"), and noncoding RNA functions. We have released version 2 of a comprehensive set of integrated, reviewed annotations for human genes, which we call the "functionome." We have also dramatically increased the number of GO-CAM models, with over 1500 models of metabolic and signaling pathways, primarily in human, mouse, budding and fission yeast, and fruit fly. Finally, we discuss our current recommendations and future prospects of AI in the use and development of GO.

View details for DOI 10.1093/nar/gkaf1292

View details for Web of Science ID 001642129700001

View details for PubMedID 41413728
The IGVF catalog-from genetic variation to function. Nucleic acids research Li, D., Liu, S., Assis, P. R., Li, M., Dong, S., Whaling, I., Jolanki, O., Kagda, M., Zhang, W., Macias-Velasco, J. F., Liu, T., Cody, S., Antonacci-Fulton, L., Huang, Y., Liu, J., Montgomery, M. T., Zeiberg, D., Jain, S., Pejaver, V., Bergquist, T., Chen, Y., Radivojac, P., Gersbach, C. A., Sherpa, R. N., Castro, C. P., Boyle, A. P., Starita, L. M., Fowler, D. M., Ahituv, N., Dey, K. K., Majoros, W. H., Reddy, T. E., Craven, M., Sinha, R., Sverchkov, Y., Cai, X., Nzima, M. Z., Calderwood, M. A., Rozowsky, J., Gerstein, M., Ma, J., Yue, F., Cherry, J. M., Love, M. I., Engreitz, J. M., Hitz, B. C., Wang, T. 2025

Abstract

Genomic variation between individuals is essential for understanding how differences in the genome sequence affect molecular and cellular processes. The Impact of Genomic Variation on Function (IGVF) Consortium aims to uncover the relationships among genomic variation, genome function, and phenotypes by combining experimental techniques, such as single-cell mapping and genomic perturbation assays, with computational approaches such as machine learning-based predictive modeling. The IGVF Data and Administrative Coordinating Centers collect, analyze, and disseminate data and results from across the consortium through an open-source platform called the IGVF Catalog. This resource includes, but is not limited to, data on the effects of coding variants on protein abundance and function, noncoding variants on enhancer activity (measured by MPRA or predicted computationally), and associations between variants and quantitative traits. All data are organized within a graph database comprising over 50 types of data collections with nearly 3 billion nodes and over 7.5 billion edges. The Catalog offers public API endpoints (https://api.catalogkg.igvf.org/) and a user-friendly interface for exploring, querying, and visualizing the data at https://catalog.igvf.org. We expect that this open-access platform will support the broader scientific community to advance our understanding of how genomic variation influences biology and disease.

View details for DOI 10.1093/nar/gkaf1341

View details for PubMedID 41359121
Data navigation on the ENCODE portal. Nature communications Kagda, M. S., Lam, B., Litton, C., Small, C., Sloan, C. A., Spragins, E., Tanaka, F., Whaling, I., Gabdank, I., Youngworth, I., Strattan, J. S., Hilton, J., Jou, J., Au, J., Lee, J. W., Andreeva, K., Graham, K., Lin, K., Simison, M., Jolanki, O., Sud, P., Assis, P., Adenekan, P., Miyasato, S., Zhong, W., Luo, Y., Myers, Z., Cherry, J. M., Hitz, B. C. 2025; 16 (1): 9592

Abstract

Spanning two decades, the collaborative ENCODE project aims to identify all the functional elements within human and mouse genomes. To best serve the scientific community, the comprehensive ENCODE data including results from 23,000+ functional genomics experiments, 800+ functional elements characterization experiments and 60,000+ results from integrative computational analyses are available on an open-access data-portal ( https://www.encodeproject.org/ ). The final phase of the project includes data from several novel assays aimed at characterization and validation of genomic elements. In addition to developing and maintaining the data portal, the Data Coordination Center (DCC) implemented and utilised uniform processing pipelines to generate uniformly processed data. Here we report recent updates to the data portal including a redesigned home page, an improved search interface, new custom-designed pages highlighting biologically related datasets and an enhanced cart interface for data visualisation plus user-friendly data download options. A summary of data generated using uniform processing pipelines is also provided.

View details for DOI 10.1038/s41467-025-64343-9

View details for PubMedID 41168159

View details for PubMedCentralID 5389787
Gene Spatial Integration: enhancing spatial transcriptomics analysis via deep learning and batch effect mitigation. Bioinformatics (Oxford, England) Pratama, R., Hilton, J., Cherry, J. M., Song, G. 2025

Abstract

Spatial transcriptomics (ST) is a groundbreaking technique for studying the correlation between cellular organization within a tissue and their physiological and pathological properties. Every facet of spatial information, including cell/spot proximity, distribution, and dimensionality, is significant. Most methods lean heavily on proximity for ST analysis, each resulting in useful insights but still leaving other aspects untapped. In addition, samples procured at different times, different donors, and by different technologies introduce a batch effects problem that hinders the statistical approach employed by most analysis tools. Addressing these challenges, we have developed a deep learning method for analyzing integrated multiple ST data, focusing on the distribution aspect. Furthermore, our method aims to leverage single-cell analysis tools.Our study introduces Gene Spatial Integration (GSI), a data integration pipeline utilizing representation learning approach to extract spatial distribution of genes into the same feature space as gene expression features. We employ Autoencoder network to extract spatial embedding, facilitating the projection of spatial features into gene expression feature space. Our approach allows for seamless integration of multiple samples with minimum detriment, increasing the performance of the ST data analysis tool. We show application of our method on human DLPFC dataset. Our method consistently improves the performance of the clustering of Seurat tools, with the most significant increase observed in sample 151673, almost doubling the ARI score from 0.225 to 0.405. We also combine our pipeline with the clustering of GraphST, achieving a significantly higher ARI score in sample 151672 from 0.614 to 0.795. This result reveals the potential of gene distribution spatial aspect, also emphasizes the impact of integration and batch effect removal in developing a refined analysis in understanding tissue characteristics.Implementation of GSI is accessible at https://github.com/Riandanis/Spatial_Integration_GSI.Supplementary data are available at Bioinformatics online.

View details for DOI 10.1093/bioinformatics/btaf350

View details for PubMedID 40511994
CZ CELLxGENE Discover: a single-cell data platform for scalable exploration, analysis and modeling of aggregated data. Nucleic acids research Abdulla, S., Aevermann, B., Assis, P., Badajoz, S., Bell, S. M., Bezzi, E., Cakir, B., Chaffer, J., Chambers, S., Cherry, J. M., Chi, T., Chien, J., Dorman, L., Garcia-Nieto, P., Gloria, N., Hastie, M., Hegeman, D., Hilton, J., Huang, T., Infeld, A., Istrate, A. M., Jelic, I., Katsuya, K., Kim, Y. J., Liang, K., Lin, M., Lombardo, M., Marshall, B., Martin, B., McDade, F., Megill, C., Patel, N., Predeus, A., Raymor, B., Robatmili, B., Rogers, D., Rutherford, E., Sadgat, D., Shin, A., Small, C., Smith, T., Sridharan, P., Tarashansky, A., Tavares, N., Thomas, H., Tolopko, A., Urisko, M., Yan, J., Yeretssian, G., Zamanian, J., Mani, A., Cool, J., Carr, A. 2024

Abstract

Hundreds of millions of single cells have been analyzed using high-throughput transcriptomic methods. The cumulative knowledge within these datasets provides an exciting opportunity for unlocking insights into health and disease at the level of single cells. Meta-analyses that span diverse datasets building on recent advances in large language models and other machine-learning approaches pose exciting new directions to model and extract insight from single-cell data. Despite the promise of these and emerging analytical tools for analyzing large amounts of data, the sheer number of datasets, data models and accessibility remains a challenge. Here, we present CZ CELLxGENE Discover (cellxgene.cziscience.com), a data platform that provides curated and interoperable single-cell data. Available via a free-to-use online data portal, CZ CELLxGENE hosts a growing corpus of community-contributed data of over 93 million unique cells. Curated, standardized and associated with consistent cell-level metadata, this collection of single-cell transcriptomic data is the largest of its kind and growing rapidly via community contributions. A suite of tools and features enables accessibility and reusability of the data via both computational and visual interfaces to allow researchers to explore individual datasets, perform cross-corpus analysis, and run meta-analyses of tens of millions of cells across studies and tissues at the resolution of single cells.

View details for DOI 10.1093/nar/gkae1142

View details for PubMedID 39607691
Saccharomyces Genome Database: Advances in Genome Annotation, Expanded Biochemical Pathways, and Other Key Enhancements. Genetics Engel, S. R., Aleksander, S., Nash, R. S., Wong, E. D., Weng, S., Miyasato, S. R., Sherlock, G., Cherry, J. M. 2024

Abstract

Budding yeast (Saccharomyces cerevisiae) is the most extensively characterized eukaryotic model organism and has long been used to gain insight into the fundamentals of genetics, cellular biology, and the functions of specific genes and proteins. The Saccharomyces Genome Database (SGD) is a scientific resource that provides information about the genome and biology of S. cerevisiae. For more than 30 years, SGD has maintained the genetic nomenclature, chromosome maps, and functional annotation for budding yeast along with search and analysis tools to explore these data. Here we describe recent updates at SGD, including the two most recent reference genome annotation updates, expanded biochemical pathways representation, changes to SGD search and data files, and other enhancements to the SGD website and user interface. These activities are part of our continuing effort to promote insights gained from yeast to enable the discovery of functional relationships between sequence and gene products in fungi and higher eukaryotes.

View details for DOI 10.1093/genetics/iyae185

View details for PubMedID 39530598
Updates to the Alliance of Genome Resources central infrastructure GENETICS Aleksander, S. A., Anagnostopoulos, A. V., Antonazzo, G., Arnaboldi, V., Attrill, H., Becerra, A., Bello, S. M., Blodgett, O., Bradford, Y. M., Bult, C. J., Cain, S., Calvi, B. R., Carbon, S., Chan, J., Chen, W. J., Cherry, J., Cho, J., Crosby, M. A., De Pons, J. L., D'Eustachio, P., Diamantakis, S., Dolan, M. E., dos Santos, G., Dyer, S., Ebert, D., Engel, S. R., Fashena, D., Fisher, M., Foley, S., Gibson, A. C., Gollapally, V. R., Gramates, L., Grove, C. A., Hale, P., Harris, T., Hayman, G., Hu, Y., James-Zorn, C., Karimi, K., Karra, K., Kishore, R., Kwitek, A. E., Laulederkind, S. J. F., Lee, R., Longden, I., Luypaert, M., Markarian, N., Marygold, S. J., Matthews, B., McAndrews, M. S., Millburn, G., Miyasato, S., Motenko, H., Moxon, S., Muller, H., Mungall, C. J., Muruganujan, A., Mushayahama, T., Nash, R. S., Nuin, P., Paddock, H., Pells, T., Perrimon, N., Pich, C., Quinton-Tulloch, M., Raciti, D., Ramachandran, S., Richardson, J. E., Gelbart, S., Ruzicka, L., Schindelman, G., Shaw, D. R., Sherlock, G., Shrivatsav, A., Singer, A., Smith, C. M., Smith, C. L., Smith, J. R., Stein, L., Sternberg, P. W., Tabone, C. J., Thomas, P. D., Thorat, K., Thota, J., Tomczuk, M., Trovisco, V., Tutaj, M. A., Urbano, J., Van Auken, K., Van Slyke, C. E., Vize, P. D., Wang, Q., Weng, S., Westerfield, M., Wilming, L. G., Wong, E. D., Wright, A., Yook, K., Zhou, P., Zorn, A., Zytkovicz, M., Alliance Genome Resources Consortium 2024; 227 (1)

Abstract

The Alliance of Genome Resources (Alliance) is an extensible coalition of knowledgebases focused on the genetics and genomics of intensively studied model organisms. The Alliance is organized as individual knowledge centers with strong connections to their research communities and a centralized software infrastructure, discussed here. Model organisms currently represented in the Alliance are budding yeast, Caenorhabditis elegans, Drosophila, zebrafish, frog, laboratory mouse, laboratory rat, and the Gene Ontology Consortium. The project is in a rapid development phase to harmonize knowledge, store it, analyze it, and present it to the community through a web portal, direct downloads, and application programming interfaces (APIs). Here, we focus on developments over the last 2 years. Specifically, we added and enhanced tools for browsing the genome (JBrowse), downloading sequences, mining complex data (AllianceMine), visualizing pathways, full-text searching of the literature (Textpresso), and sequence similarity searching (SequenceServer). We enhanced existing interactive data tables and added an interactive table of paralogs to complement our representation of orthology. To support individual model organism communities, we implemented species-specific "landing pages" and will add disease-specific portals soon; in addition, we support a common community forum implemented in Discourse software. We describe our progress toward a central persistent database to support curation, the data modeling that underpins harmonization, and progress toward a state-of-the-art literature curation system with integrated artificial intelligence and machine learning (AI/ML).

View details for DOI 10.1093/genetics/iyae049

View details for Web of Science ID 001287647500001

View details for PubMedID 38552170

View details for PubMedCentralID PMC11075569
Annotating and prioritizing human non-coding variants with RegulomeDB v.2. Nature genetics Dong, S., Zhao, N., Spragins, E., Kagda, M. S., Li, M., Assis, P., Jolanki, O., Luo, Y., Cherry, J. M., Boyle, A. P., Hitz, B. C. 2023; 55 (5): 724-726

View details for DOI 10.1038/s41588-023-01365-3

View details for PubMedID 37173523

View details for PubMedCentralID 3431494
The Gene Ontology Knowledgebase in 2023. Genetics Gene Ontology Consortium, Aleksander, S. A., Balhoff, J., Carbon, S., Cherry, J. M., Drabkin, H. J., Ebert, D., Feuermann, M., Gaudet, P., Harris, N. L., Hill, D. P., Lee, R., Mi, H., Moxon, S., Mungall, C. J., Muruganugan, A., Mushayahama, T., Sternberg, P. W., Thomas, P. D., Van Auken, K., Ramsey, J., Siegele, D. A., Chisholm, R. L., Fey, P., Aspromonte, M. C., Nugnes, M. V., Quaglia, F., Tosatto, S., Giglio, M., Nadendla, S., Antonazzo, G., Attrill, H., Dos Santos, G., Marygold, S., Strelets, V., Tabone, C. J., Thurmond, J., Zhou, P., Ahmed, S. H., Asanitthong, P., Buitrago, D. L., Erdol, M. N., Gage, M. C., Kadhum, M. A., Li, K. Y., Long, M., Michalak, A., Pesala, A., Pritazahra, A., Saverimuttu, S. C., Su, R., Thurlow, K. E., Lovering, R. C., Logie, C., Oliferenko, S., Blake, J., Christie, K., Corbani, L., Dolan, M. E., Drabkin, H. J., Hill, D. P., Ni, L., Sitnikov, D., Smith, C., Cuzick, A., Seager, J., Cooper, L., Elser, J., Jaiswal, P., Gupta, P., Jaiswal, P., Naithani, S., Lera-Ramirez, M., Rutherford, K., Wood, V., De Pons, J. L., Dwinell, M. R., Hayman, G. T., Kaldunski, M. L., Kwitek, A. E., Laulederkind, S. J., Tutaj, M. A., Vedi, M., Wang, S., D'Eustachio, P., Aimo, L., Axelsen, K., Bridge, A., Hyka-Nouspikel, N., Morgat, A., Aleksander, S. A., Cherry, J. M., Engel, S. R., Karra, K., Miyasato, S. R., Nash, R. S., Skrzypek, M. S., Weng, S., Wong, E. D., Bakker, E., Berardini, T. Z., Reiser, L., Auchincloss, A., Axelsen, K., Argoud-Puy, G., Blatter, M., Boutet, E., Breuza, L., Bridge, A., Casals-Casas, C., Coudert, E., Estreicher, A., Famiglietti, M. L., Feuermann, M., Gos, A., Gruaz-Gumowski, N., Hulo, C., Hyka-Nouspikel, N., Jungo, F., Le Mercier, P., Lieberherr, D., Masson, P., Morgat, A., Pedruzzi, I., Pourcel, L., Poux, S., Rivoire, C., Sundaram, S., Bateman, A., Bowler-Barnett, E., Bye-A-Jee, H., Denny, P., Ignatchenko, A., Ishtiaq, R., Lock, A., Lussi, Y., Magrane, M., Martin, M. J., Orchard, S., Raposo, P., Speretta, E., Tyagi, N., Warner, K., Zaru, R., Diehl, A. D., Lee, R., Chan, J., Diamantakis, S., Raciti, D., Zarowiecki, M., Fisher, M., James-Zorn, C., Ponferrada, V., Zorn, A., Ramachandran, S., Ruzicka, L., Westerfield, M. 2023

Abstract

The Gene Ontology (GO) knowledgebase (http://geneontology.org) is a comprehensive resource concerning the functions of genes and gene products (proteins and non-coding RNAs). GO annotations cover genes from organisms across the tree of life as well as viruses, though most gene function knowledge currently derives from experiments carried out in a relatively small number of model organisms. Here, we provide an updated overview of the GO knowledgebase, as well as the efforts of the broad, international consortium of scientists that develops, maintains and updates the GO knowledgebase. The GO knowledgebase consists of three components: 1) the Gene Ontology - a computational knowledge structure describing functional characteristics of genes; 2) GO annotations - evidence-supported statements asserting that a specific gene product has a particular functional characteristic; and 3) GO Causal Activity Models (GO-CAMs) - mechanistic models of molecular "pathways" (GO biological processes) created by linking multiple GO annotations using defined relations. Each of these components is continually expanded, revised and updated in response to newly published discoveries, and receives extensive QA checks, reviews and user feedback. For each of these components, we provide a description of the current contents, recent developments to keep the knowledgebase up to date with new discoveries, as well as guidance on how users can best make use of the data we provide. We conclude with future directions for the project.

View details for DOI 10.1093/genetics/iyad031

View details for PubMedID 36866529
Dive into Epigenetics and Gene Regulation - Navigation using the ENCODE Portal Au, J. N., Gabdank, I., Luo, Y., Kagda, M., Lam, B., Youngworth, I., Adenekan, P., Baymuradov, U. K., Miyasato, S., Simison, M., Graham, K., Jolanki, O., Jou, J. P., Lee, J., Litton, C., Lin, K. Z., O'Neill, E., Sud, P., Tanaka, F., Strattan, J. S., Hitz, B. C., Cherry, J. M. SPRINGERNATURE. 2020: 744

View details for Web of Science ID 000598482602403
Incorporation of a unified protein abundance dataset into the Saccharomyces genome database. Database : the journal of biological databases and curation Nash, R. S., Weng, S. n., Karra, K. n., Wong, E. D., Engel, S. R., Cherry, J. M. 2020; 2020

Abstract

The identification and accurate quantitation of protein abundance has been a major objective of proteomics research. Abundance studies have the potential to provide users with data that can be used to gain a deeper understanding of protein function and regulation and can also help identify cellular pathways and modules that operate under various environmental stress conditions. One of the central missions of the Saccharomyces Genome Database (SGD; https://www.yeastgenome.org) is to work with researchers to identify and incorporate datasets of interest to the wider scientific community, thereby enabling hypothesis-driven research. A large number of studies have detailed efforts to generate proteome-wide abundance data, but deeper analyses of these data have been hampered by the inability to compare results between studies. Recently, a unified protein abundance dataset was generated through the evaluation of more than 20 abundance datasets, which were normalized and converted to common measurement units, in this case molecules per cell. We have incorporated these normalized protein abundance data and associated metadata into the SGD database, as well as the SGD YeastMine data warehouse, resulting in the addition of 56 487 values for untreated cells grown in either rich or defined media and 28 335 values for cells treated with environmental stressors. Abundance data for protein-coding genes are displayed in a sortable, filterable table on Protein pages, available through Locus Summary pages. A median abundance value was incorporated, and a median absolute deviation was calculated for each protein-coding gene and incorporated into SGD. These values are displayed in the Protein section of the Locus Summary page. The inclusion of these data has enhanced the quality and quantity of protein experimental information presented at SGD and provides opportunities for researchers to access and utilize the data to further their research.

View details for DOI 10.1093/database/baaa008

View details for PubMedID 32128557
The ENCODE Portal as an Epigenomics Resource. Current protocols in bioinformatics Jou, J., Gabdank, I., Luo, Y., Lin, K., Sud, P., Myers, Z., Hilton, J. A., Kagda, M. S., Lam, B., O'Neill, E., Adenekan, P., Graham, K., Baymuradov, U. K., R Miyasato, S., Strattan, J. S., Jolanki, O., Lee, J., Litton, C., Y Tanaka, F., Hitz, B. C., Cherry, J. M. 2019; 68 (1): e89

Abstract

The Encyclopedia of DNA Elements (ENCODE) web portal hosts genomic data generated by the ENCODE Consortium, Genomics of Gene Regulation, The NIH Roadmap Epigenomics Consortium, and the modENCODE and modERN projects. The goal of the ENCODE project is to build a comprehensive map of the functional elements of the human and mouse genomes. Currently, the portal database stores over 500 TB of raw and processed data from over 15,000 experiments spanning assays that measure gene expression, DNA accessibility, DNA and RNA binding, DNA methylation, and 3D chromatin structure across numerous cell lines, tissue types, and differentiation states with selected genetic and molecular perturbations. The ENCODE portal provides unrestricted access to the aforementioned data and relevant metadata as a service to the scientific community. The metadata model captures the details of the experiments, raw and processed data files, and processing pipelines in human and machine-readable form and enables the user to search for specific data either using a web browser or programmatically via REST API. Furthermore, ENCODE data can be freely visualized or downloaded for additional analyses. © 2019 The Authors. Basic Protocol: Query the portal Support Protocol 1: Batch downloading Support Protocol 2: Using the cart to download files Support Protocol 3: Visualize data Alternate Protocol: Query building and programmatic access.

View details for DOI 10.1002/cpbi.89

View details for PubMedID 31751002
Integration of macromolecular complex data into the Saccharomyces Genome Database. Database : the journal of biological databases and curation Wong, E. D., Skrzypek, M. S., Weng, S., Binkley, G., Meldal, B. H., Perfetto, L., Orchard, S. E., Engel, S. R., Cherry, J. M., SGD Project 2019; 2019

Abstract

Proteins seldom function individually. Instead, they interact with other proteins or nucleic acids to form stable macromolecular complexes that play key roles in important cellular processes and pathways. One of the goals of Saccharomyces Genome Database (SGD; www.yeastgenome.org) is to provide a complete picture of budding yeast biological processes. To this end, we have collaborated with the Molecular Interactions team that provides the Complex Portal database at EMBL-EBI to manually curate the complete yeast complexome. These data, from a total of 589 complexes, were previously available only in SGD's YeastMine data warehouse (yeastmine.yeastgenome.org) and the Complex Portal (www.ebi.ac.uk/complexportal). We have now incorporated these macromolecular complex data into the SGD core database and designed complex-specific reports to make these data easily available to researchers. These web pages contain referenced summaries focused on the composition and function of individual complexes. In addition, detailed information about how subunits interact within the complex, their stoichiometry and the physical structure are displayed when such information is available. Finally, we generate network diagrams displaying subunits and Gene Ontology annotations that are shared between complexes. Information on macromolecular complexes will continue to be updated in collaboration with the Complex Portal team and curated as more data become available.

View details for PubMedID 30715277
Curated protein information in the Saccharomyces genome database DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION Hellerstedt, S. T., Nash, R. S., Weng, S., Paskov, K. M., Wong, E. D., Karra, K., Engel, S. R., Cherry, J. M. 2017

Abstract

Due to recent advancements in the production of experimental proteomic data, the Saccharomyces genome database (SGD; www.yeastgenome.org ) has been expanding our protein curation activities to make new data types available to our users. Because of broad interest in post-translational modifications (PTM) and their importance to protein function and regulation, we have recently started incorporating expertly curated PTM information on individual protein pages. Here we also present the inclusion of new abundance and protein half-life data obtained from high-throughput proteome studies. These new data types have been included with the aim to facilitate cellular biology research.: www.yeastgenome.org.

View details for DOI 10.1093/database/bax011

View details for Web of Science ID 000397530600002

View details for PubMedID 28365727
Saccharomyces genome database informs human biology. Nucleic acids research Skrzypek, M. S., Nash, R. S., Wong, E. D., MacPherson, K. A., Hellerstedt, S. T., Engel, S. R., Karra, K. n., Weng, S. n., Sheppard, T. K., Binkley, G. n., Simison, M. n., Miyasato, S. R., Cherry, J. M. 2017

Abstract

The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is an expertly curated database of literature-derived functional information for the model organism budding yeast, Saccharomyces cerevisiae. SGD constantly strives to synergize new types of experimental data and bioinformatics predictions with existing data, and to organize them into a comprehensive and up-to-date information resource. The primary mission of SGD is to facilitate research into the biology of yeast and to provide this wealth of information to advance, in many ways, research on other organisms, even those as evolutionarily distant as humans. To build such a bridge between biological kingdoms, SGD is curating data regarding yeast-human complementation, in which a human gene can successfully replace the function of a yeast gene, and/or vice versa. These data are manually curated from published literature, made available for download, and incorporated into a variety of analysis tools provided by SGD.

View details for PubMedID 29140510
ENCODE data at the ENCODE portal. Nucleic acids research Sloan, C. A., Chan, E. T., Davidson, J. M., Malladi, V. S., Strattan, J. S., Hitz, B. C., Gabdank, I., Narayanan, A. K., Ho, M., Lee, B. T., Rowe, L. D., Dreszer, T. R., Roe, G., Podduturi, N. R., Tanaka, F., Hong, E. L., Cherry, J. M. 2016; 44 (D1): D726-32

Abstract

The Encyclopedia of DNA Elements (ENCODE) Project is in its third phase of creating a comprehensive catalog of functional elements in the human genome. This phase of the project includes an expansion of assays that measure diverse RNA populations, identify proteins that interact with RNA and DNA, probe regions of DNA hypersensitivity, and measure levels of DNA methylation in a wide range of cell and tissue types to identify putative regulatory elements. To date, results for almost 5000 experiments have been released for use by the scientific community. These data are available for searching, visualization and download at the new ENCODE Portal (www.encodeproject.org). The revamped ENCODE Portal provides new ways to browse and search the ENCODE data based on the metadata that describe the assays as well as summaries of the assays that focus on data provenance. In addition, it is a flexible platform that allows integration of genomic data from multiple projects. The portal experience was designed to improve access to ENCODE data by relying on metadata that allow reusability and reproducibility of the experiments.

View details for DOI 10.1093/nar/gkv1160

View details for PubMedID 26527727

View details for PubMedCentralID PMC4702836
Principles of metadata organization at the ENCODE data coordination center. Database : the journal of biological databases and curation Hong, E. L., Sloan, C. A., Chan, E. T., Davidson, J. M., Malladi, V. S., Strattan, J. S., Hitz, B. C., Gabdank, I., Narayanan, A. K., Ho, M., Lee, B. T., Rowe, L. D., Dreszer, T. R., Roe, G. R., Podduturi, N. R., Tanaka, F., Hilton, J. A., Cherry, J. M. 2016; 2016

Abstract

The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center (DCC) is responsible for organizing, describing and providing access to the diverse data generated by the ENCODE project. The description of these data, known as metadata, includes the biological sample used as input, the protocols and assays performed on these samples, the data files generated from the results and the computational methods used to analyze the data. Here, we outline the principles and philosophy used to define the ENCODE metadata in order to create a metadata standard that can be applied to diverse assays and multiple genomic projects. In addition, we present how the data are validated and used by the ENCODE DCC in creating the ENCODE Portal (https://www.encodeproject.org/). Database URL: www.encodeproject.org.

View details for DOI 10.1093/database/baw001

View details for PubMedID 26980513

View details for PubMedCentralID PMC4792520
From one to many: expanding the Saccharomyces cerevisiae reference genome panel. Database : the journal of biological databases and curation Engel, S. R., Weng, S., Binkley, G., Paskov, K., Song, G., Cherry, J. M. 2016; 2016

Abstract

In recent years, thousands of Saccharomyces cerevisiae genomes have been sequenced to varying degrees of completion. The Saccharomyces Genome Database (SGD) has long been the keeper of the original eukaryotic reference genome sequence, which was derived primarily from S. cerevisiae strain S288C. Because new technologies are pushing S. cerevisiae annotation past the limits of any system based exclusively on a single reference sequence, SGD is actively working to expand the original S. cerevisiae systematic reference sequence from a single genome to a multi-genome reference panel. We first commissioned the sequencing of additional genomes and their automated analysis using the AGAPE pipeline. Here we describe our curation strategy to produce manually reviewed high-quality genome annotations in order to elevate 11 of these additional genomes to Reference status. Database URL: http://www.yeastgenome.org/.

View details for DOI 10.1093/database/baw020

View details for PubMedID 26989152

View details for PubMedCentralID PMC4795930
The Saccharomyces Genome Database: Advanced Searching Methods and Data Mining. Cold Spring Harbor protocols Cherry, J. M. 2015; 2015 (12): pdb.prot088906

Abstract

At the core of the Saccharomyces Genome Database (SGD) are chromosomal features that encode a product. These include protein-coding genes and major noncoding RNA genes, such as tRNA and rRNA genes. The basic entry point into SGD is a gene or open-reading frame name that leads directly to the locus summary information page. A keyword describing function, phenotype, selective condition, or text from abstracts will also provide a door into the SGD. A DNA or protein sequence can be used to identify a gene or a chromosomal region using BLAST. Protein and DNA sequence identifiers, PubMed and NCBI IDs, author names, and function terms are also valid entry points. The information in SGD has been gathered and is maintained by a group of scientific biocurators and software developers who are devoted to providing researchers with up-to-date information from the published literature, connections to all the major research resources, and tools that allow the data to be explored. All the collected information cannot be represented or summarized for every possible question; therefore, it is necessary to be able to search the structured data in the database. This protocol describes the YeastMine tool, which provides an advanced search capability via an interactive tool. The SGD also archives results from microarray expression experiments, and a strategy designed to explore these data using the SPELL (Serial Pattern of Expression Levels Locator) tool is provided.

View details for DOI 10.1101/pdb.prot088906

View details for PubMedID 26631124

View details for PubMedCentralID PMC5673598
Integrative chromatin state annotation of 234 human ENCODE4 cell types using Segway. Genome research Farahbod, M., Diab, A., Sud, P., Kagda, M. S., Whaling, I., Foroozandeh, M., Goel, I., Daneshpajouh, H., Hitz, B., Cherry, J. M., Libbrecht, M. W. 2025

Abstract

The fourth and final phase of the ENCODE consortium has newly profiled epigenetic activity in hundreds of human tissues. Chromatin state annotations created by segmentation and genome annotation (SAGA) methods such as Segway have emerged as the predominant integrative summary of such data sets. Here, we present the ENCODE4 Catalog of Segway Annotations, a set of sample-specific genome-wide chromatin state annotations of 234 human biosamples inferred from 1,794 genomics experiments. This catalog identifies genomic elements, accurately captures cell type-specific regulatory patterns, and facilitates discovery of elements involved in phenotype and disease.

View details for DOI 10.1101/gr.280633.125

View details for PubMedID 41052933
Deciphering the impact of genomic variation on function. Nature 2024; 633 (8028): 47-57

Abstract

Our genomes influence nearly every aspect of human biology-from molecular and cellular functions to phenotypes in health and disease. Studying the differences in DNA sequence between individuals (genomic variation) could reveal previously unknown mechanisms of human biology, uncover the basis of genetic predispositions to diseases, and guide the development of new diagnostic tools and therapeutic agents. Yet, understanding how genomic variation alters genome function to influence phenotype has proved challenging. To unlock these insights, we need a systematic and comprehensive catalogue of genome function and the molecular and cellular effects of genomic variants. Towards this goal, the Impact of Genomic Variation on Function (IGVF) Consortium will combine approaches in single-cell mapping, genomic perturbations and predictive modelling to investigate the relationships among genomic variation, genome function and phenotypes. IGVF will create maps across hundreds of cell types and states describing how coding variants alter protein activity, how noncoding variants change the regulation of gene expression, and how such effects connect through gene-regulatory and protein-interaction networks. These experimental data, computational predictions and accompanying standards and pipelines will be integrated into an open resource that will catalyse community efforts to explore how our genomes influence biology and disease across populations.

View details for DOI 10.1038/s41586-024-07510-0

View details for PubMedID 39232149

View details for PubMedCentralID 7405896
The ENCODE Uniform Analysis Pipelines. Research square Hitz, B. C., Lee, J. W., Jolanki, O., Kagda, M. S., Graham, K., Sud, P., Gabdank, I., Strattan, J. S., Sloan, C. A., Dreszer, T., Rowe, L. D., Podduturi, N. R., Malladi, V. S., Chan, E. T., Davidson, J. M., Ho, M., Miyasato, S., Simison, M., Tanaka, F., Luo, Y., Whaling, I., Hong, E. L., Lee, B. T., Sandstrom, R., Rynes, E., Nelson, J., Nishida, A., Ingersoll, A., Buckley, M., Frerker, M., Kim, D. S., Boley, N., Trout, D., Dobin, A., Rahmanian, S., Wyman, D., Balderrama-Gutierrez, G., Reese, F., Durand, N. C., Dudchenko, O., Weisz, D., Rao, S. S., Blackburn, A., Gkountaroulis, D., Sadr, M., Olshansky, M., Eliaz, Y., Nguyen, D., Bochkov, I., Shamim, M. S., Mahajan, R., Aiden, E., Gingeras, T., Heath, S., Hirst, M., Kent, W. J., Kundaje, A., Mortazavi, A., Wold, B., Cherry, J. M. 2023

Abstract

The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.

View details for DOI 10.21203/rs.3.rs-3111932/v1

View details for PubMedID 37503119

View details for PubMedCentralID PMC10371165
The EN-TEx resource of multi-tissue personal epigenomes & variant-impact models. Cell Rozowsky, J., Gao, J., Borsari, B., Yang, Y. T., Galeev, T., Gürsoy, G., Epstein, C. B., Xiong, K., Xu, J., Li, T., Liu, J., Yu, K., Berthel, A., Chen, Z., Navarro, F., Sun, M. S., Wright, J., Chang, J., Cameron, C. J., Shoresh, N., Gaskell, E., Drenkow, J., Adrian, J., Aganezov, S., Aguet, F., Balderrama-Gutierrez, G., Banskota, S., Corona, G. B., Chee, S., Chhetri, S. B., Cortez Martins, G. C., Danyko, C., Davis, C. A., Farid, D., Farrell, N. P., Gabdank, I., Gofin, Y., Gorkin, D. U., Gu, M., Hecht, V., Hitz, B. C., Issner, R., Jiang, Y., Kirsche, M., Kong, X., Lam, B. R., Li, S., Li, B., Li, X., Lin, K. Z., Luo, R., Mackiewicz, M., Meng, R., Moore, J. E., Mudge, J., Nelson, N., Nusbaum, C., Popov, I., Pratt, H. E., Qiu, Y., Ramakrishnan, S., Raymond, J., Salichos, L., Scavelli, A., Schreiber, J. M., Sedlazeck, F. J., See, L. H., Sherman, R. M., Shi, X., Shi, M., Sloan, C. A., Strattan, J. S., Tan, Z., Tanaka, F. Y., Vlasova, A., Wang, J., Werner, J., Williams, B., Xu, M., Yan, C., Yu, L., Zaleski, C., Zhang, J., Ardlie, K., Cherry, J. M., Mendenhall, E. M., Noble, W. S., Weng, Z., Levine, M. E., Dobin, A., Wold, B., Mortazavi, A., Ren, B., Gillis, J., Myers, R. M., Snyder, M. P., Choudhary, J., Milosavljevic, A., Schatz, M. C., Bernstein, B. E., Guigó, R., Gingeras, T. R., Gerstein, M. 2023; 186 (7): 1493-1511.e40

Abstract

Understanding how genetic variants impact molecular phenotypes is a key goal of functional genomics, currently hindered by reliance on a single haploid reference genome. Here, we present the EN-TEx resource of 1,635 open-access datasets from four donors (∼30 tissues × ∼15 assays). The datasets are mapped to matched, diploid genomes with long-read phasing and structural variants, instantiating a catalog of >1 million allele-specific loci. These loci exhibit coordinated activity along haplotypes and are less conserved than corresponding, non-allele-specific ones. Surprisingly, a deep-learning transformer model can predict the allele-specific activity based only on local nucleotide-sequence context, highlighting the importance of transcription-factor-binding motifs particularly sensitive to variants. Furthermore, combining EN-TEx with existing genome annotations reveals strong associations between allele-specific and GWAS loci. It also enables models for transferring known eQTLs to difficult-to-profile tissues (e.g., from skin to heart). Overall, EN-TEx provides rich data and generalizable models for more accurate personal functional genomics.

View details for DOI 10.1016/j.cell.2023.02.018

View details for PubMedID 37001506
Saccharomyces Genome Database Update: Server Architecture, Pan-Genome Nomenclature, and External Resources. Genetics Wong, E. D., Miyasato, S. R., Aleksander, S., Karra, K., Nash, R. S., Skrzypek, M. S., Weng, S., Engel, S. R., Cherry, J. M. 2023

Abstract

As one of the first model organism knowledgebases, Saccharomyces Genome Database (SGD) has been supporting the scientific research community since 1993. As technologies and research evolve, so does SGD: from updates in software architecture, to curation of novel data types, to incorporation of data from, and collaboration with, other knowledgebases. We are continuing to make steps toward providing the community with an S. cerevisiae pan-genome. Here we describe software upgrades, a new nomenclature system for genes not found in the reference strain, and additions to gene pages. With these improvements, we aim to remain a leading resource for students, researchers, and the broader scientific community.

View details for DOI 10.1093/genetics/iyac191

View details for PubMedID 36607068
Describing the Impact of Genomic Variation on Function (IGVF) Consortium submitted on behalf of the IGVF Consortium members Fulton, L., Wang, T., Yue, F., Hitz, B., Cherry, J. ELSEVIER SCIENCE INC. 2022: S219

View details for DOI 10.1016/j.gim.2022.01.384

View details for Web of Science ID 000796586200125
ClinGen Variant Curation Interface: a variant classification platform for the application of evidence criteria from ACMG/AMP guidelines. Genome medicine Preston, C. G., Wright, M. W., Madhavrao, R., Harrison, S. M., Goldstein, J. L., Luo, X., Wand, H., Wulf, B., Cheung, G., Mandell, M. E., Tong, H., Cheng, S., Iacocca, M. A., Pineda, A. L., Popejoy, A. B., Dalton, K., Zhen, J., Dwight, S. S., Babb, L., DiStefano, M., O'Daniel, J. M., Lee, K., Riggs, E. R., Zastrow, D. B., Mester, J. L., Ritter, D. I., Patel, R. Y., Subramanian, S. L., Milosavljevic, A., Berg, J. S., Rehm, H. L., Plon, S. E., Cherry, J. M., Bustamante, C. D., Costa, H. A., Clinical Genome Resource (ClinGen) 1800; 14 (1): 6

Abstract

BACKGROUND: Identification of clinically significant genetic alterations involved in human disease has been dramatically accelerated by developments in next-generation sequencing technologies. However, the infrastructure and accessible comprehensive curation tools necessary for analyzing an individual patient genome and interpreting genetic variants to inform healthcare management have been lacking.RESULTS: Here we present the ClinGen Variant Curation Interface (VCI), a global open-source variant classification platform for supporting the application of evidence criteria and classification of variants based on the ACMG/AMP variant classification guidelines. The VCI is among a suite of tools developed by the NIH-funded Clinical Genome Resource (ClinGen) Consortium and supports an FDA-recognized human variant curation process. Essential to this is the ability to enable collaboration and peer review across ClinGen Expert Panels supporting users in comprehensively identifying, annotating, and sharing relevant evidence while making variant pathogenicity assertions. To facilitate evidence-based improvements in human variant classification, the VCI is publicly available to the genomics community. Navigation workflows support users providing guidance to comprehensively apply the ACMG/AMP evidence criteria and document provenance for asserting variant classifications.CONCLUSIONS: The VCI offers a central platform for clinical variant classification that fills a gap in the learning healthcare system, facilitates widespread adoption of standards for clinical curation, and is available at https://curation.clinicalgenome.org.

View details for DOI 10.1186/s13073-021-01004-8

View details for PubMedID 35039090
New Data and Collaborations at the Saccharomyces Genome Database: Updated reference genome, alleles, and the Alliance of Genome Resources. Genetics Engel, S. R., Wong, E. D., Nash, R. S., Aleksander, S., Alexander, M., Douglass, E., Karra, K., Miyasato, S. R., Simison, M., Skrzypek, M. S., Weng, S., Cherry, J. M. 1800

Abstract

Saccharomyces cerevisiae is used to provide fundamental understanding of eukaryotic genetics, gene product function, and cellular biological processes. Saccharomyces Genome Database (SGD) has been supporting the yeast research community since 1993, serving as its de facto hub. Over the years, SGD has maintained the genetic nomenclature, chromosome maps, and functional annotation, and developed various tools and methods for analysis and curation of a variety of emerging data types. More recently, SGD and six other model organism focused knowledgebases have come together to create the Alliance of Genome Resources to develop sustainable genome information resources that promote and support the use of various model organisms to understand the genetic and genomic bases of human biology and disease. Here we describe recent activities at SGD, including the latest reference genome annotation update, the development of a curation system for mutant alleles, and new pages addressing homology across model organisms as well as the use of yeast to study human disease.

View details for DOI 10.1093/genetics/iyab224

View details for PubMedID 34897464
The Gene Ontology resource: enriching a GOld mine NUCLEIC ACIDS RESEARCH Carbon, S., Douglass, E., Good, B. M., Unni, D. R., Harris, N. L., Mungall, C. J., Basu, S., Chisholm, R. L., Dodson, R. J., Hartline, E., Fey, P., Thomas, P. D., Albou, L., Ebert, D., Kesling, M. J., Mi, H., Muruganujan, A., Huang, X., Mushayahama, T., LaBonte, S. A., Siegele, D. A., Antonazzo, G., Attrill, H., Brown, N. H., Garapati, P., Marygold, S. J., Trovisco, V., Dos Santos, G., Falls, K., Tabone, C., Zhou, P., Goodman, J. L., Strelets, V. B., Thurmond, J., Garmiri, P., Ishtiaq, R., Rodriguez-Lopez, M., Acencio, M. L., Kuiper, M., Laegreid, A., Logie, C., Lovering, R. C., Kramarz, B., Saverimuttu, S. C. C., Pinheiro, S. M., Gunn, H., Su, R., Thurlow, K. E., Chibucos, M., Giglio, M., Nadendla, S., Munro, J., Jackson, R., Duesbury, M. J., Del-Toro, N., Meldal, B. H. M., Paneerselvam, K., Perfetto, L., Porras, P., Orchard, S., Shrivastava, A., Chang, H., Finn, R., Mitchell, A., Rawlings, N., Richardson, L., Sangrador-Vegas, A., Blake, J. A., Christie, K. R., Dolan, M. E., Drabkin, H. J., Hill, D. P., Ni, L., Sitnikov, D. M., Harris, M. A., Oliver, S. G., Rutherford, K., Wood, V., Hayles, J., Bahler, J., Bolton, E. R., De Pons, J. L., Dwinell, M. R., Hayman, G., Kaldunski, M. L., Kwitek, A. E., Laulederkind, S. J. F., Plasterer, C., Tutaj, M. A., Vedi, M., Wang, S., D'Eustachio, P., Matthews, L., Balhoff, J. P., Aleksander, S. A., Alexander, M. J., Cherry, J., Engel, S. R., Gondwe, F., Karra, K., Miyasato, S. R., Nash, R. S., Simison, M., Skrzypek, M. S., Weng, S., Wong, E. D., Feuermann, M., Gaudet, P., Morgat, A., Bakker, E., Berardini, T. Z., Reiser, L., Subramaniam, S., Huala, E., Arighi, C. N., Auchincloss, A., Axelsen, K., Argoud-Puy, G., Bateman, A., Blatter, M., Boutet, E., Bowler, E., Breuza, L., Bridge, A., Britto, R., Bye-A-Jee, H., Casas, C., Coudert, E., Denny, P., Estreicher, A., Famiglietti, M., Georghiou, G., Gos, A., Gruaz-Gumowski, N., Hatton-Ellis, E., Hulo, C., Ignatchenko, A., Jungo, F., Laiho, K., Le Mercier, P., Lieberherr, D., Lock, A., Lussi, Y., MacDougall, A., Magrane, M., Martin, M. J., Masson, P., Natale, D. A., Hyka-Nouspikel, N., Orchard, S., Pedruzzi, I., Pourcel, L., Poux, S., Pundir, S., Rivoire, C., Speretta, E., Sundaram, S., Tyagi, N., Warner, K., Zaru, R., Wu, C. H., Diehl, A. D., Chan, J. N., Grove, C., Lee, R. Y. N., Muller, H., Raciti, D., Van Auken, K., Sternberg, P. W., Berriman, M., Paulini, M., Howe, K., Gao, S., Wright, A., Stein, L., Howe, D. G., Toro, S., Westerfield, M., Jaiswal, P., Cooper, L., Elser, J., Gene Ontology Consortium 2021; 49 (D1): D325–D334

Abstract

The Gene Ontology Consortium (GOC) provides the most comprehensive resource currently available for computable knowledge regarding the functions of genes and gene products. Here, we report the advances of the consortium over the past two years. The new GO-CAM annotation framework was notably improved, and we formalized the model with a computational schema to check and validate the rapidly increasing repository of 2838 GO-CAMs. In addition, we describe the impacts of several collaborations to refine GO and report a 10% increase in the number of GO annotations, a 25% increase in annotated gene products, and over 9,400 new scientific articles annotated. As the project matures, we continue our efforts to review older annotations in light of newer findings, and, to maintain consistency with other ontologies. As a result, 20 000 annotations derived from experimental data were reviewed, corresponding to 2.5% of experimental GO annotations. The website (http://geneontology.org) was redesigned for quick access to documentation, downloads and tools. To maintain an accurate resource and support traceability and reproducibility, we have made available a historical archive covering the past 15 years of GO data with a consistent format and file structure for both the ontology and annotations.

View details for DOI 10.1093/nar/gkaa1113

View details for Web of Science ID 000608437800042

View details for PubMedID 33290552

View details for PubMedCentralID PMC7779012
Data Sanitization to Reduce Private Information Leakage from Functional Genomics. Cell Gursoy, G., Emani, P., Brannon, C. M., Jolanki, O. A., Harmanci, A., Strattan, J. S., Cherry, J. M., Miranker, A. D., Gerstein, M. 2020; 183 (4): 905

Abstract

The generation of functional genomics datasets is surging, because they provide insight into gene regulation and organismal phenotypes (e.g., genes upregulated in cancer). The intent behind functional genomics experiments is not necessarily to study genetic variants, yet they pose privacy concerns due to their use of next-generation sequencing. Moreover, there is a great incentive to broadly share raw reads for better statistical power and general research reproducibility. Thus, we need new modes of sharing beyond traditional controlled-access models. Here, we develop a data-sanitization procedure allowing raw functional genomics reads to be shared while minimizing privacy leakage, enabling principled privacy-utility trade-offs. Our protocol works with traditional Illumina-based assays and newer technologies such as 10x single-cell RNA sequencing. It involves quantifying the privacy leakage in reads by statistically linking study participants to known individuals. We carried out these linkages using data from highly accurate reference genomes and more realistic environmental samples.

View details for DOI 10.1016/j.cell.2020.09.036

View details for PubMedID 33186529
Perspectives on ENCODE. Nature ENCODE Project Consortium, Snyder, M. P., Gingeras, T. R., Moore, J. E., Weng, Z., Gerstein, M. B., Ren, B., Hardison, R. C., Stamatoyannopoulos, J. A., Graveley, B. R., Feingold, E. A., Pazin, M. J., Pagan, M., Gilchrist, D. A., Hitz, B. C., Cherry, J. M., Bernstein, B. E., Mendenhall, E. M., Zerbino, D. R., Frankish, A., Flicek, P., Myers, R. M., Abascal, F., Acosta, R., Addleman, N. J., Adrian, J., Afzal, V., Aken, B., Akiyama, J. A., Jammal, O. A., Amrhein, H., Anderson, S. M., Andrews, G. R., Antoshechkin, I., Ardlie, K. G., Armstrong, J., Astley, M., Banerjee, B., Barkal, A. A., Barnes, I. H., Barozzi, I., Barrell, D., Barson, G., Bates, D., Baymuradov, U. K., Bazile, C., Beer, M. A., Beik, S., Bender, M. A., Bennett, R., Bouvrette, L. P., Bernstein, B. E., Berry, A., Bhaskar, A., Bignell, A., Blue, S. M., Bodine, D. M., Boix, C., Boley, N., Borrman, T., Borsari, B., Boyle, A. P., Brandsmeier, L. A., Breschi, A., Bresnick, E. H., Brooks, J. A., Buckley, M., Burge, C. B., Byron, R., Cahill, E., Cai, L., Cao, L., Carty, M., Castanon, R. G., Castillo, A., Chaib, H., Chan, E. T., Chee, D. R., Chee, S., Chen, H., Chen, H., Chen, J., Chen, S., Cherry, J. M., Chhetri, S. B., Choudhary, J. S., Chrast, J., Chung, D., Clarke, D., Cody, N. A., Coppola, C. J., Coursen, J., D'Ippolito, A. M., Dalton, S., Danyko, C., Davidson, C., Davila-Velderrain, J., Davis, C. A., Dekker, J., Deran, A., DeSalvo, G., Despacio-Reyes, G., Dewey, C. N., Dickel, D. E., Diegel, M., Diekhans, M., Dileep, V., Ding, B., Djebali, S., Dobin, A., Dominguez, D., Donaldson, S., Drenkow, J., Dreszer, T. R., Drier, Y., Duff, M. O., Dunn, D., Eastman, C., Ecker, J. R., Edwards, M. D., El-Ali, N., Elhajjajy, S. I., Elkins, K., Emili, A., Epstein, C. B., Evans, R. C., Ezkurdia, I., Fan, K., Farnham, P. J., Farrell, N., Feingold, E. A., Ferreira, A., Fisher-Aylor, K., Fitzgerald, S., Flicek, P., Foo, C. S., Fortier, K., Frankish, A., Freese, P., Fu, S., Fu, X., Fu, Y., Fukuda-Yuzawa, Y., Fulciniti, M., Funnell, A. P., Gabdank, I., Galeev, T., Gao, M., Giron, C. G., Garvin, T. H., Gelboin-Burkhart, C. A., Georgolopoulos, G., Gerstein, M. B., Giardine, B. M., Gifford, D. K., Gilbert, D. M., Gilchrist, D. A., Gillespie, S., Gingeras, T. R., Gong, P., Gonzalez, A., Gonzalez, J. M., Good, P., Goren, A., Gorkin, D. U., Graveley, B. R., Gray, M., Greenblatt, J. F., Griffiths, E., Groudine, M. T., Grubert, F., Gu, M., Guigo, R., Guo, H., Guo, Y., Guo, Y., Gursoy, G., Gutierrez-Arcelus, M., Halow, J., Hardison, R. C., Hardy, M., Hariharan, M., Harmanci, A., Harrington, A., Harrow, J. L., Hashimoto, T. B., Hasz, R. D., Hatan, M., Haugen, E., Hayes, J. E., He, P., He, Y., Heidari, N., Hendrickson, D., Heuston, E. F., Hilton, J. A., Hitz, B. C., Hochman, A., Holgren, C., Hou, L., Hou, S., Hsiao, Y. E., Hsu, S., Huang, H., Hubbard, T. J., Huey, J., Hughes, T. R., Hunt, T., Ibarrientos, S., Issner, R., Iwata, M., Izuogu, O., Jaakkola, T., Jameel, N., Jansen, C., Jiang, L., Jiang, P., Johnson, A., Johnson, R., Jungreis, I., Kadaba, M., Kasowski, M., Kasparian, M., Kato, M., Kaul, R., Kawli, T., Kay, M., Keen, J. C., Keles, S., Keller, C. A., Kelley, D., Kellis, M., Kheradpour, P., Kim, D. S., Kirilusha, A., Klein, R. J., Knoechel, B., Kuan, S., Kulik, M. J., Kumar, S., Kundaje, A., Kutyavin, T., Lagarde, J., Lajoie, B. R., Lambert, N. J., Lazar, J., Lee, A. Y., Lee, D., Lee, E., Lee, J. W., Lee, K., Leslie, C. S., Levy, S., Li, B., Li, H., Li, N., Li, X., Li, Y. I., Li, Y., Li, Y., Li, Y., Lian, J., Libbrecht, M. W., Lin, S., Lin, Y., Liu, D., Liu, J., Liu, P., Liu, T., Liu, X. S., Liu, Y., Liu, Y., Long, M., Lou, S., Loveland, J., Lu, A., Lu, Y., Lecuyer, E., Ma, L., Mackiewicz, M., Mannion, B. J., Mannstadt, M., Manthravadi, D., Marinov, G. K., Martin, F. J., Mattei, E., McCue, K., McEown, M., McVicker, G., Meadows, S. K., Meissner, A., Mendenhall, E. M., Messer, C. L., Meuleman, W., Meyer, C., Miller, S., Milton, M. G., Mishra, T., Moore, D. E., Moore, H. M., Moore, J. E., Moore, S. H., Moran, J., Mortazavi, A., Mudge, J. M., Munshi, N., Murad, R., Myers, R. M., Nandakumar, V., Nandi, P., Narasimha, A. M., Narayanan, A. K., Naughton, H., Navarro, F. C., Navas, P., Nazarovs, J., Nelson, J., Neph, S., Neri, F. J., Nery, J. R., Nesmith, A. R., Newberry, J. S., Newberry, K. M., Ngo, V., Nguyen, R., Nguyen, T. B., Nguyen, T., Nishida, A., Noble, W. S., Novak, C. S., Novoa, E. M., Nunez, B., O'Donnell, C. W., Olson, S., Onate, K. C., Otterman, E., Ozadam, H., Pagan, M., Palden, T., Pan, X., Park, Y., Partridge, E. C., Paten, B., Pauli-Behn, F., Pazin, M. J., Pei, B., Pennacchio, L. A., Perez, A. R., Perry, E. H., Pervouchine, D. D., Phalke, N. N., Pham, Q., Phanstiel, D. H., Plajzer-Frick, I., Pratt, G. A., Pratt, H. E., Preissl, S., Pritchard, J. K., Pritykin, Y., Purcaro, M. J., Qin, Q., Quinones-Valdez, G., Rabano, I., Radovani, E., Raj, A., Rajagopal, N., Ram, O., Ramirez, L., Ramirez, R. N., Rausch, D., Raychaudhuri, S., Raymond, J., Razavi, R., Reddy, T. E., Reimonn, T. M., Ren, B., Reymond, A., Reynolds, A., Rhie, S. K., Rinn, J., Rivera, M., Rivera-Mulia, J. C., Roberts, B., Rodriguez, J. M., Rozowsky, J., Ryan, R., Rynes, E., Salins, D. N., Sandstrom, R., Sasaki, T., Sathe, S., Savic, D., Scavelli, A., Scheiman, J., Schlaffner, C., Schloss, J. A., Schmitges, F. W., See, L. H., Sethi, A., Setty, M., Shafer, A., Shan, S., Sharon, E., Shen, Q., Shen, Y., Sherwood, R. I., Shi, M., Shin, S., Shoresh, N., Siebenthall, K., Sisu, C., Slifer, T., Sloan, C. A., Smith, A., Snetkova, V., Snyder, M. P., Spacek, D. V., Srinivasan, S., Srivas, R., Stamatoyannopoulos, G., Stamatoyannopoulos, J. A., Stanton, R., Steffan, D., Stehling-Sun, S., Strattan, J. S., Su, A., Sundararaman, B., Suner, M., Syed, T., Szynkarek, M., Tanaka, F. Y., Tenen, D., Teng, M., Thomas, J. A., Toffey, D., Tress, M. L., Trout, D. E., Trynka, G., Tsuji, J., Upchurch, S. A., Ursu, O., Uszczynska-Ratajczak, B., Uziel, M. C., Valencia, A., Biber, B. V., van der Velde, A. G., Van Nostrand, E. L., Vaydylevich, Y., Vazquez, J., Victorsen, A., Vielmetter, J., Vierstra, J., Visel, A., Vlasova, A., Vockley, C. M., Volpi, S., Vong, S., Wang, H., Wang, M., Wang, Q., Wang, R., Wang, T., Wang, W., Wang, X., Wang, Y., Watson, N. K., Wei, X., Wei, Z., Weisser, H., Weissman, S. M., Welch, R., Welikson, R. E., Weng, Z., Westra, H., Whitaker, J. W., White, C., White, K. P., Wildberg, A., Williams, B. A., Wine, D., Witt, H. N., Wold, B., Wolf, M., Wright, J., Xiao, R., Xiao, X., Xu, J., Xu, J., Yan, K., Yan, Y., Yang, H., Yang, X., Yang, Y., Yardimci, G. G., Yee, B. A., Yeo, G. W., Young, T., Yu, T., Yue, F., Zaleski, C., Zang, C., Zeng, H., Zeng, W., Zerbino, D. R., Zhai, J., Zhan, L., Zhan, Y., Zhang, B., Zhang, J., Zhang, J., Zhang, K., Zhang, L., Zhang, P., Zhang, Q., Zhang, X., Zhang, Y., Zhang, Z., Zhao, Y., Zheng, Y., Zhong, G., Zhou, X., Zhu, Y., Zimmerman, J. 2020; 583 (7818): 693–98

Abstract

The Encylopedia of DNA Elements (ENCODE) Project launched in 2003 with the long-term goal of developing a comprehensive map of functional elements in the human genome. These included genes, biochemical regions associated with gene regulation (for example, transcription factor binding sites, open chromatin, and histone marks) and transcript isoforms. The marks serve as sites for candidate cis-regulatory elements (cCREs) that may serve functional roles in regulating gene expression1. The project has been extended to model organisms, particularly the mouse. In the third phase of ENCODE, nearly a million and more than 300,000 cCRE annotations have been generated for human and mouse, respectively, and these have provided a valuable resource for the scientific community.

View details for DOI 10.1038/s41586-020-2449-8

View details for PubMedID 32728248
An atlas of dynamic chromatin landscapes in mouse fetal development. Nature Gorkin, D. U., Barozzi, I., Zhao, Y., Zhang, Y., Huang, H., Lee, A. Y., Li, B., Chiou, J., Wildberg, A., Ding, B., Zhang, B., Wang, M., Strattan, J. S., Davidson, J. M., Qiu, Y., Afzal, V., Akiyama, J. A., Plajzer-Frick, I., Novak, C. S., Kato, M., Garvin, T. H., Pham, Q. T., Harrington, A. N., Mannion, B. J., Lee, E. A., Fukuda-Yuzawa, Y., He, Y., Preissl, S., Chee, S., Han, J. Y., Williams, B. A., Trout, D., Amrhein, H., Yang, H., Cherry, J. M., Wang, W., Gaulton, K., Ecker, J. R., Shen, Y., Dickel, D. E., Visel, A., Pennacchio, L. A., Ren, B. 2020; 583 (7818): 744–51

Abstract

The Encyclopedia of DNA Elements (ENCODE) project has established a genomic resource for mammalian development, profiling a diverse panel of mouse tissues at 8 developmental stages from 10.5 days after conception until birth, including transcriptomes, methylomes and chromatin states. Here we systematically examined the state and accessibility of chromatin in the developing mouse fetus. In total we performed 1,128 chromatin immunoprecipitation with sequencing (ChIP-seq) assays for histone modifications and 132 assay for transposase-accessible chromatin using sequencing (ATAC-seq) assays for chromatin accessibility across 72 distinct tissue-stages. We used integrative analysis to develop a unified set of chromatin state annotations, infer the identities of dynamic enhancers and key transcriptional regulators, and characterize the relationship between chromatin state and accessibility during developmental gene regulation. We also leveraged these data to link enhancers to putative target genes and demonstrate tissue-specific enrichments of sequence variants associated with disease in humans. The mouse ENCODE data sets provide a compendium of resources for biomedical researchers and achieve, to our knowledge, the most comprehensive view of chromatin dynamics during mammalian fetal development to date.

View details for DOI 10.1038/s41586-020-2093-3

View details for PubMedID 32728240
CNN-Peaks: ChIP-Seq peak detection pipeline using convolutional neural networks that imitate human visual inspection. Scientific reports Oh, D., Strattan, J. S., Hur, J. K., Bento, J., Urban, A. E., Song, G., Cherry, J. M. 2020; 10 (1): 7933

Abstract

ChIP-seq is one of the core experimental resources available to understand genome-wide epigenetic interactions and identify the functional elements associated with diseases. The analysis of ChIP-seq data is important but poses a difficult computational challenge, due to the presence of irregular noise and bias on various levels. Although many peak-calling methods have been developed, the current computational tools still require, in some cases, human manual inspection using data visualization. However, the huge volumes of ChIP-seq data make it almost impossible for human researchers to manually uncover all the peaks. Recently developed convolutional neural networks (CNN), which are capable of achieving human-like classification accuracy, can be applied to this challenging problem. In this study, we design a novel supervised learning approach for identifying ChIP-seq peaks using CNNs, and integrate it into a software pipeline called CNN-Peaks. We use data labeled by human researchers who annotate the presence or absence of peaks in some genomic segments, as training data for our model. The trained model is then applied to predict peaks in previously unseen genomic segments from multiple ChIP-seq datasets including benchmark datasets commonly used for validation of peak calling methods. We observe a performance superior to that of previous methods.

View details for DOI 10.1038/s41598-020-64655-4

View details for PubMedID 32404971
Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature Moore, J. E., Purcaro, M. J., Pratt, H. E., Epstein, C. B., Shoresh, N. n., Adrian, J. n., Kawli, T. n., Davis, C. A., Dobin, A. n., Kaul, R. n., Halow, J. n., Van Nostrand, E. L., Freese, P. n., Gorkin, D. U., Shen, Y. n., He, Y. n., Mackiewicz, M. n., Pauli-Behn, F. n., Williams, B. A., Mortazavi, A. n., Keller, C. A., Zhang, X. O., Elhajjajy, S. I., Huey, J. n., Dickel, D. E., Snetkova, V. n., Wei, X. n., Wang, X. n., Rivera-Mulia, J. C., Rozowsky, J. n., Zhang, J. n., Chhetri, S. B., Zhang, J. n., Victorsen, A. n., White, K. P., Visel, A. n., Yeo, G. W., Burge, C. B., Lécuyer, E. n., Gilbert, D. M., Dekker, J. n., Rinn, J. n., Mendenhall, E. M., Ecker, J. R., Kellis, M. n., Klein, R. J., Noble, W. S., Kundaje, A. n., Guigó, R. n., Farnham, P. J., Cherry, J. M., Myers, R. M., Ren, B. n., Graveley, B. R., Gerstein, M. B., Pennacchio, L. A., Snyder, M. P., Bernstein, B. E., Wold, B. n., Hardison, R. C., Gingeras, T. R., Stamatoyannopoulos, J. A., Weng, Z. n. 2020; 583 (7818): 699–710

Abstract

The human and mouse genomes contain instructions that specify RNAs and proteins and govern the timing, magnitude, and cellular context of their production. To better delineate these elements, phase III of the Encyclopedia of DNA Elements (ENCODE) Project has expanded analysis of the cell and tissue repertoires of RNA transcription, chromatin structure and modification, DNA methylation, chromatin looping, and occupancy by transcription factors and RNA-binding proteins. Here we summarize these efforts, which have produced 5,992 new experimental datasets, including systematic determinations across mouse fetal development. All data are available through the ENCODE data portal (https://www.encodeproject.org), including phase II ENCODE1 and Roadmap Epigenomics2 data. We have developed a registry of 926,535 human and 339,815 mouse candidate cis-regulatory elements, covering 7.9 and 3.4% of their respective genomes, by integrating selected datatypes associated with gene regulation, and constructed a web-based server (SCREEN; http://screen.encodeproject.org) to provide flexible, user-defined access to this resource. Collectively, the ENCODE data and registry provide an expansive resource for the scientific community to build a better understanding of the organization and function of the human and mouse genomes.

View details for DOI 10.1038/s41586-020-2493-4

View details for PubMedID 32728249
The Alliance of Genome Resources: Building a Modern Data Ecosystem for Model Organism Databases GENETICS Bult, C. J., Blake, J. A., Calvi, B. R., Cherry, J., DiFrancesco, V., Fullem, R., Howe, K. L., Kaufman, T., Mungall, C., Perrimon, N., Shimoyama, M., Sternberg, P. W., Thomas, P., Westerfield, M., Alliance Genome Resources Consorti 2019; 213 (4): 1189–96

Abstract

Model organisms are essential experimental platforms for discovering gene functions, defining protein and genetic networks, uncovering functional consequences of human genome variation, and for modeling human disease. For decades, researchers who use model organisms have relied on Model Organism Databases (MODs) and the Gene Ontology Consortium (GOC) for expertly curated annotations, and for access to integrated genomic and biological information obtained from the scientific literature and public data archives. Through the development and enforcement of data and semantic standards, these genome resources provide rapid access to the collected knowledge of model organisms in human readable and computation-ready formats that would otherwise require countless hours for individual researchers to assemble on their own. Since their inception, the MODs for the predominant biomedical model organisms [Mus sp (laboratory mouse), Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Danio rerio, and Rattus norvegicus] along with the GOC have operated as a network of independent, highly collaborative genome resources. In 2016, these six MODs and the GOC joined forces as the Alliance of Genome Resources (the Alliance). By implementing shared programmatic access methods and data-specific web pages with a unified "look and feel," the Alliance is tackling barriers that have limited the ability of researchers to easily compare common data types and annotations across model organisms. To adapt to the rapidly changing landscape for evaluating and funding core data resources, the Alliance is building a modern, extensible, and operationally efficient "knowledge commons" for model organisms using shared, modular infrastructure.

View details for DOI 10.1534/genetics.119.302523

View details for Web of Science ID 000501177400004

View details for PubMedID 31796553

View details for PubMedCentralID PMC6893393
Transcriptome visualization and data availability at the Saccharomyces Genome Database. Nucleic acids research Ng, P. C., Wong, E. D., MacPherson, K. A., Aleksander, S., Argasinska, J., Dunn, B., Nash, R. S., Skrzypek, M. S., Gondwe, F., Jha, S., Karra, K., Weng, S., Miyasato, S., Simison, M., Engel, S. R., Cherry, J. M. 2019

Abstract

The Saccharomyces Genome Database (SGD; www.yeastgenome.org) maintains the official annotation of all genes in the Saccharomyces cerevisiae reference genome and aims to elucidate the function of these genes and their products by integrating manually curated experimental data. Technological advances have allowed researchers to profile RNA expression and identify transcripts at high resolution. These data can be configured in web-based genome browser applications for display to the general public. Accordingly, SGD has incorporated published transcript isoform data in our instance of JBrowse, a genome visualization platform. This resource will help clarify S. cerevisiae biological processes by furthering studies of transcriptional regulation, untranslated regions, genome engineering, and expression quantification in S. cerevisiae.

View details for DOI 10.1093/nar/gkz892

View details for PubMedID 31612944
RNAcentral: a hub of information for non-coding RNA sequences (vol 47, pg D221, 2019) NUCLEIC ACIDS RESEARCH Sweeney, B. A., Petrov, A. I., Burkov, B., Finn, R. D., Bateman, A., Szymanski, M., Karlowski, W. M., Gorodkin, J., Seemann, S. E., Cannone, J. J., Gutell, R. R., Fey, P., Basu, S., Kay, S., Cochrane, G., Billis, K., Emmert, D., Marygold, S. J., Huntley, R. P., Lovering, R. C., Frankish, A., Chan, P. P., Lowe, T. M., Bruford, E., Seal, R., Vandesompele, J., Volders, P., Paraskevopoulou, M., Ma, L., Zhang, Z., Griffiths-Jones, S., Bujnicki, J. M., Boccaletto, P., Blake, J. A., Bult, C. J., Chen, R., Zhao, Y., Wood, V., Rutherford, K., Rivas, E., Cole, J., Laulederkind, S. J. F., Shimoyama, M., Gillespie, M. E., Orlic-Milacic, M., Kalvari, I., Nawrocki, E., Engel, S. R., Cherry, J., Berardini, T. Z., Hatzigeorgiou, A., Karagkouni, D., Howe, K., Davis, P., Dinger, M., He, S., Yoshihama, M., Kenmochi, N., Stadler, P. F., Williams, K. P., RNAcentral Consortium, SILVA Team 2019; 47 (D1): D1250–D1251

View details for DOI 10.1093/nar/gky1206

View details for Web of Science ID 000462587400170
The Gene Ontology Resource: 20 years and still GOing strong NUCLEIC ACIDS RESEARCH Carbon, S., Douglass, E., Dunn, N., Good, B., Harris, N. L., Lewis, S. E., Mungall, C. J., Basu, S., Chisholm, R. L., Dodson, R. J., Hartline, E., Fey, P., Thomas, P. D., Albou, L. P., Ebert, D., Kesling, M. J., Mi, H., Muruganujian, A., Huang, X., Poudel, S., Mushayahama, T., Hu, J. C., LaBonte, S. A., Siegele, D. A., Antonazzo, G., Attrill, H., Brown, N. H., Fexova, S., Garapati, P., Jones, T. M., Marygold, S. J., Millburn, G. H., Rey, A. J., Trovisco, V., dos Santos, G., Emmert, D. B., Falls, K., Zhou, P., Goodman, J. L., Strelets, V. B., Thurmond, J., Courtot, M., Osumi-Sutherland, D., Parkinson, H., Roncaglia, P., Acencio, M. L., Kuiper, M., Laegreid, A., Logie, C., Lovering, R. C., Huntley, R. P., Denny, P., Campbell, N. H., Kramarz, B., Acquaah, V., Ahmad, S. H., Chen, H., Rawson, J. H., Chibucos, M. C., Giglio, M., Nadendla, S., Tauber, R., Duesbury, M. J., Del-Toro, N., Meldal, B. M., Perfetto, L., Porras, P., Orchard, S., Shrivastava, A., Xie, Z., Chang, H. Y., Finn, R. D., Mitchell, A. L., Rawlings, N. D., Richardson, L., Sangrador-Vegas, A., Blake, J. A., Christie, K. R., Dolan, M. E., Drabkin, H. J., Hill, D. P., Ni, L., Sitnikov, D., Harris, M. A., Oliver, S. G., Ruther-Ford, K., Wood, V., Hayles, J., Bahler, J., Lock, A., Bolton, E. R., De Pons, J., Dwinell, M., Hayman, G. T., Laulederkind, S. F., Shimoyama, M., Tutaj, M., Wang, S., D'Eustachio, P., Matthews, L., Balhoff, J. P., Aleksander, S. A., Binkley, G., Dunn, B. L., Cherry, J. M., Engel, S. R., Gondwe, F., Karra, K., MacPherson, K. A., Miyasato, S. R., Nash, R. S., Ng, P. C., Sheppard, T. K., Shrivatsav, A. P., Simison, M., Skrzypek, M. S., Weng, S., Wong, E. D., Feuermann, M., Gaudet, P., Bakker, E., Berardini, T. Z., Reiser, L., Subramaniam, S., Huala, E., Arighi, C., Auchincloss, A., Axelsen, K., Argoud-Puy, G., Bateman, A., Bely, B., Blatter, M., Boutet, E., Breuza, L., Bridge, A., Britto, R., Bye-A-Jee, H., Casals-Casas, C., Coudert, E., Estreicher, A., Famiglietti, L., Garmiri, P., Georghiou, G., Gos, A., Gruaz-Gumowski, N., Hatton-Ellis, E., Hinz, U., Hulo, C., Ignatchenko, A., Jungo, F., Keller, G., Laiho, K., Lemercier, P., Lieberherr, D., Lussi, Y., Mac-Dougall, A., Magrane, M., Martin, M. J., Masson, P., Natale, D. A., Hyka-Nouspikel, N., Pedruzzi, I., Pichler, K., Poux, S., Rivoire, C., Rodriguez-Lopez, M., Sawford, T., Speretta, E., Shypitsyna, A., Stutz, A., Sundaram, S., Tognolli, M., Tyagi, N., Warner, K., Zaru, R., Wu, C., Diehl, A. D., Chan, J., Cho, J., Gao, S., Grove, C., Harrison, M. C., Howe, K., Lee, R., Mendel, J., Muller, H., Raciti, D., Van Auken, K., Berriman, M., Stein, L., Sternberg, P. W., Howe, D., Toro, S., Westerfield, M., Gene Ontology Consortium 2019; 47 (D1): D330–D338

Abstract

The Gene Ontology resource (GO; http://geneontology.org) provides structured, computable knowledge regarding the functions of genes and gene products. Founded in 1998, GO has become widely adopted in the life sciences, and its contents are under continual improvement, both in quantity and in quality. Here, we report the major developments of the GO resource during the past two years. Each monthly release of the GO resource is now packaged and given a unique identifier (DOI), enabling GO-based analyses on a specific release to be reproduced in the future. The molecular function ontology has been refactored to better represent the overall activities of gene products, with a focus on transcription regulator activities. Quality assurance efforts have been ramped up to address potentially out-of-date or inaccurate annotations. New evidence codes for high-throughput experiments now enable users to filter out annotations obtained from these sources. GO-CAM, a new framework for representing gene function that is more expressive than standard GO annotations, has been released, and users can now explore the growing repository of these models. We also provide the 'GO ribbon' widget for visualizing GO annotations to a gene; the widget can be easily embedded in any web page.

View details for DOI 10.1093/nar/gky1055

View details for Web of Science ID 000462587400049

View details for PubMedID 30395331

View details for PubMedCentralID PMC6323945
New developments on the Encyclopedia of DNA Elements (ENCODE) data portal. Nucleic acids research Luo, Y. n., Hitz, B. C., Gabdank, I. n., Hilton, J. A., Kagda, M. S., Lam, B. n., Myers, Z. n., Sud, P. n., Jou, J. n., Lin, K. n., Baymuradov, U. K., Graham, K. n., Litton, C. n., Miyasato, S. R., Strattan, J. S., Jolanki, O. n., Lee, J. W., Tanaka, F. Y., Adenekan, P. n., O'Neill, E. n., Cherry, J. M. 2019

Abstract

The Encyclopedia of DNA Elements (ENCODE) is an ongoing collaborative research project aimed at identifying all the functional elements in the human and mouse genomes. Data generated by the ENCODE consortium are freely accessible at the ENCODE portal (https://www.encodeproject.org/), which is developed and maintained by the ENCODE Data Coordinating Center (DCC). Since the initial portal release in 2013, the ENCODE DCC has updated the portal to make ENCODE data more findable, accessible, interoperable and reusable. Here, we report on recent updates, including new ENCODE data and assays, ENCODE uniform data processing pipelines, new visualization tools, a dataset cart feature, unrestricted public access to ENCODE data on the cloud (Amazon Web Services open data registry, https://registry.opendata.aws/encode-project/) and more comprehensive tutorials and documentation.

View details for DOI 10.1093/nar/gkz1062

View details for PubMedID 31713622
Integrative Meta-Assembly Pipeline (IMAP): Chromosome-level genome assembler combining multiple de novo assemblies. PloS one Song, G. n., Lee, J. n., Kim, J. n., Kang, S. n., Lee, H. n., Kwon, D. n., Lee, D. n., Lang, G. I., Cherry, J. M., Kim, J. n. 2019; 14 (8): e0221858

Abstract

Genomic data have become major resources to understand complex mechanisms at fine-scale temporal and spatial resolution in functional and evolutionary genetic studies, including human diseases, such as cancers. Recently, a large number of whole genomes of evolving populations of yeast (Saccharomyces cerevisiae W303 strain) were sequenced in a time-dependent manner to identify temporal evolutionary patterns. For this type of study, a chromosome-level sequence assembly of the strain or population at time zero is required to compare with the genomes derived later. However, there is no fully automated computational approach in experimental evolution studies to establish the chromosome-level genome assembly using unique features of sequencing data.In this study, we developed a new software pipeline, the integrative meta-assembly pipeline (IMAP), to build chromosome-level genome sequence assemblies by generating and combining multiple initial assemblies using three de novo assemblers from short-read sequencing data. We significantly improved the continuity and accuracy of the genome assembly using a large collection of sequencing data and hybrid assembly approaches. We validated our pipeline by generating chromosome-level assemblies of yeast strains W303 and SK1, and compared our results with assemblies built using long-read sequencing and various assembly evaluation metrics. We also constructed chromosome-level sequence assemblies of S. cerevisiae strain Sigma1278b, and three commonly used fungal strains: Aspergillus nidulans A713, Neurospora crassa 73, and Thielavia terrestris CBS 492.74, for which long-read sequencing data are not yet available. Finally, we examined the effect of IMAP parameters, such as reference and resolution, on the quality of the final assembly of the yeast strains W303 and SK1.We developed a cost-effective pipeline to generate chromosome-level sequence assemblies using only short-read sequencing data. Our pipeline combines the strengths of reference-guided and meta-assembly approaches. Our pipeline is available online at http://github.com/jkimlab/IMAP including a Docker image, as well as a Perl script, to help users install the IMAP package, including several prerequisite programs. Users can use IMAP to easily build the chromosome-level assembly for the genome of their interest.

View details for DOI 10.1371/journal.pone.0221858

View details for PubMedID 31454399
Prevention of data duplication for high throughput sequencing repositories DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION Gabdank, I., Chan, E. T., Davidson, J. M., Hilton, J. A., Davis, C. A., Baymuradov, U. K., Narayanan, A., Onate, K. C., Graham, K., Miyasato, S. R., Dreszer, T. R., Strattan, J., Jolanki, O., Tanaka, F. Y., Hitz, B. C., Sloan, C. A., Cherry, J. 2018

Abstract

https://www.encodeproject.org/.

View details for PubMedID 29688363
Updated regulation curation model at the Saccharomyces Genome Database DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION Engel, S. R., Skrzypek, M. S., Hellerstedt, S. T., Wong, E. D., Nash, R. S., Weng, S., Binkley, G., Sheppard, T. K., Karra, K., Cherry, J. 2018

Abstract

http://www.yeastgenome.org.

View details for PubMedID 29688362
Evaluating the Clinical Validity of Gene-Disease Associations: An Evidence-Based Framework Developed by the Clinical Genome Resource. American journal of human genetics Strande, N. T., Riggs, E. R., Buchanan, A. H., Ceyhan-Birsoy, O., DiStefano, M., Dwight, S. S., Goldstein, J., Ghosh, R., Seifert, B. A., Sneddon, T. P., Wright, M. W., Milko, L. V., Cherry, J. M., Giovanni, M. A., Murray, M. F., O'Daniel, J. M., Ramos, E. M., Santani, A. B., Scott, A. F., Plon, S. E., Rehm, H. L., Martin, C. L., Berg, J. S. 2017; 100 (6): 895-906

Abstract

With advances in genomic sequencing technology, the number of reported gene-disease relationships has rapidly expanded. However, the evidence supporting these claims varies widely, confounding accurate evaluation of genomic variation in a clinical setting. Despite the critical need to differentiate clinically valid relationships from less well-substantiated relationships, standard guidelines for such evaluation do not currently exist. The NIH-funded Clinical Genome Resource (ClinGen) has developed a framework to define and evaluate the clinical validity of gene-disease pairs across a variety of Mendelian disorders. In this manuscript we describe a proposed framework to evaluate relevant genetic and experimental evidence supporting or contradicting a gene-disease relationship and the subsequent validation of this framework using a set of representative gene-disease pairs. The framework provides a semiquantitative measurement for the strength of evidence of a gene-disease relationship that correlates to a qualitative classification: "Definitive," "Strong," "Moderate," "Limited," "No Reported Evidence," or "Conflicting Evidence." Within the ClinGen structure, classifications derived with this framework are reviewed and confirmed or adjusted based on clinical expertise of appropriate disease experts. Detailed guidance for utilizing this framework and access to the curation interface is available on our website. This evidence-based, systematic method to assess the strength of gene-disease relationships will facilitate more knowledgeable utilization of genomic variants in clinical and research settings.

View details for DOI 10.1016/j.ajhg.2017.04.015

View details for PubMedID 28552198
SnoVault and encodeD: A novel object-based storage system and applications to ENCODE metadata PLOS ONE Hitz, B. C., Rowe, L. D., Podduturi, N. R., Glick, D. I., Baymuradov, U. K., Malladi, V. S., Chan, E. T., Davidson, J. M., Gabdank, I., Narayana, A. K., Onate, K. C., Hilton, J., Ho, M. C., Lee, B. T., Miyasato, S. R., Dreszer, T. R., Sloan, C. A., Strattan, J. S., Tanaka, F. Y., Hong, E. L., Cherry, J. M. 2017; 12 (4)

Abstract

The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a comprehensive catalog of functional elements initiated shortly after the completion of the Human Genome Project. The current database exceeds 6500 experiments across more than 450 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the H. sapiens and M. musculus genomes. All ENCODE experimental data, metadata, and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage, unified processing, and distribution to community resources and the scientific community. As the volume of data increases, the identification and organization of experimental details becomes increasingly intricate and demands careful curation. The ENCODE DCC has created a general purpose software system, known as SnoVault, that supports metadata and file submission, a database used for metadata storage, web pages for displaying the metadata and a robust API for querying the metadata. The software is fully open-source, code and installation instructions can be found at: http://github.com/ENCODE-DCC/snovault/ (for the generic database) and http://github.com/ENCODE-DCC/encoded/ to store genomic data in the manner of ENCODE. The core database engine, SnoVault (which is completely independent of ENCODE, genomic data, or bioinformatic data) has been released as a separate Python package.

View details for DOI 10.1371/journal.pone.0175310

View details for Web of Science ID 000399955200049

View details for PubMedID 28403240
Outreach and online training services at the Saccharomyces Genome Database DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION MacPherson, K. A., Starr, B., Wong, E. D., Dalusag, K. S., Hellerstedt, S. T., Lang, O., Nash, R. S., Skrzypek, M. S., Engel, S. R., Cherry, J. M. 2017

Abstract

The Saccharomyces Genome Database (SGD; www.yeastgenome.org ), the primary genetics and genomics resource for the budding yeast S. cerevisiae , provides free public access to expertly curated information about the yeast genome and its gene products. As the central hub for the yeast research community, SGD engages in a variety of social outreach efforts to inform our users about new developments, promote collaboration, increase public awareness of the importance of yeast to biomedical research, and facilitate scientific discovery. Here we describe these various outreach methods, from networking at scientific conferences to the use of online media such as blog posts and webinars, and include our perspectives on the benefits provided by outreach activities for model organism databases.http://www.yeastgenome.org.

View details for DOI 10.1093/database/bax002

View details for Web of Science ID 000397529600002

View details for PubMedID 28365719
Active Interaction Mapping Reveals the Hierarchical Organization of Autophagy. Molecular cell Kramer, M. H., Farré, J., Mitra, K., Yu, M. K., Ono, K., Demchak, B., Licon, K., Flagg, M., Balakrishnan, R., Cherry, J. M., Subramani, S., Ideker, T. 2017; 65 (4): 761-774 e5

Abstract

We have developed a general progressive procedure, Active Interaction Mapping, to guide assembly of the hierarchy of functions encoding any biological system. Using this process, we assemble an ontology of functions comprising autophagy, a central recycling process implicated in numerous diseases. A first-generation model, built from existing gene networks in Saccharomyces, captures most known autophagy components in broad relation to vesicle transport, cell cycle, and stress response. Systematic analysis identifies synthetic-lethal interactions as most informative for further experiments; consequently, we saturate the model with 156,364 such measurements across autophagy-activating conditions. These targeted interactions provide more information about autophagy than all previous datasets, producing a second-generation ontology of 220 functions. Approximately half are previously unknown; we confirm roles for Gyp1 at the phagophore-assembly site, Atg24 in cargo engulfment, Atg26 in cytoplasm-to-vacuole targeting, and Ssd1, Did4, and others in selective and non-selective autophagy. The procedure and autophagy hierarchy are at http://atgo.ucsd.edu/.

View details for DOI 10.1016/j.molcel.2016.12.024

View details for PubMedID 28132844
Expansion of the Gene Ontology knowledgebase and resources NUCLEIC ACIDS RESEARCH CARBON, S., Dietze, H., Lewis, S. E., Mungall, C. J., Munoz-Torres, M. C., Basu, S., Chisholm, R. L., Dodson, R. J., Fey, P., Thomas, P. D., Mi, H., Muruganujan, A., Huang, X., Poudel, S., Hu, J. C., Aleksander, S. A., McIntosh, B. K., Renfro, D. P., Siegele, D. A., Antonazzo, G., Attrill, H., Brown, N. H., Marygold, S. J., McQuilton, P., Ponting, L., Millburn, G. H., REY, A. J., Stefancsik, R., Tweedie, S., Falls, K., Schroeder, A. J., COURTOT, M., Osumi-Sutherland, D., Parkinson, H., Roncaglia, P., Lovering, R. C., Foulger, R. E., Huntley, R. P., Denny, P., Campbell, N. H., Kramarz, B., Patel, S., Buxton, J. L., Umrao, Z., Deng, A. T., Alrohaif, H., Mitchell, K., Ratnaraj, F., OMER, W., Rodriguez-Lopez, M., Chibucos, M. C., GIGLIO, M., Nadendla, S., Duesbury, M. J., Koch, M., Meldal, B. H., Melidoni, A., Porras, P., Orchard, S., Shrivastava, A., Chang, H. Y., Finn, R. D., Fraser, M., Mitchell, A. L., Nuka, G., Potter, S., Rawlings, N. D., Richardson, L., Sangrador-Vegas, A., Young, S. Y., Blake, J. A., Christie, K. R., Dolan, M. E., Drabkin, H. J., Hill, D. P., Ni, L., Sitnikov, D., Harris, M. A., Hayles, J., Oliver, S. G., Rutherford, K., Wood, V., Bahler, J., Lock, A., De Pons, J., Dwinell, M., Shimoyama, M., Laulederkind, S., Hayman, G. T., Tutaj, M., Wang, S., D'Eustachio, P., Matthews, L., Balhoff, J. P., Balakrishnan, R., Binkley, G., Cherry, J. M., Costanzo, M. C., Engel, S. R., Miyasato, S. R., Nash, R. S., Simison, M., Skrzypek, M. S., Weng, S., Wong, E. D., Feuermann, M., Gaudet, P., Berardini, T. Z., Li, D., Muller, B., Reiser, L., Huala, E., Argasinska, J., Arighi, C., Auchincloss, A., Axelsen, K., Argoud-Puy, G., Bateman, A., Bely, B., Blatter, M., Bonilla, C., Bougueleret, L., Boutet, E., Breuza, L., Bridge, A., Britto, R., Hye-A-Bye, H., Casals, C., Cibrian-Uhalte, E., Coudert, E., Cusin, I., Duek-Roggli, P., Estreicher, A., Famiglietti, L., Gane, P., Garmiri, P., Georghiou, G., Gos, A., Gruaz-Gumowski, N., Hatton-Ellis, E., Hinz, U., Holmes, A., Hulo, C., Jungo, F., Keller, G., Laiho, K., Lemercier, P., Lieberherr, D., MacDougall, A., Magrane, M., Martin, M. J., Masson, P., Natale, D. A., O'Donovan, C., Pedruzzi, I., Pichler, K., POGGIOLI, D., Poux, S., Rivoire, C., Roechert, B., Sawford, T., Schneider, M., Speretta, E., Shypitsyna, A., Stutz, A., Sundaram, S., Tognolli, M., Wu, C., Xenarios, I., Yeh, L., Chan, J., Gao, S., Howe, K., Kishore, R., LEE, R., Li, Y., Lomax, J., Muller, H., Raciti, D., Van Auken, K., Berriman, M., Stein, L., Kersey, P., Sternberg, P. W., Howe, D., Westerfield, M. 2017; 45 (D1): D331-D338

Abstract

The Gene Ontology (GO) is a comprehensive resource of computable knowledge regarding the functions of genes and gene products. As such, it is extensively used by the biomedical research community for the analysis of -omics and related data. Our continued focus is on improving the quality and utility of the GO resources, and we welcome and encourage input from researchers in all areas of biology. In this update, we summarize the current contents of the GO knowledgebase, and present several new features and improvements that have been made to the ontology, the annotations and the tools. Among the highlights are 1) developments that facilitate access to, and application of, the GO knowledgebase, and 2) extensions to the resource as well as increasing support for descriptions of causal models of biological systems and network biology. To learn more, visit http://geneontology.org/.

View details for DOI 10.1093/nar/gkw1108

View details for Web of Science ID 000396575500049

View details for PubMedID 27899567
RNAcentral: a comprehensive database of non-coding RNA sequences NUCLEIC ACIDS RESEARCH Petrov, A. I., Kay, S. J., Kalvari, I., Howe, K. L., Gray, K. A., Bruford, E. A., Kersey, P. J., Cochrane, G., Finn, R. D., Bateman, A., Kozomara, A., Griffiths-Jones, S., Frankish, A., Zwieb, C. W., Lau, B. Y., Williams, K. P., Chan, P. P., Lowe, T. M., Cannone, J. J., Gutell, R. R., Machnicka, M. A., Bujnicki, J. M., Yoshihama, M., Kenmochi, N., Chai, B., Cole, J. R., Szymanski, M., Karlowski, W. M., Wood, V., Huala, E., Berardini, T. Z., Zhao, Y., Chen, R., Zhu, W., Paraskevopoulou, M. D., Vlachos, I. S., Hatzigeorgiou, A. G., Ma, L., Zhang, Z., Puetz, J., Stadler, P. F., McDonald, D., Basu, S., Fey, P., Engel, S. R., Cherry, J. M., Volders, P., Mestdagh, P., Wower, J., Clark, M., Quek, X. C., Dinger, M. E. 2017; 45 (D1): D128-D134

Abstract

RNAcentral is a database of non-coding RNA (ncRNA) sequences that aggregates data from specialised ncRNA resources and provides a single entry point for accessing ncRNA sequences of all ncRNA types from all organisms. Since its launch in 2014, RNAcentral has integrated twelve new resources, taking the total number of collaborating database to 22, and began importing new types of data, such as modified nucleotides from MODOMICS and PDB. We created new species-specific identifiers that refer to unique RNA sequences within a context of single species. The website has been subject to continuous improvements focusing on text and sequence similarity searches as well as genome browsing functionality. All RNAcentral data is provided for free and is available for browsing, bulk downloads, and programmatic access at http://rnacentral.org/.

View details for DOI 10.1093/nar/gkw1008

View details for Web of Science ID 000396575500020

View details for PubMedCentralID PMC5210518
The Encyclopedia of DNA elements (ENCODE): data portal update. Nucleic acids research Davis, C. A., Hitz, B. C., Sloan, C. A., Chan, E. T., Davidson, J. M., Gabdank, I. n., Hilton, J. A., Jain, K. n., Baymuradov, U. K., Narayanan, A. K., Onate, K. C., Graham, K. n., Miyasato, S. R., Dreszer, T. R., Strattan, J. S., Jolanki, O. n., Tanaka, F. Y., Cherry, J. M. 2017

Abstract

The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center has developed the ENCODE Portal database and website as the source for the data and metadata generated by the ENCODE Consortium. Two principles have motivated the design. First, experimental protocols, analytical procedures and the data themselves should be made publicly accessible through a coherent, web-based search and download interface. Second, the same interface should serve carefully curated metadata that record the provenance of the data and justify its interpretation in biological terms. Since its initial release in 2013 and in response to recommendations from consortium members and the wider community of scientists who use the Portal to access ENCODE data, the Portal has been regularly updated to better reflect these design principles. Here we report on these updates, including results from new experiments, uniformly-processed data from other projects, new visualization tools and more comprehensive metadata to describe experiments and analyses. Additionally, the Portal is now home to meta(data) from related projects including Genomics of Gene Regulation, Roadmap Epigenome Project, Model organism ENCODE (modENCODE) and modERN. The Portal now makes available over 13000 datasets and their accompanying metadata and can be accessed at: https://www.encodeproject.org/.

View details for PubMedID 29126249
XenMine: A genomic interaction tool for the Xenopus community. Developmental biology Reid, C. D., Karra, K., Chang, J., Piskol, R., Li, Q., Li, J. B., Cherry, J. M., Baker, J. C. 2016

Abstract

The Xenopus community has embraced recent advances in sequencing technology, resulting in the accumulation of numerous RNA-Seq and ChIP-Seq datasets. However, easily accessing and comparing datasets generated by multiple laboratories is challenging. Thus, we have created a central space to view, search and analyze data, providing essential information on gene expression changes and regulatory elements present in the genome. XenMine (www.xenmine.org) is a user-friendly website containing published genomic datasets from both Xenopus tropicalis and Xenopus laevis. We have established an analysis pipeline where all published datasets are uniformly processed with the latest genome releases. Information from these datasets can be extracted and compared using an array of pre-built or custom templates. With these search tools, users can easily extract sequences for all putative regulatory domains surrounding a gene of interest, identify the expression values of a gene of interest over developmental time, and analyze lists of genes for gene ontology terms and publications. Additionally, XenMine hosts an in-house genome browser that allows users to visualize all available ChIP-Seq data, extract specifically marked sequences, and aid in identifying important regulatory elements within the genome. Altogether, XenMine is an excellent tool for visualizing, accessing and querying analyzed datasets rapidly and efficiently.

View details for DOI 10.1016/j.ydbio.2016.02.034

View details for PubMedID 27157655
The Saccharomyces Genome Database Variant Viewer. Nucleic acids research Sheppard, T. K., Hitz, B. C., Engel, S. R., Song, G., Balakrishnan, R., Binkley, G., Costanzo, M. C., Dalusag, K. S., Demeter, J., Hellerstedt, S. T., Karra, K., Nash, R. S., Paskov, K. M., Skrzypek, M. S., Weng, S., Wong, E. D., Cherry, J. M. 2016; 44 (D1): D698-702

Abstract

The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. In recent years, we have moved toward increased representation of sequence variation and allelic differences within S. cerevisiae. The publication of numerous additional genomes has motivated the creation of new tools for their annotation and analysis. Here we present the Variant Viewer: a dynamic open-source web application for the visualization of genomic and proteomic differences. Multiple sequence alignments have been constructed across high quality genome sequences from 11 different S. cerevisiae strains and stored in the SGD. The alignments and summaries are encoded in JSON and used to create a two-tiered dynamic view of the budding yeast pan-genome, available at http://www.yeastgenome.org/variant-viewer.

View details for DOI 10.1093/nar/gkv1250

View details for PubMedID 26578556

View details for PubMedCentralID PMC4702884
Integration of new alternative reference strain genome sequences into the Saccharomyces genome database. Database : the journal of biological databases and curation Song, G., Balakrishnan, R., Binkley, G., Costanzo, M. C., Dalusag, K., Demeter, J., Engel, S., Hellerstedt, S. T., Karra, K., Hitz, B. C., Nash, R. S., Paskov, K., Sheppard, T., Skrzypek, M., Weng, S., Wong, E., Michael Cherry, J. 2016; 2016

Abstract

The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. To provide a wider scope of genetic and phenotypic variation in yeast, the genome sequences and their corresponding annotations from 11 alternative S. cerevisiae reference strains have been integrated into SGD. Genomic and protein sequence information for genes from these strains are now available on the Sequence and Protein tab of the corresponding Locus Summary pages. We illustrate how these genome sequences can be utilized to aid our understanding of strain-specific functional and phenotypic differences.Database URL: www.yeastgenome.org.

View details for DOI 10.1093/database/baw074

View details for PubMedID 27252399

View details for PubMedCentralID PMC4888754
Providing Access to Genomic Variant Knowledge in a Healthcare Setting: A Vision for the ClinGen Electronic Health Records Workgroup. Clinical pharmacology and therapeutics Overby, C. L., Heale, B. n., Aronson, S. n., Cherry, J. M., Dwight, S. n., Milosavljevic, A. n., Nelson, T. n., Niehaus, A. n., Weaver, M. A., Ramos, E. M., Williams, M. S. 2016; 99 (2): 157–60

Abstract

The Clinical Genome Resource (ClinGen) is a National Institutes of Health (NIH)-funded collaborative program that brings together a variety of projects designed to provide high-quality, curated information on clinically relevant genes and variants. ClinGen's EHR (Electronic Health Record) Workgroup aims to ensure that ClinGen is accessible to providers and patients through EHR and related systems. This article describes the current scope of these efforts and progress to date. The ClinGen public portal can be accessed at www.clinicalgenome.org.

View details for PubMedID 26418054
The Saccharomyces Genome Database: A Tool for Discovery. Cold Spring Harbor protocols Cherry, J. M. 2015; 2015 (12): pdb.top083840

Abstract

The Saccharomyces Genome Database (SGD) is the main community repository of information for the budding yeast, Saccharomyces cerevisiae. The SGD has collected published results on chromosomal features, including genes and their products, and has become an encyclopedia of information on the biology of the yeast cell. This information includes gene and gene product function, phenotype, interactions, regulation, complexes, and pathways. All information has been integrated into a unique web resource, accessible via http://yeastgenome.org. The website also provides custom tools to allow useful searches and visualization of data. The experimentally defined functions of genes, mutant phenotypes, and sequence homologies archived in the SGD provide a platform for understanding many fields of biological research. The mission of SGD is to provide public access to all published experimental results on yeast to aid life science students, educators, and researchers. As such, the SGD has become an essential tool for the design of experiments and for the analysis of experimental results.

View details for DOI 10.1101/pdb.top083840

View details for PubMedID 26631132

View details for PubMedCentralID PMC5673599
The Saccharomyces Genome Database: Exploring Biochemical Pathways and Mutant Phenotypes. Cold Spring Harbor protocols Cherry, J. M. 2015; 2015 (12): pdb.prot088898

Abstract

Many biochemical processes, and the proteins and cofactors involved, have been defined for the eukaryote Saccharomyces cerevisiae. This understanding has been largely derived through the awesome power of yeast genetics. The proteins responsible for the reactions that build complex molecules and generate energy for the cell have been integrated into web-based tools that provide classical views of pathways. The Yeast Pathways in the Saccharomyces Genome Database (SGD) is, however, the only database created from manually curated literature annotations. In this protocol, gene function is explored using phenotype annotations to enable hypotheses to be formulated about a gene's action. A common use of the SGD is to understand more about a gene that was identified via a phenotypic screen or found to interact with a gene/protein of interest. There are still many genes that do not yet have an experimentally defined function and so the information currently available can be used to speculate about their potential function. Typically, computational annotations based on sequence similarity are used to predict gene function. In addition, annotations are sometimes available for phenotypes of mutations in the gene of interest. Integrated results for a few example genes will be explored in this protocol. This will be instructive for the exploration of details that aid the analysis of experimental results and the establishment of connections within the yeast literature.

View details for DOI 10.1101/pdb.prot088898

View details for PubMedID 26631123

View details for PubMedCentralID PMC5673601
The Saccharomyces Genome Database: Gene Product Annotation of Function, Process, and Component. Cold Spring Harbor protocols Cherry, J. M. 2015; 2015 (12): pdb.prot088914

Abstract

An ontology is a highly structured form of controlled vocabulary. Each entry in the ontology is commonly called a term. These terms are used when talking about an annotation. However, each term has a definition that, like the definition of a word found within a dictionary, provides the complete usage and detailed explanation of the term. It is critical to consult a term's definition because the distinction between terms can be subtle. The use of ontologies in biology started as a way of unifying communication between scientific communities and to provide a standard dictionary for different topics, including molecular functions, biological processes, mutant phenotypes, chemical properties and structures. The creation of ontology terms and their definitions often requires debate to reach agreement but the result has been a unified descriptive language used to communicate knowledge. In addition to terms and definitions, ontologies require a relationship used to define the type of connection between terms. In an ontology, a term can have more than one parent term, the term above it in an ontology, as well as more than one child, the term below it in the ontology. Many ontologies are used to construct annotations in the Saccharomyces Genome Database (SGD), as in all modern biological databases; however, Gene Ontology (GO), a descriptive system used to categorize gene function, is the most extensively used ontology in SGD annotations. Examples included in this protocol illustrate the structure and features of this ontology.

View details for DOI 10.1101/pdb.prot088914

View details for PubMedID 26631125

View details for PubMedCentralID PMC5673600
The Saccharomyces Genome Database: Exploring Genome Features and Their Annotations. Cold Spring Harbor protocols Cherry, J. M. 2015; 2015 (12): pdb.prot088922

Abstract

Genomic-scale assays result in data that provide information over the entire genome. Such base pair resolution data cannot be summarized easily except via a graphical viewer. A genome browser is a tool that displays genomic data and experimental results as horizontal tracks. Genome browsers allow searches for a chromosomal coordinate or a feature, such as a gene name, but they do not allow searches by function or upstream binding site. Entry into a genome browser requires that you identify the gene name or chromosomal coordinates for a region of interest. A track provides a representation for genomic results and is displayed as a row of data shown as line segments to indicate regions of the chromosome with a feature. Another type of track presents a graph or wiggle plot that indicates the processed signal intensity computed for a particular experiment or set of experiments. Wiggle plots are typical for genomic assays such as the various next-generation sequencing methods (e.g., chromatin immunoprecipitation [ChIP]-seq or RNA-seq), where it represents a peak of DNA binding, histone modification, or the mapping of an RNA sequence. Here we explore the browser that has been built into the Saccharomyces Genome Database (SGD).

View details for DOI 10.1101/pdb.prot088922

View details for PubMedID 26631126

View details for PubMedCentralID PMC5673602
Gene Ontology Consortium: going forward NUCLEIC ACIDS RESEARCH Blake, J. A., Christie, K. R., Dolan, M. E., Drabkin, H. J., Hill, D. P., Ni, L., Sitnikov, D., Burgess, S., Buza, T., Gresham, C., McCarthy, F., Pillai, L., Wang, H., CARBON, S., Dietze, H., Lewis, S. E., Mungall, C. J., Munoz-Torres, M. C., Feuermann, M., Gaudet, P., Basu, S., Chisholm, R. L., Dodson, R. J., Fey, P., Mi, H., Thomas, P. D., Muruganujan, A., Poudel, S., Hu, J. C., Aleksander, S. A., McIntosh, B. K., Renfro, D. P., Siegele, D. A., Attrill, H., Brown, N. H., Tweedie, S., Lomax, J., Osumi-Sutherland, D., Parkinson, H., Roncaglia, P., Lovering, R. C., Talmud, P. J., Humphries, S. E., Denny, P., Campbell, N. H., Foulger, R. E., Chibucos, M. C., Giglio, M. G., Chang, H. Y., Finn, R., Fraser, M., Mitchell, A., Nuka, G., Pesseat, S., Sangrador, A., Scheremetjew, M., Young, S. Y., Stephan, R., Harris, M. A., Oliver, S. G., Rutherford, K., Wood, V., Bahler, J., Lock, A., Kersey, P. J., McDowall, M. D., Staines, D. M., Dwinell, M., Shimoyama, M., Laulederkind, S., Hayman, G. T., Wang, S. J., Petri, V., D'Eustachio, P., Matthews, L., Balakrishnan, R., Binkley, G., Cherry, J. M., Costanzo, M. C., Demeter, J., Dwight, S. S., Engel, S. R., Hitz, B. C., Inglis, D. O., Lloyd, P., Miyasato, S. R., Paskov, K., Roe, G., Simison, M., Nash, R. S., Skrzypek, M. S., Weng, S., Wong, E. D., Berardini, T. Z., Li, D., Huala, E., Argasinska, J., Arighi, C., Auchincloss, A., Axelsen, K., Argoud-Puy, G., Bateman, A., Bely, B., Blatter, M. C., Bonilla, C., Bougueleret, L., Boutet, E., Breuza, L., Bridge, A., Britto, R., Casals, C., Cibrian-Uhalte, E., Coudert, E., Cusin, I., Duek-Roggli, P., Estreicher, A., Famiglietti, L., Gane, P., Garmiri, P., Gos, A., Gruaz-Gumowski, N., Hatton-Ellis, E., Hinz, U., Hulo, C., Huntley, R., Jungo, F., Keller, G., Laiho, K., Lemercier, P., Lieberherr, D., MacDougall, A., Magrane, M., Martin, M., Masson, P., Mutowo, P., O'Donovan, C., Pedruzzi, I., Pichler, K., POGGIOLI, D., Poux, S., Rivoire, C., Roechert, B., Sawford, T., Schneider, M., Shypitsyna, A., Stutz, A., Sundaram, S., Tognolli, M., Wu, C., Xenarios, I., Chan, J., Kishore, R., Sternberg, P. W., Van Auken, K., Muller, H. M., Done, J., Li, Y., Howe, D., Westerfield, M. 2015; 43 (D1): D1049-D1056

Abstract

The Gene Ontology (GO; http://www.geneontology.org) is a community-based bioinformatics resource that supplies information about gene product function using ontologies to represent biological knowledge. Here we describe improvements and expansions to several branches of the ontology, as well as updates that have allowed us to more efficiently disseminate the GO and capture feedback from the research community. The Gene Ontology Consortium (GOC) has expanded areas of the ontology such as cilia-related terms, cell-cycle terms and multicellular organism processes. We have also implemented new tools for generating ontology terms based on a set of logical rules making use of templates, and we have made efforts to increase our use of logical definitions. The GOC has a new and improved web site summarizing new developments and documentation, serving as a portal to GO data. Users can perform GO enrichment analysis, and search the GO for terms, annotations to gene products, and associated metadata across multiple species using the all-new AmiGO 2 browser. We encourage and welcome the input of the research community in all biological areas in our continued effort to improve the Gene Ontology.

View details for DOI 10.1093/nar/gku1179

View details for Web of Science ID 000350210400154

View details for PubMedCentralID PMC4383973
RNAcentral: an international database of ncRNA sequences NUCLEIC ACIDS RESEARCH Petrov, A. I., Kay, S. J., Gibson, R., Kulesha, E., Staines, D., Bruford, E. A., Wright, M. W., Burge, S., Finn, R. D., Kersey, P. J., Cochrane, G., Bateman, A., Griffiths-Jones, S., Harrow, J., Chan, P. P., Lowe, T. M., Zwieb, C. W., Wower, J., Williams, K. P., Hudson, C. M., Gutell, R., Clark, M. B., Dinger, M., Quek, X. C., Bujnicki, J. M., Chua, N., Liu, J., Wang, H., Skogerbo, G., Zhao, Y., Chen, R., Zhu, W., Cole, J. R., Chai, B., Huang, H., Huang, H., Cherry, J. M., Hatzigeorgiou, A., Pruitt, K. D. 2015; 43 (D1): D123-D129

Abstract

The field of non-coding RNA biology has been hampered by the lack of availability of a comprehensive, up-to-date collection of accessioned RNA sequences. Here we present the first release of RNAcentral, a database that collates and integrates information from an international consortium of established RNA sequence databases. The initial release contains over 8.1 million sequences, including representatives of all major functional classes. A web portal (http://rnacentral.org) provides free access to data, search functionality, cross-references, source code and an integrated genome browser for selected species.

View details for DOI 10.1093/nar/gku991

View details for Web of Science ID 000350210400020

View details for PubMedCentralID PMC4384043
AGAPE (Automated Genome Analysis PipelinE) for pan-genome analysis of Saccharomyces cerevisiae. PloS one Song, G., Dickins, B. J., Demeter, J., Engel, S., Gallagher, J., Choe, K., Dunn, B., Snyder, M., Cherry, J. M. 2015; 10 (3)

Abstract

The characterization and public release of genome sequences from thousands of organisms is expanding the scope for genetic variation studies. However, understanding the phenotypic consequences of genetic variation remains a challenge in eukaryotes due to the complexity of the genotype-phenotype map. One approach to this is the intensive study of model systems for which diverse sources of information can be accumulated and integrated. Saccharomyces cerevisiae is an extensively studied model organism, with well-known protein functions and thoroughly curated phenotype data. To develop and expand the available resources linking genomic variation with function in yeast, we aim to model the pan-genome of S. cerevisiae. To initiate the yeast pan-genome, we newly sequenced or re-sequenced the genomes of 25 strains that are commonly used in the yeast research community using advanced sequencing technology at high quality. We also developed a pipeline for automated pan-genome analysis, which integrates the steps of assembly, annotation, and variation calling. To assign strain-specific functional annotations, we identified genes that were not present in the reference genome. We classified these according to their presence or absence across strains and characterized each group of genes with known functional and phenotypic features. The functional roles of novel genes not found in the reference genome and associated with strains or groups of strains appear to be consistent with anticipated adaptations in specific lineages. As more S. cerevisiae strain genomes are released, our analysis can be used to collate genome data and relate it to lineage-specific patterns of genome evolution. Our new tool set will enhance our understanding of genomic and functional evolution in S. cerevisiae, and will be available to the yeast genetics and molecular biology community.

View details for DOI 10.1371/journal.pone.0120671

View details for PubMedID 25781462
Ontology application and use at the ENCODE DCC. Database : the journal of biological databases and curation Malladi, V. S., Erickson, D. T., Podduturi, N. R., Rowe, L. D., Chan, E. T., Davidson, J. M., Hitz, B. C., Ho, M., Lee, B. T., Miyasato, S., Roe, G. R., Simison, M., Sloan, C. A., Strattan, J. S., Tanaka, F., Kent, W. J., Cherry, J. M., Hong, E. L. 2015; 2015

Abstract

The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a catalog of genomic annotations. To date, the project has generated over 4000 experiments across more than 350 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory network and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All ENCODE experimental data, metadata and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage and distribution to community resources and the scientific community. As the volume of data increases, the organization of experimental details becomes increasingly complicated and demands careful curation to identify related experiments. Here, we describe the ENCODE DCC's use of ontologies to standardize experimental metadata. We discuss how ontologies, when used to annotate metadata, provide improved searching capabilities and facilitate the ability to find connections within a set of experiments. Additionally, we provide examples of how ontologies are used to annotate ENCODE metadata and how the annotations can be identified via ontology-driven searches at the ENCODE portal. As genomic datasets grow larger and more interconnected, standardization of metadata becomes increasingly vital to allow for exploration and comparison of data between different scientific projects.

View details for DOI 10.1093/database/bav010

View details for PubMedID 25776021

View details for PubMedCentralID PMC4360730
Saccharomyces genome database provides new regulation data. Nucleic acids research Costanzo, M. C., Engel, S. R., Wong, E. D., Lloyd, P., Karra, K., Chan, E. T., Weng, S., Paskov, K. M., Roe, G. R., Binkley, G., Hitz, B. C., Cherry, J. M. 2014; 42 (Database issue): D717-25

Abstract

The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is the community resource for genomic, gene and protein information about the budding yeast Saccharomyces cerevisiae, containing a variety of functional information about each yeast gene and gene product. We have recently added regulatory information to SGD and present it on a new tabbed section of the Locus Summary entitled 'Regulation'. We are compiling transcriptional regulator-target gene relationships, which are curated from the literature at SGD or imported, with permission, from the YEASTRACT database. For nearly every S. cerevisiae gene, the Regulation page displays a table of annotations showing the regulators of that gene, and a graphical visualization of its regulatory network. For genes whose products act as transcription factors, the Regulation page also shows a table of their target genes, accompanied by a Gene Ontology enrichment analysis of the biological processes in which those genes participate. We additionally synthesize information from the literature for each transcription factor in a free-text Regulation Summary, and provide other information relevant to its regulatory function, such as DNA binding site motifs and protein domains. All of the regulation data are available for querying, analysis and download via YeastMine, the InterMine-based data warehouse system in use at SGD.

View details for DOI 10.1093/nar/gkt1158

View details for PubMedID 24265222

View details for PubMedCentralID PMC3965049
The Reference Genome Sequence of Saccharomyces cerevisiae: Then and Now. G3 (Bethesda, Md.) Engel, S. R., Dietrich, F. S., Fisk, D. G., Binkley, G., Balakrishnan, R., Costanzo, M. C., Dwight, S. S., Hitz, B. C., Karra, K., Nash, R. S., Weng, S., Wong, E. D., Lloyd, P., Skrzypek, M. S., Miyasato, S. R., Simison, M., Cherry, J. M. 2014; 4 (3): 389-398

Abstract

The genome of the budding yeast Saccharomyces cerevisiae was the first completely sequenced from a eukaryote. It was released in 1996 as the work of a worldwide effort of hundreds of researchers. In the time since, the yeast genome has been intensively studied by geneticists, molecular biologists, and computational scientists all over the world. Maintenance and annotation of the genome sequence have long been provided by the Saccharomyces Genome Database, one of the original model organism databases. To deepen our understanding of the eukaryotic genome, the S. cerevisiae strain S288C reference genome sequence was updated recently in its first major update since 1996. The new version, called "S288C 2010," was determined from a single yeast colony using modern sequencing technologies and serves as the anchor for further innovations in yeast genomic science.

View details for DOI 10.1534/g3.113.008995

View details for PubMedID 24374639

View details for PubMedCentralID PMC3962479
DATABASE, The Journal of Biological Databases and Curation, is now the official journal of the International Society for Biocuration. Database : the journal of biological databases and curation Gaudet, P., Munoz-Torres, M., Robinson-Rechavi, M., Attwood, T., Bateman, A., Cherry, J. M., Kania, R., O'Donovan, C., Yamasaki, C. 2013; 2013: bat077

View details for DOI 10.1093/database/bat077

View details for PubMedID 24319113

View details for PubMedCentralID PMC3855479
A guide to best practices for Gene Ontology (GO) manual annotation DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION Balakrishnan, R., Harris, M. A., Huntley, R., Van Auken, K., Cherry, J. M. 2013

Abstract

The Gene Ontology Consortium (GOC) is a community-based bioinformatics project that classifies gene product function through the use of structured controlled vocabularies. A fundamental application of the Gene Ontology (GO) is in the creation of gene product annotations, evidence-based associations between GO definitions and experimental or sequence-based analysis. Currently, the GOC disseminates 126 million annotations covering >374 000 species including all the kingdoms of life. This number includes two classes of GO annotations: those created manually by experienced biocurators reviewing the literature or by examination of biological data (1.1 million annotations covering 2226 species) and those generated computationally via automated methods. As manual annotations are often used to propagate functional predictions between related proteins within and between genomes, it is critical to provide accurate consistent manual annotations. Toward this goal, we present here the conventions defined by the GOC for the creation of manual annotation. This guide represents the best practices for manual annotation as established by the GOC project over the past 12 years. We hope this guide will encourage research communities to annotate gene products of their interest to enhance the corpus of GO annotations available to all. Database URL: http://www.geneontology.org.

View details for DOI 10.1093/database/bat054

View details for Web of Science ID 000322067500001

View details for PubMedID 23842463

View details for PubMedCentralID PMC3706743
InterMOD: integrated data and tools for the unification of model organism research. Scientific reports Sullivan, J., Karra, K., Moxon, S. A., Vallejos, A., Motenko, H., Wong, J. D., Aleksic, J., Balakrishnan, R., Binkley, G., Harris, T., Hitz, B., Jayaraman, P., Lyne, R., Neuhauser, S., Pich, C., Smith, R. N., Trinh, Q., Cherry, J. M., Richardson, J., Stein, L., Twigger, S., Westerfield, M., Worthey, E., Micklem, G. 2013; 3: 1802-?

Abstract

Model organisms are widely used for understanding basic biology, and have significantly contributed to the study of human disease. In recent years, genomic analysis has provided extensive evidence of widespread conservation of gene sequence and function amongst eukaryotes, allowing insights from model organisms to help decipher gene function in a wider range of species. The InterMOD consortium is developing an infrastructure based around the InterMine data warehouse system to integrate genomic and functional data from a number of key model organisms, leading the way to improved cross-species research. So far including budding yeast, nematode worm, fruit fly, zebrafish, rat and mouse, the project has set up data warehouses, synchronized data models, and created analysis tools and links between data from different species. The project unites a number of major model organism databases, improving both the consistency and accessibility of comparative research, to the benefit of the wider scientific community.

View details for DOI 10.1038/srep01802

View details for PubMedID 23652793

View details for PubMedCentralID PMC3647165
The new modern era of yeast genomics: community sequencing and the resulting annotation of multiple Saccharomyces cerevisiae strains at the Saccharomyces Genome Database DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION Engel, S. R., Cherry, J. M. 2013

Abstract

The first completed eukaryotic genome sequence was that of the yeast Saccharomyces cerevisiae, and the Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is the original model organism database. SGD remains the authoritative community resource for the S. cerevisiae reference genome sequence and its annotation, and continues to provide comprehensive biological information correlated with S. cerevisiae genes and their products. A diverse set of yeast strains have been sequenced to explore commercial and laboratory applications, and a brief history of those strains is provided. The publication of these new genomes has motivated the creation of new tools, and SGD will annotate and provide comparative analyses of these sequences, correlating changes with variations in strain phenotypes and protein function. We are entering a new era at SGD, as we incorporate these new sequences and make them accessible to the scientific community, all in an effort to continue in our mission of educating researchers and facilitating discovery.

View details for DOI 10.1093/database/bat012

View details for Web of Science ID 000316172400001

View details for PubMedID 23487186

View details for PubMedCentralID PMC3595989
The YeastGenome app: the Saccharomyces Genome Database at your fingertips DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION Wong, E. D., Karra, K., Hitz, B. C., Hong, E. L., Cherry, J. M. 2013

Abstract

The Saccharomyces Genome Database (SGD) is a scientific database that provides researchers with high-quality curated data about the genes and gene products of Saccharomyces cerevisiae. To provide instant and easy access to this information on mobile devices, we have developed YeastGenome, a native application for the Apple iPhone and iPad. YeastGenome can be used to quickly find basic information about S. cerevisiae genes and chromosomal features regardless of internet connectivity. With or without network access, you can view basic information and Gene Ontology annotations about a gene of interest by searching gene names and gene descriptions or by browsing the database within the app to find the gene of interest. With internet access, the app provides more detailed information about the gene, including mutant phenotypes, references and protein and genetic interactions, as well as provides hyperlinks to retrieve detailed information by showing SGD pages and views of the genome browser. SGD provides online help describing basic ways to navigate the mobile version of SGD, highlights key features and answers frequently asked questions related to the app. The app is available from iTunes (http://itunes.com/apps/yeastgenome). The YeastGenome app is provided freely as a service to our community, as part of SGD's mission to provide free and open access to all its data and annotations.

View details for DOI 10.1093/database/bat004

View details for Web of Science ID 000316179800001

View details for PubMedID 23396302

View details for PubMedCentralID PMC3567487
A gene ontology inferred from molecular networks NATURE BIOTECHNOLOGY Dutkowski, J., Kramer, M., Surma, M. A., Balakrishnan, R., Cherry, J. M., Krogan, N. J., Ideker, T. 2013; 31 (1): 38-?

Abstract

Ontologies have proven very useful for capturing knowledge as a hierarchy of terms and their interrelationships. In biology a major challenge has been to construct ontologies of gene function given incomplete biological knowledge and inconsistencies in how this knowledge is manually curated. Here we show that large networks of gene and protein interactions in Saccharomyces cerevisiae can be used to infer an ontology whose coverage and power are equivalent to those of the manually curated Gene Ontology (GO). The network-extracted ontology (NeXO) contains 4,123 biological terms and 5,766 term-term relations, capturing 58% of known cellular components. We also explore robust NeXO terms and term relations that were initially not cataloged in GO, a number of which have now been added based on our analysis. Using quantitative genetic interaction profiling and chemogenomics, we find further support for many of the uncharacterized terms identified by NeXO, including multisubunit structures related to protein trafficking or mitochondrial function. This work enables a shift from using ontologies to evaluate data to using data to construct and evaluate ontologies.

View details for DOI 10.1038/nbt.2463

View details for Web of Science ID 000313563600020

View details for PubMedID 23242164

View details for PubMedCentralID PMC3654867
Gene Ontology Annotations and Resources NUCLEIC ACIDS RESEARCH Blake, J. A., Dolan, M., Drabkin, H., Hill, D. P., Ni, L., Sitnikov, D., Bridges, S., Burgess, S., Buza, T., McCarthy, F., Peddinti, D., Pillai, L., CARBON, S., Dietze, H., Ireland, A., Lewis, S. E., Mungall, C. J., Gaudet, P., Chisholm, R. L., Fey, P., Kibbe, W. A., Basu, S., Siegele, D. A., McIntosh, B. K., Renfro, D. P., Zweifel, A. E., Hu, J. C., Brown, N. H., Tweedie, S., Alam-Faruque, Y., Apweiler, R., Auchinchloss, A., Axelsen, K., Bely, B., Blatter, M., Bonilla, C., Bougueleret, L., Boutet, E., Breuza, L., Bridge, A., Chan, W. M., Chavali, G., Coudert, E., Dimmer, E., Estreicher, A., Famiglietti, L., Feuermann, M., Gos, A., Gruaz-Gumowski, N., Hieta, R., Hinz, U., Hulo, C., Huntley, R., James, J., Jungo, F., Keller, G., Laiho, K., Legge, D., Lemercier, P., Lieberherr, D., Magrane, M., Martin, M. J., Masson, P., Mutowo-Muellenet, P., O'Donovan, C., Pedruzzi, I., Pichler, K., POGGIOLI, D., Millan, P. P., Poux, S., Rivoire, C., Roechert, B., Sawford, T., Schneider, M., Stutz, A., Sundaram, S., Tognolli, M., Xenarios, I., Foulger, R., Lomax, J., Roncaglia, P., Khodiyar, V. K., Lovering, R. C., Talmud, P. J., Chibucos, M., Giglio, M. G., Chang, H., Hunter, S., McAnulla, C., Mitchell, A., Sangrador, A., Stephan, R., Harris, M. A., Oliver, S. G., Rutherford, K., Wood, V., Bahler, J., Lock, A., Kersey, P. J., McDowall, M. D., Staines, D. M., Dwinell, M., Shimoyama, M., Laulederkind, S., Hayman, T., Wang, S., Petri, V., Lowry, T., D'Eustachio, P., Matthews, L., Balakrishnan, R., Binkley, G., Cherry, J. M., Costanzo, M. C., Dwight, S. S., Engel, S. R., Fisk, D. G., Hitz, B. C., Hong, E. L., Karra, K., Miyasato, S. R., Nash, R. S., Park, J., Skrzypek, M. S., Weng, S., Wong, E. D., Berardini, T. Z., Li, D., Huala, E., Mi, H., Thomas, P. D., Chan, J., Kishore, R., Sternberg, P., Van Auken, K., Howe, D., Westerfield, M. 2013; 41 (D1): D530-D535

Abstract

The Gene Ontology (GO) Consortium (GOC, http://www.geneontology.org) is a community-based bioinformatics resource that classifies gene product function through the use of structured, controlled vocabularies. Over the past year, the GOC has implemented several processes to increase the quantity, quality and specificity of GO annotations. First, the number of manual, literature-based annotations has grown at an increasing rate. Second, as a result of a new 'phylogenetic annotation' process, manually reviewed, homology-based annotations are becoming available for a broad range of species. Third, the quality of GO annotations has been improved through a streamlined process for, and automated quality checks of, GO annotations deposited by different annotation groups. Fourth, the consistency and correctness of the ontology itself has increased by using automated reasoning tools. Finally, the GO has been expanded not only to cover new areas of biology through focused interaction with experts, but also to capture greater specificity in all areas of the ontology using tools for adding new combinatorial terms. The GOC works closely with other ontology developers to support integrated use of terminologies. The GOC supports its user community through the use of e-mail lists, social media and web-based resources.

View details for DOI 10.1093/nar/gks1050

View details for Web of Science ID 000312893300075

View details for PubMedCentralID PMC3531070
Annotation of functional variation in personal genomes using RegulomeDB GENOME RESEARCH Boyle, A. P., Hong, E. L., Hariharan, M., Cheng, Y., Schaub, M. A., Kasowski, M., Karczewski, K. J., Park, J., Hitz, B. C., Weng, S., Cherry, J. M., Snyder, M. 2012; 22 (9): 1790-1797

Abstract

As the sequencing of healthy and disease genomes becomes more commonplace, detailed annotation provides interpretation for individual variation responsible for normal and disease phenotypes. Current approaches focus on direct changes in protein coding genes, particularly nonsynonymous mutations that directly affect the gene product. However, most individual variation occurs outside of genes and, indeed, most markers generated from genome-wide association studies (GWAS) identify variants outside of coding segments. Identification of potential regulatory changes that perturb these sites will lead to a better localization of truly functional variants and interpretation of their effects. We have developed a novel approach and database, RegulomeDB, which guides interpretation of regulatory variants in the human genome. RegulomeDB includes high-throughput, experimental data sets from ENCODE and other sources, as well as computational predictions and manual annotations to identify putative regulatory potential and identify functional variants. These data sources are combined into a powerful tool that scores variants to help separate functional variants from a large pool and provides a small set of putative sites with testable hypotheses as to their function. We demonstrate the applicability of this tool to the annotation of noncoding variants from 69 full sequenced genomes as well as that of a personal genome, where thousands of functionally associated variants were identified. Moreover, we demonstrate a GWAS where the database is able to quickly identify the known associated functional variant and provide a hypothesis as to its function. Overall, we expect this approach and resource to be valuable for the annotation of human genome sequences.

View details for DOI 10.1101/gr.137323.112

View details for PubMedID 22955989
In the beginning there was babble ... AUTOPHAGY Klionsky, D. J., Bruford, E. A., Cherry, J. M., Hodgkin, J., Laulederkind, S. J., Singer, A. G. 2012; 8 (8): 1165-1167

View details for DOI 10.4161/auto.20665

View details for Web of Science ID 000308505200001

View details for PubMedID 22836666

View details for PubMedCentralID PMC3625114
YeastMine-an integrated data warehouse for Saccharomyces cerevisiae data as a multipurpose tool-kit DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION Balakrishnan, R., Park, J., Karra, K., Hitz, B. C., Binkley, G., Hong, E. L., Sullivan, J., Micklem, G., Cherry, J. M. 2012

Abstract

The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) provides high-quality curated genomic, genetic, and molecular information on the genes and their products of the budding yeast Saccharomyces cerevisiae. To accommodate the increasingly complex, diverse needs of researchers for searching and comparing data, SGD has implemented InterMine (http://www.InterMine.org), an open source data warehouse system with a sophisticated querying interface, to create YeastMine (http://yeastmine.yeastgenome.org). YeastMine is a multifaceted search and retrieval environment that provides access to diverse data types. Searches can be initiated with a list of genes, a list of Gene Ontology terms, or lists of many other data types. The results from queries can be combined for further analysis and saved or downloaded in customizable file formats. Queries themselves can be customized by modifying predefined templates or by creating a new template to access a combination of specific data types. YeastMine offers multiple scenarios in which it can be used such as a powerful search interface, a discovery tool, a curation aid and also a complex database presentation format. DATABASE URL: http://yeastmine.yeastgenome.org.

View details for DOI 10.1093/database/bar062

View details for Web of Science ID 000304923700001

View details for PubMedID 22434830

View details for PubMedCentralID PMC3308152
Considerations for creating and annotating the budding yeast Genome Map at SGD: a progress report DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION Chan, E. T., Cherry, J. M. 2012

Abstract

The Saccharomyces Genome Database (SGD) is compiling and annotating a comprehensive catalogue of functional sequence elements identified in the budding yeast genome. Recent advances in deep sequencing technologies have enabled for example, global analyses of transcription profiling and assembly of maps of transcription factor occupancy and higher order chromatin organization, at nucleotide level resolution. With this growing influx of published genome-scale data, come new challenges for their storage, display, analysis and integration. Here, we describe SGD's progress in the creation of a consolidated resource for genome sequence elements in the budding yeast, the considerations taken in its design and the lessons learned thus far. The data within this collection can be accessed at http://browse.yeastgenome.org and downloaded from http://downloads.yeastgenome.org. DATABASE URL: http://www.yeastgenome.org.

View details for DOI 10.1093/database/bar057

View details for Web of Science ID 000304922200001

View details for PubMedID 22434826

View details for PubMedCentralID PMC3308148
CvManGO, a method for leveraging computational predictions to improve literature-based Gene Ontology annotations DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION Park, J., Costanzo, M. C., Balakrishnan, R., Cherry, J. M., Hong, E. L. 2012

Abstract

The set of annotations at the Saccharomyces Genome Database (SGD) that classifies the cellular function of S. cerevisiae gene products using Gene Ontology (GO) terms has become an important resource for facilitating experimental analysis. In addition to capturing and summarizing experimental results, the structured nature of GO annotations allows for functional comparison across organisms as well as propagation of functional predictions between related gene products. Due to their relevance to many areas of research, ensuring the accuracy and quality of these annotations is a priority at SGD. GO annotations are assigned either manually, by biocurators extracting experimental evidence from the scientific literature, or through automated methods that leverage computational algorithms to predict functional information. Here, we discuss the relationship between literature-based and computationally predicted GO annotations in SGD and extend a strategy whereby comparison of these two types of annotation identifies genes whose annotations need review. Our method, CvManGO (Computational versus Manual GO annotations), pairs literature-based GO annotations with computational GO predictions and evaluates the relationship of the two terms within GO, looking for instances of discrepancy. We found that this method will identify genes that require annotation updates, taking an important step towards finding ways to prioritize literature review. Additionally, we explored factors that may influence the effectiveness of CvManGO in identifying relevant gene targets to find in particular those genes that are missing literature-supported annotations, but our survey found that there are no immediately identifiable criteria by which one could enrich for these under-annotated genes. Finally, we discuss possible ways to improve this strategy, and the applicability of this method to other projects that use the GO for curation. DATABASE URL: http://www.yeastgenome.org.

View details for DOI 10.1093/database/bas001

View details for Web of Science ID 000304919800001

View details for PubMedID 22434836

View details for PubMedCentralID PMC3308158
Saccharomyces Genome Database: the genomics resource of budding yeast. Nucleic acids research Cherry, J. M., Hong, E. L., Amundsen, C., Balakrishnan, R., Binkley, G., Chan, E. T., Christie, K. R., Costanzo, M. C., Dwight, S. S., Engel, S. R., Fisk, D. G., Hirschman, J. E., Hitz, B. C., Karra, K., Krieger, C. J., Miyasato, S. R., Nash, R. S., Park, J., Skrzypek, M. S., Simison, M., Weng, S., Wong, E. D. 2012; 40 (Database issue): D700-5

Abstract

The Saccharomyces Genome Database (SGD, http://www.yeastgenome.org) is the community resource for the budding yeast Saccharomyces cerevisiae. The SGD project provides the highest-quality manually curated information from peer-reviewed literature. The experimental results reported in the literature are extracted and integrated within a well-developed database. These data are combined with quality high-throughput results and provided through Locus Summary pages, a powerful query engine and rich genome browser. The acquisition, integration and retrieval of these data allow SGD to facilitate experimental design and analysis by providing an encyclopedia of the yeast genome, its chromosomal features, their functions and interactions. Public access to these data is provided to researchers and educators via web pages designed for optimal ease of use.

View details for DOI 10.1093/nar/gkr1029

View details for PubMedID 22110037

View details for PubMedCentralID PMC3245034
The Gene Ontology: enhancements for 2011 NUCLEIC ACIDS RESEARCH Blake, J. A., Dolan, M., Drabkin, H., Hill, D. P., Ni, L., Sitnikov, D., Burgess, S., Buza, T., Gresham, C., McCarthy, F., Pillai, L., Wang, H., CARBON, S., Lewis, S. E., Mungall, C. J., Gaudet, P., Chisholm, R. L., Fey, P., Kibbe, W. A., Basu, S., Siegele, D. A., McIntosh, B. K., Renfro, D. P., Zweifel, A. E., Hu, J. C., Brown, N. H., Tweedie, S., Alam-Faruque, Y., Apweiler, R., Auchinchloss, A., Axelsen, K., Argoud-Puy, G., Bely, B., Blatter, M., Bougueleret, L., Boutet, E., Branconi-Quintaje, S., Breuza, L., Bridge, A., Browne, P., Chan, W. M., Coudert, E., Cusin, I., Dimmer, E., Duek-Roggli, P., Eberhardt, R., Estreicher, A., Famiglietti, L., Ferro-Rojas, S., Feuermann, M., Gardner, M., Gos, A., Gruaz-Gumowski, N., Hinz, U., Hulo, C., Huntley, R., James, J., Jimenez, S., Jungo, F., Keller, G., Laiho, K., Legge, D., Lemercier, P., Lieberherr, D., Magrane, M., Martin, M. J., Masson, P., Moinat, M., O'Donovan, C., Pedruzzi, I., Pichler, K., POGGIOLI, D., Millan, P. P., Poux, S., Rivoire, C., Roechert, B., Sawford, T., Schneider, M., Sehra, H., Stanley, E., Stutz, A., Sundaram, S., Tognolli, M., Xenarios, I., Foulger, R., Lomax, J., Roncaglia, P., Camon, E., Khodiyar, V. K., Lovering, R. C., Talmud, P. J., Chibucos, M., Giglio, M. G., Dolinski, K., HEINICKE, S., Livstone, M. S., Stephan, R., Harris, M. A., Oliver, S. G., Rutherford, K., Wood, V., Bahler, J., Lock, A., Kersey, P. J., McDowall, M. D., Staines, D. M., Dwinell, M., Shimoyama, M., Laulederkind, S., Hayman, T., Wang, S., Petri, V., Lowry, T., D'Eustachio, P., Matthews, L., Amundsen, C. D., Balakrishnan, R., Binkley, G., Cherry, J. M., Christie, K. R., Costanzo, M. C., Dwight, S. S., Engel, S. R., Fisk, D. G., Hirschman, J. E., Hitz, B. C., Hong, E. L., Karra, K., Krieger, C. J., Miyasato, S. R., Nash, R. S., Park, J., Skrzypek, M. S., Weng, S., Wong, E. D., Berardini, T. Z., Li, D., Huala, E., Slonim, D., Wick, H., Thomas, P., Chan, J., Kishore, R., Sternberg, P., Van Auken, K., Howe, D., Westerfield, M. 2012; 40 (D1): D559-D564

Abstract

The Gene Ontology (GO) (http://www.geneontology.org) is a community bioinformatics resource that represents gene product function through the use of structured, controlled vocabularies. The number of GO annotations of gene products has increased due to curation efforts among GO Consortium (GOC) groups, including focused literature-based annotation and ortholog-based functional inference. The GO ontologies continue to expand and improve as a result of targeted ontology development, including the introduction of computable logical definitions and development of new tools for the streamlined addition of terms to the ontology. The GOC continues to support its user community through the use of e-mail lists, social media and web-based resources.

View details for DOI 10.1093/nar/gkr1028

View details for Web of Science ID 000298601300084

View details for PubMedCentralID PMC3245151
Toward an interactive article: integrating journals and biological databases BMC BIOINFORMATICS Rangarajan, A., Schedl, T., Yook, K., Chan, J., Haenel, S., Otis, L., Faelten, S., DePellegrin-Connelly, T., Isaacson, R., Skrzypek, M. S., Marygold, S. J., Stefancsik, R., Cherry, J. M., Sternberg, P. W., Mueller, H. 2011; 12

Abstract

Journal articles and databases are two major modes of communication in the biological sciences, and thus integrating these critical resources is of urgent importance to increase the pace of discovery. Projects focused on bridging the gap between journals and databases have been on the rise over the last five years and have resulted in the development of automated tools that can recognize entities within a document and link those entities to a relevant database. Unfortunately, automated tools cannot resolve ambiguities that arise from one term being used to signify entities that are quite distinct from one another. Instead, resolving these ambiguities requires some manual oversight. Finding the right balance between the speed and portability of automation and the accuracy and flexibility of manual effort is a crucial goal to making text markup a successful venture.We have established a journal article mark-up pipeline that links GENETICS journal articles and the model organism database (MOD) WormBase. This pipeline uses a lexicon built with entities from the database as a first step. The entity markup pipeline results in links from over nine classes of objects including genes, proteins, alleles, phenotypes and anatomical terms. New entities and ambiguities are discovered and resolved by a database curator through a manual quality control (QC) step, along with help from authors via a web form that is provided to them by the journal. New entities discovered through this pipeline are immediately sent to an appropriate curator at the database. Ambiguous entities that do not automatically resolve to one link are resolved by hand ensuring an accurate link. This pipeline has been extended to other databases, namely Saccharomyces Genome Database (SGD) and FlyBase, and has been implemented in marking up a paper with links to multiple databases.Our semi-automated pipeline hyperlinks articles published in GENETICS to model organism databases such as WormBase. Our pipeline results in interactive articles that are data rich with high accuracy. The use of a manual quality control step sets this pipeline apart from other hyperlinking tools and results in benefits to authors, journals, readers and databases.

View details for DOI 10.1186/1471-2105-12-175

View details for Web of Science ID 000293000700001

View details for PubMedID 21595960
Using computational predictions to improve literature-based Gene Ontology annotations: a feasibility study DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION Costanzo, M. C., Park, J., Balakrishnan, R., Cherry, J. M., Hong, E. L. 2011

Abstract

Annotation using Gene Ontology (GO) terms is one of the most important ways in which biological information about specific gene products can be expressed in a searchable, computable form that may be compared across genomes and organisms. Because literature-based GO annotations are often used to propagate functional predictions between related proteins, their accuracy is critically important. We present a strategy that employs a comparison of literature-based annotations with computational predictions to identify and prioritize genes whose annotations need review. Using this method, we show that comparison of manually assigned 'unknown' annotations in the Saccharomyces Genome Database (SGD) with InterPro-based predictions can identify annotations that need to be updated. A survey of literature-based annotations and computational predictions made by the Gene Ontology Annotation (GOA) project at the European Bioinformatics Institute (EBI) across several other databases shows that this comparison strategy could be used to maintain and improve the quality of GO annotations for other organisms besides yeast. The survey also shows that although GOA-assigned predictions are the most comprehensive source of functional information for many genomes, a large proportion of genes in a variety of different organisms entirely lack these predictions but do have manual annotations. This underscores the critical need for manually performed, literature-based curation to provide functional information about genes that are outside the scope of widely used computational methods. Thus, the combination of manual and computational methods is essential to provide the most accurate and complete functional annotation of a genome. Database URL: http://www.yeastgenome.org.

View details for DOI 10.1093/database/bar004

View details for Web of Science ID 000299630600010

View details for PubMedID 21411447

View details for PubMedCentralID PMC3067894
Towards BioDBcore: a community-defined information specification for biological databases NUCLEIC ACIDS RESEARCH Gaudet, P., Bairoch, A., Field, D., Sansone, S., Taylor, C., Attwood, T. K., Bateman, A., Blake, J. A., Bult, C. J., Cherry, J. M., Chisholm, R. L., Cochrane, G., Cook, C. E., Eppig, J. T., Galperin, M. Y., Gentleman, R., Goble, C. A., Gojobori, T., Hancock, J. M., Howe, D. G., Imanishi, T., Kelso, J., Landsman, D., Lewis, S. E., Karsch-Mizrachi, I., Orchard, S., Ouellette, B. F., Ranganathan, S., Richardson, L., Rocca-Serra, P., Schofield, P. N., Smedley, D., Southan, C., Tan, T. W., Tatusova, T., Whetzel, P. L., White, O., Yamasaki, C. 2011; 39: D7-D10

Abstract

The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.

View details for DOI 10.1093/nar/gkq1173

View details for Web of Science ID 000285831700002

View details for PubMedID 21097465
Saccharomyces Genome Database provides mutant phenotype data NUCLEIC ACIDS RESEARCH Engel, S. R., Balakrishnan, R., Binkley, G., Christie, K. R., Costanzo, M. C., Dwight, S. S., Fisk, D. G., Hirschman, J. E., Hitz, B. C., Hong, E. L., Krieger, C. J., Livstone, M. S., Miyasato, S. R., Nash, R., Oughtred, R., Park, J., Skrzypek, M. S., Weng, S., Wong, E. D., Dolinski, K., Botstein, D., Cherry, J. M. 2010; 38: D433-D436

Abstract

The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is a scientific database for the molecular biology and genetics of the yeast Saccharomyces cerevisiae, which is commonly known as baker's or budding yeast. The information in SGD includes functional annotations, mapping and sequence information, protein domains and structure, expression data, mutant phenotypes, physical and genetic interactions and the primary literature from which these data are derived. Here we describe how published phenotypes and genetic interaction data are annotated and displayed in SGD.

View details for DOI 10.1093/nar/gkp917

View details for Web of Science ID 000276399100068

View details for PubMedID 19906697

View details for PubMedCentralID PMC2808950
The Gene Ontology in 2010: extensions and refinements The Gene Ontology Consortium NUCLEIC ACIDS RESEARCH Berardini, T. Z., Li, D., Huala, E., Bridges, S., Burgess, S., McCarthy, F., Carbon, S., Lewis, S. E., Mungall, C. J., Abdulla, A., Wood, V., Feltrin, E., Valle, G., Chisholm, R. L., Fey, P., Gaudet, P., Kibbe, W., Basu, S., Bushmanova, Y., Eilbeck, K., Siegele, D. A., McIntosh, B., Renfro, D., Zweifel, A., Hu, J. C., Ashburner, M., Tweedie, S., Alam-Faruque, Y., Apweiler, R., Auchinchloss, A., Bairoch, A., Barrell, D., Binns, D., Blatter, M., Bougueleret, L., Boutet, E., Breuza, L., Bridge, A., Browne, P., Chan, W. M., Coudert, E., Daugherty, L., Dimmer, E., Eberhardt, R., Estreicher, A., Famiglietti, L., Ferro-Rojas, S., Feuermann, M., Foulger, R., Gruaz-Gumowski, N., Hinz, U., Huntley, R., Jimenez, S., Jungo, F., Keller, G., Laiho, K., Legge, D., Lemercier, P., Lieberherr, D., Magrane, M., O'Donovan, C., Pedruzzi, I., Poux, S., Rivoire, C., Roechert, B., Sawford, T., Schneider, M., Stanley, E., Stutz, A., Sundaram, S., Tognolli, M., Xenarios, I., Harris, M. A., Deegan (nee Clark), J. I., Ireland, A., Lomax, J., Jaiswal, P., Chibucos, M., Giglio, M. G., Wortman, J., Hannick, L., Madupu, R., Botstein, D., Dolinski, K., Livstone, M. S., Oughtred, R., Blake, J. A., Bult, C., Diehl, A. D., Dolan, M., Drabkin, H., Eppig, J. T., Hill, D. P., Ni, L., Ringwald, M., Sitnikov, D., Collmer, C., Torto-Alalibo, T., Laulederkind, S., Shimoyama, M., Twigger, S., D'Eustachio, P., Matthews, L., Balakrishnan, R., Binkley, G., Cherry, J. M., Christie, K. R., Costanzo, M. C., Engel, S. R., Fisk, D. G., Hirschman, J. E., Hitz, B. C., Hong, E. L., Krieger, C. J., Miyasato, S. R., Nash, R. S., Park, J., Skrzypek, M. S., Weng, S., Wong, E. D., Aslett, M., Chan, J., Kishore, R., Sternberg, P., Van Auken, K., Khodiyar, V. K., Lovering, R. C., Talmud, P. J., Howe, D., Westerfield, M. 2010; 38: D331-D335

Abstract

The Gene Ontology (GO) Consortium (http://www.geneontology.org) (GOC) continues to develop, maintain and use a set of structured, controlled vocabularies for the annotation of genes, gene products and sequences. The GO ontologies are expanding both in content and in structure. Several new relationship types have been introduced and used, along with existing relationships, to create links between and within the GO domains. These improve the representation of biology, facilitate querying, and allow GO developers to systematically check for and correct inconsistencies within the GO. Gene product annotation using GO continues to increase both in the number of total annotations and in species coverage. GO tools, such as OBO-Edit, an ontology-editing tool, and AmiGO, the GOC ontology browser, have seen major improvements in functionality, speed and ease of use.

View details for DOI 10.1093/nar/gkp1018

View details for Web of Science ID 000276399100051

View details for PubMedCentralID PMC2808930
The Gene Ontology's Reference Genome Project: A Unified Framework for Functional Annotation across Species PLOS COMPUTATIONAL BIOLOGY Gaudet, P., Chisholm, R., Berardini, T., Dimmer, E., Engel, S. R., Fey, P., Hill, D. P., Howe, D., Hu, J. C., Huntley, R., Khodiyar, V. K., Kishore, R., Li, D., Lovering, R. C., McCarthy, F., Ni, L., Petri, V., Siegele, D. A., Tweedie, S., Van Auken, K., Wood, V., Basu, S., Carbon, S., Dolan, M., Mungall, C. J., Dolinski, K., Thomas, P., Ashburner, M., Blake, J. A., Cherry, J. M., Lewis, S. E. 2009; 5 (7)

Abstract

The Gene Ontology (GO) is a collaborative effort that provides structured vocabularies for annotating the molecular function, biological role, and cellular location of gene products in a highly systematic way and in a species-neutral manner with the aim of unifying the representation of gene function across different organisms. Each contributing member of the GO Consortium independently associates GO terms to gene products from the organism(s) they are annotating. Here we introduce the Reference Genome project, which brings together those independent efforts into a unified framework based on the evolutionary relationships between genes in these different organisms. The Reference Genome project has two primary goals: to increase the depth and breadth of annotations for genes in each of the organisms in the project, and to create data sets and tools that enable other genome annotation efforts to infer GO annotations for homologous genes in their organisms. In addition, the project has several important incidental benefits, such as increasing annotation consistency across genome databases, and providing important improvements to the GO's logical structure and biological content.

View details for DOI 10.1371/journal.pcbi.1000431

View details for Web of Science ID 000269220100031

View details for PubMedID 19578431

View details for PubMedCentralID PMC2699109
Functional annotations for the Saccharomyces cerevisiae genome: the knowns and the known unknowns TRENDS IN MICROBIOLOGY Christie, K. R., Hong, E. L., Cherry, J. M. 2009; 17 (7): 286-294

Abstract

The quest to characterize each of the genes of the yeast Saccharomyces cerevisiae has propelled the development and application of novel high-throughput (HTP) experimental techniques. To handle the enormous amount of information generated by these techniques, new bioinformatics tools and resources are needed. Gene Ontology (GO) annotations curated by the Saccharomyces Genome Database (SGD) have facilitated the development of algorithms that analyze HTP data and help predict functions for poorly characterized genes in S. cerevisiae and other organisms. Here, we describe how published results are incorporated into GO annotations at SGD and why researchers can benefit from using these resources wisely to analyze their HTP data and predict gene functions.

View details for DOI 10.1016/j.tim.2009.04.005

View details for Web of Science ID 000268616600005

View details for PubMedID 19577472

View details for PubMedCentralID PMC3057094
New mutant phenotype data curation system in the Saccharomyces Genome Database. Database : the journal of biological databases and curation Costanzo, M. C., Skrzypek, M. S., Nash, R., Wong, E., Binkley, G., Engel, S. R., Hitz, B., Hong, E. L., Cherry, J. M. 2009; 2009: bap001

Abstract

The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) organizes and displays molecular and genetic information about the genes and proteins of baker's yeast, Saccharomyces cerevisiae. Mutant phenotype screens have been the starting point for a large proportion of yeast molecular biological studies, and are still used today to elucidate the functions of uncharacterized genes and discover new roles for previously studied genes. To greatly facilitate searching and comparison of mutant phenotypes across genes, we have devised a new controlled-vocabulary system for capturing phenotype information. Each phenotype annotation is represented as an 'observable', which is the entity, or process that is observed, and a 'qualifier' that describes the change in that entity or process in the mutant (e.g. decreased, increased, or abnormal). Additional information about the mutant, such as strain background, allele name, conditions under which the phenotype is observed, or the identity of relevant chemicals, is captured in separate fields. For each gene, a summary of the mutant phenotype information is displayed on the Locus Summary page, and the complete information is displayed in tabular format on the Phenotype Details Page. All of the information is searchable and may also be downloaded in bulk using SGD's Batch Download Tool or Download Data Files Page. In the future, phenotypes will be integrated with other curated data to allow searching across different types of functional information, such as genetic and physical interaction data and Gene Ontology annotations.Database URL:http://www.yeastgenome.org/

View details for DOI 10.1093/database/bap001

View details for PubMedID 20157474

View details for PubMedCentralID PMC2790299
New mutant phenotype data curation system in the Saccharomyces Genome Database DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION Costanzo, M. C., Skrzypek, M. S., Nash, R., Wong, E., Binkley, G., Engel, S. R., Hitz, B., Hong, E. L., Cherry, J. M. 2009

View details for DOI 10.1093/database/bap001

View details for Web of Science ID 000208191300001
Gene Ontology annotations at SGD: new data sources and annotation methods NUCLEIC ACIDS RESEARCH Hong, E. L., Balakrishnan, R., Dong, Q., Christie, K. R., Park, J., Binkley, G., Costanzo, M. C., Dwight, S. S., Engel, S. R., Fisk, D. G., Hirschman, J. E., Hitz, B. C., Krieger, C. J., Livstone, M. S., Miyasato, S. R., Nash, R. S., Oughtred, R., Skrzypek, M. S., Weng, S., Wong, E. D., Zhu, K. K., Dolinski, K., Botstein, D., Cherry, J. M. 2008; 36: D577-D581

Abstract

The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) collects and organizes biological information about the chromosomal features and gene products of the budding yeast Saccharomyces cerevisiae. Although published data from traditional experimental methods are the primary sources of evidence supporting Gene Ontology (GO) annotations for a gene product, high-throughput experiments and computational predictions can also provide valuable insights in the absence of an extensive body of literature. Therefore, GO annotations available at SGD now include high-throughput data as well as computational predictions provided by the GO Annotation Project (GOA UniProt; http://www.ebi.ac.uk/GOA/). Because the annotation method used to assign GO annotations varies by data source, GO resources at SGD have been modified to distinguish data sources and annotation methods. In addition to providing information for genes that have not been experimentally characterized, GO annotations from independent sources can be compared to those made by SGD to help keep the literature-based GO annotations current.

View details for DOI 10.1093/nar/gkm909

View details for Web of Science ID 000252545400104

View details for PubMedID 17982175

View details for PubMedCentralID PMC2238894
Combining guilt-by-association and guilt-by-profiling to predict Saccharomyces cerevisiae gene function GENOME BIOLOGY Tian, W., Zhang, L. V., Tasan, M., Gibbons, F. D., King, O. D., Park, J., Wunderlich, Z., Cherry, J. M., Roth, F. P. 2008; 9

Abstract

Learning the function of genes is a major goal of computational genomics. Methods for inferring gene function have typically fallen into two categories: 'guilt-by-profiling', which exploits correlation between function and other gene characteristics; and 'guilt-by-association', which transfers function from one gene to another via biological relationships.We have developed a strategy ('Funckenstein') that performs guilt-by-profiling and guilt-by-association and combines the results. Using a benchmark set of functional categories and input data for protein-coding genes in Saccharomyces cerevisiae, Funckenstein was compared with a previous combined strategy. Subsequently, we applied Funckenstein to 2,455 Gene Ontology terms. In the process, we developed 2,455 guilt-by-profiling classifiers based on 8,848 gene characteristics and 12 functional linkage graphs based on 23 biological relationships.Funckenstein outperforms a previous combined strategy using a common benchmark dataset. The combination of 'guilt-by-profiling' and 'guilt-by-association' gave significant improvement over the component classifiers, showing the greatest synergy for the most specific functions. Performance was evaluated by cross-validation and by literature examination of the top-scoring novel predictions. These quantitative predictions should help prioritize experimental study of yeast gene functions.

View details for Web of Science ID 000278173500007

View details for PubMedID 18613951
The Gene Ontology project in 2008 NUCLEIC ACIDS RESEARCH Harris, M. A., Deegan, J. I., Lomax, J., Ashburner, M., Tweedie, S., Carbon, S., Lewis, S., Mungall, C., Day-Richter, J., Eilbeck, K., Blake, J. A., Bult, C., Diehl, A. D., Dolan, M., Drabkin, H., Eppig, J. T., Hill, D. P., Ni, L., Ringwald, M., Balakrishnan, R., Binkley, G., Cherry, J. M., Christie, K. R., Costanzo, M. C., Dong, Q., Engel, S. R., Fisk, D. G., Hirschman, J. E., Hitz, B. C., Hong, E. L., Krieger, C. J., Miyasato, S. R., Nash, R. S., Park, J., Skrzypek, M. S., Weng, S., Wong, E. D., Zhu, K. K., Botstein, D., Dolinski, K., Livstone, M. S., Oughtred, R., Berardini, T., Li, D., Rhee, S. Y., Apweiler, R., Barrell, D., Camon, E., Dimmer, E., Huntley, R., Mulder, N., Khodiyar, V. K., Lovering, R. C., Povey, S., Chisholm, R., Fey, P., Gaudet, P., Kibbe, W., Kishore, R., Schwarz, E. M., Sternberg, P., Van Auken, K., Giglio, M. G., Hannick, L., Wortman, J., Aslett, M., Berriman, M., Wood, V., Jacob, H., Laulederkind, S., Petri, V., Shimoyama, M., Smith, J., Twigger, S., Jaiswal, P., Seigfried, T., Howe, D., Westerfield, M., Collmer, C., Torto-Alalibo, T., Feltrin, E., Valle, G., Bromberg, S., Burgess, S., McCarthy, F. 2008; 36: D440-D444

Abstract

The Gene Ontology (GO) project (http://www.geneontology.org/) provides a set of structured, controlled vocabularies for community use in annotating genes, gene products and sequences (also see http://www.sequenceontology.org/). The ontologies have been extended and refined for several biological areas, and improvements to the structure of the ontologies have been implemented. To improve the quantity and quality of gene product annotations available from its public repository, the GO Consortium has launched a focused effort to provide comprehensive and detailed annotation of orthologous genes across a number of 'reference' genomes, including human and several key model organisms. Software developments include two releases of the ontology-editing tool OBO-Edit, and improvements to the AmiGO browser interface.

View details for DOI 10.1093/nar/gkm883

View details for Web of Science ID 000252545400079

View details for PubMedID 17984083

View details for PubMedCentralID PMC2238979
Mining experimental evidence of molecular function claims from the literature BIOINFORMATICS Crangle, C. E., Cherry, J. M., Hong, E. L., Zbyslaw, A. 2007; 23 (23): 3232-3240

Abstract

The rate at which gene-related findings appear in the scientific literature makes it difficult if not impossible for biomedical scientists to keep fully informed and up to date. The importance of these findings argues for the development of automated methods that can find, extract and summarize this information. This article reports on methods for determining the molecular function claims that are being made in a scientific article, specifically those that are backed by experimental evidence.The most significant result is that for molecular function claims based on direct assays, our methods achieved recall of 70.7% and precision of 65.7%. Furthermore, our methods correctly identified in the text 44.6% of the specific molecular function claims backed up by direct assays, but with a precision of only 0.92%, a disappointing outcome that led to an examination of the different kinds of errors. These results were based on an analysis of 1823 articles from the literature of Saccharomyces cerevisiae (budding yeast).The annotation files for S.cerevisiae are available from ftp://genome-ftp.stanford.edu/pub/yeast/data_download/literature_curation/gene_association.sgd.gz. The draft protocol vocabulary is available by request from the first author.

View details for DOI 10.1093/bioinformatics/btm495

View details for Web of Science ID 000251334800017

View details for PubMedID 17942445
The Saccharomyces Genome Database provides comprehensive information about the biology of S-cerevisiae and tools for studies in comparative genomics Experimental Biology 2007 Annual Meeting Hirschman, J. E., Engel, S., Hong, E., Balakrishnan, R., Christie, K., Costanzo, M., Dwight, S., Fisk, D., Nash, R., Park, J., Skrzypek, M., Dolinski, K., Livstone, M., Oughtred, R., Andrada, R., Binkley, G., Dong, Q., Hitz, B., Miyasoto, S., Schroeder, M., Weng, S., Wong, E., Botstein, D., Cherry, J. M. FEDERATION AMER SOC EXP BIOL. 2007: A264–A264

View details for Web of Science ID 000245708502115
Tetrahymena genome database (TGD): a resource for comparative studies with a model protist. Stover, N. A., Krieger, C. J., Binkley, G., Dong, Q., Sethuraman, A., Weng, S., Cherry, J. M. WILEY-BLACKWELL PUBLISHING, INC. 2007: 54S–54S

View details for Web of Science ID 000245312600169
Expanded protein information at SGD: new pages and proteome browser NUCLEIC ACIDS RESEARCH Nash, R., Weng, S., Hitz, B., Balakrishnan, R., Christie, K. R., Costanzo, M. C., Dwight, S. S., Engel, S. R., Fisk, D. G., Hirschman, J. E., Hong, E. L., Livstone, M. S., Oughtred, R., Park, J., Skrzypek, M., Theesfeld, C. L., Binkley, G., Dong, Q., Lane, C., Miyasato, S., Sethuraman, A., Schroeder, M., Dolinski, K., Botstein, D., Cherry, J. M. 2007; 35: D468-D471

Abstract

The recent explosion in protein data generated from both directed small-scale studies and large-scale proteomics efforts has greatly expanded the quantity of available protein information and has prompted the Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) to enhance the depth and accessibility of protein annotations. In particular, we have expanded ongoing efforts to improve the integration of experimental information and sequence-based predictions and have redesigned the protein information web pages. A key feature of this redesign is the development of a GBrowse-derived interactive Proteome Browser customized to improve the visualization of sequence-based protein information. This Proteome Browser has enabled SGD to unify the display of hidden Markov model (HMM) domains, protein family HMMs, motifs, transmembrane regions, signal peptides, hydropathy plots and profile hits using several popular prediction algorithms. In addition, a physico-chemical properties page has been introduced to provide easy access to basic protein information. Improvements to the layout of the Protein Information page and integration of the Proteome Browser will facilitate the ongoing expansion of sequence-specific experimental information captured in SGD, including post-translational modifications and other user-defined annotations. Finally, SGD continues to improve upon the availability of genetic and physical interaction data in an ongoing collaboration with BioGRID by providing direct access to more than 82,000 manually-curated interactions.

View details for DOI 10.1093/nar/gkl931

View details for Web of Science ID 000243494600095

View details for PubMedID 17142221

View details for PubMedCentralID PMC1669759
Saccharomyces cerevisiae S288C genome annotation: a working hypothesis YEAST Fisk, D. G., Ball, C. A., Dolinski, K., Engel, S. R., Hong, E. L., Issel-Tarver, L., Schwartz, K., Sethuraman, A., Botstein, D., Cherry, J. M. 2006; 23 (12): 857-865

Abstract

The S. cerevisiae genome is the most well-characterized eukaryotic genome and one of the simplest in terms of identifying open reading frames (ORFs), yet its primary annotation has been updated continually in the decade since its initial release in 1996 (Goffeau et al., 1996). The Saccharomyces Genome Database (SGD; www.yeastgenome.org) (Hirschman et al., 2006), the community-designated repository for this reference genome, strives to ensure that the S. cerevisiae annotation is as accurate and useful as possible. At SGD, the S. cerevisiae genome sequence and annotation are treated as a working hypothesis, which must be repeatedly tested and refined. In this paper, in celebration of the tenth anniversary of the completion of the S. cerevisiae genome sequence, we discuss the ways in which the S. cerevisiae sequence and annotation have changed, consider the multiple sources of experimental and comparative data on which these changes are based, and describe our methods for evaluating, incorporating and documenting these new data.

View details for DOI 10.1002/yea.1400

View details for Web of Science ID 000242009800002

View details for PubMedID 17001629

View details for PubMedCentralID PMC3040122
Macronuclear genome sequence of the ciliate Tetrahymena thermophila, a model eukaryote PLOS BIOLOGY Eisen, J. A., Coyne, R. S., Wu, M., Wu, D., Thiagarajan, M., Wortman, J. R., Badger, J. H., Ren, Q., Amedeo, P., Jones, K. M., Tallon, L. J., Delcher, A. L., Salzberg, S. L., Silva, J. C., Haas, B. J., Majoros, W. H., Farzad, M., Carlton, J. M., Smith, R. K., Garg, J., Pearlman, R. E., Karrer, K. M., Sun, L., Manning, G., Elde, N. C., Turkewitz, A. P., Asai, D. J., Wilkes, D. E., Wang, Y., Cai, H., Collins, K., Stewart, A., Lee, S. R., Wilamowska, K., Weinberg, Z., Ruzzo, W. L., Wloga, D., Gaertig, J., Frankel, J., Tsao, C., Gorovsky, M. A., Keeling, P. J., Waller, R. F., Patron, N. J., Cherry, J. M., Stover, N. A., Krieger, C. J., del Toro, C., Ryder, H. F., Williamson, S. C., Barbeau, R. A., Hamilton, E. P., Orias, E. 2006; 4 (9): 1620-1642

Abstract

The ciliate Tetrahymena thermophila is a model organism for molecular and cellular biology. Like other ciliates, this species has separate germline and soma functions that are embodied by distinct nuclei within a single cell. The germline-like micronucleus (MIC) has its genome held in reserve for sexual reproduction. The soma-like macronucleus (MAC), which possesses a genome processed from that of the MIC, is the center of gene expression and does not directly contribute DNA to sexual progeny. We report here the shotgun sequencing, assembly, and analysis of the MAC genome of T. thermophila, which is approximately 104 Mb in length and composed of approximately 225 chromosomes. Overall, the gene set is robust, with more than 27,000 predicted protein-coding genes, 15,000 of which have strong matches to genes in other organisms. The functional diversity encoded by these genes is substantial and reflects the complexity of processes required for a free-living, predatory, single-celled organism. This is highlighted by the abundance of lineage-specific duplications of genes with predicted roles in sensing and responding to environmental conditions (e.g., kinases), using diverse resources (e.g., proteases and transporters), and generating structural complexity (e.g., kinesins and dyneins). In contrast to the other lineages of alveolates (apicomplexans and dinoflagellates), no compelling evidence could be found for plastid-derived genes in the genome. UGA, the only T. thermophila stop codon, is used in some genes to encode selenocysteine, thus making this organism the first known with the potential to translate all 64 codons in nuclear genes into amino acids. We present genomic evidence supporting the hypothesis that the excision of DNA from the MIC to generate the MAC specifically targets foreign DNA as a form of genome self-defense. The combination of the genome sequence, the functional diversity encoded therein, and the presence of some pathways missing from other model organisms makes T. thermophila an ideal model for functional genomic studies to address biological, biomedical, and biotechnological questions of fundamental importance.

View details for DOI 10.1371/journal.pbio.0040286

View details for Web of Science ID 000240740900012

View details for PubMedID 16933976

View details for PubMedCentralID PMC1557398
The Gene Ontology (GO) project in 2006 NUCLEIC ACIDS RESEARCH Harris, M. A., Clark, J. I., Ireland, A., Lomax, J., Ashburner, M., Collins, R., Eilbeck, K., Lewis, S., Mungall, C., Richter, J., Rubin, G. M., Shu, S., Blake, J. A., Bult, C. J., Diehl, A. D., Dolan, M. E., Drabkin, H. J., Eppig, J. T., Hill, D. P., Ni, L., Ringwald, M., Balakrishnan, R., Binkley, G., Cherry, J. M., Christie, K. R., Costanzo, M. C., Dong, Q., Engel, S. R., Fisk, D. G., Hirschman, J. E., Hitz, B. C., Hong, E. L., Lane, C., Miyasato, S., Nash, R., Sethuraman, A., Skrzypek, M., Theesfeld, C. L., Weng, S., Botstein, D., Dolinski, K., Oughtred, R., Berardini, T., Mundodi, S., Rhee, S. Y., Apweiler, R., Barrell, D., Camon, E., Dimmer, E., Mulder, N., Chisholm, R., Fey, P., Gaudet, P., Kibbe, W., Pilcher, K., Bastiani, C. A., Kishore, R., Schwarz, E. M., Sternberg, P., Van Auken, K., Gwinn, M., Hannick, L., Wortman, J., Aslett, M., Berriman, M., Wood, V., Bromberg, S., Foote, C., Jacob, H., Pasko, D., Petri, V., Reilly, D., Seiler, K., Shimoyama, M., Smith, J., Twigger, S., Jaiswal, P., Seigfried, T., Collmer, C., Howe, D., Westerfield, M. 2006; 34: D322-D326

Abstract

The Gene Ontology (GO) project (http://www.geneontology.org) develops and uses a set of structured, controlled vocabularies for community use in annotating genes, gene products and sequences (also see http://song.sourceforge.net/). The GO Consortium continues to improve to the vocabulary content, reflecting the impact of several novel mechanisms of incorporating community input. A growing number of model organism databases and genome annotation groups contribute annotation sets using GO terms to GO's public repository. Updates to the AmiGO browser have improved access to contributed genome annotations. As the GO project continues to grow, the use of the GO vocabularies is becoming more varied as well as more widespread. The GO project provides an ontological annotation system that enables biologists to infer knowledge from large amounts of data.

View details for DOI 10.1093/nar/gkj021

View details for Web of Science ID 000239307700070

View details for PubMedID 16381878
Tetrahymena Genome Database (TGD): a new genomic resource for Tetrahymena thermophila research NUCLEIC ACIDS RESEARCH Stover, N. A., Krieger, C. J., Binkley, G., Dong, Q., Fisk, D. G., Nash, R., Sethuraman, A., Weng, S., Cherry, J. M. 2006; 34: D500-D503

Abstract

We have developed a web-based resource (available at www.ciliate.org) for researchers studying the model ciliate organism Tetrahymena thermophila. Employing the underlying database structure and programming of the Saccharomyces Genome Database, the Tetrahymena Genome Database (TGD) integrates the wealth of knowledge generated by the Tetrahymena research community about genome structure, genes and gene products with the newly sequenced macronuclear genome determined by The Institute for Genomic Research (TIGR). TGD provides information curated from the literature about each published gene, including a standardized gene name, a link to the genomic locus in our graphical genome browser, gene product annotations utilizing the Gene Ontology, links to published literature about the gene and more. TGD also displays automatic annotations generated for the gene models predicted by TIGR. A variety of tools are available at TGD for searching the Tetrahymena genome, its literature and information about members of the research community.

View details for DOI 10.1093/nar/gkj054

View details for Web of Science ID 000239307700109

View details for PubMedID 16381920

View details for PubMedCentralID PMC1347417
Genome Snapshot: a new resource at the Saccharomyces Genome Database (SGD) presenting an overview of the Saccharomyces cerevisiae genome NUCLEIC ACIDS RESEARCH Hirschman, J. E., Balakrishnan, R., Christie, K. R., Costanzo, M. C., Dwight, S. S., Engel, S. R., Fisk, D. G., Hong, E. L., Livstone, M. S., Nash, R., Park, J., Oughtred, R., Skrzypek, M., Starr, B., Theesfeld, C. L., Williams, J., Andrada, R., Binkley, G., Dong, Q., Lane, C., Miyasato, S., Sethuraman, A., Schroeder, M., Thanawala, M. K., Weng, S., Dolinski, K., Botstein, D., Cherry, J. M. 2006; 34: D442-D445

Abstract

Sequencing and annotation of the entire Saccharomyces cerevisiae genome has made it possible to gain a genome-wide perspective on yeast genes and gene products. To make this information available on an ongoing basis, the Saccharomyces Genome Database (SGD) (http://www.yeastgenome.org/) has created the Genome Snapshot (http://db.yeastgenome.org/cgi-bin/genomeSnapShot.pl). The Genome Snapshot summarizes the current state of knowledge about the genes and chromosomal features of S.cerevisiae. The information is organized into two categories: (i) number of each type of chromosomal feature annotated in the genome and (ii) number and distribution of genes annotated to Gene Ontology terms. Detailed lists are accessible through SGD's Advanced Search tool (http://db.yeastgenome.org/cgi-bin/search/featureSearch), and all the data presented on this page are available from the SGD ftp site (ftp://ftp.yeastgenome.org/yeast/).

View details for DOI 10.1093/nar/gkj117

View details for Web of Science ID 000239307700097

View details for PubMedID 16381907

View details for PubMedCentralID PMC1347479
PatMatch: a program for finding patterns in peptide and nucleotide sequences NUCLEIC ACIDS RESEARCH Yan, T., Yoo, D., Berardini, T. Z., Mueller, L. A., Weems, D. C., Weng, S., Cherry, J. M., Rhee, S. Y. 2005; 33: W262-W266

Abstract

Here, we present PatMatch, an efficient, web-based pattern-matching program that enables searches for short nucleotide or peptide sequences such as cis-elements in nucleotide sequences or small domains and motifs in protein sequences. The program can be used to find matches to a user-specified sequence pattern that can be described using ambiguous sequence codes and a powerful and flexible pattern syntax based on regular expressions. A recent upgrade has improved performance and now supports both mismatches and wildcards in a single pattern. This enhancement has been achieved by replacing the previous searching algorithm, scan_for_matches [D'Souza et al. (1997), Trends in Genetics, 13, 497-498], with nondeterministic-reverse grep (NR-grep), a general pattern matching tool that allows for approximate string matching [Navarro (2001), Software Practice and Experience, 31, 1265-1312]. We have tailored NR-grep to be used for DNA and protein searches with PatMatch. The stand-alone version of the software can be adapted for use with any sequence dataset and is available for download at The Arabidopsis Information Resource (TAIR) at ftp://ftp.arabidopsis.org/home/tair/Software/Patmatch/. The PatMatch server is available on the web at http://www.arabidopsis.org/cgi-bin/patmatch/nph-patmatch.pl for searching Arabidopsis thaliana sequences.

View details for DOI 10.1093/nar/gki368

View details for Web of Science ID 000230271400050

View details for PubMedID 15980466

View details for PubMedCentralID PMC1160129
Inference of combinatorial regulation in yeast transcriptional networks: A case study of sporulation PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Wang, W., Cherry, J. M., Nochomovitz, Y., Jolly, E., Botstein, D., Li, H. 2005; 102 (6): 1998-2003

Abstract

Decomposing transcriptional regulatory networks into functional modules and determining logical relations between them is the first step toward understanding transcriptional regulation at the system level. Modules based on analysis of genome-scale data can serve as the basis for inferring combinatorial regulation and for building mathematical models to quantitatively describe the behavior of the networks. We present here an algorithm called modem to identify target genes of a transcription factor (TF) from a single expression experiment, based on a joint probabilistic model for promoter sequence and gene expression data. We show how this method can facilitate the discovery of specific instances of combinatorial regulation and illustrate this for a specific case of transcriptional networks that regulate sporulation in the yeast Saccharomyces cerevisiae. Applying this method to analyze two crucial TFs in sporulation, Ndt80p and Sum1p, we were able to delineate their overlapping binding sites. We proposed a mechanistic model for the competitive regulation by the two TFs on a defined subset of sporulation genes. We show that this model accounts for the temporal control of the "middle" sporulation genes and suggest a similar regulatory arrangement can be found in developmental programs in higher organisms.

View details for Web of Science ID 000227072900037

View details for PubMedID 15684073
Fungal BLAST and Model Organism BLASTP Best Hits: new comparison resources at the Saccharomyces Genome Database (SGD) NUCLEIC ACIDS RESEARCH Balakrishnan, R., Christie, K. R., Costanzo, M. C., Dolinski, K., Dwight, S. S., Engel, S. R., Fisk, D. G., Hirschman, J. E., Hong, E. L., Nash, R., Oughtred, R., Skrzypek, M., Theesfeld, C. L., Binkley, G., Dong, Q., Lane, C., Sethuraman, A., Weng, S., Botstein, D., Cherry, J. M. 2005; 33: D374-D377

Abstract

The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is a scientific database of gene, protein and genomic information for the yeast Saccharomyces cerevisiae. SGD has recently developed two new resources that facilitate nucleotide and protein sequence comparisons between S.cerevisiae and other organisms. The Fungal BLAST tool provides directed searches against all fungal nucleotide and protein sequences available from GenBank, divided into categories according to organism, status of completeness and annotation, and source. The Model Organism BLASTP Best Hits resource displays, for each S.cerevisiae protein, the single most similar protein from several model organisms and presents links to the database pages of those proteins, facilitating access to curated information about potential orthologs of yeast proteins.

View details for DOI 10.1093/nar/gki023

View details for Web of Science ID 000226524300077

View details for PubMedID 15608219

View details for PubMedCentralID PMC539977
GO::TermFinder - open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes BIOINFORMATICS Boyle, E. I., Weng, S. A., Gollub, J., Jin, H., Botstein, D., Cherry, J. M., Sherlock, G. 2004; 20 (18): 3710-3715

Abstract

GO::TermFinder comprises a set of object-oriented Perl modules for accessing Gene Ontology (GO) information and evaluating and visualizing the collective annotation of a list of genes to GO terms. It can be used to draw conclusions from microarray and other biological data, calculating the statistical significance of each annotation. GO::TermFinder can be used on any system on which Perl can be run, either as a command line application, in single or batch mode, or as a web-based CGI script.The full source code and documentation for GO::TermFinder are freely available from http://search.cpan.org/dist/GO-TermFinder/.

View details for DOI 10.1093/bioinformatics/bth456

View details for Web of Science ID 000225786600064

View details for PubMedID 15297299

View details for PubMedCentralID PMC3037731
Saccharomyces genome database: Underlying principles and organisation BRIEFINGS IN BIOINFORMATICS Dwight, S. S., Balakrishnan, R., Christie, K. R., Costanzo, M. C., Dolinski, K., Engel, S. R., Feierbach, B., Fisk, D. G., Hirschman, J., Hong, E. L., Issel-Tarver, L., Nash, R. S., Sethuraman, A., Starr, B., Theesfeld, C. L., Andrada, R., Binkley, G., Dong, Q., Lane, C., Schroeder, M., Weng, S., Botstein, D., Cherry, J. M. 2004; 5 (1): 9-22

Abstract

A scientific database can be a powerful tool for biologists in an era where large-scale genomic analysis, combined with smaller-scale scientific results, provides new insights into the roles of genes and their products in the cell. However, the collection and assimilation of data is, in itself, not enough to make a database useful. The data must be incorporated into the database and presented to the user in an intuitive and biologically significant manner. Most importantly, this presentation must be driven by the user's point of view; that is, from a biological perspective. The success of a scientific database can therefore be measured by the response of its users - statistically, by usage numbers and, in a less quantifiable way, by its relationship with the community it serves and its ability to serve as a model for similar projects. Since its inception ten years ago, the Saccharomyces Genome Database (SGD) has seen a dramatic increase in its usage, has developed and maintained a positive working relationship with the yeast research community, and has served as a template for at least one other database. The success of SGD, as measured by these criteria, is due in large part to philosophies that have guided its mission and organisation since it was established in 1993. This paper aims to detail these philosophies and how they shape the organisation and presentation of the database.

View details for Web of Science ID 000222244300002

View details for PubMedID 15153302
The Gene Ontology (GO) database and informatics resource NUCLEIC ACIDS RESEARCH Harris, M. A., Clark, J., Ireland, A., Lomax, J., Ashburner, M., Foulger, R., Eilbeck, K., Lewis, S., Marshall, B., Mungall, C., Richter, J., Rubin, G. M., Blake, J. A., Bult, C., Dolan, M., Drabkin, H., Eppig, J. T., Hill, D. P., Ni, L., RINGWALD, M., Balakrishnan, R., Cherry, J. M., Christie, K. R., Costanzo, M. C., Dwight, S. S., Engel, S., Fisk, D. G., Hirschman, J. E., Hong, E. L., Nash, R. S., Sethuraman, A., Theesfeld, C. L., Botstein, D., Dolinski, K., Feierbach, B., Berardini, T., Mundodi, S., Rhee, S. Y., Apweiler, R., Barrell, D., Camon, E., Dimmer, E., Lee, V., Chisholm, R., Gaudet, P., Kibbe, W., Kishore, R., Schwarz, E. M., Sternberg, P., Gwinn, M., Hannick, L., Wortman, J., Berriman, M., Wood, V., de la Cruz, N., Tonellato, P., Jaiswal, P., Seigfried, T., White, R. 2004; 32: D258-D261

Abstract

The Gene Ontology (GO) project (http://www. geneontology.org/) provides structured, controlled vocabularies and classifications that cover several domains of molecular and cellular biology and are freely available for community use in the annotation of genes, gene products and sequences. Many model organism databases and genome annotation groups use the GO and contribute their annotation sets to the GO resource. The GO database integrates the vocabularies and contributed annotations and provides full access to this information in several formats. Members of the GO Consortium continually work collectively, involving outside experts as needed, to expand and update the GO vocabularies. The GO Web resource also provides access to extensive documentation about the GO project and links to applications that use GO data for functional analyses.

View details for DOI 10.1093/nar/gkh036

View details for Web of Science ID 000188079000059

View details for PubMedID 14681407

View details for PubMedCentralID PMC308770
Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms NUCLEIC ACIDS RESEARCH Christie, K. R., Weng, S., Balakrishnan, R., Costanzo, M. C., Dolinski, K., Dwight, S. S., Engel, S. R., Feierbach, B., Fisk, D. G., Hirschman, J. E., Hong, E. L., Issel-Tarver, L., Nash, R., Sethuraman, A., Starr, B., Theesfeld, C. L., Andrada, R., Binkley, G., Dong, Q., Lane, C., Schroeder, M., Botstein, D., Cherry, J. M. 2004; 32: D311-D314

Abstract

The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/), a scientific database of the molecular biology and genetics of the yeast Saccharomyces cerevisiae, has recently developed several new resources that allow the comparison and integration of information on a genome-wide scale, enabling the user not only to find detailed information about individual genes, but also to make connections across groups of genes with common features and across different species. The Fungal Alignment Viewer displays alignments of sequences from multiple fungal genomes, while the Sequence Similarity Query tool displays PSI-BLAST alignments of each S.cerevisiae protein with similar proteins from any species whose sequences are contained in the non-redundant (nr) protein data set at NCBI. The Yeast Biochemical Pathways tool integrates groups of genes by their common roles in metabolism and displays the metabolic pathways in a graphical form. Finally, the Find Chromosomal Features search interface provides a versatile tool for querying multiple types of information in SGD.

View details for DOI 10.1093/nar/gkh033

View details for Web of Science ID 000188079000073

View details for PubMedID 14681421

View details for PubMedCentralID PMC308767
Defining Saccharomyces genes. 21st International Conference on Yeast Genetics and Molecular Biology Cherry, J. M., Theesfeld, C., Sethuraman, A., Fisk, D. G., Dolinski, K., Balakrishnan, R., Binkley, G., Christie, K. R., Costanzo, M., Dong, S., Dwight, S. S., Engel, S., Hirschman, J., Hong, E. L., Issel-Tarver, L., Weng, S., Botstein, D. WILEY-BLACKWELL. 2003: S280–S280

View details for Web of Science ID 000184161800667
The Community Annotation system at the Saccharomyces genome database (SGD). 21st International Conference on Yeast Genetics and Molecular Biology Theesfeld, C. L., Dong, S., Fisk, D. G., Balakrishnan, R., Christie, K. R., Costanzo, M. C., Dolinski, K., Dwight, S. S., Engel, S. R., Hirschman, J. E., Hong, E. L., Issel-Tarver, L., Sethuraman, A., Binkley, G., Weng, S., Botstein, D., Cherry, J. M. WILEY-BLACKWELL. 2003: S345–S345

View details for Web of Science ID 000184161800824
Saccharomyces Genome Database (SGD) provides biochemical and structural information for budding yeast proteins NUCLEIC ACIDS RESEARCH Weng, S., Dong, Q., Balakrishnan, R., Christie, K., Costanzo, M., Dolinski, K., Dwight, S. S., Engel, S., Fisk, D. G., Hong, E., Issel-Tarver, L., Sethuraman, A., Theesfeld, C., Andrada, R., Binkley, G., Lane, C., Schroeder, M., Botstein, D., Cherry, J. M. 2003; 31 (1): 216-218

Abstract

The Saccharomyces Genome Database (SGD: http://genome-www.stanford.edu/Saccharomyces/) has recently developed new resources to provide more complete information about proteins from the budding yeast Saccharomyces cerevisiae. The PDB Homologs page provides structural information from the Protein Data Bank (PDB) about yeast proteins and/or their homologs. SGD has also created a resource that utilizes the eMOTIF database for motif information about a given protein. A third new resource is the Protein Information page, which contains protein physical and chemical properties, such as molecular weight and hydropathicity scores, predicted from the translated ORF sequence.

View details for DOI 10.1093/nar/gkg054

View details for Web of Science ID 000181079700049

View details for PubMedID 12519985

View details for PubMedCentralID PMC165501
Gene function, metabolic pathways and comparative genomics in yeast 2nd International Computational Systems Bioinformatics Conference Dong, Q., Balakrishnan, R., Binkley, G., Christie, K. R., Costanzo, M., Dolinski, K., Dwight, S. S., Engel, S., Fisk, D. G., Hirschman, J., Hong, E. L., Nash, R., Issel-Tarver, L., Sethuraman, A., Theesfeld, C. L., Weng, S., Botstein, D., Cherry, J. M. IEEE COMPUTER SOC. 2003: 437–438

View details for Web of Science ID 000188997700075
SOURCE: a unified genomic resource of functional annotations, ontologies, and gene expression data NUCLEIC ACIDS RESEARCH Diehn, M., Sherlock, G., Binkley, G., Jin, H., Matese, J. C., Hernandez-Boussard, T., Rees, C. A., Cherry, J. M., Botstein, D., Brown, P. O., Alizadeh, A. A. 2003; 31 (1): 219-223

Abstract

The explosion in the number of functional genomic datasets generated with tools such as DNA microarrays has created a critical need for resources that facilitate the interpretation of large-scale biological data. SOURCE is a web-based database that brings together information from a broad range of resources, and provides it in manner particularly useful for genome-scale analyses. SOURCE's GeneReports include aliases, chromosomal location, functional descriptions, GeneOntology annotations, gene expression data, and links to external databases. We curate published microarray gene expression datasets and allow users to rapidly identify sets of co-regulated genes across a variety of tissues and a large number of conditions using a simple and intuitive interface. SOURCE provides content both in gene and cDNA clone-centric pages, and thus simplifies analysis of datasets generated using cDNA microarrays. SOURCE is continuously updated and contains the most recent and accurate information available for human, mouse, and rat genes. By allowing dynamic linking to individual gene or clone reports, SOURCE facilitates browsing of large genomic datasets. Finally, SOURCEs batch interface allows rapid extraction of data for thousands of genes or clones at once and thus facilitates statistical analyses such as assessing the enrichment of functional attributes within clusters of genes. SOURCE is available at http://source.stanford.edu.

View details for DOI 10.1093/nar/gkg014

View details for Web of Science ID 000181079700050

View details for PubMedID 12519986

View details for PubMedCentralID PMC165461
A systematic approach to reconstructing transcription networks in Saccharomyces cerevisiae PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Wang, W., Cherry, J. M., Botstein, D., Li, H. 2002; 99 (26): 16893-16898

Abstract

Decomposing regulatory networks into functional modules is a first step toward deciphering the logical structure of complex networks. We propose a systematic approach to reconstructing transcription modules (defined by a transcription factor and its target genes) and identifying conditionsperturbations under which a particular transcription module is activateddeactivated. Our approach integrates information from regulatory sequences, genome-wide mRNA expression data, and functional annotation. We systematically analyzed gene expression profiling experiments in which the yeast cell was subjected to various environmental or genetic perturbations. We were able to construct transcription modules with high specificity and sensitivity for many transcription factors, and predict the activation of these modules under anticipated as well as unexpected conditions. These findings generate testable hypotheses when combined with existing knowledge on signaling pathways and protein-protein interactions. Correlating the activation of a module to a specific perturbation predicts links in the cell's regulatory networks, and examining coactivated modules suggests specific instances of crosstalk between regulatory pathways.

View details for DOI 10.1073/pnas.252638199

View details for Web of Science ID 000180101600070

View details for PubMedID 12482955

View details for PubMedCentralID PMC139240
Identification of unstable transcripts in Arabidopsis by cDNA microarray analysis: Rapid decay is associated with a group of touch- and specific clock-controlled genes PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Gutierrez, R. A., Ewing, R. M., Cherry, J. M., Green, P. J. 2002; 99 (17): 11513-11518

Abstract

mRNA degradation provides a powerful means for controlling gene expression during growth, development, and many physiological transitions in plants and other systems. Rates of decay help define the steady state levels to which transcripts accumulate in the cytoplasm and determine the speed with which these levels change in response to the appropriate signals. When fast responses are to be achieved, rapid decay of mRNAs is necessary. Accordingly, genes with unstable transcripts often encode proteins that play important regulatory roles. Although detailed studies have been carried out on individual genes with unstable transcripts, there is limited knowledge regarding their nature and associations from a genomic perspective, or the physiological significance of rapid mRNA turnover in intact organisms. To address these problems, we have applied cDNA microarray analysis to identify and characterize genes with unstable transcripts in Arabidopsis thaliana (AtGUTs). Our studies showed that at least 1% of the 11,521 clones represented on Arabidopsis Functional Genomics Consortium microarrays correspond to transcripts that are rapidly degraded, with estimated half-lives of less than 60 min. AtGUTs encode proteins that are predicted to participate in a broad range of cellular processes, with transcriptional functions being over-represented relative to the whole Arabidopsis genome annotation. Analysis of public microarray expression data for these genes argues that mRNA instability is of high significance during plant responses to mechanical stimulation and is associated with specific genes controlled by the circadian clock.

View details for DOI 10.1073/pnas.152204099

View details for Web of Science ID 000177606900100

View details for PubMedID 12167669

View details for PubMedCentralID PMC123287
Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO) NUCLEIC ACIDS RESEARCH Dwight, S. S., Harris, M. A., Dolinski, K., Ball, C. A., Binkley, G., Christie, K. R., Fisk, D. G., Issel-Tarver, L., Schroeder, M., Sherlock, G., Sethuraman, A., Weng, S., Botstein, D., Cherry, J. M. 2002; 30 (1): 69-72

Abstract

The Saccharomyces Genome Database (SGD) resources, ranging from genetic and physical maps to genome-wide analysis tools, reflect the scientific progress in identifying genes and their functions over the last decade. As emphasis shifts from identification of the genes to identification of the role of their gene products in the cell, SGD seeks to provide its users with annotations that will allow relationships to be made between gene products, both within Saccharomyces cerevisiae and across species. To this end, SGD is annotating genes to the Gene Ontology (GO), a structured representation of biological knowledge that can be shared across species. The GO consists of three separate ontologies describing molecular function, biological process and cellular component. The goal is to use published information to associate each characterized S.cerevisiae gene product with one or more GO terms from each of the three ontologies. To be useful, this must be done in a manner that allows accurate associations based on experimental evidence, modifications to GO when necessary, and careful documentation of the annotations through evidence codes for given citations. Reaching this goal is an ongoing process at SGD. For information on the current progress of GO annotations at SGD and other participating databases, as well as a description of each of the three ontologies, please visit the GO Consortium page at http://www.geneontology.org. SGD gene associations to GO can be found by visiting our site at http://genome-www.stanford.edu/Saccharomyces/.

View details for Web of Science ID 000173077100017

View details for PubMedID 11752257
Saccharomyces genome database GUIDE TO YEAST GENETICS AND MOLECULAR AND CELL BIOLOGY, PT B Issel-Tarver, L., Christie, K. R., Dolinski, K., Andrada, R., Balakrishnan, R., Ball, C. A., Binkley, G., Dong, S., Dwight, S. S., Fisk, D. G., Harris, M., Schroeder, M., Sethuraman, A., Tse, K., Weng, S., Botstein, D., Cherry, J. M. 2002; 350: 329-346

View details for Web of Science ID 000176466300019

View details for PubMedID 12073322
Microarray data quality analysis: lessons from the AFGC project. Arabidopsis Functional Genomics Consortium. Plant molecular biology Finkelstein, D., Ewing, R., Gollub, J., Sterky, F., Cherry, J. M., Somerville, S. 2002; 48 (1-2): 119-131

Abstract

Genome-wide expression profiling with DNA microarrays has and will provide a great deal of data to the plant scientific community. However, reliability concerns have required the development data quality tests for common systematic biases. Fortunately, most large-scale systematic biases are detectable and some are correctable by normalization. Technical replication experiments and statistical surveys indicate that these biases vary widely in severity and appearance. As a result, no single normalization or correction method currently available is able to address all the issues. However, careful sequence selection, array design, experimental design and experimental annotation can substantially improve the quality and biological of microarray data. In this review, we discuss these issues with reference to examples from the Arabidopsis Functional Genomics Consortium (AFGC) microarray project.

View details for PubMedID 11860205
Creating the gene ontology resource: Design and implementation GENOME RESEARCH Ashburner, M., Ball, C. A., Blake, J. A., Butler, H., Cherry, J. M., Corradi, J., Dolinski, K., Eppig, J. T., Harris, M., Hill, D. P., Lewis, S., Marshall, B., Mungall, C., Reiser, L., Rhee, S., Richardson, J. E., Richter, J., RINGWALD, M., Rubin, G. M., Sherlock, G., Yoon, J. 2001; 11 (8): 1425-1433

Abstract

The exponential growth in the volume of accessible biological information has generated a confusion of voices surrounding the annotation of molecular information about genes and their products. The Gene Ontology (GO) project seeks to provide a set of structured vocabularies for specific biological domains that can be used to describe gene products in any organism. This work includes building three extensive ontologies to describe molecular function, biological process, and cellular component, and providing a community database resource that supports the use of these ontologies. The GO Consortium was initiated by scientists associated with three model organism databases: SGD, the Saccharomyces Genome database; FlyBase, the Drosophila genome database; and MGD/GXD, the Mouse Genome Informatics databases. Additional model organism database groups are joining the project. Each of these model organism information systems is annotating genes and gene products using GO vocabulary terms and incorporating these annotations into their respective model organism databases. Each database contributes its annotation files to a shared GO data resource accessible to the public at http://www.geneontology.org/. The GO site can be used by the community both to recover the GO vocabularies and to access the annotated gene product data sets from the model organism databases. The GO Consortium supports the development of the GO database resource and provides tools enabling curators and researchers to query and manipulate the vocabularies. We believe that the shared development of this molecular annotation resource will contribute to the unification of biological information.

View details for Web of Science ID 000170263900015

View details for PubMedID 11483584
Information resources at SGD: Gene Ontology, Gene Summary Paragraphs, and the Literature Guide. Fisk, D., Christie, K., Dolinski, K., Dwight, S., Issel-Tarver, L., Sethuraman, A., Cherry, J. M., Botstein, D. WILEY-BLACKWELL. 2001: S331–S331

View details for Web of Science ID 000170442100575
Visualization of expression clusters using Sammon's non-linear mapping BIOINFORMATICS Ewing, R. M., Cherry, J. M. 2001; 17 (7): 658-659

Abstract

A method of exploratory analysis and visualization of multi-dimensional gene expression data using Sammon's Non-Linear Mapping (NLM) is presented.

View details for Web of Science ID 000170249100012

View details for PubMedID 11448886
Computer manipulation of DNA and protein sequences. Current protocols in molecular biology / edited by Frederick M. Ausubel ... [et al.] Cherry, J. M. 2001; Chapter 7: Unit7 7-?

Abstract

This unit outlines a variety of methods by which DNA sequences can be manipulated by computers. Procedures for entering sequence data into the computer and assembling raw sequence data into a contiguous sequence are described first, followed by a description of methods of analyzing and manipulating sequences--e.g., verifying sequences, constructing restriction maps, designing oligonucleotides, identifying protein-coding regions, and predicting secondary structures. This unit also provides information on the large amount of software available for sequence analysis. The appendix to this unit lists some of the commercial software, shareware, and free software related to DNA sequence manipulation. The goal of this unit is to serve as a starting point for researchers interested in utilizing the tremendous sequencing resources available to the computer-knowledgeable molecular biology laboratory.

View details for DOI 10.1002/0471142727.mb0707s30

View details for PubMedID 18265271
Characteristics of amino acids. Current protocols in molecular biology / edited by Frederick M. Ausubel ... [et al.] Ellington, A., Cherry, J. M. 2001; Appendix 1: Appendix 1C-?

Abstract

This appendix presents useful basic information, including common abbreviations, useful measurements and data, characteristics of amino acids and nucleic acids, information on radioactivity and the safe use of radioisotopes and other hazardous chemicals, conversions for centrifuges and rotors, characteristics of common detergents, and common conversion factors.

View details for DOI 10.1002/0471142727.mba01cs33

View details for PubMedID 18265025
Genome comparisons highlight similarity and diversity within the eukaryotic kingdoms CURRENT OPINION IN CHEMICAL BIOLOGY Ball, C. A., Cherry, J. M. 2001; 5 (1): 86-89

Abstract

In 2000, the number of completely sequenced eukaryotic genomes increased to four. The addition of Drosophila and Arabidopsis into this cohort permits additional insights into the processes that have shaped evolution. Analysis and comparisons of both completed genomes and partially sequenced genomes have already shed light on mechanisms such as gene duplication and gene loss that have long been hypothesized to be major forces in speciation. Indeed, duplicate gene pairs in Saccharomyces, Arabidopsis, Caenorhabditis and Drosophila are high: 30%, 60%, 48% and 40%, respectively. Evidence of horizontal gene-transfer, thought to be a major evolutionary force in bacteria, has been found in Arabidopsis. The release of the 'first draft' of the human genome sequence in 2000 heralds a new stage of biological study. Understanding the as-yet-unannotated human genome will be largely based on conclusions, techniques and tools developed during the analysis and comparison of the genome of these four model organisms.

View details for Web of Science ID 000167051500014

View details for PubMedID 11166654
Saccharomyces Genome Database provides tools to survey gene expression and functional analysis data NUCLEIC ACIDS RESEARCH Ball, C. A., Jin, H., Sherlock, G., Weng, S., Matese, J. C., Andrada, R., Binkley, G., Dolinski, K., Dwight, S. S., Harris, M. A., Issel-Tarver, L., SCHROEDER, R., Botstein, D., Cherry, J. M. 2001; 29 (1): 80-81

Abstract

Upon the completion of the SACCHAROMYCES: cerevisiae genomic sequence in 1996 [Goffeau,A. et al. (1997) NATURE:, 387, 5], several creative and ambitious projects have been initiated to explore the functions of gene products or gene expression on a genome-wide scale. To help researchers take advantage of these projects, the SACCHAROMYCES: Genome Database (SGD) has created two new tools, Function Junction and Expression Connection. Together, the tools form a central resource for querying multiple large-scale analysis projects for data about individual genes. Function Junction provides information from diverse projects that shed light on the role a gene product plays in the cell, while Expression Connection delivers information produced by the ever-increasing number of microarray projects. WWW access to SGD is available at genome-www.stanford. edu/Saccharomyces/.

View details for Web of Science ID 000166360300019

View details for PubMedID 11125055
The Stanford Microarray Database NUCLEIC ACIDS RESEARCH Sherlock, G., Hernandez-Boussard, T., Kasarskis, A., Binkley, G., Matese, J. C., Dwight, S. S., Kaloper, M., Weng, S., Jin, H., Ball, C. A., Eisen, M. B., Spellman, P. T., Brown, P. O., Botstein, D., Cherry, J. M. 2001; 29 (1): 152-155

Abstract

The Stanford Microarray Database (SMD) stores raw and normalized data from microarray experiments, and provides web interfaces for researchers to retrieve, analyze and visualize their data. The two immediate goals for SMD are to serve as a storage site for microarray data from ongoing research at Stanford University, and to facilitate the public dissemination of that data once published, or released by the researcher. Of paramount importance is the connection of microarray data with the biological data that pertains to the DNA deposited on the microarray (genes, clones etc.). SMD makes use of many public resources to connect expression information to the relevant biology, including SGD [Ball,C.A., Dolinski,K., Dwight,S.S., Harris,M.A., Issel-Tarver,L., Kasarskis,A., Scafe,C.R., Sherlock,G., Binkley,G., Jin,H. et al. (2000) Nucleic Acids Res., 28, 77-80], YPD and WormPD [Costanzo,M.C., Hogan,J.D., Cusick,M.E., Davis,B.P., Fancher,A.M., Hodges,P.E., Kondu,P., Lengieza,C., Lew-Smith,J.E., Lingner,C. et al. (2000) Nucleic Acids Res., 28, 73-76], Unigene [Wheeler,D.L., Chappey,C., Lash,A.E., Leipe,D.D., Madden,T.L., Schuler,G.D., Tatusova,T.A. and Rapp,B.A. (2000) Nucleic Acids Res., 28, 10-14], dbEST [Boguski,M.S., Lowe,T.M. and Tolstoshev,C.M. (1993) Nature Genet., 4, 332-333] and SWISS-PROT [Bairoch,A. and Apweiler,R. (2000) Nucleic Acids Res., 28, 45-48] and can be accessed at http://genome-www.stanford.edu/microarray.

View details for Web of Science ID 000166360300039

View details for PubMedID 11125075
Gene Ontology: tool for the unification of biology NATURE GENETICS Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., RINGWALD, M., Rubin, G. M., Sherlock, G. 2000; 25 (1): 25-29

View details for PubMedID 10802651
Comparative genomics of the eukaryotes SCIENCE Rubin, G. M., Yandell, M. D., Wortman, J. R., Miklos, G. L., Nelson, C. R., Hariharan, I. K., Fortini, M. E., Li, P. W., Apweiler, R., Fleischmann, W., Cherry, J. M., Henikoff, S., Skupski, M. P., Misra, S., Ashburner, M., Birney, E., Boguski, M. S., Brody, T., Brokstein, P., Celniker, S. E., Chervitz, S. A., Coates, D., Cravchik, A., Gabrielian, A., Galle, R. F., Gelbart, W. M., George, R. A., Goldstein, L. S., Gong, F. C., Guan, P., Harris, N. L., Hay, B. A., Hoskins, R. A., Li, J. Y., Li, Z. Y., HYNES, R. O., Jones, S. J., Kuehl, P. M., Lemaitre, B., Littleton, J. T., Morrison, D. K., Mungall, C., O'Farrell, P. H., Pickeral, O. K., Shue, C., Vosshall, L. B., Zhang, J., Zhao, Q., Zheng, X. Q., Zhong, F., Zhong, W. Y., Gibbs, R., Venter, J. C., Adams, M. D., Lewis, S. 2000; 287 (5461): 2204-2215

Abstract

A comparative analysis of the genomes of Drosophila melanogaster, Caenorhabditis elegans, and Saccharomyces cerevisiae-and the proteins they are predicted to encode-was undertaken in the context of cellular, developmental, and evolutionary processes. The nonredundant protein sets of flies and worms are similar in size and are only twice that of yeast, but different gene families are expanded in each genome, and the multidomain proteins and signaling pathways of the fly and worm are far more complex than those of yeast. The fly has orthologs to 177 of the 289 human disease genes examined and provides the foundation for rapid analysis of some of the basic processes involved in human disease.

View details for Web of Science ID 000086049100035

View details for PubMedID 10731134

View details for PubMedCentralID PMC2754258
The genome sequence of Drosophila melanogaster SCIENCE Adams, M. D., Celniker, S. E., Holt, R. A., Evans, C. A., Gocayne, J. D., Amanatides, P. G., Scherer, S. E., Li, P. W., Hoskins, R. A., Galle, R. F., George, R. A., Lewis, S. E., Richards, S., Ashburner, M., Henderson, S. N., Sutton, G. G., Wortman, J. R., Yandell, M. D., Zhang, Q., Chen, L. X., Brandon, R. C., Rogers, Y. H., Blazej, R. G., Champe, M., Pfeiffer, B. D., Wan, K. H., Doyle, C., Baxter, E. G., Helt, G., Nelson, C. R., Miklos, G. L., Abril, J. F., Agbayani, A., An, H. J., Andrews-Pfannkoch, C., Baldwin, D., Ballew, R. M., Basu, A., Baxendale, J., Bayraktaroglu, L., Beasley, E. M., Beeson, K. Y., Benos, P. V., Berman, B. P., Bhandari, D., Bolshakov, S., Borkova, D., Botchan, M. R., Bouck, J., Brokstein, P., Brottier, P., Burtis, K. C., Busam, D. A., Butler, H., Cadieu, E., Center, A., Chandra, I., Cherry, J. M., Cawley, S., Dahlke, C., Davenport, L. B., DAVIES, A., de Pablos, B., Delcher, A., Deng, Z. M., Mays, A. D., Dew, I., Dietz, S. M., Dodson, K., Doup, L. E., Downes, M., Dugan-Rocha, S., Dunkov, B. C., Dunn, P., Durbin, K. J., Evangelista, C. C., Ferraz, C., Ferriera, S., Fleischmann, W., Fosler, C., Gabrielian, A. E., Garg, N. S., Gelbart, W. M., Glasser, K., Glodek, A., Gong, F. C., Gorrell, J. H., Gu, Z. P., Guan, P., Harris, M., Harris, N. L., Harvey, D., Heiman, T. J., HERNANDEZ, J. R., Houck, J., Hostin, D., Houston, D. A., Howland, T. J., Wei, M. H., Ibegwam, C., Jalali, M., Kalush, F., Karpen, G. H., Ke, Z. X., Kennison, J. A., Ketchum, K. A., Kimmel, B. E., Kodira, C. D., Kraft, C., Kravitz, S., Kulp, D., Lai, Z. W., Lasko, P., Lei, Y. D., Levitsky, A. A., Li, J. Y., Li, Z. Y., Liang, Y., Lin, X. Y., Liu, X. J., Mattei, B., McIntosh, T. C., McLeod, M. P., McPherson, D., Merkulov, G., Milshina, N. V., Mobarry, C., Morris, J., Moshrefi, A., Mount, S. M., Moy, M., Murphy, B., Murphy, L., Muzny, D. M., Nelson, D. L., Nelson, D. R., Nelson, K. A., Nixon, K., Nusskern, D. R., Pacleb, J. M., Palazzolo, M., Pittman, G. S., Pan, S., Pollard, J., Puri, V., Reese, M. G., Reinert, K., Remington, K., Saunders, R. D., Scheeler, F., Shen, H., Shue, B. C., Siden-Kiamos, I., Simpson, M., Skupski, M. P., Smith, T., Spier, E., Spradling, A. C., Stapleton, M., Strong, R., Sun, E., Svirskas, R., Tector, C., Turner, R., Venter, E., Wang, A. H., Wang, X., Wang, Z. Y., Wassarman, D. A., Weinstock, G. M., Weissenbach, J., Williams, S. M., Woodage, T., Worley, K. C., Wu, D., Yang, S., Yao, Q. A., Ye, J., Yeh, R. F., Zaveri, J. S., Zhan, M., Zhang, G. G., Zhao, Q., Zheng, L. S., Zheng, X. Q., Zhong, F. N., Zhong, W. Y., Zhou, X. J., Zhu, S. P., Zhu, X. H., Smith, H. O., Gibbs, R. A., Myers, E. W., Rubin, G. M., Venter, J. C. 2000; 287 (5461): 2185-2195

Abstract

The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes approximately 13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.

View details for Web of Science ID 000086049100033

View details for PubMedID 10731132
Integrating functional genomic information into the Saccharomyces genome database NUCLEIC ACIDS RESEARCH Ball, C. A., Dolinski, K., Dwight, S. S., Harris, M. A., Issel-Tarver, L., Kasarskis, A., Scafe, C. R., Sherlock, G., Binkley, G., Jin, H., Kaloper, M., Orr, S. D., Schroeder, M., Weng, S., Zhu, Y., Botstein, D., Cherry, J. M. 2000; 28 (1): 77-80

Abstract

The Saccharomyces Genome Database (SGD) stores and organizes information about the nearly 6200 genes in the yeast genome. The information is organized around the 'locus page' and directs users to the detailed information they seek. SGD is endeavoring to integrate the existing information about yeast genes with the large volume of data generated by functional analyses that are beginning to appear in the literature and on web sites. New features will include searches of systematic analyses and Gene Summary Paragraphs that succinctly review the literature for each gene. In addition to current information, such as gene product and phenotype descriptions, the new locus page will also describe a gene product's cellular process, function and localization using a controlled vocabulary developed in collaboration with two other model organism databases. We describe these developments in SGD through the newly reorganized locus page. The SGD is accessible via the WWW at http://genome-www.stanford.edu/Saccharomyces/

View details for Web of Science ID 000084896300020

View details for PubMedID 10592186
Gene Ontology: a controlled vocabulary to describe the function, biological process and cellular location of gene products in genome databases. Shaw, D. R., Ashbumer, M., Blake, J. A., Baldarelli, R. M., Botstein, D., Davis, A. P., Cherry, J. M., Lewis, S., Lutz, C. M., Richardson, J. E., Eppig, J. T. CELL PRESS. 1999: A419–A419

View details for Web of Science ID 000082879802373
Unified display of Arabidopsis thaliana physical maps from AtDB, the A.thaliana database NUCLEIC ACIDS RESEARCH Rhee, S. Y., Weng, S., Bongard-Pierce, D. K., Garcia-Hernandez, M., Malekian, A., Flanders, D. J., Cherry, J. M. 1999; 27 (1): 79-84

Abstract

In the past several years, there has been a tremendous effort to construct physical maps and to sequence the genome of Arabidopsis thaliana. As a result, four of the five chromosomes are completely covered by overlapping clones except at the centromeric and nucleolus organizer regions (NOR). In addition, over 30% of the genome has been sequenced and completion is anticipated by the end of the year 2000. Despite these accomplishments, the physical maps are provided in many formats on laboratories' Web sites. These data are thus difficult to obtain in a coherent manner for researchers. To alleviate this problem, AtDB (Arabidopsis thaliana DataBase, URL: http://genome-www.stanford.edu/Arabidopsis/) has constructed a unified display of the physical maps where all publicly available physical-map data for all chromosomes are presented through the Web in a clickable, 'on-the-fly' graphic, created by CGI programs that directly consult our relational database.

View details for Web of Science ID 000077983000018

View details for PubMedID 9847147
Using the Saccharomyces Genome Database (SGD) for analysis of protein similarities and structure NUCLEIC ACIDS RESEARCH Chervitz, S. A., Hester, E. T., Ball, C. A., Dolinski, K., Dwight, S. S., Harris, M. A., Juvik, G., Malekian, A., Roberts, S., Roe, T., Scafe, C., Schroeder, M., Sherlock, G., Weng, S., Zhu, Y., Cherry, J. M., Botstein, D. 1999; 27 (1): 74-78

Abstract

The Saccharomyces Genome Database (SGD) collects and organizes information about the molecular biology and genetics of the yeast Saccharomyces cerevisiae. The latest protein structure and comparison tools available at SGD are presented here. With the completion of the yeast sequence and the Caenorhabditis elegans sequence soon to follow, comparison of proteins from complete eukaryotic proteomes will be an extremely powerful way to learn more about a particular protein's structure, its function, and its relationships with other proteins. SGD can be accessed through the World Wide Web at http://genome-www.stanford.edu/Saccharomyces/

View details for Web of Science ID 000077983000017

View details for PubMedID 9847146
Comparison of the complete protein sets of worm and yeast: Orthology and divergence SCIENCE Chervitz, S. A., Aravind, L., Sherlock, G., Ball, C. A., Koonin, E. V., Dwight, S. S., Harris, M. A., Dolinski, K., Mohr, S., Smith, T., Weng, S., Cherry, J. M., Botstein, D. 1998; 282 (5396): 2022-2028

Abstract

Comparative analysis of predicted protein sequences encoded by the genomes of Caenorhabditis elegans and Saccharomyces cerevisiae suggests that most of the core biological functions are carried out by orthologous proteins (proteins of different species that can be traced back to a common ancestor) that occur in comparable numbers. The specialized processes of signal transduction and regulatory control that are unique to the multicellular worm appear to use novel proteins, many of which re-use conserved domains. Major expansion of the number of some of these domains seen in the worm may have contributed to the advent of multicellularity. The proteins conserved in yeast and worm are likely to have orthologs throughout eukaryotes; in contrast, the proteins unique to the worm may well define metazoans.

View details for Web of Science ID 000077467100036

View details for PubMedID 9851918
Expanding yeast knowledge online YEAST Dolinski, K., Ball, C. A., Chervitz, S. A., Dwight, S. S., Harris, M. A., Roberts, S., Roe, T., Cherry, J. M., Botstein, D. 1998; 14 (16): 1453-1469

Abstract

The completion of the Saccharomyces cerevisiae genome sequencing project and the continued development of improved technology for large-scale genome analysis have led to tremendous growth in the amount of new yeast genetics and molecular biology data. Efficient organization, presentation, and dissemination of this information are essential if researchers are to exploit this knowledge. In addition, the development of tools that provide efficient analysis of this information and link it with pertinent information from other systems is becoming increasingly important at a time when the complete genome sequences of other organisms are becoming available. The aim of this review is to familiarize biologists with the type of data resources currently available on the World Wide Web (WWW).

View details for Web of Science ID 000077792400003

View details for PubMedID 9885151
Arabidopsis thaliana: A model plant for genome analysis SCIENCE Meinke, D. W., Cherry, J. M., Dean, C., Rounsley, S. D., Koornneef, M. 1998; 282 (5389): 662-?

Abstract

Arabidopsis thaliana is a small plant in the mustard family that has become the model system of choice for research in plant biology. Significant advances in understanding plant growth and development have been made by focusing on the molecular genetics of this simple angiosperm. The 120-megabase genome of Arabidopsis is organized into five chromosomes and contains an estimated 20,000 genes. More than 30 megabases of annotated genomic sequence has already been deposited in GenBank by a consortium of laboratories in Europe, Japan, and the United States. The entire genome is scheduled to be sequenced by the end of the year 2000. Reaching this milestone should enhance the value of Arabidopsis as a model for plant biology and the analysis of complex organisms in general.

View details for Web of Science ID 000076607500039

View details for PubMedID 9784120
Genome maps 9. Arabidopsis thaliana. Wall chart. Science Rhee, S. Y., Weng, S., Flanders, D., Cherry, J. M., Dean, C., Lister, C., Anderson, M., Koornneef, M., Meinke, D. W., Nickle, T., Smith, K., Rounsley, S. D. 1998; 282 (5389): 663-667

View details for PubMedID 9841422
AtDB, the Arabidopsis thaliana database, and graphical-web-display of progress by the Arabidopsis genome initiative NUCLEIC ACIDS RESEARCH Flanders, D. J., Weng, S. A., Petel, F. X., Cherry, J. M. 1998; 26 (1): 80-84

Abstract

AtDB, the Arabidopsis thaliana Database, has a primary role to provide public access to the collected genomic information for A. thaliana via the World Wide Web (URL: http://genome-www.stanford. edu/ ). AtDB presents interactive physical and genetics maps that are hyperlinked with detailed information about the clones and markers placed on these maps. A large literature collection on Arabidopsis , contact information on researchers worldwide, laboratory method manuals and other information useful to plant molecular biologists are also provided. This paper discusses the database-driven clickable displays that provide easy navigation within a variety of genomic maps, including those summarizing progress of the international Arabidopsis genomic sequencing effort, AGI (the Arabidopsis Genome Initiative). The interface uses client-side hyperlinked GIF-images that direct the user to detailed database-information. A new BLAST service is also described. This gives users access to the thousands of Arabidopsis BAC clone end-sequences and includes hyperlinked images summarizing the search results. The linking of genetic and physically mapped regions and their sequence into information for loci within that region is an ongoing goal for this project.

View details for Web of Science ID 000071778900017

View details for PubMedID 9399805
SGD: Saccharomyces Genome Database NUCLEIC ACIDS RESEARCH Cherry, J. M., Adler, C., Ball, C., Chervitz, S. A., Dwight, S. S., Hester, E. T., Jia, Y. K., Juvik, G., Roe, T., Schroeder, M., Weng, S. A., Botstein, D. 1998; 26 (1): 73-79

Abstract

The Saccharomyces Genome Database (SGD) provides Internet access to the complete Saccharomyces cerevisiae genomic sequence, its genes and their products, the phenotypes of its mutants, and the literature supporting these data. The amount of information and the number of features provided by SGD have increased greatly following the release of the S.cerevisiae genomic sequence, which is currently the only complete sequence of a eukaryotic genome. SGD aids researchers by providing not only basic information, but also tools such as sequence similarity searching that lead to detailed information about features of the genome and relationships between genes. SGD presents information using a variety of user-friendly, dynamically created graphical displays illustrating physical, genetic and sequence feature maps. SGD can be accessed via the World Wide Web at http://genome-www.stanford.edu/Saccharomyces/

View details for Web of Science ID 000071778900016

View details for PubMedID 9399804
Genetics - Yeast as a model organism SCIENCE Botstein, D., Chervitz, S. A., Cherry, J. M. 1997; 277 (5330): 1259-1260

View details for Web of Science ID A1997XT82700041

View details for PubMedID 9297238
Arabidopsis genomic information from AtDB. Cherry, J. M., Flanders, D. J., Petel, F. X., Weng, S. AMER SOC PLANT BIOLOGISTS. 1997: 11003–

View details for Web of Science ID A1997XL11900023
The nucleotide sequence of Saccharomyces cerevisiae chromosome XVI NATURE Bussey, H., Storms, R. K., Ahmed, A., Albermann, K., Allen, E., Ansorge, W., Araujo, R., Aparicio, A., Barrell, B., Badcock, K., Benes, V., Botstein, D., Bowman, S., Bruckner, M., Carpenter, J., Cherry, J. M., Chung, E., Churcher, C., COSTER, F., Davis, K., Davis, R. W., Dietrich, F. S., DELIUS, H., DiPaolo, T., Dubois, E., Dusterhoft, A., Duncan, M., Floeth, M., Fortin, N., Friesen, J. D., Fritz, C., Goffeau, A., Hall, J., Hebling, U., Heumann, K., Hilbert, H., Hillier, L., HunickeSmith, S., HYMAN, R., Johnston, M., Kalman, S., Kleine, K., Komp, C., Kurdi, O., Lashkari, D., Lew, H., Lin, A., LIN, D., Louis, E. J., Marathe, R., Messenguy, F., Mewes, H. W., Mirtipati, S., Moestl, D., MullerAuer, S., Namath, A., Nentwich, U., Oefner, P., Pearson, D., Petel, F. X., Pohl, T. M., Purnelle, B., Rajandream, M. A., Rechmann, S., Rieger, M., Riles, L., Roberts, D., Schafer, M., Scharfe, M., Scherens, B., Schramm, S., Schroder, M., Sdicu, A. M., Tettelin, H., Urrestarazu, L. A., Ushinsky, S., Vierendeels, F., Vissers, S., Voss, H., Walsh, S. V., Wambutt, R., Wang, Y., Wedler, E., Wedler, H., WINNETT, E., Zhong, W. W., Zollner, A., VO, D. H., Hani, J. 1997; 387 (6632): 103-105

Abstract

The nucleotide sequence of the 948,061 base pairs of chromosome XVI has been determined, completing the sequence of the yeast genome. Chromosome XVI was the last yeast chromosome identified, and some of the genes mapped early to it, such as GAL4, PEP4 and RAD1 (ref. 2) have played important roles in the development of yeast biology. The architecture of this final chromosome seems to be typical of the large yeast chromosomes, and shows large duplications with other yeast chromosomes. Chromosome XVI contains 487 potential protein-encoding genes, 17 tRNA genes and two small nuclear RNA genes; 27% of the genes have significant similarities to human gene products, and 48% are new and of unknown biological function. Systematic efforts to explore gene function have begun.

View details for Web of Science ID A1997XB54600015

View details for PubMedID 9169875
The nucleotide sequence of Saccharomyces cerevisiae chromosome IV NATURE Jacq, C., ALTMORBE, J., Andre, B., Arnold, W., Bahr, A., Ballesta, J. P., Bargues, M., Baron, L., Becker, A., Biteau, N., Blocker, H., Blugeon, C., Boskovic, J., Brandt, P., Bruckner, M., Buitrago, M. J., COSTER, F., Delaveau, T., DELREY, F., Dujon, B., Eide, L. G., GarciaCantalejo, J. M., Goffeau, A., GomezPeris, A., Granotier, C., Hanemann, V., Hankeln, T., Hoheisel, J. D., Jager, W., Jimenez, A., Jonniaux, J. L., KRAMER, C., Kuster, H., LAAMANEN, P., Legros, Y., Louis, E., MollerRieker, S., Monnet, A., Moro, M., MullerAuer, S., Nussbaumer, B., Paricio, N., Paulin, L., Perea, J., PEREZALONSO, M., PEREZORTIN, J. E., Pohl, T. M., Prydz, H., Purnelle, B., Rasmussen, S. W., Remacha, M., Revuelta, J. L., Rieger, M., Salom, D., Saluz, H. P., Saiz, J. E., Saren, A. M., Schafer, M., Scharfe, M., Schmidt, E. R., Schneider, C., Scholler, P., Schwarz, S., SolerMira, A., Urrestarazu, L. A., Verhasselt, P., Vissers, S., Voet, M., Volckaert, G., Wagner, G., Wambutt, R., Wedler, E., Wedler, H., Wolfl, S., Harris, D. E., Bowman, S., Brown, D., Churcher, C. M., Connor, R., Dedman, K., Gentles, S., Hamlin, N., Hunt, S., Jones, L., McDonald, S., Murphy, L., Niblett, D., Odell, C., Oliver, K., Rajandream, M. A., Richards, C., Shore, L., Walsh, S. V., Barrell, B. G., Dietrich, F. S., Mulligan, J., Allen, E., Araujo, R., Aviles, E., Berno, O., Carpenter, J., Chen, E., Cherry, J. M., Chung, E., Duncan, M., HunickeSmith, S., HYMAN, R., Komp, C., Lashkari, D., Lew, H., LIN, D., MOSEDALE, D., Nakahara, K., Namath, A., Oefner, P., Oh, C., Petel, F. X., Roberts, D., Schramm, S., Schroeder, M., Shogren, T., Shroff, N., Winant, A., Yelton, M., Botstein, D., Davis, R. W., Johnston, M., Hillier, L., Riles, L., Albermann, K., Hani, J., Heumann, K., Kleine, K., Mewes, H. W., Zollner, A., Zaccaria, P. 1997; 387 (6632): 75-78

Abstract

The complete DNA sequence of the yeast Saccharomyces cerevisiae chromosome IV has been determined. Apart from chromosome XII, which contains the 1-2 Mb rDNA cluster, chromosome IV is the longest S. cerevisiae chromosome. It was split into three parts, which were sequenced by a consortium from the European Community, the Sanger Centre, and groups from St Louis and Stanford in the United States. The sequence of 1,531,974 base pairs contains 796 predicted or known genes, 318 (39.9%) of which have been previously identified. Of the 478 new genes, 225 (28.3%) are homologous to previously identified genes and 253 (32%) have unknown functions or correspond to spurious open reading frames (ORFs). On average there is one gene approximately every two kilobases. Superimposed on alternating regional variations in G+C composition, there is a large central domain with a lower G+C content that contains all the yeast transposon (Ty) elements and most of the tRNA genes. Chromosome IV shares with chromosomes II, V, XII, XIII and XV some long clustered duplications which partly explain its origin.

View details for Web of Science ID A1997XB54600007

View details for PubMedID 9169867
The nucleotide sequence of Saccharomyces cerevisiae chromosome V NATURE Dietrich, F. S., Mulligan, J., Hennessy, K., Yelton, M. A., Allen, E., Araujo, R., Aviles, E., Berno, A., Brennan, T., Carpenter, J., Chen, E., Cherry, J. M., Chung, E., Duncan, M., Guzman, E., Hartzell, G., HunickeSmith, S., Hyman, R. W., Kayser, A., Komp, C., Lashkari, D., Lew, H., LIN, D., MOSEDALE, D., Nakahara, K., Namath, A., Norgren, R., Oefner, P., Oh, C., Petel, F. X., Roberts, D., Sehl, P., Schramm, S., Shogren, T., Smith, V., Taylor, P., Wei, Y., Botstein, D., Davis, R. W. 1997; 387 (6632): 78-81

Abstract

Here we report the sequence of 569,202 base pairs of Saccharomyces cerevisiae chromosome V. Analysis of the sequence revealed a centromere, two telomeres and 271 open reading frames (ORFs) plus 13 tRNAs and four small nuclear RNAs. There are two Tyl transposable elements, each of which contains an ORF (included in the count of 271). Of the ORFs, 78 (29%) are new, 81 (30%) have potential homologues in the public databases, and 112 (41%) are previously characterized yeast genes.

View details for Web of Science ID A1997XB54600008

View details for PubMedID 9169868

View details for PubMedCentralID PMC3057095
Genetic and physical maps of Saccharomyces cerevisiae NATURE Cherry, J. M., Ball, C., Weng, S., Juvik, G., Schmidt, R., Adler, C., Dunn, B., Dwight, S., Riles, L., Mortimer, R. K., Botstein, D. 1997; 387 (6632): 67-73

Abstract

Genetic and physical maps for the 16 chromosomes of Saccharomyces cerevisiae are presented. The genetic map is the result of 40 years of genetic analysis. The physical map was produced from the results of an international systematic sequencing effort. The data for the maps are accessible electronically from the Saccharomyces Genome Database (SGD: http://genome-www.stanford. edu/Saccharomyces/).

View details for Web of Science ID A1997XB54600006

View details for PubMedID 9169866
Molecular linguistics: Extracting information from gene and protein sequences PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Botstein, D., Cherry, J. M. 1997; 94 (11): 5506-5507

View details for Web of Science ID A1997XB71100005

View details for PubMedID 9159100

View details for PubMedCentralID PMC34160
Genetic nomenclature guide. Saccharomyces cerevisiae. Trends in genetics : TIG Cherry, J. M. 1995: 11-12

View details for PubMedID 7660459
AN INTEGRATED GENETIC RFLP MAP OF THE ARABIDOPSIS-THALIANA GENOME PLANT JOURNAL HAUGE, B. M., HANLEY, S. M., Cartinhour, S., Cherry, J. M., Goodman, H. M., Koornneef, M., Stam, P., Chang, C., Kempin, S., Medrano, L., Meyerowitz, E. M. 1993; 3 (5): 745-754

View details for Web of Science ID A1993LC75800013
DETECTION OF HERPES-SIMPLEX VIRUS THYMIDINE KINASE AND LATENCY-ASSOCIATED TRANSCRIPT GENE-SEQUENCES IN HUMAN HERPETIC CORNEAS BY POLYMERASE CHAIN-REACTION AMPLIFICATION INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE RONG, B. L., PAVANLANGSTON, D., Weng, Q. P., Martinez, R., Cherry, J. M., Dunkel, E. C. 1991; 32 (6): 1808-1815

Abstract

Herpes simplex virus (HSV) latency in sensory ganglion neurons is well documented, but the existence of extraneuronal corneal latency is less well defined. To investigate the possibility of extraneuronal latency during ocular HSV infection, corneal specimens from 18 patients with quiescent herpes simplex keratitis (HSK) were obtained at the time of keratoplasty. Polymerase chain reaction (PCR) amplification followed by southern blot hybridization with a radiolabeled oligonucleotide probe was done to detect the presence of HSV-1 genome in these human corneal samples. Two pairs of oligonucleotides from the region of the HSV thymidine kinase (TK) gene and the latency-associated transcript (LAT) gene were used as primers in the PCR amplification. The DNA sequences from either the TK or the LAT gene were identified in 15 of 18 HSK corneas (83%). These results demonstrate that the HSV genome was retained, at least in part, in human corneas during quiescent HSV infection, giving further support to the concept of corneal extraneuronal latency.

View details for Web of Science ID A1991FM17900014

View details for PubMedID 1851732
CODON USAGE TABLE FOR XENOPUS-LAEVIS METHODS IN CELL BIOLOGY Cherry, J. M. 1991; 36: 675-677

View details for Web of Science ID A1991MC41400038

View details for PubMedID 1811159
SACCHAROMYCES-CEREVISIAE HOMOSERINE KINASE IS HOMOLOGOUS TO PROKARYOTIC HOMOSERINE KINASES GENE Schultes, N. P., Ellington, A. D., Cherry, J. M., Szostak, J. W. 1990; 96 (2): 177-180

Abstract

The Saccharomyces cerevisiae gene (THR1) encoding homoserine kinase (HK; EC 2.7.1.39) was cloned by complementation in yeast. Disruption of the THR1 gene results in threonine auxotrophy in yeast. Comparison of the amino acid sequences of yeast and bacterial HKs reveals substantial similarity.

View details for Web of Science ID A1990EM78200004

View details for PubMedID 2176637
MUTATIONAL ANALYSIS OF CONSERVED NUCLEOTIDES IN A SELF-SPLICING GROUP-I INTRON JOURNAL OF MOLECULAR BIOLOGY COUTURE, S., Ellington, A. D., Gerber, A. S., Cherry, J. M., Doudna, J. A., Green, R., Hanna, M., Pace, U., Rajagopal, J., Szostak, J. W. 1990; 215 (3): 345-358

Abstract

We have constructed all single base substitutions in almost all of the highly conserved residues of the Tetrahymena self-splicing intron. Mutation of highly conserved residues almost invariably leads to loss of enzymatic activity. In many cases, activity could be regained by making additional mutations that restored predicted base-pairings; these second site suppressors in general confirm the secondary structure derived from phylogenetic data. At several positions, our suppression data can be most readily explained by assuming non-Watson-Crick base-pairings. In addition to the requirements imposed by the secondary structure, the sequence of the intron is constrained by "negative interactions", the exclusion of particular nucleotide sequences that would form undesirable secondary structures. A comparison of genetic and phylogenetic data suggests sites that may be involved in tertiary structural interactions.

View details for Web of Science ID A1990ED16700004

View details for PubMedID 1700131
GENETIC DISSECTION OF AN RNA ENZYME COLD SPRING HARBOR SYMPOSIA ON QUANTITATIVE BIOLOGY Doudna, J. A., Gerber, A. S., Cherry, J. M., Szostak, J. W. 1987; 52: 173-180

View details for Web of Science ID A1987P094200021

View details for PubMedID 2456876
THE INTERNALLY LOCATED TELOMERIC SEQUENCES IN THE GERM-LINE CHROMOSOMES OF TETRAHYMENA ARE AT THE ENDS OF TRANSPOSON-LIKE ELEMENTS CELL Cherry, J. M., Blackburn, E. H. 1985; 43 (3): 747-758

Abstract

The germ-line micronuclear genome of the ciliate Tetrahymena thermophila contains approximately 10(2) chromosome-internal blocks of tandemly repeated C4A2 sequences (mic C4A2). This repeated sequence is the telomeric sequence in the somatic macronucleus. Each of six cloned micC4A2 was found to be adjacent to a conserved 30 bp sequence, which we propose is the terminal inverted repeat of a family of DNA elements (the Tel-1 family). This 30 bp sequence contains a site for the infrequently cutting restriction enzyme Bst XI, which allows full-length Tel-1 elements to be cut out of the micronuclear genome. BAL 31 exonuclease digestion of Bst XI-cut micronuclear DNA showed the majority of micC4A2 blocks to be associated with the ends of the Tel-1 family. We propose that Tel-1 elements are transposable and suggest a novel mechanism to account for the origin of micC4A2, in which telomeric repeats are added to the ends of free linear forms of the transposable elements prior to reintegration.

View details for Web of Science ID A1985AWV6100022

View details for PubMedID 3000613
DNA termini in ciliate macronuclei. Cold Spring Harbor symposia on quantitative biology Blackburn, E. H., Budarf, M. L., Challoner, P. B., Cherry, J. M., Howard, E. A., Katzen, A. L., Pan, W. C., Ryan, T. 1983; 47: 1195-1207

View details for PubMedID 6407801
DNA TERMINI IN CILIATE MACRONUCLEI COLD SPRING HARBOR SYMPOSIA ON QUANTITATIVE BIOLOGY Blackburn, E. H., Budarf, M. L., Challoner, P. B., Cherry, J. M., Howard, E. A., Katzen, A. L., Pan, W. C., Ryan, T. 1982; 47: 1195-1207

View details for Web of Science ID A1982QR19100066
EVIDENCE FOR A PLASMA-MEMBRANE REDOX SYSTEM ON INTACT ASCITES TUMOR-CELLS WITH DIFFERENT METASTATIC CAPACITY BIOCHIMICA ET BIOPHYSICA ACTA Cherry, J. M., MACKELLAR, W., Morre, D. J., CRANE, F. L., Jacobsen, L. B., SCHIRRMACHER, V. 1981; 634 (1): 11-18

Abstract

A NADH-ferricyanide reductase of the external surface of intact mouse ascites tumor cells grown in culture was shown. The oxidation/reduction reaction was due to enzymatic rather than inorganic iron catalysis as demonstrated by the kinetics and specificity of the reaction. Activities of three markers for cytoplasmic contents were lacking with the intact tumor cells. The dehydrogenase activity was inhibited by p-chloromercuribenzoate, bathophenanthroline sulfonate, and the anticancer drug adriamycin. Sodium azide and potassium cyanide inhibited partially. The response to inhibitors resembled that of isolated plasma membranes rather than that of mitochondria. Concurrent with these findings, neither superoxide dismutase nor rotenone affected the redox activity. The findings provide evidence for the operation of a plasma membrane redox system at the surface of intact, living cells.

View details for Web of Science ID A1981KZ18600002

View details for PubMedID 7470494
ABSENCE OF GANGLIOSIDES IN A HIGHER PLANT EXPERIENTIA Cherry, J. M., Buckhout, T. J., Morre, D. J. 1978; 34 (11): 1433-1434

View details for Web of Science ID A1978FX18000016

Mike Cherry

Professor (Research) of Genetics, Emeritus

Academic Appointments

Administrative Appointments

Honors & Awards

Professional Education

Community and International Work

Topic

Partnering Organization(s)

Location

Ongoing Project

Opportunities for Student Involvement

Additional Info

Links

Current Research and Scholarly Interests

2025-26 Courses

Graduate and Fellowship Programs

All Publications

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract