John has performed project management for the Marine Metadata Interoperability Project and its Ontology Registry and Repository (http://mmisw.org/orr) and the National Science Foundation’s Ocean Observatories Initiative CyberInfrastructure project at UC San Diego, as well as at NASA Ames Research Center (via Sterling Software/Northrup Grumman), Monterey Bay Aquarium Research Institute.
John received a BA degree from UC Berkeley, in Computer Science and Statistics, with follow-on training/certification in quality management, project management (Project Management Professional), and scrum software development (ScrumAlliance).
Current Role at Stanford
John is a Technical Program Manager at Stanford University's School of Medicine. He leads the Center for Enhanced Data Annotation and Retrieval (CEDAR), and the NCBO BioPortal Repository, .
John's work encompasses whatever is needed: project management, product management, systems architecture, dev ops, and administration, to name a few fun roles.
Service, Volunteer and Community Work
Project Lead, MMI, MMI ORR and ESIP COR, Marine Metadata Interoperability Project (4/1/2014 - Present)
John provides leadership for the MMI project and its Ontology Registry and Repository. MMI is a community collaboration to improve metadata practices, resources, and services in the marine and earth sciences. Visit MMI at http://marinemetadata.org.
MMI provides an ontology registry and repository called the ORR, at http://mmisw.org/orr. This software has been deployed in multiple environments, including as the Community Ontology Repository (https://cor.esipfed.org) for the ESIP Federation (https://esipfed.org). John provides guidance and coordination activities for the ESIP Community Ontology Repository.
California, United States
Professional Affiliations and Activities
Chair, EarthCube Semantic Infrastructure Working Group (2015 - 2016)
NCBO Ontology Recommender 2.0: an enhanced approach for biomedical ontology recommendation.
Journal of biomedical semantics
2017; 8 (1): 21-?
Ontologies and controlled terminologies have become increasingly important in biomedical research. Researchers use ontologies to annotate their data with ontology terms, enabling better data integration and interoperability across disparate datasets. However, the number, variety and complexity of current biomedical ontologies make it cumbersome for researchers to determine which ones to reuse for their specific needs. To overcome this problem, in 2010 the National Center for Biomedical Ontology (NCBO) released the Ontology Recommender, which is a service that receives a biomedical text corpus or a list of keywords and suggests ontologies appropriate for referencing the indicated terms.We developed a new version of the NCBO Ontology Recommender. Called Ontology Recommender 2.0, it uses a novel recommendation approach that evaluates the relevance of an ontology to biomedical text data according to four different criteria: (1) the extent to which the ontology covers the input data; (2) the acceptance of the ontology in the biomedical community; (3) the level of detail of the ontology classes that cover the input data; and (4) the specialization of the ontology to the domain of the input data.Our evaluation shows that the enhanced recommender provides higher quality suggestions than the original approach, providing better coverage of the input data, more detailed information about their concepts, increased specialization for the domain of the input data, and greater acceptance and use in the community. In addition, it provides users with more explanatory information, along with suggestions of not only individual ontologies but also groups of ontologies to use together. It also can be customized to fit the needs of different ontology recommendation scenarios.Ontology Recommender 2.0 suggests relevant ontologies for annotating biomedical text data. It combines the strengths of its predecessor with a range of adjustments and new features that improve its reliability and usefulness. Ontology Recommender 2.0 recommends over 500 biomedical ontologies from the NCBO BioPortal platform, where it is openly available (both via the user interface at http://bioportal.bioontology.org/recommender , and via a Web service API).
View details for DOI 10.1186/s13326-017-0128-y
View details for PubMedID 28592275
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata that Describe Scientific Experiments.
The semantic Web--ISWC ... : ... International Semantic Web Conference ... proceedings. International Semantic Web Conference
2017; 10588: 103–10
The Center for Expanded Data Annotation and Retrieval (CEDAR) aims to revolutionize the way that metadata describing scientific experiments are authored. The software we have developed-the CEDAR Workbench-is a suite of Web-based tools and REST APIs that allows users to construct metadata templates, to fill in templates to generate high-quality metadata, and to share and manage these resources. The CEDAR Workbench provides a versatile, REST-based environment for authoring metadata that are enriched with terms from ontologies. The metadata are available as JSON, JSON-LD, or RDF for easy integration in scientific applications and reusability on the Web. Users can leverage our APIs for validating and submitting metadata to external repositories. The CEDAR Workbench is freely available and open-source.
View details for DOI 10.1007/978-3-319-68204-4_10
View details for PubMedID 32219223
View details for PubMedCentralID PMC7098808
Using association rule mining and ontologies to generate metadata recommendations from multiple biomedical databases.
Database : the journal of biological databases and curation
Metadata-the machine-readable descriptions of the data-are increasingly seen as crucial for describing the vast array of biomedical datasets that are currently being deposited in public repositories. While most public repositories have firm requirements that metadata must accompany submitted datasets, the quality of those metadata is generally very poor. A key problem is that the typical metadata acquisition process is onerous and time consuming, with little interactive guidance or assistance provided to users. Secondary problems include the lack of validation and sparse use of standardized terms or ontologies when authoring metadata. There is a pressing need for improvements to the metadata acquisition process that will help users to enter metadata quickly and accurately. In this paper, we outline a recommendation system for metadata that aims to address this challenge. Our approach uses association rule mining to uncover hidden associations among metadata values and to represent them in the form of association rules. These rules are then used to present users with real-time recommendations when authoring metadata. The novelties of our method are that it is able to combine analyses of metadata from multiple repositories when generating recommendations and can enhance those recommendations by aligning them with ontology terms. We implemented our approach as a service integrated into the CEDAR Workbench metadata authoring platform, and evaluated it using metadata from two public biomedical repositories: US-based National Center for Biotechnology Information BioSample and European Bioinformatics Institute BioSamples. The results show that our approach is able to use analyses of previously entered metadata coupled with ontology-based mappings to present users with accurate recommendations when authoring metadata.
View details for DOI 10.1093/database/baz059
View details for PubMedID 31210270
- The CAIRR Pipeline for Submitting Standards-Compliant B and T Cell Receptor Repertoire Sequencing Studies to the National Center for Biotechnology Information Repositories FRONTIERS IN IMMUNOLOGY 2018; 9
CEDAR OnDemand: a browser extension to generate ontology-based scientific metadata
2018; 19: 268
Public biomedical data repositories often provide web-based interfaces to collect experimental metadata. However, these interfaces typically reflect the ad hoc metadata specification practices of the associated repositories, leading to a lack of standardization in the collected metadata. This lack of standardization limits the ability of the source datasets to be broadly discovered, reused, and integrated with other datasets. To increase reuse, discoverability, and reproducibility of the described experiments, datasets should be appropriately annotated by using agreed-upon terms, ideally from ontologies or other controlled term sources.This work presents "CEDAR OnDemand", a browser extension powered by the NCBO (National Center for Biomedical Ontology) BioPortal that enables users to seamlessly enter ontology-based metadata through existing web forms native to individual repositories. CEDAR OnDemand analyzes the web page contents to identify the text input fields and associate them with relevant ontologies which are recommended automatically based upon input fields' labels (using the NCBO ontology recommender) and a pre-defined list of ontologies. These field-specific ontologies are used for controlling metadata entry. CEDAR OnDemand works for any web form designed in the HTML format. We demonstrate how CEDAR OnDemand works through the NCBI (National Center for Biotechnology Information) BioSample web-based metadata entry.CEDAR OnDemand helps lower the barrier of incorporating ontologies into standardized metadata entry for public data repositories. CEDAR OnDemand is available freely on the Google Chrome store https://chrome.google.com/webstore/search/CEDAROnDemand.
View details for PubMedID 30012108
- AgroPortal: A vocabulary and ontology repository for agronomy COMPUTERS AND ELECTRONICS IN AGRICULTURE 2018; 144: 126–43
The CAIRR Pipeline for Submitting Standards-Compliant B and T Cell Receptor Repertoire Sequencing Studies to the National Center for Biotechnology Information Repositories.
Frontiers in immunology
2018; 9: 1877
The adaptation of high-throughput sequencing to the B cell receptor and T cell receptor has made it possible to characterize the adaptive immune receptor repertoire (AIRR) at unprecedented depth. These AIRR sequencing (AIRR-seq) studies offer tremendous potential to increase the understanding of adaptive immune responses in vaccinology, infectious disease, autoimmunity, and cancer. The increasingly wide application of AIRR-seq is leading to a critical mass of studies being deposited in the public domain, offering the possibility of novel scientific insights through secondary analyses and meta-analyses. However, effective sharing of these large-scale data remains a challenge. The AIRR community has proposed minimal information about adaptive immune receptor repertoire (MiAIRR), a standard for reporting AIRR-seq studies. The MiAIRR standard has been operationalized using the National Center for Biotechnology Information (NCBI) repositories. Submissions of AIRR-seq data to the NCBI repositories typically use a combination of web-based and flat-file templates and include only a minimal amount of terminology validation. As a result, AIRR-seq studies at the NCBI are often described using inconsistent terminologies, limiting scientists' ability to access, find, interoperate, and reuse the data sets. In order to improve metadata quality and ease submission of AIRR-seq studies to the NCBI, we have leveraged the software framework developed by the Center for Expanded Data Annotation and Retrieval (CEDAR), which develops technologies involving the use of data standards and ontologies to improve metadata quality. The resulting CEDAR-AIRR (CAIRR) pipeline enables data submitters to: (i) create web-based templates whose entries are controlled by ontology terms, (ii) generate and validate metadata, and (iii) submit the ontology-linked metadata and sequence files (FASTQ) to the NCBI BioProject, BioSample, and Sequence Read Archive databases. Overall, CAIRR provides a web-based metadata submission interface that supports compliance with the MiAIRR standard. This pipeline is available at http://cairr.miairr.org, and will facilitate the NCBI submission process and improve the metadata quality of AIRR-seq studies.
View details for PubMedID 30166985
Fast and Accurate Metadata Authoring Using Ontology-Based Recommendations.
AMIA ... Annual Symposium proceedings. AMIA Symposium
2017; 2017: 1272–81
In biomedicine, high-quality metadata are crucial for finding experimental datasets, for understanding how experiments were performed, and for reproducing those experiments. Despite the recent focus on metadata, the quality of metadata available in public repositories continues to be extremely poor. A key difficulty is that the typical metadata acquisition process is time-consuming and error prone, with weak or nonexistent support for linking metadata to ontologies. There is a pressing need for methods and tools to speed up the metadata acquisition process and to increase the quality of metadata that are entered. In this paper, we describe a methodology and set of associated tools that we developed to address this challenge. A core component of this approach is a value recommendation framework that uses analysis of previously entered metadata and ontology-based metadata specifications to help users rapidly and accurately enter their metadata. We performed an initial evaluation of this approach using metadata from a public metadata repository.
View details for PubMedID 29854196
Issues in data management in observing systems and lessons learned
OCEANS 2006, VOLS 1-4
View details for Web of Science ID 000246002100217
Toward an ocean observing system of systems
OCEANS 2006, VOLS 1-4
View details for Web of Science ID 000246002100163
MBARI's SSDS: Operational, extensible data management for ocean observatories
3rd International Workshop on Scientific Use Submarine Cables and Related Technologies
IEEE. 2003: 288–292
View details for Web of Science ID 000186634900063