Bio
John has performed project management for the Marine Metadata Interoperability Project and its Ontology Registry and Repository (http://mmisw.org/orr) and the National Science Foundation’s Ocean Observatories Initiative CyberInfrastructure project at UC San Diego, as well as at NASA Ames Research Center (via Sterling Software/Northrup Grumman), Monterey Bay Aquarium Research Institute.
John received a BA degree from UC Berkeley, in Computer Science and Statistics, with follow-on training/certification in quality management, project management (Project Management Professional), and scrum software development (ScrumAlliance).
Current Role at Stanford
John is a Technical Program Manager at Stanford University's School of Medicine in the Stanford Center for Biomedical Informatics (BMIR). He helps lead the Center for Enhanced Data Annotation and Retrieval (CEDAR), and the BioPortal ontology repository, He is the Technical Lead for the Stanford-led collaboration building the NIH RADx Data Hub, a repository for diverse data sets from a large number of NIN-funded COVID-19 studies.
John's work encompasses whatever is needed: project management, product management, systems architecture, dev ops, and administration, to name a few fun roles. He has also played a part in proposal development, hiring management, job category definition, diversity advocacy, and financial reporting.
Service, Volunteer and Community Work
-
Project Lead, MMI ORR and ESIP COR, ESIP (Earth Sciences Information Partnership) (4/1/2006 - Present)
John provides leadership for the MMI Ontology Registry and Repository (ORR) and the ESIP Community Ontology Repository (COR). These are ontology hosting resources for the earth science community that have been in service since 2006 and 2018, respectively.
The MMI ORR (https://mmisw.org/orr) software base has been deployed in multiple environments, including as the Community Ontology Repository (https://cor.esipfed.org) for the ESIP Federation (https://esipfed.org). John provides guidance and coordination activities for the ESIP Community Ontology Repository.Location
California, United States
-
Co-Chair, Research Data Alliance Vocabulary and Semantic Services Interest Group (VSSIG), Research Data Alliance (1/1/2018 - Present)
The Vocabulary and Semantic Services interest Group coordinates sessions and activities at the Research Data Alliance (RDA) supporting the use of semantic services and vocabularies for research.
Location
International
Professional Affiliations and Activities
-
Chair, EarthCube Semantic Infrastructure Working Group (2015 - 2016)
All Publications
-
Modeling community standards for metadata as templates makes data FAIR.
Scientific data
2022; 9 (1): 696
Abstract
It is challenging to determine whether datasets are findable, accessible, interoperable, and reusable (FAIR) because the FAIR Guiding Principles refer to highly idiosyncratic criteria regarding the metadata used to annotate datasets. Specifically, the FAIR principles require metadata to be "rich" and to adhere to "domain-relevant" community standards. Scientific communities should be able to define their own machine-actionable templates for metadata that encode these "rich," discipline-specific elements. We have explored this template-based approach in the context of two software systems. One system is the CEDAR Workbench, which investigators use to author new metadata. The other is the FAIRware Workbench, which evaluates the metadata of archived datasets for their adherence to community standards. Benefits accrue when templates for metadata become central elements in an ecosystem of tools to manage online datasets-both because the templates serve as a community reference for what constitutes FAIR data, and because they embody that perspective in a form that can be distributed among a variety of software applications to assist with data stewardship and data sharing.
View details for DOI 10.1038/s41597-022-01815-3
View details for PubMedID 36371407
-
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata that Describe Scientific Experiments.
The semantic Web--ISWC ... : ... International Semantic Web Conference ... proceedings. International Semantic Web Conference
2017; 10588: 103-110
Abstract
The Center for Expanded Data Annotation and Retrieval (CEDAR) aims to revolutionize the way that metadata describing scientific experiments are authored. The software we have developed-the CEDAR Workbench-is a suite of Web-based tools and REST APIs that allows users to construct metadata templates, to fill in templates to generate high-quality metadata, and to share and manage these resources. The CEDAR Workbench provides a versatile, REST-based environment for authoring metadata that are enriched with terms from ontologies. The metadata are available as JSON, JSON-LD, or RDF for easy integration in scientific applications and reusability on the Web. Users can leverage our APIs for validating and submitting metadata to external repositories. The CEDAR Workbench is freely available and open-source.
View details for DOI 10.1007/978-3-319-68204-4_10
View details for PubMedID 32219223
View details for PubMedCentralID PMC7098808
-
NCBO Ontology Recommender 2.0: an enhanced approach for biomedical ontology recommendation.
Journal of biomedical semantics
2017; 8 (1): 21-?
Abstract
Ontologies and controlled terminologies have become increasingly important in biomedical research. Researchers use ontologies to annotate their data with ontology terms, enabling better data integration and interoperability across disparate datasets. However, the number, variety and complexity of current biomedical ontologies make it cumbersome for researchers to determine which ones to reuse for their specific needs. To overcome this problem, in 2010 the National Center for Biomedical Ontology (NCBO) released the Ontology Recommender, which is a service that receives a biomedical text corpus or a list of keywords and suggests ontologies appropriate for referencing the indicated terms.We developed a new version of the NCBO Ontology Recommender. Called Ontology Recommender 2.0, it uses a novel recommendation approach that evaluates the relevance of an ontology to biomedical text data according to four different criteria: (1) the extent to which the ontology covers the input data; (2) the acceptance of the ontology in the biomedical community; (3) the level of detail of the ontology classes that cover the input data; and (4) the specialization of the ontology to the domain of the input data.Our evaluation shows that the enhanced recommender provides higher quality suggestions than the original approach, providing better coverage of the input data, more detailed information about their concepts, increased specialization for the domain of the input data, and greater acceptance and use in the community. In addition, it provides users with more explanatory information, along with suggestions of not only individual ontologies but also groups of ontologies to use together. It also can be customized to fit the needs of different ontology recommendation scenarios.Ontology Recommender 2.0 suggests relevant ontologies for annotating biomedical text data. It combines the strengths of its predecessor with a range of adjustments and new features that improve its reliability and usefulness. Ontology Recommender 2.0 recommends over 500 biomedical ontologies from the NCBO BioPortal platform, where it is openly available (both via the user interface at http://bioportal.bioontology.org/recommender , and via a Web service API).
View details for DOI 10.1186/s13326-017-0128-y
View details for PubMedID 28592275
-
A Simple Standard for Sharing Ontological Mappings (SSSOM).
Database : the journal of biological databases and curation
2022; 2022
Abstract
Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Or are they associated in some other way? Such relationships between the mapped terms are often not documented, which leads to incorrect assumptions and makes them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Furthermore, the lack of descriptions of how mappings were done makes it hard to combine and reconcile mappings, particularly curated and automated ones. We have developed the Simple Standard for Sharing Ontological Mappings (SSSOM) which addresses these problems by: (i) Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit. (ii) Defining an easy-to-use simple table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data principles. (iii) Implementing open and community-driven collaborative workflows that are designed to evolve the standard continuously to address changing requirements and mapping practices. (iv) Providing reference tools and software libraries for working with the standard. In this paper, we present the SSSOM standard, describe several use cases in detail and survey some of the existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable and Reusable (FAIR). The SSSOM specification can be found at http://w3id.org/sssom/spec. Database URL: http://w3id.org/sssom/spec.
View details for DOI 10.1093/database/baac035
View details for PubMedID 35616100
-
Design of a FAIR digital data health infrastructure in Africa for COVID-19 reporting and research.
Advanced genetics (Hoboken, N.J.)
2021; 2 (2): e10050
Abstract
The limited volume of COVID-19 data from Africa raises concerns for global genome research, which requires a diversity of genotypes for accurate disease prediction, including on the provenance of the new SARS-CoV-2 mutations. The Virus Outbreak Data Network (VODAN)-Africa studied the possibility of increasing the production of clinical data, finding concerns about data ownership, and the limited use of health data for quality treatment at point of care. To address this, VODAN Africa developed an architecture to record clinical health data and research data collected on the incidence of COVID-19, producing these as human- and machine-readable data objects in a distributed architecture of locally governed, linked, human- and machine-readable data. This architecture supports analytics at the point of care and-through data visiting, across facilities-for generic analytics. An algorithm was run across FAIR Data Points to visit the distributed data and produce aggregate findings. The FAIR data architecture is deployed in Uganda, Ethiopia, Liberia, Nigeria, Kenya, Somalia, Tanzania, Zimbabwe, and Tunisia.
View details for DOI 10.1002/ggn2.10050
View details for PubMedID 34514430
-
Using association rule mining and ontologies to generate metadata recommendations from multiple biomedical databases.
Database : the journal of biological databases and curation
2019; 2019
Abstract
Metadata-the machine-readable descriptions of the data-are increasingly seen as crucial for describing the vast array of biomedical datasets that are currently being deposited in public repositories. While most public repositories have firm requirements that metadata must accompany submitted datasets, the quality of those metadata is generally very poor. A key problem is that the typical metadata acquisition process is onerous and time consuming, with little interactive guidance or assistance provided to users. Secondary problems include the lack of validation and sparse use of standardized terms or ontologies when authoring metadata. There is a pressing need for improvements to the metadata acquisition process that will help users to enter metadata quickly and accurately. In this paper, we outline a recommendation system for metadata that aims to address this challenge. Our approach uses association rule mining to uncover hidden associations among metadata values and to represent them in the form of association rules. These rules are then used to present users with real-time recommendations when authoring metadata. The novelties of our method are that it is able to combine analyses of metadata from multiple repositories when generating recommendations and can enhance those recommendations by aligning them with ontology terms. We implemented our approach as a service integrated into the CEDAR Workbench metadata authoring platform, and evaluated it using metadata from two public biomedical repositories: US-based National Center for Biotechnology Information BioSample and European Bioinformatics Institute BioSamples. The results show that our approach is able to use analyses of previously entered metadata coupled with ontology-based mappings to present users with accurate recommendations when authoring metadata.
View details for DOI 10.1093/database/baz059
View details for PubMedID 31210270
-
Unleashing the value of Common Data Elements through the CEDAR Workbench.
AMIA ... Annual Symposium proceedings. AMIA Symposium
2019; 2019: 681–90
Abstract
Developing promising treatments in biomedicine often requires aggregation and analysis of data from disparate sources across the healthcare and research spectrum. To facilitate these approaches, there is a growing focus on supporting interoperation of datasets by standardizing data-capture and reporting requirements. Common Data Elements (CDEs)-precise specifications of questions and the set of allowable answers to each question-are increasingly being adopted to help meet these standardization goals. While CDEs can provide a strong conceptual foundation for interoperation, there are no widely recognized serialization or interchange formats to describe and exchange their definitions. As a result, CDEs defined in one system cannot be easily be reused by other systems. An additional problem is that current CDE-based systems tend to be rather heavyweight and cannot be easily adopted and used by third-parties. To address these problems, we developed extensions to a metadata management system called the CEDAR Workbench to provide a platform to simplify the creation, exchange, and use of CDEs. We show how the resulting system allows users to quickly define and share CDEs and to immediately use these CDEs to build and deploy Web-based forms to acquire conforming metadata. We also show how we incorporated a large CDE library from the National Cancer Institute's caDSR system and made these CDEs publicly available for general use.
View details for PubMedID 32308863
-
The CAIRR Pipeline for Submitting Standards-Compliant B and T Cell Receptor Repertoire Sequencing Studies to the National Center for Biotechnology Information Repositories
FRONTIERS IN IMMUNOLOGY
2018; 9
View details for DOI 10.3389/fimmu.2018.01877
View details for Web of Science ID 000441756000001
-
The CAIRR Pipeline for Submitting Standards-Compliant B and T Cell Receptor Repertoire Sequencing Studies to the National Center for Biotechnology Information Repositories.
Frontiers in immunology
2018; 9: 1877
Abstract
The adaptation of high-throughput sequencing to the B cell receptor and T cell receptor has made it possible to characterize the adaptive immune receptor repertoire (AIRR) at unprecedented depth. These AIRR sequencing (AIRR-seq) studies offer tremendous potential to increase the understanding of adaptive immune responses in vaccinology, infectious disease, autoimmunity, and cancer. The increasingly wide application of AIRR-seq is leading to a critical mass of studies being deposited in the public domain, offering the possibility of novel scientific insights through secondary analyses and meta-analyses. However, effective sharing of these large-scale data remains a challenge. The AIRR community has proposed minimal information about adaptive immune receptor repertoire (MiAIRR), a standard for reporting AIRR-seq studies. The MiAIRR standard has been operationalized using the National Center for Biotechnology Information (NCBI) repositories. Submissions of AIRR-seq data to the NCBI repositories typically use a combination of web-based and flat-file templates and include only a minimal amount of terminology validation. As a result, AIRR-seq studies at the NCBI are often described using inconsistent terminologies, limiting scientists' ability to access, find, interoperate, and reuse the data sets. In order to improve metadata quality and ease submission of AIRR-seq studies to the NCBI, we have leveraged the software framework developed by the Center for Expanded Data Annotation and Retrieval (CEDAR), which develops technologies involving the use of data standards and ontologies to improve metadata quality. The resulting CEDAR-AIRR (CAIRR) pipeline enables data submitters to: (i) create web-based templates whose entries are controlled by ontology terms, (ii) generate and validate metadata, and (iii) submit the ontology-linked metadata and sequence files (FASTQ) to the NCBI BioProject, BioSample, and Sequence Read Archive databases. Overall, CAIRR provides a web-based metadata submission interface that supports compliance with the MiAIRR standard. This pipeline is available at http://cairr.miairr.org, and will facilitate the NCBI submission process and improve the metadata quality of AIRR-seq studies.
View details for DOI 10.3389/fimmu.2018.01877
View details for PubMedID 30166985
View details for PubMedCentralID PMC6105692
-
CEDAR OnDemand: a browser extension to generate ontology-based scientific metadata
BMC BIOINFORMATICS
2018; 19: 268
Abstract
Public biomedical data repositories often provide web-based interfaces to collect experimental metadata. However, these interfaces typically reflect the ad hoc metadata specification practices of the associated repositories, leading to a lack of standardization in the collected metadata. This lack of standardization limits the ability of the source datasets to be broadly discovered, reused, and integrated with other datasets. To increase reuse, discoverability, and reproducibility of the described experiments, datasets should be appropriately annotated by using agreed-upon terms, ideally from ontologies or other controlled term sources.This work presents "CEDAR OnDemand", a browser extension powered by the NCBO (National Center for Biomedical Ontology) BioPortal that enables users to seamlessly enter ontology-based metadata through existing web forms native to individual repositories. CEDAR OnDemand analyzes the web page contents to identify the text input fields and associate them with relevant ontologies which are recommended automatically based upon input fields' labels (using the NCBO ontology recommender) and a pre-defined list of ontologies. These field-specific ontologies are used for controlling metadata entry. CEDAR OnDemand works for any web form designed in the HTML format. We demonstrate how CEDAR OnDemand works through the NCBI (National Center for Biotechnology Information) BioSample web-based metadata entry.CEDAR OnDemand helps lower the barrier of incorporating ontologies into standardized metadata entry for public data repositories. CEDAR OnDemand is available freely on the Google Chrome store https://chrome.google.com/webstore/search/CEDAROnDemand.
View details for PubMedID 30012108
-
AgroPortal: A vocabulary and ontology repository for agronomy
COMPUTERS AND ELECTRONICS IN AGRICULTURE
2018; 144: 126–43
View details for DOI 10.1016/j.compag.2017.10.012
View details for Web of Science ID 000425072400014
-
Fast and Accurate Metadata Authoring Using Ontology-Based Recommendations.
AMIA ... Annual Symposium proceedings. AMIA Symposium
2017; 2017: 1272–81
Abstract
In biomedicine, high-quality metadata are crucial for finding experimental datasets, for understanding how experiments were performed, and for reproducing those experiments. Despite the recent focus on metadata, the quality of metadata available in public repositories continues to be extremely poor. A key difficulty is that the typical metadata acquisition process is time-consuming and error prone, with weak or nonexistent support for linking metadata to ontologies. There is a pressing need for methods and tools to speed up the metadata acquisition process and to increase the quality of metadata that are entered. In this paper, we describe a methodology and set of associated tools that we developed to address this challenge. A core component of this approach is a value recommendation framework that uses analysis of previously entered metadata and ontology-based metadata specifications to help users rapidly and accurately enter their metadata. We performed an initial evaluation of this approach using metadata from a public metadata repository.
View details for PubMedID 29854196
-
Issues in data management in observing systems and lessons learned
OCEANS 2006, VOLS 1-4
2006: 1187-1192
View details for Web of Science ID 000246002100217
-
Toward an ocean observing system of systems
OCEANS 2006, VOLS 1-4
2006: 904-?
View details for Web of Science ID 000246002100163
-
MBARI's SSDS: Operational, extensible data management for ocean observatories
3rd International Workshop on Scientific Use Submarine Cables and Related Technologies
IEEE. 2003: 288–292
View details for Web of Science ID 000186634900063