Bio
I am a senior software engineer specializing in semantic technologies for biomedical data. With over three decades of experience in software engineering—including more than 20 years at Stanford—my work focuses on developing scalable, standards-based infrastructure for scientific metadata, ontologies, and data integration. I currently lead the engineering of the CEDAR Workbench, a system widely adopted across NIH initiatives for generating FAIR-compliant metadata.
My research portfolio spans clinical decision support, biosurveillance, and semantic data management. I developed the Chronus II and SQWRL query languages, implemented SWRL functionality in Protégé, and led the architecture of the BioSTORM outbreak detection platform. My work on metadata management has been incorporated into several NIH-funded platforms, including HuBMAP and RADx.
My recent work focuses on creating infrastructure for lifecycle management of value sets—curated collections of standardized terms critical for metadata consistency and interoperability. This effort addresses a major shortcoming in biomedical data ecosystems by enabling collaborative authoring, version control, and reuse of semantic vocabularies across research domains.
Current Role at Stanford
Research software developer at Stanford Center for Biomedical Informatics Research (BMIR)
Education & Certifications
-
M.Sc., University of Dublin, Trinity College, Computer Science (1993)
-
B.A., University of Dublin, Trinity College, Computer Science (1989)
All Publications
-
Ensuring Adherence to Standards in Experiment-Related Metadata Entered Via Spreadsheets.
Scientific data
2025; 12 (1): 265
Abstract
Scientists increasingly recognize the importance of providing rich, standards-adherent metadata to describe their experimental results. Despite the availability of sophisticated tools to assist in the process of data annotation, investigators generally seem to prefer to use spreadsheets when supplying metadata, despite the limitations of spreadsheets in ensuring metadata consistency and compliance with formal specifications. In this paper, we describe an end-to-end approach that supports spreadsheet-based entry of metadata, while ensuring rigorous adherence to community-based metadata standards and providing quality control. Our methods employ several key components, including customizable templates that represent metadata standards and that can inform the spreadsheets that investigators use to author metadata, controlled terminologies and ontologies for defining metadata values that can be accessed directly from a spreadsheet, and an interactive Web-based tool that allows users to rapidly identify and fix errors in their spreadsheet-based metadata. We demonstrate how this approach is being deployed in a biomedical consortium known as HuBMAP to define and collect metadata about a wide range of biological assays.
View details for DOI 10.1038/s41597-025-04589-6
View details for PubMedID 39952970
-
Advances and prospects for the Human BioMolecular Atlas Program (HuBMAP).
Nature cell biology
2023
Abstract
The Human BioMolecular Atlas Program (HuBMAP) aims to create a multi-scale spatial atlas of the healthy human body at single-cell resolution by applying advanced technologies and disseminating resources to the community. As the HuBMAP moves past its first phase, creating ontologies, protocols and pipelines, this Perspective introduces the production phase: the generation of reference spatial maps of functional tissue units across many organs from diverse populations and the creation of mapping tools and infrastructure to advance biomedical research.
View details for DOI 10.1038/s41556-023-01194-w
View details for PubMedID 37468756
View details for PubMedCentralID 8238499
-
Modeling community standards for metadata as templates makes data FAIR.
Scientific data
2022; 9 (1): 696
Abstract
It is challenging to determine whether datasets are findable, accessible, interoperable, and reusable (FAIR) because the FAIR Guiding Principles refer to highly idiosyncratic criteria regarding the metadata used to annotate datasets. Specifically, the FAIR principles require metadata to be "rich" and to adhere to "domain-relevant" community standards. Scientific communities should be able to define their own machine-actionable templates for metadata that encode these "rich," discipline-specific elements. We have explored this template-based approach in the context of two software systems. One system is the CEDAR Workbench, which investigators use to author new metadata. The other is the FAIRware Workbench, which evaluates the metadata of archived datasets for their adherence to community standards. Benefits accrue when templates for metadata become central elements in an ecosystem of tools to manage online datasets-both because the templates serve as a community reference for what constitutes FAIR data, and because they embody that perspective in a form that can be distributed among a variety of software applications to assist with data stewardship and data sharing.
View details for DOI 10.1038/s41597-022-01815-3
View details for PubMedID 36371407
-
Using association rule mining and ontologies to generate metadata recommendations from multiple biomedical databases.
Database : the journal of biological databases and curation
2019; 2019
Abstract
Metadata-the machine-readable descriptions of the data-are increasingly seen as crucial for describing the vast array of biomedical datasets that are currently being deposited in public repositories. While most public repositories have firm requirements that metadata must accompany submitted datasets, the quality of those metadata is generally very poor. A key problem is that the typical metadata acquisition process is onerous and time consuming, with little interactive guidance or assistance provided to users. Secondary problems include the lack of validation and sparse use of standardized terms or ontologies when authoring metadata. There is a pressing need for improvements to the metadata acquisition process that will help users to enter metadata quickly and accurately. In this paper, we outline a recommendation system for metadata that aims to address this challenge. Our approach uses association rule mining to uncover hidden associations among metadata values and to represent them in the form of association rules. These rules are then used to present users with real-time recommendations when authoring metadata. The novelties of our method are that it is able to combine analyses of metadata from multiple repositories when generating recommendations and can enhance those recommendations by aligning them with ontology terms. We implemented our approach as a service integrated into the CEDAR Workbench metadata authoring platform, and evaluated it using metadata from two public biomedical repositories: US-based National Center for Biotechnology Information BioSample and European Bioinformatics Institute BioSamples. The results show that our approach is able to use analyses of previously entered metadata coupled with ontology-based mappings to present users with accurate recommendations when authoring metadata.
View details for DOI 10.1093/database/baz059
View details for PubMedID 31210270
-
Unleashing the value of Common Data Elements through the CEDAR Workbench.
AMIA ... Annual Symposium proceedings. AMIA Symposium
2019; 2019: 681–90
Abstract
Developing promising treatments in biomedicine often requires aggregation and analysis of data from disparate sources across the healthcare and research spectrum. To facilitate these approaches, there is a growing focus on supporting interoperation of datasets by standardizing data-capture and reporting requirements. Common Data Elements (CDEs)-precise specifications of questions and the set of allowable answers to each question-are increasingly being adopted to help meet these standardization goals. While CDEs can provide a strong conceptual foundation for interoperation, there are no widely recognized serialization or interchange formats to describe and exchange their definitions. As a result, CDEs defined in one system cannot be easily be reused by other systems. An additional problem is that current CDE-based systems tend to be rather heavyweight and cannot be easily adopted and used by third-parties. To address these problems, we developed extensions to a metadata management system called the CEDAR Workbench to provide a platform to simplify the creation, exchange, and use of CDEs. We show how the resulting system allows users to quickly define and share CDEs and to immediately use these CDEs to build and deploy Web-based forms to acquire conforming metadata. We also show how we incorporated a large CDE library from the National Cancer Institute's caDSR system and made these CDEs publicly available for general use.
View details for PubMedID 32308863
-
The CAIRR Pipeline for Submitting Standards-Compliant B and T Cell Receptor Repertoire Sequencing Studies to the National Center for Biotechnology Information Repositories.
Frontiers in immunology
2018; 9: 1877
Abstract
The adaptation of high-throughput sequencing to the B cell receptor and T cell receptor has made it possible to characterize the adaptive immune receptor repertoire (AIRR) at unprecedented depth. These AIRR sequencing (AIRR-seq) studies offer tremendous potential to increase the understanding of adaptive immune responses in vaccinology, infectious disease, autoimmunity, and cancer. The increasingly wide application of AIRR-seq is leading to a critical mass of studies being deposited in the public domain, offering the possibility of novel scientific insights through secondary analyses and meta-analyses. However, effective sharing of these large-scale data remains a challenge. The AIRR community has proposed minimal information about adaptive immune receptor repertoire (MiAIRR), a standard for reporting AIRR-seq studies. The MiAIRR standard has been operationalized using the National Center for Biotechnology Information (NCBI) repositories. Submissions of AIRR-seq data to the NCBI repositories typically use a combination of web-based and flat-file templates and include only a minimal amount of terminology validation. As a result, AIRR-seq studies at the NCBI are often described using inconsistent terminologies, limiting scientists' ability to access, find, interoperate, and reuse the data sets. In order to improve metadata quality and ease submission of AIRR-seq studies to the NCBI, we have leveraged the software framework developed by the Center for Expanded Data Annotation and Retrieval (CEDAR), which develops technologies involving the use of data standards and ontologies to improve metadata quality. The resulting CEDAR-AIRR (CAIRR) pipeline enables data submitters to: (i) create web-based templates whose entries are controlled by ontology terms, (ii) generate and validate metadata, and (iii) submit the ontology-linked metadata and sequence files (FASTQ) to the NCBI BioProject, BioSample, and Sequence Read Archive databases. Overall, CAIRR provides a web-based metadata submission interface that supports compliance with the MiAIRR standard. This pipeline is available at http://cairr.miairr.org, and will facilitate the NCBI submission process and improve the metadata quality of AIRR-seq studies.
View details for DOI 10.3389/fimmu.2018.01877
View details for PubMedID 30166985
View details for PubMedCentralID PMC6105692
-
The CAIRR Pipeline for Submitting Standards-Compliant B and T Cell Receptor Repertoire Sequencing Studies to the National Center for Biotechnology Information Repositories
FRONTIERS IN IMMUNOLOGY
2018; 9
View details for DOI 10.3389/fimmu.2018.01877
View details for Web of Science ID 000441756000001
-
CEDAR OnDemand: a browser extension to generate ontology-based scientific metadata
BMC BIOINFORMATICS
2018; 19: 268
Abstract
Public biomedical data repositories often provide web-based interfaces to collect experimental metadata. However, these interfaces typically reflect the ad hoc metadata specification practices of the associated repositories, leading to a lack of standardization in the collected metadata. This lack of standardization limits the ability of the source datasets to be broadly discovered, reused, and integrated with other datasets. To increase reuse, discoverability, and reproducibility of the described experiments, datasets should be appropriately annotated by using agreed-upon terms, ideally from ontologies or other controlled term sources.This work presents "CEDAR OnDemand", a browser extension powered by the NCBO (National Center for Biomedical Ontology) BioPortal that enables users to seamlessly enter ontology-based metadata through existing web forms native to individual repositories. CEDAR OnDemand analyzes the web page contents to identify the text input fields and associate them with relevant ontologies which are recommended automatically based upon input fields' labels (using the NCBO ontology recommender) and a pre-defined list of ontologies. These field-specific ontologies are used for controlling metadata entry. CEDAR OnDemand works for any web form designed in the HTML format. We demonstrate how CEDAR OnDemand works through the NCBI (National Center for Biotechnology Information) BioSample web-based metadata entry.CEDAR OnDemand helps lower the barrier of incorporating ontologies into standardized metadata entry for public data repositories. CEDAR OnDemand is available freely on the Google Chrome store https://chrome.google.com/webstore/search/CEDAROnDemand.
View details for PubMedID 30012108
-
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata that Describe Scientific Experiments.
The semantic Web--ISWC ... : ... International Semantic Web Conference ... proceedings. International Semantic Web Conference
2017; 10588: 103-110
Abstract
The Center for Expanded Data Annotation and Retrieval (CEDAR) aims to revolutionize the way that metadata describing scientific experiments are authored. The software we have developed-the CEDAR Workbench-is a suite of Web-based tools and REST APIs that allows users to construct metadata templates, to fill in templates to generate high-quality metadata, and to share and manage these resources. The CEDAR Workbench provides a versatile, REST-based environment for authoring metadata that are enriched with terms from ontologies. The metadata are available as JSON, JSON-LD, or RDF for easy integration in scientific applications and reusability on the Web. Users can leverage our APIs for validating and submitting metadata to external repositories. The CEDAR Workbench is freely available and open-source.
View details for DOI 10.1007/978-3-319-68204-4_10
View details for PubMedID 32219223
View details for PubMedCentralID PMC7098808
-
NCBO Ontology Recommender 2.0: an enhanced approach for biomedical ontology recommendation.
Journal of biomedical semantics
2017; 8 (1): 21-?
Abstract
Ontologies and controlled terminologies have become increasingly important in biomedical research. Researchers use ontologies to annotate their data with ontology terms, enabling better data integration and interoperability across disparate datasets. However, the number, variety and complexity of current biomedical ontologies make it cumbersome for researchers to determine which ones to reuse for their specific needs. To overcome this problem, in 2010 the National Center for Biomedical Ontology (NCBO) released the Ontology Recommender, which is a service that receives a biomedical text corpus or a list of keywords and suggests ontologies appropriate for referencing the indicated terms.We developed a new version of the NCBO Ontology Recommender. Called Ontology Recommender 2.0, it uses a novel recommendation approach that evaluates the relevance of an ontology to biomedical text data according to four different criteria: (1) the extent to which the ontology covers the input data; (2) the acceptance of the ontology in the biomedical community; (3) the level of detail of the ontology classes that cover the input data; and (4) the specialization of the ontology to the domain of the input data.Our evaluation shows that the enhanced recommender provides higher quality suggestions than the original approach, providing better coverage of the input data, more detailed information about their concepts, increased specialization for the domain of the input data, and greater acceptance and use in the community. In addition, it provides users with more explanatory information, along with suggestions of not only individual ontologies but also groups of ontologies to use together. It also can be customized to fit the needs of different ontology recommendation scenarios.Ontology Recommender 2.0 suggests relevant ontologies for annotating biomedical text data. It combines the strengths of its predecessor with a range of adjustments and new features that improve its reliability and usefulness. Ontology Recommender 2.0 recommends over 500 biomedical ontologies from the NCBO BioPortal platform, where it is openly available (both via the user interface at http://bioportal.bioontology.org/recommender , and via a Web service API).
View details for DOI 10.1186/s13326-017-0128-y
View details for PubMedID 28592275
-
Fast and Accurate Metadata Authoring Using Ontology-Based Recommendations.
AMIA ... Annual Symposium proceedings. AMIA Symposium
2017; 2017: 1272–81
Abstract
In biomedicine, high-quality metadata are crucial for finding experimental datasets, for understanding how experiments were performed, and for reproducing those experiments. Despite the recent focus on metadata, the quality of metadata available in public repositories continues to be extremely poor. A key difficulty is that the typical metadata acquisition process is time-consuming and error prone, with weak or nonexistent support for linking metadata to ontologies. There is a pressing need for methods and tools to speed up the metadata acquisition process and to increase the quality of metadata that are entered. In this paper, we describe a methodology and set of associated tools that we developed to address this challenge. A core component of this approach is a value recommendation framework that uses analysis of previously entered metadata and ontology-based metadata specifications to help users rapidly and accurately enter their metadata. We performed an initial evaluation of this approach using metadata from a public metadata repository.
View details for PubMedID 29854196
-
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
SPRINGER INT PUBLISHING AG. 2016: 762–77
View details for DOI 10.1007/978-3-319-49004-5_49
View details for Web of Science ID 000389315800049
-
The center for expanded data annotation and retrieval.
Journal of the American Medical Informatics Association
2015; 22 (6): 1148-1152
Abstract
The Center for Expanded Data Annotation and Retrieval is studying the creation of comprehensive and expressive metadata for biomedical datasets to facilitate data discovery, data interpretation, and data reuse. We take advantage of emerging community-based standard templates for describing different kinds of biomedical datasets, and we investigate the use of computational techniques to help investigators to assemble templates and to fill in their values. We are creating a repository of metadata from which we plan to identify metadata patterns that will drive predictive data entry when filling in metadata templates. The metadata repository not only will capture annotations specified when experimental datasets are initially created, but also will incorporate links to the published literature, including secondary analyses and possible refinements or retractions of experimental interpretations. By working initially with the Human Immunology Project Consortium and the developers of the ImmPort data repository, we are developing and evaluating an end-to-end solution to the problems of metadata authoring and management that will generalize to other data-management environments.
View details for DOI 10.1093/jamia/ocv048
View details for PubMedID 26112029
-
Clustering rule bases using ontology-based similarity measures
JOURNAL OF WEB SEMANTICS
2014; 25: 1-8
View details for DOI 10.1016/j.websem.2014.03.001
View details for Web of Science ID 000336629000001
-
Automated Tracking of Quantitative Assessments of Tumor Burden in Clinical Trials
TRANSLATIONAL ONCOLOGY
2014; 7 (1): 23-35
Abstract
THERE ARE TWO KEY CHALLENGES HINDERING EFFECTIVE USE OF QUANTITATIVE ASSESSMENT OF IMAGING IN CANCER RESPONSE ASSESSMENT: 1) Radiologists usually describe the cancer lesions in imaging studies subjectively and sometimes ambiguously, and 2) it is difficult to repurpose imaging data, because lesion measurements are not recorded in a format that permits machine interpretation and interoperability. We have developed a freely available software platform on the basis of open standards, the electronic Physician Annotation Device (ePAD), to tackle these challenges in two ways. First, ePAD facilitates the radiologist in carrying out cancer lesion measurements as part of routine clinical trial image interpretation workflow. Second, ePAD records all image measurements and annotations in a data format that permits repurposing image data for analyses of alternative imaging biomarkers of treatment response. To determine the impact of ePAD on radiologist efficiency in quantitative assessment of imaging studies, a radiologist evaluated computed tomography (CT) imaging studies from 20 subjects having one baseline and three consecutive follow-up imaging studies with and without ePAD. The radiologist made measurements of target lesions in each imaging study using Response Evaluation Criteria in Solid Tumors 1.1 criteria, initially with the aid of ePAD, and then after a 30-day washout period, the exams were reread without ePAD. The mean total time required to review the images and summarize measurements of target lesions was 15% (P < .039) shorter using ePAD than without using this tool. In addition, it was possible to rapidly reanalyze the images to explore lesion cross-sectional area as an alternative imaging biomarker to linear measure. We conclude that ePAD appears promising to potentially improve reader efficiency for quantitative assessment of CT examinations, and it may enable discovery of future novel image-based biomarkers of cancer treatment response.
View details for DOI 10.1593/tlo.13796
View details for Web of Science ID 000342684300004
View details for PubMedID 24772204
View details for PubMedCentralID PMC3998692
-
A semantic-based method for extracting concept definitions from scientific publications: evaluation in the autism phenotype domain
JOURNAL OF BIOMEDICAL SEMANTICS
2013; 4
Abstract
A variety of informatics approaches have been developed that use information retrieval, NLP and text-mining techniques to identify biomedical concepts and relations within scientific publications or their sentences. These approaches have not typically addressed the challenge of extracting more complex knowledge such as biomedical definitions. In our efforts to facilitate knowledge acquisition of rule-based definitions of autism phenotypes, we have developed a novel semantic-based text-mining approach that can automatically identify such definitions within text.Using an existing knowledge base of 156 autism phenotype definitions and an annotated corpus of 26 source articles containing such definitions, we evaluated and compared the average rank of correctly identified rule definition or corresponding rule template using both our semantic-based approach and a standard term-based approach. We examined three separate scenarios: (1) the snippet of text contained a definition already in the knowledge base; (2) the snippet contained an alternative definition for a concept in the knowledge base; and (3) the snippet contained a definition not in the knowledge base. Our semantic-based approach had a higher average rank than the term-based approach for each of the three scenarios (scenario 1: 3.8 vs. 5.0; scenario 2: 2.8 vs. 4.9; and scenario 3: 4.5 vs. 6.2), with each comparison significant at the p-value of 0.05 using the Wilcoxon signed-rank test.Our work shows that leveraging existing domain knowledge in the information extraction of biomedical definitions significantly improves the correct identification of such knowledge within sentences. Our method can thus help researchers rapidly acquire knowledge about biomedical definitions that are specified and evolving within an ever-growing corpus of scientific publications.
View details for DOI 10.1186/2041-1480-4-14
View details for Web of Science ID 000343705400002
View details for PubMedID 23937724
View details for PubMedCentralID PMC3765483
-
Adaptive System for Collaborative Online Laboratories
IEEE INTELLIGENT SYSTEMS
2012; 27 (4): 11-17
View details for DOI 10.1109/MIS.2011.1
View details for Web of Science ID 000307483700005
-
Overcoming the ontology enrichment bottleneck with Quick Term Templates
APPLIED ONTOLOGY
2011; 6 (1): 13-22
View details for DOI 10.3233/AO-2011-0086
View details for Web of Science ID 000290996800002
-
Evaluation of semantic-based information retrieval methods in the autism phenotype domain.
AMIA ... Annual Symposium proceedings. AMIA Symposium
2011; 2011: 569-77
Abstract
Biomedical ontologies are increasingly being used to improve information retrieval methods. In this paper, we present a novel information retrieval approach that exploits knowledge specified by the Semantic Web ontology and rule languages OWL and SWRL. We evaluate our approach using an autism ontology that has 156 SWRL rules defining 145 autism phenotypes. Our approach uses a vector space model to correlate how well these phenotypes relate to the publications used to define them. We compare a vector space phenotype representation using class hierarchies with one that extends this method to incorporate additional semantics encoded in SWRL rules. From a PubMed-extracted corpus of 75 articles, we show that average rank of a related paper using the class hierarchy method is 4.6 whereas the average rank using the extended rule-based method is 3.3. Our results indicate that incorporating rule-based definitions in information retrieval methods can improve search for relevant publications.
View details for PubMedID 22195112
-
A Framework for the Automatic Extraction of Rules from Online Text
SPRINGER-VERLAG BERLIN. 2011: 266-280
View details for Web of Science ID 000306715000021
-
A Method for Representing and Querying Temporal Information in OWL
SPRINGER-VERLAG BERLIN. 2011: 97-110
View details for Web of Science ID 000289177200008
-
A LIGHTWEIGHT MODEL FOR REPRESENTING AND REASONING WITH TEMPORAL INFORMATION IN BIOMEDICAL ONTOLOGIES
INSTICC-INST SYST TECHNOLOGIES INFORMATION CONTROL & COMMUNICATION. 2010: 90-97
View details for Web of Science ID 000299185100016
-
Visualizing Logical Dependencies in SWRL Rule Bases
SPRINGER-VERLAG BERLIN. 2010: 259-272
View details for Web of Science ID 000287827100022
-
A Software Tool for Visualizing, Managing and Eliciting SWRL Rules
SPRINGER-VERLAG BERLIN. 2010: 381-385
View details for Web of Science ID 000279595500028
-
Semantic Reasoning with XML-based Biomedical Information Models
IOS PRESS. 2010: 986-990
Abstract
The Extensible Markup Language (XML) is increasingly being used for biomedical data exchange. The parallel growth in the use of ontologies in biomedicine presents opportunities for combining the two technologies to leverage the semantic reasoning services provided by ontology-based tools. There are currently no standardized approaches for taking XML-encoded biomedical information models and representing and reasoning with them using ontologies. To address this shortcoming, we have developed a workflow and a suite of tools for transforming XML-based information models into domain ontologies encoded using OWL. In this study, we applied semantics reasoning methods to these ontologies to automatically generate domain-level inferences. We successfully used these methods to develop semantic reasoning methods for information models in the HIV and radiological image domains.
View details for DOI 10.3233/978-1-60750-588-4-986
View details for Web of Science ID 000392215900193
View details for PubMedID 20841831
-
Mapping Master: A Flexible Approach for Mapping Spreadsheets to OWL
9th International Semantic Web Conference
SPRINGER-VERLAG BERLIN. 2010: 194–208
View details for Web of Science ID 000297605500013
-
Semantic reasoning with XML-based biomedical information models.
Studies in health technology and informatics
2010; 160: 986-990
Abstract
The Extensible Markup Language (XML) is increasingly being used for biomedical data exchange. The parallel growth in the use of ontologies in biomedicine presents opportunities for combining the two technologies to leverage the semantic reasoning services provided by ontology-based tools. There are currently no standardized approaches for taking XML-encoded biomedical information models and representing and reasoning with them using ontologies. To address this shortcoming, we have developed a workflow and a suite of tools for transforming XML-based information models into domain ontologies encoded using OWL. In this study, we applied semantics reasoning methods to these ontologies to automatically generate domain-level inferences. We successfully used these methods to develop semantic reasoning methods for information models in the HIV and radiological image domains.
View details for PubMedID 20841831
-
Software-engineering challenges of building and deploying reusable problem solverse
AI EDAM-ARTIFICIAL INTELLIGENCE FOR ENGINEERING DESIGN ANALYSIS AND MANUFACTURING
2009; 23 (4): 339-356
Abstract
Problem solving methods (PSMs) are software components that represent and encode reusable algorithms. They can be combined with representations of domain knowledge to produce intelligent application systems. A goal of research on PSMs is to provide principled methods and tools for composing and reusing algorithms in knowledge-based systems. The ultimate objective is to produce libraries of methods that can be easily adapted for use in these systems. Despite the intuitive appeal of PSMs as conceptual building blocks, in practice, these goals are largely unmet. There are no widely available tools for building applications using PSMs and no public libraries of PSMs available for reuse. This paper analyzes some of the reasons for the lack of widespread adoptions of PSM techniques and illustrate our analysis by describing our experiences developing a complex, high-throughput software system based on PSM principles. We conclude that many fundamental principles in PSM research are useful for building knowledge-based systems. In particular, the task-method decomposition process, which provides a means for structuring knowledge-based tasks, is a powerful abstraction for building systems of analytic methods. However, despite the power of PSMs in the conceptual modeling of knowledge-based systems, software engineering challenges have been seriously underestimated. The complexity of integrating control knowledge modeled by developers using PSMs with the domain knowledge that they model using ontologies creates a barrier to widespread use of PSM-based systems. Nevertheless, the surge of recent interest in ontologies has led to the production of comprehensive domain ontologies and of robust ontology-authoring tools. These developments present new opportunities to leverage the PSM approach.
View details for DOI 10.1017/S0890060409990047
View details for Web of Science ID 000271131600003
View details for PubMedCentralID PMC3615443
-
Software-engineering challenges of building and deploying reusable problem solvers.
Artificial intelligence for engineering design, analysis and manufacturing : AI EDAM
2009; 23 (Spec Iss 4): 339-356
Abstract
Problem solving methods (PSMs) are software components that represent and encode reusable algorithms. They can be combined with representations of domain knowledge to produce intelligent application systems. A goal of research on PSMs is to provide principled methods and tools for composing and reusing algorithms in knowledge-based systems. The ultimate objective is to produce libraries of methods that can be easily adapted for use in these systems. Despite the intuitive appeal of PSMs as conceptual building blocks, in practice, these goals are largely unmet. There are no widely available tools for building applications using PSMs and no public libraries of PSMs available for reuse. This paper analyzes some of the reasons for the lack of widespread adoptions of PSM techniques and illustrate our analysis by describing our experiences developing a complex, high-throughput software system based on PSM principles. We conclude that many fundamental principles in PSM research are useful for building knowledge-based systems. In particular, the task-method decomposition process, which provides a means for structuring knowledge-based tasks, is a powerful abstraction for building systems of analytic methods. However, despite the power of PSMs in the conceptual modeling of knowledge-based systems, software engineering challenges have been seriously underestimated. The complexity of integrating control knowledge modeled by developers using PSMs with the domain knowledge that they model using ontologies creates a barrier to widespread use of PSM-based systems. Nevertheless, the surge of recent interest in ontologies has led to the production of comprehensive domain ontologies and of robust ontology-authoring tools. These developments present new opportunities to leverage the PSM approach.
View details for DOI 10.1017/S0890060409990047
View details for PubMedID 23565031
View details for PubMedCentralID PMC3615443
-
Knowledge-data integration for temporal reasoning in a clinical trial system.
International journal of medical informatics
2009; 78: S77-85
Abstract
Managing time-stamped data is essential to clinical research activities and often requires the use of considerable domain knowledge. Adequately representing and integrating temporal data and domain knowledge is difficult with the database technologies used in most clinical research systems. There is often a disconnect between the database representation of research data and corresponding domain knowledge of clinical research concepts. In this paper, we present a set of methodologies for undertaking ontology-based specification of temporal information, and discuss their application to the verification of protocol-specific temporal constraints among clinical trial activities. Our approach allows knowledge-level temporal constraints to be evaluated against operational trial data stored in relational databases. We show how the Semantic Web ontology and rule languages OWL and SWRL, respectively, can support tools for research data management that automatically integrate low-level representations of relational data with high-level domain concepts used in study design.
View details for DOI 10.1016/j.ijmedinf.2008.07.013
View details for PubMedID 18789876
-
Exploration of SWRL Rule Bases through Visualization, Paraphrasing, and Categorization of Rules
SPRINGER-VERLAG BERLIN. 2009: 246-261
View details for Web of Science ID 000279121800020
-
Semantic reasoning with image annotations for tumor assessment.
AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
2009; 2009: 359-363
Abstract
Identifying, tracking and reasoning about tumor lesions is a central task in cancer research and clinical practice that could potentially be automated. However, information about tumor lesions in imaging studies is not easily accessed by machines for automated reasoning. The Annotation and Image Markup (AIM) information model recently developed for the cancer Biomedical Informatics Grid provides a method for encoding the semantic information related to imaging findings, enabling their storage and transfer. However, it is currently not possible to apply automated reasoning methods to image information encoded in AIM. We have developed a methodology and a suite of tools for transforming AIM image annotations into OWL, and an ontology for reasoning with the resulting image annotations for tumor lesion assessment. Our methods enable automated inference of semantic information about cancer lesions in images.
View details for PubMedID 20351880
-
TrialWiz: an ontology-driven tool for authoring clinical trial protocols.
AMIA ... Annual Symposium proceedings. AMIA Symposium
2008: 1226
Abstract
There has long been great interest in the clinical research community for automated support of clinical trials management. At the core of such efforts is formal specification of protocol knowledge. Building a clinical-trial knowledge base is a complex task involving software engineers and domain experts. As part of our Epoch ontological framework for clinical trials management, we have developed TrialWiz, an authoring tool for encoding a clinical-trial knowledge base. The main goals of TrialWiz are to manage the complexity of the protocol-encoding process and to improve efficiency in knowledge acquisition. TrialWiz provides intelligent guidance through the process of acquiring clinical-trial knowledge; graphical user interfaces intuitive to clinical trialists; a repository of reusable knowledge; and facilities to export the knowledge in different formats. We have successfully used TrialWiz to encode example clinical trials at the Immune Tolerance Network (ITN). In this presentation, we will demonstrate the intuitive authoring of clinical trial protocols using TrialWiz and how the protocol knowledge can be used by different clinical trial management applications at run time.
View details for PubMedID 18999161
-
Using an integrated ontology and information model for querying and reasoning about phenotypes: The case of autism.
AMIA ... Annual Symposium proceedings. AMIA Symposium
2008: 727-31
Abstract
The Open Biomedical Ontologies (OBO) Foundry is a coordinated community-wide effort to develop ontologies that support the annotation and integration of scientific data. In work supported by the National Database of Autism Research (NDAR), we are developing an ontology of autism that extends the ontologies available in the OBO Foundry. We undertook a systematic literature review to identify domain terms and relationships relevant to autism phenotypes. To enable user queries and inferences about such phenotypes using data in the NDAR repository, we augmented the domain ontology with an information model. In this paper, we show how our approach, using a combination of description logic and rule-based reasoning, enables high-level phenotypic abstractions to be inferred from subject-specific data. Our integrated domain ontologyinformation model approach allows scientific data repositories to be augmented with rule-based abstractions that facilitate the ability of researchers to undertake data analysis.
View details for PubMedID 18999231
-
Understanding Detection Performance in Public Health Surveillance: Modeling Aberrancy-detection Algorithms
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION
2008; 15 (6): 760-769
Abstract
Statistical aberrancy-detection algorithms play a central role in automated public health systems, analyzing large volumes of clinical and administrative data in real-time with the goal of detecting disease outbreaks rapidly and accurately. Not all algorithms perform equally well in terms of sensitivity, specificity, and timeliness in detecting disease outbreaks and the evidence describing the relative performance of different methods is fragmented and mainly qualitative.We developed and evaluated a unified model of aberrancy-detection algorithms and a software infrastructure that uses this model to conduct studies to evaluate detection performance. We used a task-analytic methodology to identify the common features and meaningful distinctions among different algorithms and to provide an extensible framework for gathering evidence about the relative performance of these algorithms using a number of evaluation metrics. We implemented our model as part of a modular software infrastructure (Biological Space-Time Outbreak Reasoning Module, or BioSTORM) that allows configuration, deployment, and evaluation of aberrancy-detection algorithms in a systematic manner.We assessed the ability of our model to encode the commonly used EARS algorithms and the ability of the BioSTORM software to reproduce an existing evaluation study of these algorithms.Using our unified model of aberrancy-detection algorithms, we successfully encoded the EARS algorithms, deployed these algorithms using BioSTORM, and were able to reproduce and extend previously published evaluation results.The validated model of aberrancy-detection algorithms and its software implementation will enable principled comparison of algorithms, synthesis of results from evaluation studies, and identification of surveillance algorithms for use in specific public health settings.
View details for DOI 10.1197/jamia.M2799
View details for Web of Science ID 000260905500008
View details for PubMedID 18755992
View details for PubMedCentralID PMC2585528
-
An ontological approach to representing and reasoning with temporal constraints in clinical trial protocols
INSTICC-INST SYST TECHNOLOGIES INFORMATION CONTROL & COMMUNICATION. 2008: 87-93
View details for Web of Science ID 000256697800018
-
TrialWiz: an ontology-driven tool for authoring clinical trial protocols.
AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
2008: 1226-?
Abstract
There has long been great interest in the clinical research community for automated support of clinical trials management. At the core of such efforts is formal specification of protocol knowledge. Building a clinical-trial knowledge base is a complex task involving software engineers and domain experts. As part of our Epoch ontological framework for clinical trials management, we have developed TrialWiz, an authoring tool for encoding a clinical-trial knowledge base. The main goals of TrialWiz are to manage the complexity of the protocol-encoding process and to improve efficiency in knowledge acquisition. TrialWiz provides intelligent guidance through the process of acquiring clinical-trial knowledge; graphical user interfaces intuitive to clinical trialists; a repository of reusable knowledge; and facilities to export the knowledge in different formats. We have successfully used TrialWiz to encode example clinical trials at the Immune Tolerance Network (ITN). In this presentation, we will demonstrate the intuitive authoring of clinical trial protocols using TrialWiz and how the protocol knowledge can be used by different clinical trial management applications at run time.
View details for PubMedID 18999161
-
Representing and Reasoning with Temporal Constraints in Clinical Trials Using Semantic Technologies
SPRINGER-VERLAG BERLIN. 2008: 520-+
View details for Web of Science ID 000262981100039
-
Using an integrated ontology and information model for querying and reasoning about phenotypes: The case of autism.
AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
2008: 727-731
Abstract
The Open Biomedical Ontologies (OBO) Foundry is a coordinated community-wide effort to develop ontologies that support the annotation and integration of scientific data. In work supported by the National Database of Autism Research (NDAR), we are developing an ontology of autism that extends the ontologies available in the OBO Foundry. We undertook a systematic literature review to identify domain terms and relationships relevant to autism phenotypes. To enable user queries and inferences about such phenotypes using data in the NDAR repository, we augmented the domain ontology with an information model. In this paper, we show how our approach, using a combination of description logic and rule-based reasoning, enables high-level phenotypic abstractions to be inferred from subject-specific data. Our integrated domain ontologyinformation model approach allows scientific data repositories to be augmented with rule-based abstractions that facilitate the ability of researchers to undertake data analysis.
View details for PubMedID 18999231
-
An ontology-driven method for hierarchical mining of temporal patterns: application to HIV drug resistance research.
AMIA ... Annual Symposium proceedings. AMIA Symposium
2007: 614-9
Abstract
Many biomedical research databases contain time-oriented data resulting from longitudinal, time-series and time-dependent study designs, knowledge of which is not handled explicitly by most data-analytic methods. To make use of such knowledge about research data, we have developed an ontology-driven temporal mining method, called ChronoMiner. Most mining algorithms require data be inputted in a single table. ChronoMiner, in contrast, can search for interesting temporal patterns among multiple input tables and at different levels of hierarchical representation. In this paper, we present the application of our method to the discovery of temporal associations between newly arising mutations in the HIV genome and past drug regimens. We discuss the various components of ChronoMiner, including its user interface, and provide results of a study indicating the efficiency and potential value of ChronoMiner on an existing HIV drug resistance data repository.
View details for PubMedID 18693909
-
An ontology-based architecture for integration of clinical trials management applications.
AMIA ... Annual Symposium proceedings. AMIA Symposium
2007: 661-5
Abstract
Management of complex clinical trials involves coordinated-use of a myriad of software applications by trial personnel. The applications typically use distinct knowledge representations and generate enormous amount of information during the course of a trial. It becomes vital that the applications exchange trial semantics in order for efficient management of the trials and subsequent analysis of clinical trial data. Existing model-based frameworks do not address the requirements of semantic integration of heterogeneous applications. We have built an ontology-based architecture to support interoperation of clinical trial software applications. Central to our approach is a suite of clinical trial ontologies, which we call Epoch, that define the vocabulary and semantics necessary to represent information on clinical trials. We are continuing to demonstrate and validate our approach with different clinical trials management applications and with growing number of clinical trials.
View details for PubMedID 18693919
-
Efficiently querying relational databases using OWL and SWRL
1st International Conference on Web Reasoning and Rule Systems
SPRINGER-VERLAG BERLIN. 2007: 361–363
View details for Web of Science ID 000247363900031
-
An ontology-based architecture for integration of clinical trials management applications.
AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
2007: 661-665
Abstract
Management of complex clinical trials involves coordinated-use of a myriad of software applications by trial personnel. The applications typically use distinct knowledge representations and generate enormous amount of information during the course of a trial. It becomes vital that the applications exchange trial semantics in order for efficient management of the trials and subsequent analysis of clinical trial data. Existing model-based frameworks do not address the requirements of semantic integration of heterogeneous applications. We have built an ontology-based architecture to support interoperation of clinical trial software applications. Central to our approach is a suite of clinical trial ontologies, which we call Epoch, that define the vocabulary and semantics necessary to represent information on clinical trials. We are continuing to demonstrate and validate our approach with different clinical trials management applications and with growing number of clinical trials.
View details for PubMedID 18693919
View details for PubMedCentralID PMC2655871
-
Knowledge-Level Querying of Temporal Patterns in Clinical Research Systems
IOS PRESS. 2007: 311-+
Abstract
Managing time-stamped data is essential to clinical research activities and often requires the use of considerable domain knowledge. Adequately representing this domain knowledge is difficult in relational database systems. As a result, there is a need for principled methods to overcome the disconnect between the database representation of time-oriented research data and corresponding knowledge of domain-relevant concepts. In this paper, we present a set of methodologies for undertaking knowledge level querying of temporal patterns, and discuss its application to the verification of temporal constraints in clinical-trial applications. Our approach allows knowledge generated from query results to be tied to the data and, if necessary, used for further inference. We show how the Semantic Web ontology and rule languages, OWL and SWRL, respectively, can support the temporal knowledge model needed to integrate low-level representations of relational data with high-level domain concepts used in research data management. We present a scalable bridge-based software architecture that uses this knowledge model to enable dynamic querying of time-oriented research data.
View details for Web of Science ID 000272064000063
View details for PubMedID 17911729
-
An ontology-driven method for hierarchical mining of temporal patterns: application to HIV drug resistance research.
AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
2007: 614-619
Abstract
Many biomedical research databases contain time-oriented data resulting from longitudinal, time-series and time-dependent study designs, knowledge of which is not handled explicitly by most data-analytic methods. To make use of such knowledge about research data, we have developed an ontology-driven temporal mining method, called ChronoMiner. Most mining algorithms require data be inputted in a single table. ChronoMiner, in contrast, can search for interesting temporal patterns among multiple input tables and at different levels of hierarchical representation. In this paper, we present the application of our method to the discovery of temporal associations between newly arising mutations in the HIV genome and past drug regimens. We discuss the various components of ChronoMiner, including its user interface, and provide results of a study indicating the efficiency and potential value of ChronoMiner on an existing HIV drug resistance data repository.
View details for PubMedID 18693909
View details for PubMedCentralID PMC2655843
-
Using semantic web technologies for knowledge-driven querying of biomedical data
11th Conference on Artificial Intelligence in Medicine (AIME 2007)
SPRINGER-VERLAG BERLIN. 2007: 267–276
View details for Web of Science ID 000248222900036
-
Querying the semantic web with SWRL
International Symposium on Rule Interchange and Applications
SPRINGER-VERLAG BERLIN. 2007: 155–159
View details for Web of Science ID 000252602800013
-
Knowledge-Level Querying of Temporal Patterns in Clinical Research Systems
12th World Congress on Health (Medical) Informatics
I O S PRESS. 2007: 311–315
Abstract
Managing time-stamped data is essential to clinical research activities and often requires the use of considerable domain knowledge. Adequately representing this domain knowledge is difficult in relational database systems. As a result, there is a need for principled methods to overcome the disconnect between the database representation of time-oriented research data and corresponding knowledge of domain-relevant concepts. In this paper, we present a set of methodologies for undertaking knowledge level querying of temporal patterns, and discuss its application to the verification of temporal constraints in clinical-trial applications. Our approach allows knowledge generated from query results to be tied to the data and, if necessary, used for further inference. We show how the Semantic Web ontology and rule languages, OWL and SWRL, respectively, can support the temporal knowledge model needed to integrate low-level representations of relational data with high-level domain concepts used in research data management. We present a scalable bridge-based software architecture that uses this knowledge model to enable dynamic querying of time-oriented research data.
View details for Web of Science ID 000272064000063
View details for PubMedID 17911729
-
An ontology-driven mediator for querying time-oriented biomedical data
IEEE COMPUTER SOC. 2006: 264-+
View details for DOI 10.1109/CBMS.2006.41
View details for Web of Science ID 000240724000044
-
Ontology-driven mapping of temporal data in biomedical databases.
AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
2006: 1045-?
Abstract
Biomedical databases contain considerable amounts of time-oriented data, which are not typically in a format suitable for querying complex temporal patterns. We address this problem in implementing Synchronus, a tool for ontology-driven mapping of data from an existing relational database to a database schema with a uniform temporal representation. We discuss the design of Synchronus, which consists of a schema-mapping ontology and a data-mapping algorithm that together provide general capabilities for database transformation.
View details for PubMedID 17238664
View details for PubMedCentralID PMC1839378
-
A knowledge-based system for managing complex clinical trials
IEEE COMPUTER SOC. 2006: 270-+
View details for DOI 10.1109/CBMS.2006.107
View details for Web of Science ID 000240724000045
-
Towards semantic interoperability in a clinical trials management system
SPRINGER-VERLAG BERLIN. 2006: 901-912
View details for Web of Science ID 000243131100065
-
Ontology-driven mapping of temporal data in biomedical databases.
AMIA ... Annual Symposium proceedings. AMIA Symposium
2006: 1045
Abstract
Biomedical databases contain considerable amounts of time-oriented data, which are not typically in a format suitable for querying complex temporal patterns. We address this problem in implementing Synchronus, a tool for ontology-driven mapping of data from an existing relational database to a database schema with a uniform temporal representation. We discuss the design of Synchronus, which consists of a schema-mapping ontology and a data-mapping algorithm that together provide general capabilities for database transformation.
View details for PubMedID 17238664
-
A dynamic distributed architecture for temporal data abstraction.
AMIA ... Annual Symposium proceedings. AMIA Symposium
2006: 880
Abstract
Considerable prior work has been taken by researchers to address the need for temporal data deduction in biomedical applications, but relatively little research has examined how to create robust, efficient approaches for such methods using large databases. We present the design and evaluation of a distributed architecture that can be dynamically optimized to perform large-scale abstraction of temporal data.
View details for PubMedID 17238500
-
Computationat method for temporal pattern discovery in biomedical genomic databases
IEEE Computational Systems Bioinformatics Conference
IEEE COMPUTER SOC. 2005: 362–365
Abstract
With the rapid growth of biomedical research databases, opportunities for scientific inquiry have expanded quickly and led to a demand for computational methods that can extract biologically relevant patterns among vast amounts of data. A significant challenge is identifying temporal relationships among genotypic and clinical (phenotypic) data. Few software tools are available for such pattern matching, and they are not interoperable with existing databases. We are developing and validating a novel software method for temporal pattern discovery in biomedical genomics. In this paper, we present an efficient and flexible query algorithm (called TEMF) to extract statistical patterns from time-oriented relational databases. We show that TEMF - as an extension to our modular temporal querying application (Chronus II) - can express a wide range of complex temporal aggregations without the need for data processing in a statistical software package. We show the expressivity of TEMF using example queries from the Stanford HIV Database.
View details for Web of Science ID 000231800100040
View details for PubMedID 16447993
-
Translating research into practice: Organizational issues in implementing automated decision support for hypertension in three medical centers
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION
2004; 11 (5): 368-376
Abstract
Information technology can support the implementation of clinical research findings in practice settings. Technology can address the quality gap in health care by providing automated decision support to clinicians that integrates guideline knowledge with electronic patient data to present real-time, patient-specific recommendations. However, technical success in implementing decision support systems may not translate directly into system use by clinicians. Successful technology integration into clinical work settings requires explicit attention to the organizational context. We describe the application of a "sociotechnical" approach to integration of ATHENA DSS, a decision support system for the treatment of hypertension, into geographically dispersed primary care clinics. We applied an iterative technical design in response to organizational input and obtained ongoing endorsements of the project by the organization's administrative and clinical leadership. Conscious attention to organizational context at the time of development, deployment, and maintenance of the system was associated with extensive clinician use of the system.
View details for Web of Science ID 000223898000005
View details for PubMedID 15187064
View details for PubMedCentralID PMC516243
-
BioSTORM: a system for automated surveillance of diverse data sources.
AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
2003: 1071-?
Abstract
Heightened concerns about bioterrorism are forcing changes to the traditional biosurveillance-model. Public health departments are under pressure to follow multiple, non-specific, pre-diagnostic indicators, often drawn from many data sources. As a result, there is a need for biosurveillance systems that can use a variety of analysis techniques to rapidly integrate and process multiple diverse data feeds using a variety of problem solving techniques to give timely analysis. To meet these requirements, we are developing a new system called BioSTORM (Biological Spatio-Temporal Outbreak Reasoning Module).
View details for PubMedID 14728574
-
The chronus II temporal database mediator
Annual Symposium of the American-Medical-Informatics-Association
HANLEY & BELFUS INC MED PUBLISHERS. 2002: 567–571
Abstract
Clinical databases typically contain a significant amount of temporal information. This information is often crucial in medical decision-support systems. Although temporal queries are common in clinical systems, the medical informatics field has no standard means for representing or querying temporal data. Over the past decade, the temporal database community has made a significant amount of progress in temporal systems. Much of this research can be applied to clinical database systems. This paper outlines a temporal database mediator called Chronus II. Chronus II extends the standard relational model and the SQL query language to support temporal queries. It provides an expressive general-purpose temporal query language that is tuned to the querying requirements of clinical decision support systems. This paper describes how we have used Chronus II to tackle a variety of clinical problems in decision support systems developed by our group.
View details for Web of Science ID 000189418100115
View details for PubMedID 12474882
-
Knowledge-based bioterrorism surveillance
Annual Symposium of the American-Medical-Informatics-Association
HANLEY & BELFUS INC MED PUBLISHERS. 2002: 76–80
Abstract
An epidemic resulting from an act of bioterrorism could be catastrophic. However, if an epidemic can be detected and characterized early on, prompt public health intervention may mitigate its impact. Current surveillance approaches do not perform well in terms of rapid epidemic detection or epidemic monitoring. One reason for this shortcoming is their failure to bring existing knowledge and data to bear on the problem in a coherent manner. Knowledge-based methods can integrate surveillance data and knowledge, and allow for careful evaluation of problem-solving methods. This paper presents an argument for knowledge-based surveillance, describes a prototype of BioSTORM, a system for real-time epidemic surveillance, and shows an initial evaluation of this system applied to a simulated epidemic from a bioterrorism attack.
View details for Web of Science ID 000189418100016
View details for PubMedID 12463790