Bio


Dr. Musen is Professor of Biomedical Informatics at Stanford University, where he is Director of the Stanford Center for Biomedical Informatics Research. Dr. Musen conducts research related to open science, data stewardship, intelligent systems, and biomedical decision support. His group developed Protégé, the world’s most widely used technology for building and managing terminologies and ontologies. He is principal investigator of the National Center for Biomedical Ontology, one of the original National Centers for Biomedical Computing created by the U.S. National Institutes of Heath (NIH). He is principal investigator of the Center for Expanded Data Annotation and Retrieval (CEDAR). CEDAR develops new technology to ease the authoring and management of biomedical experimental data and metadata to make online datasets more findable, accessible, interoperable, and reusable. Dr. Musen chaired the Health Informatics and Modeling Topic Advisory Group for the World Health Organization’s revision of the International Classification of Diseases (ICD-11) and he currently directs the WHO Collaborating Center for Classification, Terminology, and Standards at Stanford University.

Early in his career, Dr. Musen received the Young Investigator Award for Research in Medical Knowledge Systems from the American Association of Medical Systems and Informatics and a Young Investigator Award from the National Science Foundation. In 2006, he was recipient of the Donald A. B. Lindberg Award for Innovation in Informatics from the American Medical Informatics Association. He has been elected to the American College of Medical Informatics, the Association of American Physicians, the International Academy of Health Sciences Informatics, and the National Academy of Medicine. He is founding co-editor-in-chief of the journal Applied Ontology.

Administrative Appointments


  • Principal Investigator, RADx Data Hub (2021 - Present)
  • Principal Investigator, Center for Expanded Data Annotation and Retrieval (2014 - Present)
  • Principal Investigator, National Center for Biomedical Ontology (2005 - Present)
  • Co-Editor-in-Chief, Applied Ontology: An International Journal of Ontological Analysis and Conceptual Modeling (2005 - 2016)
  • Deputy Director for Bioinformatics, Immune Tolerance Network (2005 - 2007)
  • Head, Stanford Center for Biomedical Informatics Research (1992 - Present)

Honors & Awards


  • Doctor honoris causa, University of Fribourg, Switzerland (2023)
  • Doctor honoris causa, University of A Coruña, Spain (2021)
  • Elected member, International Academy of Health Sciences Informatics (2017)
  • Elected member, National Academy of Medicine (2016)
  • "Ten Year Award" for the most influential paper presented at ISWC, ten years previously, Semantic Web Science Association (2014)
  • General Chair, Association for Computing Machinery Conference on Knowledge Capture (K-Cap '11) (2011)
  • Elected member, Association of American Physicians (2010)
  • Donald A. B. Lindberg Award for Innovation in Informatics, American Medical Informatics Association (2006)
  • General Chair, International Semantic Web Conference (2005)
  • Chair, Scientific Program Committee, American Medical Informatics Association Annual Symposium (2003)
  • Elected member, American Society for Clinical Investigation (1997)
  • NSF Young Investigator Award, National Science Foundation (1992)
  • Elected Fellow, American College of Medical Informatics (1989)
  • Young Investigator Award for Research in Medical Knowledge Systems, American Association for Medical Systems and Informatics (1989)

Boards, Advisory Committees, Professional Organizations


  • Member, U.S. National Committee for CODATA (2020 - Present)
  • Member, National Advisory Council on Biomedical Imaging and Bioengineering (2011 - 2015)
  • Chair, Health Informatics and Modeling Topic Advisory Group, ICD Revision Steering Group, World Health Organization (2008 - 2016)

Program Affiliations


  • Symbolic Systems Program

Professional Education


  • Postdoctoral fellow, Erasmus University, Rotterdam, Medical Informatics (1988)
  • Ph.D., Stanford University, Medical Information Sciences (1988)
  • Residency, Stanford University Hospital, Internal Medicine (1983)
  • M.D., Brown University, Medicine (1980)
  • Sc.B., Brown University, Biology (1977)

Current Research and Scholarly Interests


Semantic technology, which makes explicit the knowledge that drives computational systems, offers great opportunities to advance biomedical science and clinical medicine. Our laboratory studies the use of semantic technology in a range of application systems, emphasizing approaches that enhance the stewardship and dissemination of experimental datasets for open science.

The Center for Expanded Data Annotation and Retrieval (CEDAR) investigates new computational approaches that use semantic technology to ease the creation of metadata to describe scientific experiments. Metadata are data about data—machine-actionable descriptions of experimental data, of the methods used to acquire the data, of the analyses that have been performed on the data, and of the provenance of all this information. Science suffers because much of the metadata that investigators create to annotate the datasets that they archive in public repositories are incomplete and nonstandardized. CEDAR studies new methods to assist the authoring of high-quality metadata. Our long-term goal is to aid the dissemination of scientific data and knowledge in machine-processable form, and to create intelligent agents that can help scientists to track experimental results online, to integrate datasets, and to use public data repositories to make new discoveries.

We see CEDAR as the first step in the development of a new kind of technology to change the way in which scientific knowledge is communicated—not as prose journal articles but as computer-interpretable data and metadata. Automated systems one day will access such online "publications" to augment the capabilities of human scientists as they seek information about relevant studies and as they attempt to relate their own results to those of other investigators.

Our laboratory has pioneered methods for the development of intelligent systems. An important element of this work has involved the use of ontologies—formal descriptions of application areas that are created in a form that can be processed by both people and computers. CEDAR uses ontologies to ensure that scientific metadata are represented in a standardized way. Our Protégé system for ontology development—now with more than 400,000 registered users—allows us to continue to explore new methods for ontology engineering and for the construction of intelligent systems and knowledge graphs.

2023-24 Courses


Stanford Advisees


Graduate and Fellowship Programs


All Publications


  • Modeling community standards for metadata as templates makes data FAIR. Scientific data Musen, M. A., O'Connor, M. J., Schultes, E., Martinez-Romero, M., Hardi, J., Graybeal, J. 2022; 9 (1): 696

    Abstract

    It is challenging to determine whether datasets are findable, accessible, interoperable, and reusable (FAIR) because the FAIR Guiding Principles refer to highly idiosyncratic criteria regarding the metadata used to annotate datasets. Specifically, the FAIR principles require metadata to be "rich" and to adhere to "domain-relevant" community standards. Scientific communities should be able to define their own machine-actionable templates for metadata that encode these "rich," discipline-specific elements. We have explored this template-based approach in the context of two software systems. One system is the CEDAR Workbench, which investigators use to author new metadata. The other is the FAIRware Workbench, which evaluates the metadata of archived datasets for their adherence to community standards. Benefits accrue when templates for metadata become central elements in an ecosystem of tools to manage online datasets-both because the templates serve as a community reference for what constitutes FAIR data, and because they embody that perspective in a form that can be distributed among a variety of software applications to assist with data stewardship and data sharing.

    View details for DOI 10.1038/s41597-022-01815-3

    View details for PubMedID 36371407

  • Demand standards to sort FAIR data from foul NATURE Musen, M. A. 2022; 609 (7926): 222

    View details for DOI 10.1038/d41586-022-02820-7

    View details for Web of Science ID 000850078900001

    View details for PubMedID 36064801

  • The variable quality of metadata about biological samples used in biomedical experiments. Scientific data Goncalves, R. S., Musen, M. A. 2019; 6: 190021

    Abstract

    We present an analytical study of the quality of metadata about samples used in biomedical experiments. The metadata under analysis are stored in two well-known databases: BioSample-a repository managed by the National Center for Biotechnology Information (NCBI), and BioSamples-a repository managed by the European Bioinformatics Institute (EBI). We tested whether 11.4M sample metadata records in the two repositories are populated with values that fulfill the stated requirements for such values. Our study revealed multiple anomalies in the metadata. Most metadata field names and their values are not standardized or controlled. Even simple binary or numeric fields are often populated with inadequate values of different data types. By clustering metadata field names, we discovered there are often many distinct ways to represent the same aspect of a sample. Overall, the metadata we analyzed reveal that there is a lack of principled mechanisms to enforce and validate metadata requirements. The significant aberrancies that we found in the metadata are likely to impede search and secondary use of the associated datasets.

    View details for PubMedID 30778255

  • Interpretation of biological experiments changes with evolution of the Gene Ontology and its annotations SCIENTIFIC REPORTS Tomczak, A., Mortensen, J. M., Winnenburg, R., Liu, C., Alessi, D. T., Swamy, V., Vallania, F., Lofgren, S., Haynes, W., Shah, N. H., Musen, M. A., Khatri, P. 2018; 8: 5115

    Abstract

    Gene Ontology (GO) enrichment analysis is ubiquitously used for interpreting high throughput molecular data and generating hypotheses about underlying biological phenomena of experiments. However, the two building blocks of this analysis - the ontology and the annotations - evolve rapidly. We used gene signatures derived from 104 disease analyses to systematically evaluate how enrichment analysis results were affected by evolution of the GO over a decade. We found low consistency between enrichment analyses results obtained with early and more recent GO versions. Furthermore, there continues to be a strong annotation bias in the GO annotations where 58% of the annotations are for 16% of the human genes. Our analysis suggests that GO evolution may have affected the interpretation and possibly reproducibility of experiments over time. Hence, researchers must exercise caution when interpreting GO enrichment analyses and should reexamine previous analyses with the most recent GO version.

    View details for PubMedID 29572502

  • The center for expanded data annotation and retrieval. Journal of the American Medical Informatics Association Musen, M. A., Bean, C. A., Cheung, K., Dumontier, M., Durante, K. A., Gevaert, O., Gonzalez-Beltran, A., Khatri, P., Kleinstein, S. H., O'Connor, M. J., Pouliot, Y., Rocca-Serra, P., Sansone, S., Wiser, J. A. 2015; 22 (6): 1148-1152

    Abstract

    The Center for Expanded Data Annotation and Retrieval is studying the creation of comprehensive and expressive metadata for biomedical datasets to facilitate data discovery, data interpretation, and data reuse. We take advantage of emerging community-based standard templates for describing different kinds of biomedical datasets, and we investigate the use of computational techniques to help investigators to assemble templates and to fill in their values. We are creating a repository of metadata from which we plan to identify metadata patterns that will drive predictive data entry when filling in metadata templates. The metadata repository not only will capture annotations specified when experimental datasets are initially created, but also will incorporate links to the published literature, including secondary analyses and possible refinements or retractions of experimental interpretations. By working initially with the Human Immunology Project Consortium and the developers of the ImmPort data repository, we are developing and evaluating an end-to-end solution to the problems of metadata authoring and management that will generalize to other data-management environments.

    View details for DOI 10.1093/jamia/ocv048

    View details for PubMedID 26112029

  • The Protégé Project: A Look Back and a Look Forward. AI matters Musen, M. A. 2015; 1 (4): 4-12

    View details for PubMedID 27239556

  • The National Center for Biomedical Ontology JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Musen, M. A., Noy, N. F., Shah, N. H., Whetzel, P. L., Chute, C. G., Story, M., Smith, B. 2012; 19 (2): 190-195

    Abstract

    The National Center for Biomedical Ontology is now in its seventh year. The goals of this National Center for Biomedical Computing are to: create and maintain a repository of biomedical ontologies and terminologies; build tools and web services to enable the use of ontologies and terminologies in clinical and translational research; educate their trainees and the scientific community broadly about biomedical ontology and ontology-based technology and best practices; and collaborate with a variety of groups who develop and use ontologies and terminologies in biomedicine. The centerpiece of the National Center for Biomedical Ontology is a web-based resource known as BioPortal. BioPortal makes available for research in computationally useful forms more than 270 of the world's biomedical ontologies and terminologies, and supports a wide range of web services that enable investigators to use the ontologies to annotate and retrieve data, to generate value sets and special-purpose lexicons, and to perform advanced analytics on a wide range of biomedical data.

    View details for DOI 10.1136/amiajnl-2011-000523

    View details for Web of Science ID 000300768100010

    View details for PubMedID 22081220

    View details for PubMedCentralID PMC3277625

  • Author Correction: Advances and prospects for the Human BioMolecular Atlas Program (HuBMAP). Nature cell biology Jain, S., Pei, L., Spraggins, J. M., Angelo, M., Carson, J. P., Gehlenborg, N., Ginty, F., Goncalves, J. P., Hagood, J. S., Hickey, J. W., Kelleher, N. L., Laurent, L. C., Lin, S., Lin, Y., Liu, H., Naba, A., Nakayasu, E. S., Qian, W., Radtke, A., Robson, P., Stockwell, B. R., Van de Plas, R., Vlachos, I. S., Zhou, M., HuBMAP Consortium, Borner, K., Snyder, M. P., Ahn, K. J., Allen, J., Anderson, D. M., Anderton, C. R., Curcio, C., Angelin, A., Arvanitis, C., Atta, L., Awosika-Olumo, D., Bahmani, A., Bai, H., Balderrama, K., Balzano, L., Bandyopadhyay, G., Bandyopadhyay, S., Bar-Joseph, Z., Barnhart, K., Barwinska, D., Becich, M., Becker, L., Becker, W., Bedi, K., Bendall, S., Benninger, K., Betancur, D., Bettinger, K., Billings, S., Blood, P., Bolin, D., Border, S., Bosse, M., Bramer, L., Brewer, M., Brusko, M., Bueckle, A., Burke, K., Burnum-Johnson, K., Butcher, E., Butterworth, E., Cai, L., Calandrelli, R., Caldwell, M., Campbell-Thompson, M., Cao, D., Cao-Berg, I., Caprioli, R., Caraccio, C., Caron, A., Carroll, M., Chadwick, C., Chen, A., Chen, D., Chen, F., Chen, H., Chen, J., Chen, L., Chen, L., Chiacchia, K., Cho, S., Chou, P., Choy, L., Cisar, C., Clair, G., Clarke, L., Clouthier, K. A., Colley, M. E., Conlon, K., Conroy, J., Contrepois, K., Corbett, A., Corwin, A., Cotter, D., Courtois, E., Cruz, A., Csonka, C., Czupil, K., Daiya, V., Dale, K., Davanagere, S. A., Dayao, M., de Caestecker, M. P., Decker, A., Deems, S., Degnan, D., Desai, T., Deshpande, V., Deutsch, G., Devlin, M., Diep, D., Dodd, C., Donahue, S., Dong, W., Dos Santos Peixoto, R., Duffy, M., Dufresne, M., Duong, T. E., Dutra, J., Eadon, M. T., El-Achkar, T. M., Enninful, A., Eraslan, G., Eshelman, D., Espin-Perez, A., Esplin, E. D., Esselman, A., Falo, L. D., Falo, L., Fan, J., Fan, R., Farrow, M. A., Farzad, N., Favaro, P., Fermin, J., Filiz, F., Filus, S., Fisch, K., Fisher, E., Fisher, S., Flowers, K., Flynn, W. F., Fogo, A. B., Fu, D. A., Fulcher, J., Fung, A., Furst, D., Gallant, M., Gao, F., Gao, Y., Gaulton, K., Gaut, J. P., Gee, J., Ghag, R. R., Ghazanfar, S., Ghose, S., Gisch, D., Gold, I., Gondalia, A., Gorman, B., Greenleaf, W., Greenwald, N., Gregory, B., Guo, R., Gupta, R., Hakimian, H., Haltom, J., Halushka, M., Han, K. S., Hanson, C., Harbury, P., Hardi, J., Harlan, L., Harris, R. C., Hartman, A., Heidari, E., Helfer, J., Helminiak, D., Hemberg, M., Henning, N., Herr, B. W., Ho, J., Holden-Wiltse, J., Hong, S., Hong, Y., Honick, B., Hood, G., Hu, P., Hu, Q., Huang, M., Huyck, H., Imtiaz, T., Isberg, O. G., Itkin, M., Jackson, D., Jacobs, M., Jain, Y., Jewell, D., Jiang, L., Jiang, Z. G., Johnston, S., Joshi, P., Ju, Y., Judd, A., Kagel, A., Kahn, A., Kalavros, N., Kalhor, K., Karagkouni, D., Karathanos, T., Karunamurthy, A., Katari, S., Kates, H., Kaushal, M., Keener, N., Keller, M., Kenney, M., Kern, C., Kharchenko, P., Kim, J., Kingsford, C., Kirwan, J., Kiselev, V., Kishi, J., Kitata, R. B., Knoten, A., Kollar, C., Krishnamoorthy, P., Kruse, A. R., Da, K., Kundaje, A., Kutschera, E., Kwon, Y., Lake, B. B., Lancaster, S., Langlieb, J., Lardenoije, R., Laronda, M., Laskin, J., Lau, K., Lee, H., Lee, M., Lee, M., Strekalova, Y. L., Li, D., Li, J., Li, J., Li, X., Li, Z., Liao, Y., Liaw, T., Lin, P., Lin, Y., Lindsay, S., Liu, C., Liu, Y., Liu, Y., Lott, M., Lotz, M., Lowery, L., Lu, P., Lu, X., Lucarelli, N., Lun, X., Luo, Z., Ma, J., Macosko, E., Mahajan, M., Maier, L., Makowski, D., Malek, M., Manthey, D., Manz, T., Margulies, K., Marioni, J., Martindale, M., Mason, C., Mathews, C., Maye, P., McCallum, C., McDonough, E., McDonough, L., Mcdowell, H., Meads, M., Medina-Serpas, M., Ferreira, R. M., Messinger, J., Metis, K., Migas, L. G., Miller, B., Mimar, S., Minor, B., Misra, R., Missarova, A., Mistretta, C., Moens, R., Moerth, E., Moffitt, J., Molla, G., Monroe, M., Monte, E., Morgan, M., Muraro, D., Murphy, B. R., Murray, E., Musen, M. A., Naglah, A., Nasamran, C., Neelakantan, T., Nevins, S., Nguyen, H., Nguyen, N., Nguyen, T., Nguyen, T., Nigra, D., Nofal, M., Nolan, G., Nwanne, G., O'Connor, M., Okuda, K., Olmer, M., O'Neill, K., Otaluka, N., Pang, M., Parast, M., Pasa-Tolic, L., Paten, B., Patterson, N. H., Peng, T., Phillips, G., Pichavant, M., Piehowski, P., Pilner, H., Pingry, E., Pita-Juarez, Y., Plevritis, S., Ploumakis, A., Pouch, A., Pryhuber, G., Puerto, J., Qaurooni, D., Qin, L., Quardokus, E. M., Rajbhandari, P., Rakow-Penner, R., Ramasamy, R., Read, D., Record, E. G., Reeves, D., Ricarte, A., Rodriguez-Soto, A., Ropelewski, A., Rosario, J., Roselkis, M., Rowe, D., Roy, T. K., Ruffalo, M., Ruschman, N., Sabo, A., Sachdev, N., Saka, S., Salamon, D., Sarder, P., Sasaki, H., Satija, R., Saunders, D., Sawka, R., Schey, K., Schlehlein, H., Scholten, D., Schultz, S., Schwartz, L., Schwenk, M., Scibek, R., Segre, A., Serrata, M., Shands, W., Shen, X., Shendure, J., Shephard, H., Shi, L., Shi, T., Shin, D., Shirey, B., Sibilla, M., Silber, M., Silverstein, J., Simmel, D., Simmons, A., Singhal, D., Sivajothi, S., Smits, T., Soncin, F., Song, Q., Stanley, V., Stuart, T., Su, H., Su, P., Sun, X., Surrette, C., Swahn, H., Tan, K., Teichmann, S., Tejomay, A., Tellides, G., Thomas, K., Thomas, T., Thompson, M., Tian, H., Tideman, L., Trapnell, C., Tsai, A. G., Tsai, C., Tsai, L., Tsui, E., Tsui, T., Tung, J., Turner, M., Uranic, J., Vaishnav, E. D., Varra, S. R., Vaskivskyi, V., Velickovic, D., Velickovic, M., Verheyden, J., Waldrip, J., Wallace, D., Wan, X., Wang, A., Wang, F., Wang, M., Wang, S., Wang, X., Wasserfall, C., Wayne, L., Webber, J., Weber, G. M., Wei, B., Wei, J., Weimer, A., Welling, J., Wen, X., Wen, Z., Williams, M., Winfree, S., Winograd, N., Woodard, A., Wright, D., Wu, F., Wu, P., Wu, Q., Wu, X., Xing, Y., Xu, T., Yang, M., Yang, M., Yap, J., Ye, D. H., Yin, P., Yuan, Z., Yun, C. J., Zahraei, A., Zemaitis, K., Zhang, B., Zhang, C., Zhang, C., Zhang, C., Zhang, K., Zhang, S., Zhang, T., Zhang, Y., Zhao, B., Zhao, W., Zheng, J. W., Zhong, S., Zhu, B., Zhu, C., Zhu, D., Zhu, Q., Zhu, Y. 2024

    View details for DOI 10.1038/s41556-024-01384-0

    View details for PubMedID 38429479

  • Advances and prospects for the Human BioMolecular Atlas Program (HuBMAP). Nature cell biology Jain, S., Pei, L., Spraggins, J. M., Angelo, M., Carson, J. P., Gehlenborg, N., Ginty, F., Gonçalves, J. P., Hagood, J. S., Hickey, J. W., Kelleher, N. L., Laurent, L. C., Lin, S., Lin, Y., Liu, H., Naba, A., Nakayasu, E. S., Qian, W. J., Radtke, A., Robson, P., Stockwell, B. R., Van de Plas, R., Vlachos, I. S., Zhou, M., Börner, K., Snyder, M. P. 2023

    Abstract

    The Human BioMolecular Atlas Program (HuBMAP) aims to create a multi-scale spatial atlas of the healthy human body at single-cell resolution by applying advanced technologies and disseminating resources to the community. As the HuBMAP moves past its first phase, creating ontologies, protocols and pipelines, this Perspective introduces the production phase: the generation of reference spatial maps of functional tissue units across many organs from diverse populations and the creation of mapping tools and infrastructure to advance biomedical research.

    View details for DOI 10.1038/s41556-023-01194-w

    View details for PubMedID 37468756

    View details for PubMedCentralID 8238499

  • Specimen, biological structure, and spatial ontologies in support of a Human Reference Atlas. Scientific data Herr, B. W., Hardi, J., Quardokus, E. M., Bueckle, A., Chen, L., Wang, F., Caron, A. R., Osumi-Sutherland, D., Musen, M. A., Börner, K. 2023; 10 (1): 171

    Abstract

    The Human Reference Atlas (HRA) is defined as a comprehensive, three-dimensional (3D) atlas of all the cells in the healthy human body. It is compiled by an international team of experts who develop standard terminologies that they link to 3D reference objects, describing anatomical structures. The third HRA release (v1.2) covers spatial reference data and ontology annotations for 26 organs. Experts access the HRA annotations via spreadsheets and view reference object models in 3D editing tools. This paper introduces the Common Coordinate Framework (CCF) Ontology v2.0.1 that interlinks specimen, biological structure, and spatial data, together with the CCF API that makes the HRA programmatically accessible and interoperable with Linked Open Data (LOD). We detail how real-world user needs and experimental data guide CCF Ontology design and implementation, present CCF Ontology classes and properties together with exemplary usage, and report on validation methods. The CCF Ontology graph database and API are used in the HuBMAP portal, HRA Organ Gallery, and other applications that support data queries across multiple, heterogeneous sources.

    View details for DOI 10.1038/s41597-023-01993-8

    View details for PubMedID 36973309

    View details for PubMedCentralID PMC10043028

  • The scalable precision medicine open knowledge engine (SPOKE): A massive knowledge graph of biomedical information. Bioinformatics (Oxford, England) Morris, J. H., Soman, K., Akbas, R. E., Zhou, X., Smith, B., Meng, E. C., Huang, C. C., Cerono, G., Schenk, G., Rizk-Jackson, A., Harroud, A., Sanders, L., Costes, S. V., Bharat, K., Chakraborty, A., Pico, A. R., Mardirossian, T., Keiser, M., Tang, A., Hardi, J., Shi, Y., Musen, M., Israni, S., Huang, S., Rose, P. W., Nelson, C. A., Baranzini, S. E. 2023

    Abstract

    MOTIVATION: Knowledge graphs (KG) are being adopted in industry, commerce, and academia. Biomedical KG present a challenge due to the complexity, size, and heterogeneity of the underlying information.RESULTS: In this work we present the Scalable Precision Medicine Open Knowledge Engine (SPOKE), a biomedical KG connecting millions of concepts via semantically meaningful relationships. SPOKE contains 27 million nodes of 21 different types and 53 million edges of 55 types downloaded from 41 databases. The graph is built on the framework of 11 ontologies that maintain its structure, enable mappings, and facilitate navigation. SPOKE is built weekly by python scripts which download each resource, check for integrity and completeness, and then create a "parent table" of nodes and edges. Graph queries are translated by a REST API and users can submit searches directly via an API or a graphical user interface. Conclusions/Significance: SPOKE enables the integration of seemingly disparate information to support precision medicine efforts.AVAILABILITY: The SPOKE neighborhood explorer is available at https://spoke.rbvi.ucsf.edu.SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btad080

    View details for PubMedID 36759942

  • Ontology Repositories and Semantic Artefact Catalogues with the OntoPortal Technology Jonquet, C., Graybeal, J., Bouazzouni, S., Dorf, M., Fiore, N., Kechagioglou, X., Redmond, T., Rosati, I., Skrenchuk, A., Vendetti, J. L., Musen, M., OntoPortal Alliance, Payne, T. R., Presutti, Qi, G., Poveda-Villalon, M., Stoilos, G., Hollink, L., Kaoudi, Z., Cheng, G., Li, J. SPRINGER INTERNATIONAL PUBLISHING AG. 2023: 38-58
  • A biomedical open knowledge network harnesses the power of AI to understand deep human biology. AI magazine Baranzini, S. E., Börner, K., Morris, J., Nelson, C. A., Soman, K., Schleimer, E., Keiser, M., Musen, M., Pearce, R., Reza, T., Smith, B., Herr, B. W., Oskotsky, B., Rizk-Jackson, A., Rankin, K. P., Sanders, S. J., Bove, R., Rose, P. W., Israni, S., Huang, S. 2022; 43 (1): 46-58

    Abstract

    Knowledge representation and reasoning (KR&R) has been successfully implemented in many fields to enable computers to solve complex problems with AI methods. However, its application to biomedicine has been lagging in part due to the daunting complexity of molecular and cellular pathways that govern human physiology and pathology. In this article we describe concrete uses of SPOKE, an open knowledge network that connects curated information from 37 specialized and human-curated databases into a single property graph, with 3 million nodes and 15 million edges to date. Applications discussed in this article include drug discovery, COVID-19 research and chronic disease diagnosis and management.

    View details for DOI 10.1002/aaai.12037

    View details for PubMedID 36093122

    View details for PubMedCentralID PMC9456356

  • An Architecture for Attesting to the Provenance of Ontologies Using Blockchain Technologies Curty, S., Fill, H., Goncalves, R. S., Musen, M. A., Shishkov, B. SPRINGER INTERNATIONAL PUBLISHING AG. 2022: 182-199
  • Using ethnographic methods to classify the human experience in medicine: a case study of the presence ontology. Journal of the American Medical Informatics Association : JAMIA Maitra, A., Kamdar, M. R., Zulman, D. M., Haverfield, M. C., Brown-Johnson, C., Schwartz, R., Israni, S. T., Verghese, A., Musen, M. A. 2021

    Abstract

    OBJECTIVE: Although social and environmental factors are central to provider-patient interactions, the data that reflect these factors can be incomplete, vague, and subjective. We sought to create a conceptual framework to describe and classify data about presence, the domain of interpersonal connection in medicine.METHODS: Our top-down approach for ontology development based on the concept of "relationality" included the following: 1) a broad survey of the social sciences literature and a systematic literature review of >20 000 articles around interpersonal connection in medicine, 2) relational ethnography of clinical encounters (n=5 pilot, 27 full), and 3) interviews about relational work with 40 medical and nonmedical professionals. We formalized the model using the Web Ontology Language in the Protege ontology editor. We iteratively evaluated and refined the Presence Ontology through manual expert review and automated annotation of literature.RESULTS AND DISCUSSION: The Presence Ontology facilitates the naming and classification of concepts that would otherwise be vague. Our model categorizes contributors to healthcare encounters and factors such as communication, emotions, tools, and environment. Ontology evaluation indicated that cognitive models (both patients' explanatory models and providers' caregiving approaches) influenced encounters and were subsequently incorporated. We show how ethnographic methods based in relationality can aid the representation of experiential concepts (eg, empathy, trust). Our ontology could support investigative methods to improve healthcare processes for both patients and healthcare providers, including annotation of videotaped encounters, development of clinical instruments to measure presence, or implementation of electronic health record-based reminders for providers.CONCLUSION: The Presence Ontology provides a model for using ethnographic approaches to classify interpersonal data.

    View details for DOI 10.1093/jamia/ocab091

    View details for PubMedID 34151988

  • Design of a FAIR digital data health infrastructure in Africa for COVID-19 reporting and research. Advanced genetics (Hoboken, N.J.) van Reisen, M., Oladipo, F., Stokmans, M., Mpezamihgo, M., Folorunso, S., Schultes, E., Basajja, M., Aktau, A., Amare, S. Y., Taye, G. T., Purnama Jati, P. H., Chindoza, K., Wirtz, M., Ghardallou, M., van Stam, G., Ayele, W., Nalugala, R., Abdullahi, I., Osigwe, O., Graybeal, J., Medhanyie, A. A., Kawu, A. A., Liu, F., Wolstencroft, K., Flikkenschild, E., Lin, Y., Stocker, J., Musen, M. A. 2021; 2 (2): e10050

    Abstract

    The limited volume of COVID-19 data from Africa raises concerns for global genome research, which requires a diversity of genotypes for accurate disease prediction, including on the provenance of the new SARS-CoV-2 mutations. The Virus Outbreak Data Network (VODAN)-Africa studied the possibility of increasing the production of clinical data, finding concerns about data ownership, and the limited use of health data for quality treatment at point of care. To address this, VODAN Africa developed an architecture to record clinical health data and research data collected on the incidence of COVID-19, producing these as human- and machine-readable data objects in a distributed architecture of locally governed, linked, human- and machine-readable data. This architecture supports analytics at the point of care and-through data visiting, across facilities-for generic analytics. An algorithm was run across FAIR Data Points to visit the distributed data and produce aggregate findings. The FAIR data architecture is deployed in Uganda, Ethiopia, Liberia, Nigeria, Kenya, Somalia, Tanzania, Zimbabwe, and Tunisia.

    View details for DOI 10.1002/ggn2.10050

    View details for PubMedID 34514430

  • An empirical meta-analysis of the life sciences linked open data on the web. Scientific data Kamdar, M. R., Musen, M. A. 2021; 8 (1): 24

    Abstract

    While the biomedical community has published several "open data" sources in the last decade, most researchers still endure severe logistical and technical challenges to discover, query, and integrate heterogeneous data and knowledge from multiple sources. To tackle these challenges, the community has experimented with Semantic Web and linked data technologies to create the Life Sciences Linked Open Data (LSLOD) cloud. In this paper, we extract schemas from more than 80 biomedical linked open data sources into an LSLOD schema graph and conduct an empirical meta-analysis to evaluate the extent of semantic heterogeneity across the LSLOD cloud. We observe that several LSLOD sources exist as stand-alone data sources that are not inter-linked with other sources, use unpublished schemas with minimal reuse or mappings, and have elements that are not useful for data integration from a biomedical perspective. We envision that the LSLOD schema graph and the findings from this research will aid researchers who wish to query and integrate data and knowledge from multiple biomedical sources simultaneously on the Web.

    View details for DOI 10.1038/s41597-021-00797-y

    View details for PubMedID 33479214

  • FAIR Convergence Matrix: Optimizing the Reuse of Existing FAIR-Related Resources DATA INTELLIGENCE Sustkova, H., Hettne, K., Wittenburg, P., Jacobsen, A., Kuhn, T., Pergl, R., Slifka, J., McQuilton, P., Magagna, B., Sansone, S., Stocker, M., Imming, M., Lannom, L., Musen, M., Schultes, E. 2020; 2 (1-2): 158-170
  • OrderRex clinical user testing: a randomized trial of recommender system decision support on simulated cases. Journal of the American Medical Informatics Association : JAMIA Kumar, A., Aikens, R. C., Hom, J., Shieh, L., Chiang, J., Morales, D., Saini, D., Musen, M., Baiocchi, M., Altman, R., Goldstein, M. K., Asch, S., Chen, J. H. 2020

    Abstract

    OBJECTIVE: To assess usability and usefulness of a machine learning-based order recommender system applied to simulated clinical cases.MATERIALS AND METHODS: 43 physicians entered orders for 5 simulated clinical cases using a clinical order entry interface with or without access to a previously developed automated order recommender system. Cases were randomly allocated to the recommender system in a 3:2 ratio. A panel of clinicians scored whether the orders placed were clinically appropriate. Our primary outcome included the difference in clinical appropriateness scores. Secondary outcomes included total number of orders, case time, and survey responses.RESULTS: Clinical appropriateness scores per order were comparable for cases randomized to the order recommender system (mean difference -0.11 order per score, 95% CI: [-0.41, 0.20]). Physicians using the recommender placed more orders (median 16 vs 15 orders, incidence rate ratio 1.09, 95%CI: [1.01-1.17]). Case times were comparable with the recommender system. Order suggestions generated from the recommender system were more likely to match physician needs than standard manual search options. Physicians used recommender suggestions in 98% of available cases. Approximately 95% of participants agreed the system would be useful for their workflows.DISCUSSION: User testing with a simulated electronic medical record interface can assess the value of machine learning and clinical decision support tools for clinician usability and acceptance before live deployments.CONCLUSIONS: Clinicians can use and accept machine learned clinical order recommendations integrated into an electronic order entry interface in a simulated setting. The clinical appropriateness of orders entered was comparable even when supported by automated recommendations.

    View details for DOI 10.1093/jamia/ocaa190

    View details for PubMedID 33106874

  • Toward a Harmonized WHO Family of International Classifications Content Model. Studies in health technology and informatics Tu, S. W., Nyulas, C. I., Tudorache, T., Musen, M. A., Martinuzzi, A., van Gool, C., Mea, V. D., Chute, C. G., Frattura, L., Hardiker, N., Napel, H. T., Madden, R., Almborg, A., Ginige, J. A., Sykes, C., Cekik, C., Jakob, R. 2020; 270: 1409–10

    Abstract

    An overarching WHO-FIC Content Model will allow uniform modeling of classifications in the WHO Family of International Classifications (WHO-FIC) and promote their joint use. We provide an initial conceptualization of such a model.

    View details for DOI 10.3233/SHTI200466

    View details for PubMedID 32570683

  • A more decentralized vision for Linked Data SEMANTIC WEB Polleres, A., Kamdar, M., Fernandez, J., Tudorache, T., Musen, M. 2020; 11 (1): 101–13

    View details for DOI 10.3233/SW-190380

    View details for Web of Science ID 000512298900012

  • Obstacles to the reuse of study metadata in ClinicalTrials.gov. Scientific data Miron, L. n., Gonçalves, R. S., Musen, M. A. 2020; 7 (1): 443

    Abstract

    Metadata that are structured using principled schemas and that use terms from ontologies are essential to making biomedical data findable and reusable for downstream analyses. The largest source of metadata that describes the experimental protocol, funding, and scientific leadership of clinical studies is ClinicalTrials.gov. We evaluated whether values in 302,091 trial records adhere to expected data types and use terms from biomedical ontologies, whether records contain fields required by government regulations, and whether structured elements could replace free-text elements. Contact information, outcome measures, and study design are frequently missing or underspecified. Important fields for search, such as condition and intervention, are not restricted to ontologies, and almost half of the conditions are not denoted by MeSH terms, as recommended. Eligibility criteria are stored as semi-structured free text. Enforcing the presence of all required elements, requiring values for certain fields to be drawn from ontologies, and creating a structured eligibility criteria element would improve the reusability of data from ClinicalTrials.gov in systematic reviews, metanalyses, and matching of eligible patients to trials.

    View details for DOI 10.1038/s41597-020-00780-z

    View details for PubMedID 33339830

  • Physician Usage and Acceptance of a Machine Learning Recommender System for Simulated Clinical Order Entry. AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science Chiang, J., Kumar, A., Morales, D., Saini, D., Hom, J., Shieh, L., Musen, M., Goldstein, M. K., Chen, J. H. 2020; 2020: 89–97

    Abstract

    Clinical decision support tools that automatically disseminate patterns of clinical orders have the potential to improve patient care by reducing errors of omission and streamlining physician workflows. However, it is unknown if physicians will accept such tools or how their behavior will be affected. In this randomized controlled study, we exposed 34 licensed physicians to a clinical order entry interface and five simulated emergency cases, with randomized availability of a previously developed clinical order recommender system. With the recommender available, physicians spent similar time per case (6.7 minutes), but placed more total orders (17.1 vs. 15.8). The recommender demonstrated superior recall (59% vs 41%) and precision (25% vs 17%) compared to manual search results, and was positively received by physicians recognizing workflow benefits. Further studies must assess the potential clinical impact towards a future where electronic health records automatically anticipate clinical needs.

    View details for PubMedID 32477627

  • WebProtege: A Cloud-Based Ontology Editor Horridge, M., Goncalves, R. S., Nyulas, C. I., Tudorache, T., Musen, M. A., ACM ASSOC COMPUTING MACHINERY. 2019: 686–89
  • Use of OWL and Semantic Web Technologies at Pinterest Goncalves, R. S., Horridge, M., Li, R., Liu, Y., Musen, M. A., Nyulas, C. I., Obamos, E., Shrouty, D., Temple, D., Ghidini, C., Hartig, O., Maleshkova, M., Svatek, Cruz, Hogan, A., Song, J., Lefrancois, M., Gandon, F. SPRINGER INTERNATIONAL PUBLISHING AG. 2019: 418–35
  • Enabling Web-scale data integration in biomedicine through Linked Open Data. NPJ digital medicine Kamdar, M. R., Fernandez, J. D., Polleres, A., Tudorache, T., Musen, M. A. 2019; 2: 90

    Abstract

    The biomedical data landscape is fragmented with several isolated, heterogeneous data and knowledge sources, which use varying formats, syntaxes, schemas, and entity notations, existing on the Web. Biomedical researchers face severe logistical and technical challenges to query, integrate, analyze, and visualize data from multiple diverse sources in the context of available biomedical knowledge. Semantic Web technologies and Linked Data principles may aid toward Web-scale semantic processing and data integration in biomedicine. The biomedical research community has been one of the earliest adopters of these technologies and principles to publish data and knowledge on the Web as linked graphs and ontologies, hence creating the Life Sciences Linked Open Data (LSLOD) cloud. In this paper, we provide our perspective on some opportunities proffered by the use of LSLOD to integrate biomedical data and knowledge in three domains: (1) pharmacology, (2) cancer research, and (3) infectious diseases. We will discuss some of the major challenges that hinder the wide-spread use and consumption of LSLOD by the biomedical research community. Finally, we provide a few technical solutions and insights that can address these challenges. Eventually, LSLOD can enable the development of scalable, intelligent infrastructures that support artificial intelligence methods for augmenting human intelligence to achieve better clinical outcomes for patients, to enhance the quality of biomedical research, and to improve our understanding of living systems.

    View details for DOI 10.1038/s41746-019-0162-5

    View details for PubMedID 31531395

  • Unleashing the value of Common Data Elements through the CEDAR Workbench. AMIA ... Annual Symposium proceedings. AMIA Symposium O'Connor, M. J., Warzel, D. B., Martínez-Romero, M. n., Hardi, J. n., Willrett, D. n., Egyedi, A. L., Eftekhari, A. n., Graybeal, J. n., Musen, M. A. 2019; 2019: 681–90

    Abstract

    Developing promising treatments in biomedicine often requires aggregation and analysis of data from disparate sources across the healthcare and research spectrum. To facilitate these approaches, there is a growing focus on supporting interoperation of datasets by standardizing data-capture and reporting requirements. Common Data Elements (CDEs)-precise specifications of questions and the set of allowable answers to each question-are increasingly being adopted to help meet these standardization goals. While CDEs can provide a strong conceptual foundation for interoperation, there are no widely recognized serialization or interchange formats to describe and exchange their definitions. As a result, CDEs defined in one system cannot be easily be reused by other systems. An additional problem is that current CDE-based systems tend to be rather heavyweight and cannot be easily adopted and used by third-parties. To address these problems, we developed extensions to a metadata management system called the CEDAR Workbench to provide a platform to simplify the creation, exchange, and use of CDEs. We show how the resulting system allows users to quickly define and share CDEs and to immediately use these CDEs to build and deploy Web-based forms to acquire conforming metadata. We also show how we incorporated a large CDE library from the National Cancer Institute's caDSR system and made these CDEs publicly available for general use.

    View details for PubMedID 32308863

  • HopRank: How Semantic Structure Influences Teleportation in PageRank (A Case Study on BioPortal) Espin-Noboa, L., Lemmerich, F., Walk, S., Strohmaier, M., Musen, M., Assoc Comp Machinery ASSOC COMPUTING MACHINERY. 2019: 2708–14
  • Using association rule mining and ontologies to generate metadata recommendations from multiple biomedical databases. Database : the journal of biological databases and curation Martinez-Romero, M., O'Connor, M. J., Egyedi, A. L., Willrett, D., Hardi, J., Graybeal, J., Musen, M. A. 2019; 2019

    Abstract

    Metadata-the machine-readable descriptions of the data-are increasingly seen as crucial for describing the vast array of biomedical datasets that are currently being deposited in public repositories. While most public repositories have firm requirements that metadata must accompany submitted datasets, the quality of those metadata is generally very poor. A key problem is that the typical metadata acquisition process is onerous and time consuming, with little interactive guidance or assistance provided to users. Secondary problems include the lack of validation and sparse use of standardized terms or ontologies when authoring metadata. There is a pressing need for improvements to the metadata acquisition process that will help users to enter metadata quickly and accurately. In this paper, we outline a recommendation system for metadata that aims to address this challenge. Our approach uses association rule mining to uncover hidden associations among metadata values and to represent them in the form of association rules. These rules are then used to present users with real-time recommendations when authoring metadata. The novelties of our method are that it is able to combine analyses of metadata from multiple repositories when generating recommendations and can enhance those recommendations by aligning them with ontology terms. We implemented our approach as a service integrated into the CEDAR Workbench metadata authoring platform, and evaluated it using metadata from two public biomedical repositories: US-based National Center for Biotechnology Information BioSample and European Bioinformatics Institute BioSamples. The results show that our approach is able to use analyses of previously entered metadata coupled with ontology-based mappings to present users with accurate recommendations when authoring metadata.

    View details for DOI 10.1093/database/baz059

    View details for PubMedID 31210270

  • The CAIRR Pipeline for Submitting Standards-Compliant B and T Cell Receptor Repertoire Sequencing Studies to the National Center for Biotechnology Information Repositories FRONTIERS IN IMMUNOLOGY Bukhari, S., O'Connor, M. J., Martinez-Romero, M., Egyedi, A. L., Willrett, D., Graybeal, J., Musen, M. A., Rubelt, F., Cheung, K., Kleinstein, S. H. 2018; 9
  • The CAIRR Pipeline for Submitting Standards-Compliant B and T Cell Receptor Repertoire Sequencing Studies to the National Center for Biotechnology Information Repositories. Frontiers in immunology Bukhari, S. A., O'Connor, M. J., Martínez-Romero, M., Egyedi, A. L., Willrett, D., Graybeal, J., Musen, M. A., Rubelt, F., Cheung, K. H., Kleinstein, S. H. 2018; 9: 1877

    Abstract

    The adaptation of high-throughput sequencing to the B cell receptor and T cell receptor has made it possible to characterize the adaptive immune receptor repertoire (AIRR) at unprecedented depth. These AIRR sequencing (AIRR-seq) studies offer tremendous potential to increase the understanding of adaptive immune responses in vaccinology, infectious disease, autoimmunity, and cancer. The increasingly wide application of AIRR-seq is leading to a critical mass of studies being deposited in the public domain, offering the possibility of novel scientific insights through secondary analyses and meta-analyses. However, effective sharing of these large-scale data remains a challenge. The AIRR community has proposed minimal information about adaptive immune receptor repertoire (MiAIRR), a standard for reporting AIRR-seq studies. The MiAIRR standard has been operationalized using the National Center for Biotechnology Information (NCBI) repositories. Submissions of AIRR-seq data to the NCBI repositories typically use a combination of web-based and flat-file templates and include only a minimal amount of terminology validation. As a result, AIRR-seq studies at the NCBI are often described using inconsistent terminologies, limiting scientists' ability to access, find, interoperate, and reuse the data sets. In order to improve metadata quality and ease submission of AIRR-seq studies to the NCBI, we have leveraged the software framework developed by the Center for Expanded Data Annotation and Retrieval (CEDAR), which develops technologies involving the use of data standards and ontologies to improve metadata quality. The resulting CEDAR-AIRR (CAIRR) pipeline enables data submitters to: (i) create web-based templates whose entries are controlled by ontology terms, (ii) generate and validate metadata, and (iii) submit the ontology-linked metadata and sequence files (FASTQ) to the NCBI BioProject, BioSample, and Sequence Read Archive databases. Overall, CAIRR provides a web-based metadata submission interface that supports compliance with the MiAIRR standard. This pipeline is available at http://cairr.miairr.org, and will facilitate the NCBI submission process and improve the metadata quality of AIRR-seq studies.

    View details for DOI 10.3389/fimmu.2018.01877

    View details for PubMedID 30166985

    View details for PubMedCentralID PMC6105692

  • CEDAR OnDemand: a browser extension to generate ontology-based scientific metadata BMC BIOINFORMATICS Bukhari, S., Martinez-Romero, M., Connor, M., Egyedi, A. L., Willrett, D., Graybeal, J., Musen, M. A., Cheung, K., Kleinstein, S. H. 2018; 19: 268

    Abstract

    Public biomedical data repositories often provide web-based interfaces to collect experimental metadata. However, these interfaces typically reflect the ad hoc metadata specification practices of the associated repositories, leading to a lack of standardization in the collected metadata. This lack of standardization limits the ability of the source datasets to be broadly discovered, reused, and integrated with other datasets. To increase reuse, discoverability, and reproducibility of the described experiments, datasets should be appropriately annotated by using agreed-upon terms, ideally from ontologies or other controlled term sources.This work presents "CEDAR OnDemand", a browser extension powered by the NCBO (National Center for Biomedical Ontology) BioPortal that enables users to seamlessly enter ontology-based metadata through existing web forms native to individual repositories. CEDAR OnDemand analyzes the web page contents to identify the text input fields and associate them with relevant ontologies which are recommended automatically based upon input fields' labels (using the NCBO ontology recommender) and a pre-defined list of ontologies. These field-specific ontologies are used for controlling metadata entry. CEDAR OnDemand works for any web form designed in the HTML format. We demonstrate how CEDAR OnDemand works through the NCBI (National Center for Biotechnology Information) BioSample web-based metadata entry.CEDAR OnDemand helps lower the barrier of incorporating ontologies into standardized metadata entry for public data repositories. CEDAR OnDemand is available freely on the Google Chrome store https://chrome.google.com/webstore/search/CEDAROnDemand.

    View details for PubMedID 30012108

  • Analyzing user interactions with biomedical ontologies: A visual perspective JOURNAL OF WEB SEMANTICS Kamdar, M. R., Walk, S., Tudorache, T., Musen, M. A. 2018; 49: 16–30

    Abstract

    Biomedical ontologies are large: Several ontologies in the BioPortal repository contain thousands or even hundreds of thousands of entities. The development and maintenance of such large ontologies is difficult. To support ontology authors and repository developers in their work, it is crucial to improve our understanding of how these ontologies are explored, queried, reused, and used in downstream applications by biomedical researchers. We present an exploratory empirical analysis of user activities in the BioPortal ontology repository by analyzing BioPortal interaction logs across different access modes over several years. We investigate how users of BioPortal query and search for ontologies and their classes, how they explore the ontologies, and how they reuse classes from different ontologies. Additionally, through three real-world scenarios, we not only analyze the usage of ontologies for annotation tasks but also compare it to the browsing and querying behaviors of BioPortal users. For our investigation, we use several different visualization techniques. To inspect large amounts of interaction, reuse, and real-world usage data at a glance, we make use of and extend PolygOnto, a visualization method that has been successfully used to analyze reuse of ontologies in previous work. Our results show that exploration, query, reuse, and actual usage behaviors rarely align, suggesting that different users tend to explore, query and use different parts of an ontology. Finally, we highlight and discuss differences and commonalities among users of BioPortal.

    View details for PubMedID 29657560

  • AgroPortal: A vocabulary and ontology repository for agronomy COMPUTERS AND ELECTRONICS IN AGRICULTURE Jonquet, C., Toulet, A., Arnaud, E., Aubin, S., Yeumo, E., Emonet, V., Graybeal, J., Laporte, M., Musen, M. A., Pesce, V., Larmande, P. 2018; 144: 126–43
  • How Sustainable are Biomedical Ontologies? AMIA ... Annual Symposium proceedings. AMIA Symposium Geller, J., Keloth, V. K., Musen, M. A. 2018; 2018: 470–79

    Abstract

    BioPortal is widely regarded to be the world's most comprehensive repository of biomedical ontologies. With a coverage of many biomedical subfields by 716 ontologies (June 27, 2018), BioPortal is an extremely diverse repository. BioPortal maintains easily accessible information about the ontologies submitted by ontology curators. This includes size (concepts/classes, relationships/properties), number of projects, update history, and access history. Ontologies vary by size (from a few concepts to hundreds of thousands), by frequency of update/visit and by number of projects. Interestingly, some ontologies are rarely updated even though they contain thousands of concepts. In an informal email inquiry, we attempted to understand the reasons why ontologies that were built with a major investment of effort are apparently not sustained. Our analysis indicates that lack of funding, unavailability of human resources, and folding of ontologies into other ontologies are the most common among several other factors for discontinued maintenance of these ontologies.

    View details for PubMedID 30815087

  • BiOnIC: A Catalog of User Interactions with Biomedical Ontologies. The semantic Web--ISWC ... : ... International Semantic Web Conference ... proceedings. International Semantic Web Conference Kamdar, M. R., Walk, S., Tudorache, T., Musen, M. A. 2017; 10588: 130-138

    Abstract

    BiOnIC is a catalog of aggregated statistics of user clicks, queries, and reuse counts for access to over 200 biomedical ontologies. BiOnIC also provides anonymized sequences of classes accessed by users over a period of four years. To generate the statistics, we processed the access logs of BioPortal, a large open biomedical ontology repository. We publish the BiOnIC data using DCAT and SKOS metadata standards. The BiOnIC catalog has a wide range of applicability, which we demonstrate through its use in three different types of applications. To our knowledge, this type of interaction data stemming from a real-world, large-scale application has not been published before. We expect that the catalog will become an important resource for researchers and developers in the Semantic Web community by providing novel insights into how ontologies are explored, queried and reused. The BiOnIC catalog may ultimately assist in the more informed development of intelligent user interfaces for semantic resources through interface customization, prediction of user browsing and querying behavior, and ontology summarization. The BiOnIC catalog is available at: http://onto-apps.stanford.edu/bionic.

    View details for DOI 10.1007/978-3-319-68204-4_13

    View details for PubMedID 29637199

    View details for PubMedCentralID PMC5889949

  • The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata that Describe Scientific Experiments. The semantic Web--ISWC ... : ... International Semantic Web Conference ... proceedings. International Semantic Web Conference Gonçalves, R. S., O'Connor, M. J., Martínez-Romero, M., Egyedi, A. L., Willrett, D., Graybeal, J., Musen, M. A. 2017; 10588: 103-110

    Abstract

    The Center for Expanded Data Annotation and Retrieval (CEDAR) aims to revolutionize the way that metadata describing scientific experiments are authored. The software we have developed-the CEDAR Workbench-is a suite of Web-based tools and REST APIs that allows users to construct metadata templates, to fill in templates to generate high-quality metadata, and to share and manage these resources. The CEDAR Workbench provides a versatile, REST-based environment for authoring metadata that are enriched with terms from ontologies. The metadata are available as JSON, JSON-LD, or RDF for easy integration in scientific applications and reusability on the Web. Users can leverage our APIs for validating and submitting metadata to external repositories. The CEDAR Workbench is freely available and open-source.

    View details for DOI 10.1007/978-3-319-68204-4_10

    View details for PubMedID 32219223

    View details for PubMedCentralID PMC7098808

  • An ontology-driven tool for structured data acquisition using Web forms JOURNAL OF BIOMEDICAL SEMANTICS Goncalves, R. S., Tu, S. W., Nyulas, C. I., Tierney, M. J., Musen, M. A. 2017; 8: 26

    Abstract

    Structured data acquisition is a common task that is widely performed in biomedicine. However, current solutions for this task are far from providing a means to structure data in such a way that it can be automatically employed in decision making (e.g., in our example application domain of clinical functional assessment, for determining eligibility for disability benefits) based on conclusions derived from acquired data (e.g., assessment of impaired motor function). To use data in these settings, we need it structured in a way that can be exploited by automated reasoning systems, for instance, in the Web Ontology Language (OWL); the de facto ontology language for the Web.We tackle the problem of generating Web-based assessment forms from OWL ontologies, and aggregating input gathered through these forms as an ontology of "semantically-enriched" form data that can be queried using an RDF query language, such as SPARQL. We developed an ontology-based structured data acquisition system, which we present through its specific application to the clinical functional assessment domain. We found that data gathered through our system is highly amenable to automatic analysis using queries.We demonstrated how ontologies can be used to help structuring Web-based forms and to semantically enrich the data elements of the acquired structured data. The ontologies associated with the enriched data elements enable automated inferences and provide a rich vocabulary for performing queries.

    View details for PubMedID 28764813

  • NCBO Ontology Recommender 2.0: an enhanced approach for biomedical ontology recommendation. Journal of biomedical semantics Martínez-Romero, M., Jonquet, C., O'Connor, M. J., Graybeal, J., Pazos, A., Musen, M. A. 2017; 8 (1): 21-?

    Abstract

    Ontologies and controlled terminologies have become increasingly important in biomedical research. Researchers use ontologies to annotate their data with ontology terms, enabling better data integration and interoperability across disparate datasets. However, the number, variety and complexity of current biomedical ontologies make it cumbersome for researchers to determine which ones to reuse for their specific needs. To overcome this problem, in 2010 the National Center for Biomedical Ontology (NCBO) released the Ontology Recommender, which is a service that receives a biomedical text corpus or a list of keywords and suggests ontologies appropriate for referencing the indicated terms.We developed a new version of the NCBO Ontology Recommender. Called Ontology Recommender 2.0, it uses a novel recommendation approach that evaluates the relevance of an ontology to biomedical text data according to four different criteria: (1) the extent to which the ontology covers the input data; (2) the acceptance of the ontology in the biomedical community; (3) the level of detail of the ontology classes that cover the input data; and (4) the specialization of the ontology to the domain of the input data.Our evaluation shows that the enhanced recommender provides higher quality suggestions than the original approach, providing better coverage of the input data, more detailed information about their concepts, increased specialization for the domain of the input data, and greater acceptance and use in the community. In addition, it provides users with more explanatory information, along with suggestions of not only individual ontologies but also groups of ontologies to use together. It also can be customized to fit the needs of different ontology recommendation scenarios.Ontology Recommender 2.0 suggests relevant ontologies for annotating biomedical text data. It combines the strengths of its predecessor with a range of adjustments and new features that improve its reliability and usefulness. Ontology Recommender 2.0 recommends over 500 biomedical ontologies from the NCBO BioPortal platform, where it is openly available (both via the user interface at http://bioportal.bioontology.org/recommender , and via a Web service API).

    View details for DOI 10.1186/s13326-017-0128-y

    View details for PubMedID 28592275

  • An Empirical Analysis of Ontology Reuse in BioPortal. Journal of biomedical informatics Ochs, C., Perl, Y., Geller, J., Arabandi, S., Tudorache, T., Musen, M. A. 2017

    Abstract

    Biomedical ontologies often reuse content (i.e., classes and properties) from other ontologies. Content reuse enables a consistent representation of a domain and reusing content can save an ontology author significant time and effort. Prior studies have investigated the existence of reused terms among the ontologies in the NCBO BioPortal, but as of yet there has not been a study investigating how the ontologies in BioPortal utilize reused content in the modeling of their own content. In this study we investigate how 355 ontologies hosted in the NCBO BioPortal reuse content from other ontologies for the purposes of creating new ontology content. We identified 197 ontologies that reuse content. Among these ontologies, 108 utilize reused classes in the modeling of their own classes and 116 utilize reused properties in class restrictions. Current utilization of reuse and quality issues related to reuse are discussed.

    View details for DOI 10.1016/j.jbi.2017.05.021

    View details for PubMedID 28583809

  • PhLeGrA: Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data. Proceedings of the ... International World-Wide Web Conference. International WWW Conference Kamdar, M. R., Musen, M. A. 2017; 2017: 321-329

    Abstract

    Integrated approaches for pharmacology are required for the mechanism-based predictions of adverse drug reactions that manifest due to concomitant intake of multiple drugs. These approaches require the integration and analysis of biomedical data and knowledge from multiple, heterogeneous sources with varying schemas, entity notations, and formats. To tackle these integrative challenges, the Semantic Web community has published and linked several datasets in the Life Sciences Linked Open Data (LSLOD) cloud using established W3C standards. We present the PhLeGrA platform for Linked Graph Analytics in Pharmacology in this paper. Through query federation, we integrate four sources from the LSLOD cloud and extract a drug-reaction network, composed of distinct entities. We represent this graph as a hidden conditional random field (HCRF), a discriminative latent variable model that is used for structured output predictions. We calculate the underlying probability distributions in the drug-reaction HCRF using the datasets from the U.S. Food and Drug Administration's Adverse Event Reporting System. We predict the occurrence of 146 adverse reactions due to multiple drug intake with an AUROC statistic greater than 0.75. The PhLeGrA platform can be extended to incorporate other sources published using Semantic Web technologies, as well as to discover other types of pharmacological associations.

    View details for DOI 10.1145/3038912.3052692

    View details for PubMedID 29479581

    View details for PubMedCentralID PMC5824722

  • Use of Ontology Structure and Bayesian Models to Aid the Crowdsourcing of ICD-11 Sanctioning Rules. Journal of biomedical informatics Lou, Y., Tu, S. W., Nyulas, C., Tudorache, T., Chalmers, R. J., Musen, M. A. 2017

    Abstract

    The International Classification of Diseases (ICD) is the de facto standard international classification for mortality reporting and for many epidemiological, clinical, and financial use cases. The next version of ICD, ICD-11, will be submitted for approval by the World Health Assembly in 2018. Unlike previous versions of ICD, where coders mostly select single codes from pre-enumerated disease and disorder codes, ICD-11 coding will allow extensive use of multiple codes to give more detailed disease descriptions. For example, "severe malignant neoplasms of left breast" may be coded using the combination of a "stem code" (e.g., code for malignant neoplasms of breast) with a variety of "extension codes" (e.g., codes for laterality and severity). The use of multiple codes (a process called post-coordination), while avoiding the pitfall of having to pre-enumerate vast number of possible disease and qualifier combinations, risks the creation of meaningless expressions that combine stem codes with inappropriate qualifiers. To prevent that from happening, "sanctioning rules" that define legal combinations are necessary. In this work, we developed a crowdsourcing method for obtaining sanctioning rules for the post-coordination of concepts in ICD-11. Our method utilized the hierarchical structures in the domain to improve the accuracy of the sanctioning rules and to lower the crowdsourcing cost. We used Bayesian networks to model crowd workers' skills, the accuracy of their responses, and our confidence in the acquired sanctioning rules. We applied reinforcement learning to develop an agent that constantly adjusted the confidence cutoffs during the crowdsourcing process to maximize the overall quality of sanctioning rules under a fixed budget. Finally, we performed formative evaluations using a skin-disease branch of the draft ICD-11 and demonstrated that the crowd-sourced sanctioning rules replicated those defined by an expert dermatologist with high precision and recall. This work demonstrated that a crowdsourcing approach could offer a reasonably efficient method for generating a first draft of sanctioning rules that subject matter experts could verify and edit, thus relieving them of the tedium and cost of formulating the initial set of rules.

    View details for DOI 10.1016/j.jbi.2017.02.004

    View details for PubMedID 28192233

    View details for PubMedCentralID PMC5428551

  • A systematic analysis of term reuse and term overlap across biomedical ontologies SEMANTIC WEB Kamdar, M. R., Tudorache, T., Musen, M. A. 2017; 8 (6): 853–71

    Abstract

    Reusing ontologies and their terms is a principle and best practice that most ontology development methodologies strongly encourage. Reuse comes with the promise to support the semantic interoperability and to reduce engineering costs. In this paper, we present a descriptive study of the current extent of term reuse and overlap among biomedical ontologies. We use the corpus of biomedical ontologies stored in the BioPortal repository, and analyze different types of reuse and overlap constructs. While we find an approximate term overlap between 25-31%, the term reuse is only <9%, with most ontologies reusing fewer than 5% of their terms from a small set of popular ontologies. Clustering analysis shows that the terms reused by a common set of ontologies have >90% semantic similarity, hinting that ontology developers tend to reuse terms that are sibling or parent-child nodes. We validate this finding by analysing the logs generated from a Protégé plugin that enables developers to reuse terms from BioPortal. We find most reuse constructs were 2-level subtrees on the higher levels of the class hierarchy. We developed a Web application that visualizes reuse dependencies and overlap among ontologies, and that proposes similar terms from BioPortal for a term of interest. We also identified a set of error patterns that indicate that ontology developers did intend to reuse terms from other ontologies, but that they were using different and sometimes incorrect representations. Our results stipulate the need for semi-automated tools that augment term reuse in the ontology engineering process through personalized recommendations.

    View details for PubMedID 28819351

    View details for PubMedCentralID PMC5555235

  • The New HIT: Human Health Information Technology Leung, T. I., Goldstein, M. K., Musen, M. A., Cronkite, R., Chen, J. H., Gottlieb, A., Leitersdorf, E., Gundlapalli, A. V., Jaulent, M. C., Zhao, D. IOS PRESS. 2017: 768–72

    Abstract

    Humanism in medicine is defined as health care providers' attitudes and actions that demonstrate respect for patients' values and concerns in relation to their social, psychological and spiritual life domains. Specifically, humanistic clinical medicine involves showing respect for the patient, building a personal connection, and eliciting and addressing a patient's emotional response to illness. Health information technology (IT) often interferes with humanistic clinical practice, potentially disabling these core aspects of the therapeutic patient-physician relationship. Health IT has evolved rapidly in recent years - and the imperative to maintain humanism in practice has never been greater. In this vision paper, we aim to discuss why preserving humanism is imperative in the design and implementation of health IT systems.

    View details for PubMedID 29295202

  • High-Risk Drug-Drug Interactions Between Clinical Practice Guidelines for Management of Chronic Conditions. AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science Tso, G. J., Tu, S. W., Musen, M. A., Goldstein, M. K. 2017; 2017: 531–39

    Abstract

    Clinicians and clinical decision-support systems often follow pharmacotherapy recommendations for patients based on clinical practice guidelines (CPGs). In multimorbid patients, these recommendations can potentially have clinically significant drug-drug interactions (DDIs). In this study, we describe and validate a method for programmatically detecting DDIs among CPG recommendations. The system extracts pharmacotherapy intervention recommendations from narrative CPGs, normalizes the terms, creates a mapping of drugs and drug classes, and then identifies occurrences of DDIs between CPG pairs. We used this system to analyze 75 CPGs written by authoring entities in the United States that discuss outpatient management of common chronic diseases. Using a reference list of high-risk DDIs, we identified 2198 of these DDIs in 638 CPG pairs (46 unique CPGs). Only 9 high-risk DDIs were discussed by both CPGs in a pairing. In 69 of the pairings, neither CPG had a pharmacologic reference or a warning of the possibility of a DDI.

    View details for PubMedID 28815153

  • Precision annotation of digital samples in NCBI's gene expression omnibus. Scientific data Hadley, D. n., Pan, J. n., El-Sayed, O. n., Aljabban, J. n., Aljabban, I. n., Azad, T. D., Hadied, M. O., Raza, S. n., Rayikanti, B. A., Chen, B. n., Paik, H. n., Aran, D. n., Spatz, J. n., Himmelstein, D. n., Panahiazar, M. n., Bhattacharya, S. n., Sirota, M. n., Musen, M. A., Butte, A. J. 2017; 4: 170125

    Abstract

    The Gene Expression Omnibus (GEO) contains more than two million digital samples from functional genomics experiments amassed over almost two decades. However, individual sample meta-data remains poorly described by unstructured free text attributes preventing its largescale reanalysis. We introduce the Search Tag Analyze Resource for GEO as a web application (http://STARGEO.org) to curate better annotations of sample phenotypes uniformly across different studies, and to use these sample annotations to define robust genomic signatures of disease pathology by meta-analysis. In this paper, we target a small group of biomedical graduate students to show rapid crowd-curation of precise sample annotations across all phenotypes, and we demonstrate the biological validity of these crowd-curated annotations for breast cancer. STARGEO.org makes GEO data findable, accessible, interoperable and reusable (i.e., FAIR) to ultimately facilitate knowledge discovery. Our work demonstrates the utility of crowd-curation and interpretation of open 'big data' under FAIR principles as a first step towards realizing an ideal paradigm of precision medicine.

    View details for PubMedID 28925997

    View details for PubMedCentralID PMC5604135

  • Fast and Accurate Metadata Authoring Using Ontology-Based Recommendations. AMIA ... Annual Symposium proceedings. AMIA Symposium Martinez-Romero, M., O'Connor, M. J., Shankar, R. D., Panahiazar, M., Willrett, D., Egyedi, A. L., Gevaert, O., Graybeal, J., Musen, M. A. 2017; 2017: 1272–81

    Abstract

    In biomedicine, high-quality metadata are crucial for finding experimental datasets, for understanding how experiments were performed, and for reproducing those experiments. Despite the recent focus on metadata, the quality of metadata available in public repositories continues to be extremely poor. A key difficulty is that the typical metadata acquisition process is time-consuming and error prone, with weak or nonexistent support for linking metadata to ontologies. There is a pressing need for methods and tools to speed up the metadata acquisition process and to increase the quality of metadata that are entered. In this paper, we describe a methodology and set of associated tools that we developed to address this challenge. A core component of this approach is a value recommendation framework that uses analysis of previously entered metadata and ontology-based metadata specifications to help users rapidly and accurately enter their metadata. We performed an initial evaluation of this approach using metadata from a public metadata repository.

    View details for PubMedID 29854196

  • Mechanism-based Pharmacovigilance over the Life Sciences Linked Open Data Cloud. AMIA ... Annual Symposium proceedings. AMIA Symposium Kamdar, M. R., Musen, M. A. 2017; 2017: 1014–23

    Abstract

    Adverse drug reactions (ADR) result in significant morbidity and mortality in patients, and a substantial proportion of these ADRs are caused by drug-drug interactions (DDIs). Pharmacovigilance methods are used to detect unanticipated DDIs and ADRs by mining Spontaneous Reporting Systems, such as the US FDA Adverse Event Reporting System (FAERS). However, these methods do not provide mechanistic explanations for the discovered drug-ADR associations in a systematic manner. In this paper, we present a systems pharmacology-based approach to perform mechanism-based pharmacovigilance. We integrate data and knowledge from four different sources using Semantic Web Technologies and Linked Data principles to generate a systems network. We present a network-based Apriori algorithm for association mining in FAERS reports. We evaluate our method against existing pharmacovigilance methods for three different validation sets. Our method has AUROC statistics of 0.7-0.8, similar to current methods, and event-specific thresholds generate AUROC statistics greater than 0.75 for certain ADRs. Finally, we discuss the benefits of using Semantic Web technologies to attain the objectives for mechanism-based pharmacovigilance.

    View details for PubMedID 29854169

  • A unified software framework for deriving, visualizing, and exploring abstraction networks for ontologies. Journal of biomedical informatics Ochs, C., Geller, J., Perl, Y., Musen, M. A. 2016; 62: 90-105

    Abstract

    Software tools play a critical role in the development and maintenance of biomedical ontologies. One important task that is difficult without software tools is ontology quality assurance. In previous work, we have introduced different kinds of abstraction networks to provide a theoretical foundation for ontology quality assurance tools. Abstraction networks summarize the structure and content of ontologies. One kind of abstraction network that we have used repeatedly to support ontology quality assurance is the partial-area taxonomy. It summarizes structurally and semantically similar concepts within an ontology. However, the use of partial-area taxonomies was ad hoc and not generalizable. In this paper, we describe the Ontology Abstraction Framework (OAF), a unified framework and software system for deriving, visualizing, and exploring partial-area taxonomy abstraction networks. The OAF includes support for various ontology representations (e.g., OWL and SNOMED CT's relational format). A Protégé plugin for deriving "live partial-area taxonomies" is demonstrated.

    View details for DOI 10.1016/j.jbi.2016.06.008

    View details for PubMedID 27345947

    View details for PubMedCentralID PMC4987206

  • Utilizing a structural meta-ontology for family-based quality assurance of the BioPortal ontologies. Journal of biomedical informatics Ochs, C., He, Z., Zheng, L., Geller, J., Perl, Y., Hripcsak, G., Musen, M. A. 2016; 61: 63-76

    Abstract

    An Abstraction Network is a compact summary of an ontology's structure and content. In previous research, we showed that Abstraction Networks support quality assurance (QA) of biomedical ontologies. The development of an Abstraction Network and its associated QA methodologies, however, is a labor-intensive process that previously was applicable only to one ontology at a time. To improve the efficiency of the Abstraction-Network-based QA methodology, we introduced a QA framework that uses uniform Abstraction Network derivation techniques and QA methodologies that are applicable to whole families of structurally similar ontologies. For the family-based framework to be successful, it is necessary to develop a method for classifying ontologies into structurally similar families. We now describe a structural meta-ontology that classifies ontologies according to certain structural features that are commonly used in the modeling of ontologies (e.g., object properties) and that are important for Abstraction Network derivation. Each class of the structural meta-ontology represents a family of ontologies with identical structural features, indicating which types of Abstraction Networks and QA methodologies are potentially applicable to all of the ontologies in the family. We derive a collection of 81 families, corresponding to classes of the structural meta-ontology, that enable a flexible, streamlined family-based QA methodology, offering multiple choices for classifying an ontology. The structure of 373 ontologies from the NCBO BioPortal is analyzed and each ontology is classified into multiple families modeled by the structural meta-ontology.

    View details for DOI 10.1016/j.jbi.2016.03.007

    View details for PubMedID 26988001

    View details for PubMedCentralID PMC4893909

  • Utilizing a structural meta-ontology for family-based of the BioPortal ontologies JOURNAL OF BIOMEDICAL INFORMATICS Ochs, C., He, Z., Zheng, L., Geller, J., Perl, Y., Hripcsak, G., Musen, M. A. 2016; 61: 63-76

    Abstract

    An Abstraction Network is a compact summary of an ontology's structure and content. In previous research, we showed that Abstraction Networks support quality assurance (QA) of biomedical ontologies. The development of an Abstraction Network and its associated QA methodologies, however, is a labor-intensive process that previously was applicable only to one ontology at a time. To improve the efficiency of the Abstraction-Network-based QA methodology, we introduced a QA framework that uses uniform Abstraction Network derivation techniques and QA methodologies that are applicable to whole families of structurally similar ontologies. For the family-based framework to be successful, it is necessary to develop a method for classifying ontologies into structurally similar families. We now describe a structural meta-ontology that classifies ontologies according to certain structural features that are commonly used in the modeling of ontologies (e.g., object properties) and that are important for Abstraction Network derivation. Each class of the structural meta-ontology represents a family of ontologies with identical structural features, indicating which types of Abstraction Networks and QA methodologies are potentially applicable to all of the ontologies in the family. We derive a collection of 81 families, corresponding to classes of the structural meta-ontology, that enable a flexible, streamlined family-based QA methodology, offering multiple choices for classifying an ontology. The structure of 373 ontologies from the NCBO BioPortal is analyzed and each ontology is classified into multiple families modeled by the structural meta-ontology.

    View details for DOI 10.1016/j.jbi.2016.03.007

    View details for Web of Science ID 000384704300008

    View details for PubMedCentralID PMC4893909

  • Is the crowd better as an assistant or a replacement in ontology engineering? An exploration through the lens of the Gene Ontology. Journal of biomedical informatics Mortensen, J. M., Telis, N., Hughey, J. J., Fan-Minogue, H., Van Auken, K., Dumontier, M., Musen, M. A. 2016; 60: 199-209

    Abstract

    Biomedical ontologies contain errors. Crowdsourcing, defined as taking a job traditionally performed by a designated agent and outsourcing it to an undefined large group of people, provides scalable access to humans. Therefore, the crowd has the potential to overcome the limited accuracy and scalability found in current ontology quality assurance approaches. Crowd-based methods have identified errors in SNOMED CT, a large, clinical ontology, with an accuracy similar to that of experts, suggesting that crowdsourcing is indeed a feasible approach for identifying ontology errors. This work uses that same crowd-based methodology, as well as a panel of experts, to verify a subset of the Gene Ontology (200 relationships). Experts identified 16 errors, generally in relationships referencing acids and metals. The crowd performed poorly in identifying those errors, with an area under the receiver operating characteristic curve ranging from 0.44 to 0.73, depending on the methods configuration. However, when the crowd verified what experts considered to be easy relationships with useful definitions, they performed reasonably well. Notably, there are significantly fewer Google search results for Gene Ontology concepts than SNOMED CT concepts. This disparity may account for the difference in performance - fewer search results indicate a more difficult task for the worker. The number of Internet search results could serve as a method to assess which tasks are appropriate for the crowd. These results suggest that the crowd fits better as an expert assistant, helping experts with their verification by completing the easy tasks and allowing experts to focus on the difficult tasks, rather than an expert replacement.

    View details for DOI 10.1016/j.jbi.2016.02.005

    View details for PubMedID 26873781

    View details for PubMedCentralID PMC4836980

  • An Open Repository Model for Acquiring Knowledge About Scientific Experiments O'Connor, M. J., Martnez-Romero, M., Egyedi, A. L., Willrett, D., Graybeal, J., Musen, M. A., Blomqvist, E., Ciancarini, P., Poggi, F., Vitali, F. SPRINGER INT PUBLISHING AG. 2016: 762–77
  • Snap-SPARQL: A Java Framework for Working with SPARQL and OWL Horridge, M., Musen, M., Tamma, Dragoni, M., Goncalves, R., Lawrynowicz, A. SPRINGER INT PUBLISHING AG. 2016: 154–65
  • How to apply Markov chains for modeling sequential edit patterns in collaborative ontology-engineering projects INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES Walk, S., Singer, P., Strohmaier, M., Helic, D., Noy, N. F., Musen, M. A. 2015; 84: 51-66
  • Using ontologies to model human navigation behavior in information networks: A study based on Wikipedia. Semantic web Lamprecht, D., Strohmaier, M., Helic, D., Nyulas, C., Tudorache, T., Noy, N. F., Musen, M. A. 2015; 6 (4): 403-422

    Abstract

    The need to examine the behavior of different user groups is a fundamental requirement when building information systems. In this paper, we present Ontology-based Decentralized Search (OBDS), a novel method to model the navigation behavior of users equipped with different types of background knowledge. Ontology-based Decentralized Search combines decentralized search, an established method for navigation in social networks, and ontologies to model navigation behavior in information networks. The method uses ontologies as an explicit representation of background knowledge to inform the navigation process and guide it towards navigation targets. By using different ontologies, users equipped with different types of background knowledge can be represented. We demonstrate our method using four biomedical ontologies and their associated Wikipedia articles. We compare our simulation results with base line approaches and with results obtained from a user study. We find that our method produces click paths that have properties similar to those originating from human navigators. The results suggest that our method can be used to model human navigation behavior in systems that are based on information networks, such as Wikipedia. This paper makes the following contributions: (i) To the best of our knowledge, this is the first work to demonstrate the utility of ontologies in modeling human navigation and (ii) it yields new insights and understanding about the mechanisms of human navigation in information networks.

    View details for DOI 10.3233/SW-140143

    View details for PubMedID 26568745

    View details for PubMedCentralID PMC4643321

  • Investigating Term Reuse and Overlap in Biomedical Ontologies. CEUR workshop proceedings Kamdar, M. R., Tudorache, T., Musen, M. A. 2015; 1515

    Abstract

    We investigate the current extent of term reuse and overlap among biomedical ontologies. We use the corpus of biomedical ontologies stored in the BioPortal repository, and analyze three types of reuse constructs: (a) explicit term reuse, (b) xref reuse, and (c) Concept Unique Identifier (CUI) reuse. While there is a term label similarity of approximately 14.4% of the total terms, we observed that most ontologies reuse considerably fewer than 5% of their terms from a concise set of a few core ontologies. We developed an interactive visualization to explore reuse dependencies among biomedical ontologies. Moreover, we identified a set of patterns that indicate ontology developers did intend to reuse terms from other ontologies, but they were using different and sometimes incorrect representations. Our results suggest the value of semi-automated tools that augment term reuse in the ontology engineering process through personalized recommendations.

    View details for PubMedID 29636656

  • Analysis and Prediction of User Editing Patterns in Ontology Development Projects. Journal on data semantics Wang, H., Tudorache, T., Dou, D., Noy, N. F., Musen, M. A. 2015; 4 (2): 117-132

    Abstract

    The development of real-world ontologies is a complex undertaking, commonly involving a group of domain experts with different expertise that work together in a collaborative setting. These ontologies are usually large scale and have complex structures. To assist in the authoring process, ontology tools are key at making the editing process as streamlined as possible. Being able to predict confidently what the users are likely to do next as they edit an ontology will enable us to focus and structure the user interface accordingly and to facilitate more efficient interaction and information discovery. In this paper, we use data mining, specifically the association rule mining, to investigate whether we are able to predict the next editing operation that a user will make based on the change history. We simulated and evaluated continuous prediction across time using sliding window model. We used the association rule mining to generate patterns from the ontology change logs in the training window and tested these patterns on logs in the adjacent testing window. We also evaluated the impact of different training and testing window sizes on the prediction accuracies. At last, we evaluated our prediction accuracies across different user groups and different ontologies. Our results indicate that we can indeed predict the next editing operation a user is likely to make. We will use the discovered editing patterns to develop a recommendation module for our editing tools, and to design user interface components that better fit with the user editing behaviors.

    View details for PubMedID 26052350

  • Using the wisdom of the crowds to find critical errors in biomedical ontologies: a study of SNOMED CT. Journal of the American Medical Informatics Association Mortensen, J. M., Minty, E. P., Januszyk, M., Sweeney, T. E., Rector, A. L., Noy, N. F., Musen, M. A. 2015; 22 (3): 640-648

    Abstract

    The verification of biomedical ontologies is an arduous process that typically involves peer review by subject-matter experts. This work evaluated the ability of crowdsourcing methods to detect errors in SNOMED CT (Systematized Nomenclature of Medicine Clinical Terms) and to address the challenges of scalable ontology verification.We developed a methodology to crowdsource ontology verification that uses micro-tasking combined with a Bayesian classifier. We then conducted a prospective study in which both the crowd and domain experts verified a subset of SNOMED CT comprising 200 taxonomic relationships.The crowd identified errors as well as any single expert at about one-quarter of the cost. The inter-rater agreement (κ) between the crowd and the experts was 0.58; the inter-rater agreement between experts themselves was 0.59, suggesting that the crowd is nearly indistinguishable from any one expert. Furthermore, the crowd identified 39 previously undiscovered, critical errors in SNOMED CT (eg, 'septic shock is a soft-tissue infection').The results show that the crowd can indeed identify errors in SNOMED CT that experts also find, and the results suggest that our method will likely perform well on similar ontologies. The crowd may be particularly useful in situations where an expert is unavailable, budget is limited, or an ontology is too large for manual error checking. Finally, our results suggest that the online anonymous crowd could successfully complete other domain-specific tasks.We have demonstrated that the crowd can address the challenges of scalable ontology verification, completing not only intuitive, common-sense tasks, but also expert-level, knowledge-intensive tasks.

    View details for DOI 10.1136/amiajnl-2014-002901

    View details for PubMedID 25342179

  • Data breaches of protected health information in the United States. JAMA Liu, V., Musen, M. A., Chou, T. 2015; 313 (14): 1471-1473

    View details for DOI 10.1001/jama.2015.2252

    View details for PubMedID 25871675

  • Using ontologies to model human navigation behavior in information networks: A study based on Wikipedia SEMANTIC WEB Lamprecht, D., Strohmaier, M., Helic, D., Nyulas, C., Tudorache, T., Noy, N. F., Musen, M. A. 2015; 6 (4): 403-422

    Abstract

    The need to examine the behavior of different user groups is a fundamental requirement when building information systems. In this paper, we present Ontology-based Decentralized Search (OBDS), a novel method to model the navigation behavior of users equipped with different types of background knowledge. Ontology-based Decentralized Search combines decentralized search, an established method for navigation in social networks, and ontologies to model navigation behavior in information networks. The method uses ontologies as an explicit representation of background knowledge to inform the navigation process and guide it towards navigation targets. By using different ontologies, users equipped with different types of background knowledge can be represented. We demonstrate our method using four biomedical ontologies and their associated Wikipedia articles. We compare our simulation results with base line approaches and with results obtained from a user study. We find that our method produces click paths that have properties similar to those originating from human navigators. The results suggest that our method can be used to model human navigation behavior in systems that are based on information networks, such as Wikipedia. This paper makes the following contributions: (i) To the best of our knowledge, this is the first work to demonstrate the utility of ontologies in modeling human navigation and (ii) it yields new insights and understanding about the mechanisms of human navigation in information networks.

    View details for DOI 10.3233/SW-140143

    View details for Web of Science ID 000357876600011

    View details for PubMedCentralID PMC4643321

  • Applied ontology: The next decade begins APPLIED ONTOLOGY Guarino, N., Musen, M. 2015; 10 (1): 1–4

    View details for DOI 10.3233/AO-150143

    View details for Web of Science ID 000354872000001

  • Understanding How Users Edit Ontologies: Comparing Hypotheses About Four Real-World Projects Walk, S., Singer, P., Noboa, L., Tudorache, T., Musen, M. A., Strohmaier, M., Arenas, M., Corcho, O., Simperl, E., Strohmaier, M., DAquin, M., Srinivas, K., Groth, P., Dumontier, M., Heflin, J., Thirunarayan, K., Staab, S. SPRINGER INT PUBLISHING AG. 2015: 551–68
  • User Extensible System to Identify Problems in OWL Ontologies and SWRL Rules Orlando, J., Musen, M. A., Moreira, D. A., Bassiliades, N., Gottlob, G., Sadri, F., Paschke, A., Roman, D. SPRINGER-VERLAG BERLIN. 2015: 112–26
  • Using Aggregate Taxonomies to Summarize SNOMED CT Evolution Ochs, C., Perl, Y., Geller, J., Musen, M., Huan, J., Miyano, S., Shehu, A., Hu, Ma, B., Rajasekaran, S., Gombar, V. K., Schapranow, I. M., Yoo, I. H., Zhou, J. Y., Chen, B., Pai, Pierce, B. IEEE. 2015: 1008–15
  • Ten years of Applied Ontology APPLIED ONTOLOGY Guarino, N., Musen, M. A. 2015; 10 (3-4): 169–70

    View details for DOI 10.3233/AO-150160

    View details for Web of Science ID 000368916800001

  • Helping Users Bootstrap Ontologies: An Empirical Investigation Zhang, Y., Tudorache, T., Horridge, M., Musen, M. A., Assoc Comp Machinery ASSOC COMPUTING MACHINERY. 2015: 3395–98
  • Toward a science of learning systems: a research agenda for the high-functioning Learning Health System JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Friedman, C., Rubin, J., Brown, J., Buntin, M., Corn, M., Etheredge, L., Gunter, C., Musen, M., Platt, R., Stead, W., Sullivan, K., Van Houweling, D. 2015; 22 (1): 43-50

    Abstract

    The capability to share data, and harness its potential to generate knowledge rapidly and inform decisions, can have transformative effects that improve health. The infrastructure to achieve this goal at scale--marrying technology, process, and policy--is commonly referred to as the Learning Health System (LHS). Achieving an LHS raises numerous scientific challenges.The National Science Foundation convened an invitational workshop to identify the fundamental scientific and engineering research challenges to achieving a national-scale LHS. The workshop was planned by a 12-member committee and ultimately engaged 45 prominent researchers spanning multiple disciplines over 2 days in Washington, DC on 11-12 April 2013.The workshop participants collectively identified 106 research questions organized around four system-level requirements that a high-functioning LHS must satisfy. The workshop participants also identified a new cross-disciplinary integrative science of cyber-social ecosystems that will be required to address these challenges.The intellectual merit and potential broad impacts of the innovations that will be driven by investments in an LHS are of great potential significance. The specific research questions that emerged from the workshop, alongside the potential for diverse communities to assemble to address them through a 'new science of learning systems', create an important agenda for informatics and related disciplines.

    View details for DOI 10.1136/amiajnl-2014-002977

    View details for Web of Science ID 000352771100007

    View details for PubMedID 25342177

    View details for PubMedCentralID PMC4433378

  • A Method to Compare ICF and SNOMED CT for Coverage of U.S. Social Security Administration's Disability Listing Criteria. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Tu, S. W., Nyulas, C. I., Tudorache, T., Musen, M. A. 2015; 2015: 1224-1233

    Abstract

    We developed a method to evaluate the extent to which the International Classification of Function, Disability, and Health (ICF) and SNOMED CT cover concepts used in the disability listing criteria of the U.S. Social Security Administration's "Blue Book." First we decomposed the criteria into their constituent concepts and relationships. We defined different types of mappings and manually mapped the recognized concepts and relationships to either ICF or SNOMED CT. We defined various metrics for measuring the coverage of each terminology, taking into account the effects of inexact matches and frequency of occurrence. We validated our method by mapping the terms in the disability criteria of Adult Listings, Chapter 12 (Mental Disorders). SNOMED CT dominates ICF in almost all the metrics that we have computed. The method is applicable for determining any terminology's coverage of eligibility criteria.

    View details for PubMedID 26958262

  • Automating Identification of Multiple Chronic Conditions in Clinical Practice Guidelines. AMIA Joint Summits on Translational Science proceedings AMIA Summit on Translational Science Leung, T. I., Jalal, H., Zulman, D. M., Dumontier, M., Owens, D. K., Musen, M. A., Goldstein, M. K. 2015; 2015: 456-460

    Abstract

    Many clinical practice guidelines (CPGs) are intended to provide evidence-based guidance to clinicians on a single disease, and are frequently considered inadequate when caring for patients with multiple chronic conditions (MCC), or two or more chronic conditions. It is unclear to what degree disease-specific CPGs provide guidance about MCC. In this study, we develop a method for extracting knowledge from single-disease chronic condition CPGs to determine how frequently they mention commonly co-occurring chronic diseases. We focus on 15 highly prevalent chronic conditions. We use publicly available resources, including a repository of guideline summaries from the National Guideline Clearinghouse to build a text corpus, a data dictionary of ICD-9 codes from the Medicare Chronic Conditions Data Warehouse (CCW) to construct an initial list of disease terms, and disease synonyms from the National Center for Biomedical Ontology to enhance the list of disease terms. First, for each disease guideline, we determined the frequency of comorbid condition mentions (a disease-comorbidity pair) by exactly matching disease synonyms in the text corpus. Then, we developed an annotated reference standard using a sample subset of guidelines. We used this reference standard to evaluate our approach. Then, we compared the co-prevalence of common pairs of chronic conditions from Medicare CCW data to the frequency of disease-comorbidity pairs in CPGs. Our results show that some disease-comorbidity pairs occur more frequently in CPGs than others. Sixty-one (29.0%) of 210 possible disease-comorbidity pairs occurred zero times; for example, no guideline on chronic kidney disease mentioned depression, while heart failure guidelines mentioned ischemic heart disease the most frequently. Our method adequately identifies comorbid chronic conditions in CPG recommendations with precision 0.82, recall 0.75, and F-measure 0.78. Our work identifies knowledge currently embedded in the free text of clinical practice guideline recommendations and provides an initial view of the extent to which CPGs mention common comorbid conditions. Knowledge extracted from CPG text in this way may be useful to inform gaps in guideline recommendations regarding MCC and therefore identify potential opportunities for guideline improvement.

    View details for PubMedID 26306285

  • Discovering Beaten Paths in Collaborative Ontology-Engineering Projects using Markov Chains JOURNAL OF BIOMEDICAL INFORMATICS Walk, S., Singer, P., Strohmaier, M., Tudorache, T., Musen, M. A., Noy, N. F. 2014; 51: 254-271

    Abstract

    Biomedical taxonomies, thesauri and ontologies in the form of the International Classification of Diseases as a taxonomy or the National Cancer Institute Thesaurus as an OWL-based ontology, play a critical role in acquiring, representing and processing information about human health. With increasing adoption and relevance, biomedical ontologies have also significantly increased in size. For example, the 11th revision of the International Classification of Diseases, which is currently under active development by the World Health Organization contains nearly 50,000 classes representing a vast variety of different diseases and causes of death. This evolution in terms of size was accompanied by an evolution in the way ontologies are engineered. Because no single individual has the expertise to develop such large-scale ontologies, ontology-engineering projects have evolved from small-scale efforts involving just a few domain experts to large-scale projects that require effective collaboration between dozens or even hundreds of experts, practitioners and other stakeholders. Understanding the way these different stakeholders collaborate will enable us to improve editing environments that support such collaborations. In this paper, we uncover how large ontology-engineering projects, such as the International Classification of Diseases in its 11th revision, unfold by analyzing usage logs of five different biomedical ontology-engineering projects of varying sizes and scopes using Markov chains. We discover intriguing interaction patterns (e.g., which properties users frequently change after specific given ones) that suggest that large collaborative ontology-engineering projects are governed by a few general principles that determine and drive development. From our analysis, we identify commonalities and differences between different projects that have implications for project managers, ontology editors, developers and contributors working on collaborative ontology-engineering projects and tools in the biomedical domain.

    View details for DOI 10.1016/j.jbi.2014.06.004

    View details for Web of Science ID 000343362800025

    View details for PubMedCentralID PMC4194274

  • WebProtégé: a collaborative Web-based platform for editing biomedical ontologies. Bioinformatics Horridge, M., Tudorache, T., Nuylas, C., Vendetti, J., Noy, N. F., Musen, M. A. 2014; 30 (16): 2384-2385

    Abstract

    WebProtégé is an open-source Web application for editing OWL 2 ontologies. It contains several features to aid collaboration, including support for the discussion of issues, change notification and revision-based change tracking. WebProtégé also features a simple user interface, which is geared towards editing the kinds of class descriptions and annotations that are prevalent throughout biomedical ontologies. Moreover, it is possible to configure the user interface using views that are optimized for editing Open Biomedical Ontology (OBO) class descriptions and metadata. Some of these views are shown in the Supplementary Material and can be seen in WebProtégé itself by configuring the project as an OBO project.WebProtégé is freely available for use on the Web at http://webprotege.stanford.edu. It is implemented in Java and JavaScript using the OWL API and the Google Web Toolkit. All major browsers are supported. For users who do not wish to host their ontologies on the Stanford servers, WebProtégé is available as a Web app that can be run locally using a Servlet container such as Tomcat. Binaries, source code and documentation are available under an open-source license at http://protegewiki.stanford.edu/wiki/WebProtege.matthew.horridge@stanford.eduSupplementary data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btu256

    View details for PubMedID 24771560

  • Cross-domain targeted ontology subsets for annotation: The case of SNOMED CORE and RxNorm. Journal of biomedical informatics López-García, P., LePendu, P., Musen, M., Illarramendi, A. 2014; 47: 105-111

    Abstract

    The benefits of using ontology subsets versus full ontologies are well-documented for many applications. In this study, we propose an efficient subset extraction approach for a domain using a biomedical ontology repository with mappings, a cross-ontology, and a source subset from a related domain. As a case study, we extracted a subset of drugs from RxNorm using the UMLS Metathesaurus, the NDF-RT cross-ontology, and the CORE problem list subset of SNOMED CT. The extracted subset, which we termed RxNorm/CORE, was 4% the size of the full RxNorm (0.4% when considering ingredients only). For evaluation, we used CORE and RxNorm/CORE as thesauri for the annotation of clinical documents and compared their performance to that of their respective full ontologies (i.e., SNOMED CT and RxNorm). The wide range in recall of both CORE (29-69%) and RxNorm/CORE (21-35%) suggests that more quantitative research is needed to assess the benefits of using ontology subsets as thesauri in annotation applications. Our approach to subset extraction, however, opens a door to help create other types of clinically useful domain specific subsets and acts as an alternative in scenarios where well-established subset extraction techniques might suffer from difficulties or cannot be applied.

    View details for DOI 10.1016/j.jbi.2013.09.011

    View details for PubMedID 24095962

    View details for PubMedCentralID PMC3951555

  • Investigating Collaboration Dynamics in Different Ontology Development Environments 7th International Conference on Knowledge Science, Engineering and Management (KSEM) Rospocher, M., Tudorache, T., Musen, M. A. SPRINGER-VERLAG BERLIN. 2014: 302–313
  • Knowledge Representation METHODS IN BIOMEDICAL INFORMATICS: A PRAGMATIC APPROACH Musen, M. A., Sarkar, I. N. 2014: 49–79
  • Organizational factors affecting implementation of the ATHENA-Hypertension clinical decision support system during the VA’s nation-wide information technology restructuring: a case study Health System Shluzas, L. M., Cronkite, R. C., Hoffman, B. B., Breeling, J., Musen, M. A., Owens, D. K., Goldstein, M. K. 2014

    View details for DOI 10.1057/hs.2014.5

  • An empirically derived taxonomy of errors in SNOMED CT. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Mortensen, J. M., Musen, M. A., Noy, N. F. 2014; 2014: 899-906

    Abstract

    Ontologies underpin methods throughout biomedicine and biomedical informatics. However, as ontologies increase in size and complexity, so does the likelihood that they contain errors. Effective methods that identify errors are typically manual and expert-driven; however, automated methods are essential for the size of modern biomedical ontologies. The effect of ontology errors on their application is unclear, creating a challenge in differentiating salient, relevant errors with those that have no discernable effect. As a first step in understanding the challenge of identifying salient, common errors at a large scale, we asked 5 experts to verify a random subset of complex relations in the SNOMED CT CORE Problem List Subset. The experts found 39 errors that followed several common patterns. Initially, the experts disagreed about errors almost entirely, indicating that ontology verification is very difficult and requires many eyes on the task. It is clear that additional empirically-based, application-focused ontology verification method development is necessary. Toward that end, we developed a taxonomy that can serve as a checklist to consult during ontology quality assurance.

    View details for PubMedID 25954397

  • A Study on the Atomic Decomposition of Ontologies 13th International Semantic Web Conference (ISWC) Horridge, M., Mortensen, J. M., Parsia, B., Sattler, U., Musen, M. A. SPRINGER INT PUBLISHING AG. 2014: 65–80
  • How ontologies are made: Studying the hidden social dynamics behind collaborative ontology engineering projects JOURNAL OF WEB SEMANTICS Strohmaier, M., Walk, S., Poeschko, J., Lamprecht, D., Tudorache, T., Nyulas, C., Musen, M. A., Noy, N. F. 2013; 20: 18-34

    Abstract

    Traditionally, evaluation methods in the field of semantic technologies have focused on the end result of ontology engineering efforts, mainly, on evaluating ontologies and their corresponding qualities and characteristics. This focus has led to the development of a whole arsenal of ontology-evaluation techniques that investigate the quality of ontologies as a product. In this paper, we aim to shed light on the process of ontology engineering construction by introducing and applying a set of measures to analyze hidden social dynamics. We argue that especially for ontologies which are constructed collaboratively, understanding the social processes that have led to its construction is critical not only in understanding but consequently also in evaluating the ontology. With the work presented in this paper, we aim to expose the texture of collaborative ontology engineering processes that is otherwise left invisible. Using historical change-log data, we unveil qualitative differences and commonalities between different collaborative ontology engineering projects. Explaining and understanding these differences will help us to better comprehend the role and importance of social factors in collaborative ontology engineering projects. We hope that our analysis will spur a new line of evaluation techniques that view ontologies not as the static result of deliberations among domain experts, but as a dynamic, collaborative and iterative process that needs to be understood, evaluated and managed in itself. We believe that advances in this direction would help our community to expand the existing arsenal of ontology evaluation techniques towards more holistic approaches.

    View details for DOI 10.1016/j.websem.2013.04.001

    View details for Web of Science ID 000324302000002

    View details for PubMedCentralID PMC3845806

  • How Ontologies are Made: Studying the Hidden Social Dynamics Behind Collaborative Ontology Engineering Projects. Web semantics (Online) Strohmaier, M., Walk, S., Pöschko, J., Lamprecht, D., Tudorache, T., Nyulas, C., Musen, M. A., Noy, N. F. 2013; 20

    Abstract

    Traditionally, evaluation methods in the field of semantic technologies have focused on the end result of ontology engineering efforts, mainly, on evaluating ontologies and their corresponding qualities and characteristics. This focus has led to the development of a whole arsenal of ontology-evaluation techniques that investigate the quality of ontologies as a product. In this paper, we aim to shed light on the process of ontology engineering construction by introducing and applying a set of measures to analyze hidden social dynamics. We argue that especially for ontologies which are constructed collaboratively, understanding the social processes that have led to its construction is critical not only in understanding but consequently also in evaluating the ontology. With the work presented in this paper, we aim to expose the texture of collaborative ontology engineering processes that is otherwise left invisible. Using historical change-log data, we unveil qualitative differences and commonalities between different collaborative ontology engineering projects. Explaining and understanding these differences will help us to better comprehend the role and importance of social factors in collaborative ontology engineering projects. We hope that our analysis will spur a new line of evaluation techniques that view ontologies not as the static result of deliberations among domain experts, but as a dynamic, collaborative and iterative process that needs to be understood, evaluated and managed in itself. We believe that advances in this direction would help our community to expand the existing arsenal of ontology evaluation techniques towards more holistic approaches.

    View details for DOI 10.1016/j.websem.2013.04.001

    View details for PubMedID 24311994

    View details for PubMedCentralID PMC3845806

  • The knowledge acquisition workshops: A remarkable convergence of ideas INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES Musen, M. A. 2013; 71 (2): 195-199
  • PragmatiX: An Interactive Tool for Visualizing the Creation Process Behind Collaboratively Engineered Ontologies. International journal on Semantic Web and information systems Walk, S., Pöschko, J., Strohmaier, M., Andrews, K., Tudorache, T., Noy, N. F., Nyulas, C., Musen, M. A. 2013; 9 (1): 45-78

    Abstract

    With the emergence of tools for collaborative ontology engineering, more and more data about the creation process behind collaborative construction of ontologies is becoming available. Today, collaborative ontology engineering tools such as Collaborative Protégé offer rich and structured logs of changes, thereby opening up new challenges and opportunities to study and analyze the creation of collaboratively constructed ontologies. While there exists a plethora of visualization tools for ontologies, they have primarily been built to visualize aspects of the final product (the ontology) and not the collaborative processes behind construction (e.g. the changes made by contributors over time). To the best of our knowledge, there exists no ontology visualization tool today that focuses primarily on visualizing the history behind collaboratively constructed ontologies. Since the ontology engineering processes can influence the quality of the final ontology, we believe that visualizing process data represents an important stepping-stone towards better understanding of managing the collaborative construction of ontologies in the future. In this application paper, we present a tool - PragmatiX - which taps into structured change logs provided by tools such as Collaborative Protégé to visualize various pragmatic aspects of collaborative ontology engineering. The tool is aimed at managers and leaders of collaborative ontology engineering projects to help them in monitoring progress, in exploring issues and problems, and in tracking quality-related issues such as overrides and coordination among contributors. The paper makes the following contributions: (i) we present PragmatiX, a tool for visualizing the creation process behind collaboratively constructed ontologies (ii) we illustrate the functionality and generality of the tool by applying it to structured logs of changes of two large collaborative ontology-engineering projects and (iii) we conduct a heuristic evaluation of the tool with domain experts to uncover early design challenges and opportunities for improvement. Finally, we hope that this work sparks a new line of research on visualization tools for collaborative ontology engineering projects.

    View details for DOI 10.4018/jswis.2013010103

    View details for PubMedID 24465189

    View details for PubMedCentralID PMC3901413

  • WebProtege: A collaborative ontology editor and knowledge acquisition tool for the Web SEMANTIC WEB Tudorache, T., Nyulas, C., Noy, N. F., Musen, M. A. 2013; 4 (1): 89-99

    Abstract

    In this paper, we present WebProtégé-a lightweight ontology editor and knowledge acquisition tool for the Web. With the wide adoption of Web 2.0 platforms and the gradual adoption of ontologies and Semantic Web technologies in the real world, we need ontology-development tools that are better suited for the novel ways of interacting, constructing and consuming knowledge. Users today take Web-based content creation and online collaboration for granted. WebProtégé integrates these features as part of the ontology development process itself. We tried to lower the entry barrier to ontology development by providing a tool that is accessible from any Web browser, has extensive support for collaboration, and a highly customizable and pluggable user interface that can be adapted to any level of user expertise. The declarative user interface enabled us to create custom knowledge-acquisition forms tailored for domain experts. We built WebProtégé using the existing Protégé infrastructure, which supports collaboration on the back end side, and the Google Web Toolkit for the front end. The generic and extensible infrastructure allowed us to easily deploy WebProtégé in production settings for several projects. We present the main features of WebProtégé and its architecture and describe briefly some of its uses for real-world projects. WebProtégé is free and open source. An online demo is available at http://webprotege.stanford.edu.

    View details for DOI 10.3233/SW-2012-0057

    View details for Web of Science ID 000209437000007

    View details for PubMedCentralID PMC3691821

  • BioPortal as a Dataset of Linked Biomedical Ontologies and Terminologies in RDF. Semantic web Salvadores, M., Alexander, P. R., Musen, M. A., Noy, N. F. 2013; 4 (3): 277-284

    Abstract

    BioPortal is a repository of biomedical ontologies-the largest such repository, with more than 300 ontologies to date. This set includes ontologies that were developed in OWL, OBO and other formats, as well as a large number of medical terminologies that the US National Library of Medicine distributes in its own proprietary format. We have published the RDF version of all these ontologies at http://sparql.bioontology.org. This dataset contains 190M triples, representing both metadata and content for the 300 ontologies. We use the metadata that the ontology authors provide and simple RDFS reasoning in order to provide dataset users with uniform access to key properties of the ontologies, such as lexical properties for the class names and provenance data. The dataset also contains 9.8M cross-ontology mappings of different types, generated both manually and automatically, which come with their own metadata.

    View details for PubMedID 25214827

    View details for PubMedCentralID PMC4159173

  • WebProtégé: A Collaborative Ontology Editor and Knowledge Acquisition Tool for the Web. Semantic web Tudorache, T., Nyulas, C., Noy, N. F., Musen, M. A. 2013; 4 (1): 89-99

    Abstract

    In this paper, we present WebProtégé-a lightweight ontology editor and knowledge acquisition tool for the Web. With the wide adoption of Web 2.0 platforms and the gradual adoption of ontologies and Semantic Web technologies in the real world, we need ontology-development tools that are better suited for the novel ways of interacting, constructing and consuming knowledge. Users today take Web-based content creation and online collaboration for granted. WebProtégé integrates these features as part of the ontology development process itself. We tried to lower the entry barrier to ontology development by providing a tool that is accessible from any Web browser, has extensive support for collaboration, and a highly customizable and pluggable user interface that can be adapted to any level of user expertise. The declarative user interface enabled us to create custom knowledge-acquisition forms tailored for domain experts. We built WebProtégé using the existing Protégé infrastructure, which supports collaboration on the back end side, and the Google Web Toolkit for the front end. The generic and extensible infrastructure allowed us to easily deploy WebProtégé in production settings for several projects. We present the main features of WebProtégé and its architecture and describe briefly some of its uses for real-world projects. WebProtégé is free and open source. An online demo is available at http://webprotege.stanford.edu.

    View details for DOI 10.3233/SW-2012-0057

    View details for PubMedID 23807872

    View details for PubMedCentralID PMC3691821

  • BioPortal as a dataset of linked biomedical ontologies and terminologies in RDF SEMANTIC WEB Salvadores, M., Alexander, P. R., Musen, M. A., Noy, N. F. 2013; 4 (3): 277-284

    Abstract

    BioPortal is a repository of biomedical ontologies-the largest such repository, with more than 300 ontologies to date. This set includes ontologies that were developed in OWL, OBO and other formats, as well as a large number of medical terminologies that the US National Library of Medicine distributes in its own proprietary format. We have published the RDF version of all these ontologies at http://sparql.bioontology.org. This dataset contains 190M triples, representing both metadata and content for the 300 ontologies. We use the metadata that the ontology authors provide and simple RDFS reasoning in order to provide dataset users with uniform access to key properties of the ontologies, such as lexical properties for the class names and provenance data. The dataset also contains 9.8M cross-ontology mappings of different types, generated both manually and automatically, which come with their own metadata.

    View details for DOI 10.3233/SW-2012-0086

    View details for Web of Science ID 000209437200008

    View details for PubMedCentralID PMC4159173

  • Analysis of User Editing Patterns in Ontology Development Projects On The Move (OTM) Federated International Conference Wang, H., Tudorache, T., Dou, D., Noy, N. F., Musen, M. A. SPRINGER-VERLAG BERLIN. 2013: 470–487
  • Getting Lucky in Ontology Search: A Data-Driven Evaluation Framework for Ontology Ranking 12th International Semantic Web Conference (ISWC) Noy, N. F., Alexander, P. R., Harpaz, R., Whetzel, P. L., Fergerson, R. W., Musen, M. A. SPRINGER-VERLAG BERLIN. 2013: 444–459
  • Using Semantic Web in ICD-11: Three Years Down the Road 12th International Semantic Web Conference (ISWC) Tudorache, T., Nyulas, C. I., Noy, N. F., Musen, M. A. SPRINGER-VERLAG BERLIN. 2013: 195–211
  • Simplified OWL Ontology Editing for the Web: Is WebProtege Enough? 12th International Semantic Web Conference (ISWC) Horridge, M., Tudorache, T., Vendetti, J., Nyulas, C. I., Musen, M. A., Noy, N. F. SPRINGER-VERLAG BERLIN. 2013: 200–215
  • Crowdsourcing the verification of relationships in biomedical ontologies. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Mortensen, J. M., Musen, M. A., Noy, N. F. 2013; 2013: 1020-1029

    Abstract

    Biomedical ontologies are often large and complex, making ontology development and maintenance a challenge. To address this challenge, scientists use automated techniques to alleviate the difficulty of ontology development. However, for many ontology-engineering tasks, human judgment is still necessary. Microtask crowdsourcing, wherein human workers receive remuneration to complete simple, short tasks, is one method to obtain contributions by humans at a large scale. Previously, we developed and refined an effective method to verify ontology hierarchy using microtask crowdsourcing. In this work, we report on applying this method to find errors in the SNOMED CT CORE subset. By using crowdsourcing via Amazon Mechanical Turk with a Bayesian inference model, we correctly verified 86% of the relations from the CORE subset of SNOMED CT in which Rector and colleagues previously identified errors via manual inspection. Our results demonstrate that an ontology developer could deploy this method in order to audit large-scale ontologies quickly and relatively cheaply.

    View details for PubMedID 24551391

  • PragmatiX: An Interactive Tool for Visualizing the Creation Process Behind Collaboratively Engineered Ontologies INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS Walk, S., Poeschko, J., Strohmaier, M., Andrews, K., Tudorache, T., Noy, N. F., Nyulas, C., Musen, M. A. 2013; 9 (1): 45-78

    Abstract

    With the emergence of tools for collaborative ontology engineering, more and more data about the creation process behind collaborative construction of ontologies is becoming available. Today, collaborative ontology engineering tools such as Collaborative Protégé offer rich and structured logs of changes, thereby opening up new challenges and opportunities to study and analyze the creation of collaboratively constructed ontologies. While there exists a plethora of visualization tools for ontologies, they have primarily been built to visualize aspects of the final product (the ontology) and not the collaborative processes behind construction (e.g. the changes made by contributors over time). To the best of our knowledge, there exists no ontology visualization tool today that focuses primarily on visualizing the history behind collaboratively constructed ontologies. Since the ontology engineering processes can influence the quality of the final ontology, we believe that visualizing process data represents an important stepping-stone towards better understanding of managing the collaborative construction of ontologies in the future. In this application paper, we present a tool - PragmatiX - which taps into structured change logs provided by tools such as Collaborative Protégé to visualize various pragmatic aspects of collaborative ontology engineering. The tool is aimed at managers and leaders of collaborative ontology engineering projects to help them in monitoring progress, in exploring issues and problems, and in tracking quality-related issues such as overrides and coordination among contributors. The paper makes the following contributions: (i) we present PragmatiX, a tool for visualizing the creation process behind collaboratively constructed ontologies (ii) we illustrate the functionality and generality of the tool by applying it to structured logs of changes of two large collaborative ontology-engineering projects and (iii) we conduct a heuristic evaluation of the tool with domain experts to uncover early design challenges and opportunities for improvement. Finally, we hope that this work sparks a new line of research on visualization tools for collaborative ontology engineering projects.

    View details for DOI 10.4018/jswis.2013010103

    View details for Web of Science ID 000323380800003

    View details for PubMedCentralID PMC3901413

  • Chapter 9: Analyses Using Disease Ontologies PLOS COMPUTATIONAL BIOLOGY Shah, N. H., Cole, T., Musen, M. A. 2012; 8 (12)

    Abstract

    Advanced statistical methods used to analyze high-throughput data such as gene-expression assays result in long lists of "significant genes." One way to gain insight into the significance of altered expression levels is to determine whether Gene Ontology (GO) terms associated with a particular biological process, molecular function, or cellular component are over- or under-represented in the set of genes deemed significant. This process, referred to as enrichment analysis, profiles a gene-set, and is widely used to makes sense of the results of high-throughput experiments. The canonical example of enrichment analysis is when the output dataset is a list of genes differentially expressed in some condition. To determine the biological relevance of a lengthy gene list, the usual solution is to perform enrichment analysis with the GO. We can aggregate the annotating GO concepts for each gene in this list, and arrive at a profile of the biological processes or mechanisms affected by the condition under study. While GO has been the principal target for enrichment analysis, the methods of enrichment analysis are generalizable. We can conduct the same sort of profiling along other ontologies of interest. Just as scientists can ask "Which biological process is over-represented in my set of interesting genes or proteins?" we can also ask "Which disease (or class of diseases) is over-represented in my set of interesting genes or proteins?". For example, by annotating known protein mutations with disease terms from the ontologies in BioPortal, Mort et al. recently identified a class of diseases--blood coagulation disorders--that were associated with a 14-fold depletion in substitutions at O-linked glycosylation sites. With the availability of tools for automatic annotation of datasets with terms from disease ontologies, there is no reason to restrict enrichment analyses to the GO. In this chapter, we will discuss methods to perform enrichment analysis using any ontology available in the biomedical domain. We will review the general methodology of enrichment analysis, the associated challenges, and discuss the novel translational analyses enabled by the existence of public, national computational infrastructure and by the use of disease ontologies in such analyses.

    View details for DOI 10.1371/journal.pcbi.1002827

    View details for Web of Science ID 000312901500032

    View details for PubMedID 23300417

    View details for PubMedCentralID PMC3531278

  • AMIA Board white paper: definition of biomedical informatics and specification of core competencies for graduate education in the discipline JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Kulikowski, C. A., Shortliffe, E. H., Currie, L. M., Elkin, P. L., Hunter, L. E., Johnson, T. R., Kalet, I. J., Lenert, L. A., Musen, M. A., Ozbolt, J. G., Smith, J. W., Tarczy-Hornoch, P. Z., Williamson, J. J. 2012; 19 (6): 931-938

    Abstract

    The AMIA biomedical informatics (BMI) core competencies have been designed to support and guide graduate education in BMI, the core scientific discipline underlying the breadth of the field's research, practice, and education. The core definition of BMI adopted by AMIA specifies that BMI is 'the interdisciplinary field that studies and pursues the effective uses of biomedical data, information, and knowledge for scientific inquiry, problem solving and decision making, motivated by efforts to improve human health.' Application areas range from bioinformatics to clinical and public health informatics and span the spectrum from the molecular to population levels of health and biomedicine. The shared core informatics competencies of BMI draw on the practical experience of many specific informatics sub-disciplines. The AMIA BMI analysis highlights the central shared set of competencies that should guide curriculum design and that graduate students should be expected to master.

    View details for DOI 10.1136/amiajnl-2012-001053

    View details for Web of Science ID 000310408500002

    View details for PubMedID 22683918

    View details for PubMedCentralID PMC3534470

  • Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Wu, S. T., Liu, H., Li, D., Tao, C., Musen, M. A., Chute, C. G., Shah, N. H. 2012; 19 (E1): E149-E156

    Abstract

    To characterise empirical instances of Unified Medical Language System (UMLS) Metathesaurus term strings in a large clinical corpus, and to illustrate what types of term characteristics are generalisable across data sources.Based on the occurrences of UMLS terms in a 51 million document corpus of Mayo Clinic clinical notes, this study computes statistics about the terms' string attributes, source terminologies, semantic types and syntactic categories. Term occurrences in 2010 i2b2/VA text were also mapped; eight example filters were designed from the Mayo-based statistics and applied to i2b2/VA data.For the corpus analysis, negligible numbers of mapped terms in the Mayo corpus had over six words or 55 characters. Of source terminologies in the UMLS, the Consumer Health Vocabulary and Systematized Nomenclature of Medicine-Clinical Terms (SNOMED-CT) had the best coverage in Mayo clinical notes at 106426 and 94788 unique terms, respectively. Of 15 semantic groups in the UMLS, seven groups accounted for 92.08% of term occurrences in Mayo data. Syntactically, over 90% of matched terms were in noun phrases. For the cross-institutional analysis, using five example filters on i2b2/VA data reduces the actual lexicon to 19.13% of the size of the UMLS and only sees a 2% reduction in matched terms.The corpus statistics presented here are instructive for building lexicons from the UMLS. Features intrinsic to Metathesaurus terms (well formedness, length and language) generalise easily across clinical institutions, but term frequencies should be adapted with caution. The semantic groups of mapped terms may differ slightly from institution to institution, but they differ greatly when moving to the biomedical literature domain.

    View details for DOI 10.1136/amiajnl-2011-000744

    View details for Web of Science ID 000314151400025

    View details for PubMedID 22493050

    View details for PubMedCentralID PMC3392861

  • Applications of ontology design patterns in biomedical ontologies. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Mortensen, J. M., Horridge, M., Musen, M. A., Noy, N. F. 2012; 2012: 643-652

    Abstract

    Ontology design patterns (ODPs) are a proposed solution to facilitate ontology development, and to help users avoid some of the most frequent modeling mistakes. ODPs originate from similar approaches in software engineering, where software design patterns have become a critical aspect of software development. There is little empirical evidence for ODP prevalence or effectiveness thus far. In this work, we determine the use and applicability of ODPs in a case study of biomedical ontologies. We encoded ontology design patterns from two ODP catalogs. We then searched for these patterns in a set of eight ontologies. We found five patterns of the 69 patterns. Two of the eight ontologies contained these patterns. While ontology design patterns provide a vehicle for capturing formally reoccurring models and best practices in ontology design, we show that today their use in a case study of widely used biomedical ontologies is limited.

    View details for PubMedID 23304337

  • Deriving an abstraction network to support quality assurance in OCRe. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Ochs, C., Agrawal, A., Perl, Y., Halper, M., Tu, S. W., Carini, S., Sim, I., Noy, N., Musen, M., Geller, J. 2012; 2012: 681-689

    Abstract

    An abstraction network is an auxiliary network of nodes and links that provides a compact, high-level view of an ontology. Such a view lends support to ontology orientation, comprehension, and quality-assurance efforts. A methodology is presented for deriving a kind of abstraction network, called a partial-area taxonomy, for the Ontology of Clinical Research (OCRe). OCRe was selected as a representative of ontologies implemented using the Web Ontology Language (OWL) based on shared domains. The derivation of the partial-area taxonomy for the Entity hierarchy of OCRe is described. Utilizing the visualization of the content and structure of the hierarchy provided by the taxonomy, the Entity hierarchy is audited, and several errors and inconsistencies in OCRe's modeling of its domain are exposed. After appropriate corrections are made to OCRe, a new partial-area taxonomy is derived. The generalizability of the paradigm of the derivation methodology to various families of biomedical ontologies is discussed.

    View details for PubMedID 23304341

  • Enabling enrichment analysis with the Human Disease Ontology. Journal of biomedical informatics LePendu, P., Musen, M. A., Shah, N. H. 2011; 44: S31-8

    Abstract

    Advanced statistical methods used to analyze high-throughput data such as gene-expression assays result in long lists of "significant genes." One way to gain insight into the significance of altered expression levels is to determine whether Gene Ontology (GO) terms associated with a particular biological process, molecular function, or cellular component are over- or under-represented in the set of genes deemed significant. This process, referred to as enrichment analysis, profiles a gene set, and is widely used to make sense of the results of high-throughput experiments. Our goal is to develop and apply general enrichment analysis methods to profile other sets of interest, such as patient cohorts from the electronic medical record, using a variety of ontologies including SNOMED CT, MedDRA, RxNorm, and others. Although it is possible to perform enrichment analysis using ontologies other than the GO, a key pre-requisite is the availability of a background set of annotations to enable the enrichment calculation. In the case of the GO, this background set is provided by the Gene Ontology Annotations. In the current work, we describe: (i) a general method that uses hand-curated GO annotations as a starting point for creating background datasets for enrichment analysis using other ontologies; and (ii) a gene-disease background annotation set - that enables disease-based enrichment - to demonstrate feasibility of our method.

    View details for DOI 10.1016/j.jbi.2011.04.007

    View details for PubMedID 21550421

    View details for PubMedCentralID PMC3392036

  • Empowering industrial research with shared biomedical vocabularies DRUG DISCOVERY TODAY Harland, L., Larminie, C., Sansone, S., Popa, S., Marshall, M. S., Braxenthaler, M., Cantor, M., Filsell, W., Forster, M. J., Huang, E., Matern, A., Musen, M., Saric, J., Slater, T., Wilson, J., Lynch, N., Wise, J., Dix, I. 2011; 16 (21-22): 940-947

    Abstract

    The life science industries (including pharmaceuticals, agrochemicals and consumer goods) are exploring new business models for research and development that focus on external partnerships. In parallel, there is a desire to make better use of data obtained from sources such as human clinical samples to inform and support early research programmes. Success in both areas depends upon the successful integration of heterogeneous data from multiple providers and scientific domains, something that is already a major challenge within the industry. This issue is exacerbated by the absence of agreed standards that unambiguously identify the entities, processes and observations within experimental results. In this article we highlight the risks to future productivity that are associated with incomplete biological and chemical vocabularies and suggest a new model to address this long-standing issue.

    View details for DOI 10.1016/j.drudis.2011.09.013

    View details for Web of Science ID 000297400300005

    View details for PubMedID 21963522

  • NCBO Resource Index: Ontology-based search and mining of biomedical resources JOURNAL OF WEB SEMANTICS Jonquet, C., LePendu, P., Falconer, S., Coulet, A., Noy, N. F., Musen, M. A., Shah, N. H. 2011; 9 (3): 316-324

    Abstract

    The volume of publicly available data in biomedicine is constantly increasing. However, these data are stored in different formats and on different platforms. Integrating these data will enable us to facilitate the pace of medical discoveries by providing scientists with a unified view of this diverse information. Under the auspices of the National Center for Biomedical Ontology (NCBO), we have developed the Resource Index-a growing, large-scale ontology-based index of more than twenty heterogeneous biomedical resources. The resources come from a variety of repositories maintained by organizations from around the world. We use a set of over 200 publicly available ontologies contributed by researchers in various domains to annotate the elements in these resources. We use the semantics that the ontologies encode, such as different properties of classes, the class hierarchies, and the mappings between ontologies, in order to improve the search experience for the Resource Index user. Our user interface enables scientists to search the multiple resources quickly and efficiently using domain terms, without even being aware that there is semantics "under the hood."

    View details for DOI 10.1016/j.websem.2011.06.005

    View details for Web of Science ID 000300169800007

    View details for PubMedCentralID PMC3170774

  • NCBO Resource Index: Ontology-Based Search and Mining of Biomedical Resources. Web semantics (Online) Jonquet, C., Lependu, P., Falconer, S., Coulet, A., Noy, N. F., Musen, M. A., Shah, N. H. 2011; 9 (3): 316-324

    Abstract

    The volume of publicly available data in biomedicine is constantly increasing. However, these data are stored in different formats and on different platforms. Integrating these data will enable us to facilitate the pace of medical discoveries by providing scientists with a unified view of this diverse information. Under the auspices of the National Center for Biomedical Ontology (NCBO), we have developed the Resource Index-a growing, large-scale ontology-based index of more than twenty heterogeneous biomedical resources. The resources come from a variety of repositories maintained by organizations from around the world. We use a set of over 200 publicly available ontologies contributed by researchers in various domains to annotate the elements in these resources. We use the semantics that the ontologies encode, such as different properties of classes, the class hierarchies, and the mappings between ontologies, in order to improve the search experience for the Resource Index user. Our user interface enables scientists to search the multiple resources quickly and efficiently using domain terms, without even being aware that there is semantics "under the hood."

    View details for DOI 10.1016/j.websem.2011.06.005

    View details for PubMedID 21918645

    View details for PubMedCentralID PMC3170774

  • BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications NUCLEIC ACIDS RESEARCH Whetzel, P. L., Noy, N. F., Shah, N. H., Alexander, P. R., Nyulas, C., Tudorache, T., Musen, M. A. 2011; 39: W541-W545

    Abstract

    The National Center for Biomedical Ontology (NCBO) is one of the National Centers for Biomedical Computing funded under the NIH Roadmap Initiative. Contributing to the national computing infrastructure, NCBO has developed BioPortal, a web portal that provides access to a library of biomedical ontologies and terminologies (http://bioportal.bioontology.org) via the NCBO Web services. BioPortal enables community participation in the evaluation and evolution of ontology content by providing features to add mappings between terms, to add comments linked to specific ontology terms and to provide ontology reviews. The NCBO Web services (http://www.bioontology.org/wiki/index.php/NCBO_REST_services) enable this functionality and provide a uniform mechanism to access ontologies from a variety of knowledge representation formats, such as Web Ontology Language (OWL) and Open Biological and Biomedical Ontologies (OBO) format. The Web services provide multi-layered access to the ontology content, from getting all terms in an ontology to retrieving metadata about a term. Users can easily incorporate the NCBO Web services into software applications to generate semantically aware applications and to facilitate structured data collection.

    View details for DOI 10.1093/nar/gkr469

    View details for Web of Science ID 000292325300088

    View details for PubMedID 21672956

    View details for PubMedCentralID PMC3125807

  • The Biomedical Resource Ontology (BRO) to enable resource discovery in clinical and translational research JOURNAL OF BIOMEDICAL INFORMATICS Tenenbaum, J. D., Whetzel, P. L., Anderson, K., Borromeo, C. D., Dinov, I. D., Gabriel, D., Kirschner, B., Mirel, B., Morris, T., Noy, N., Nyulas, C., Rubenson, D., Saxman, P. R., Singh, H., Whelan, N., Wright, Z., Athey, B. D., Becich, M. J., Ginsburg, G. S., Musen, M. A., Smith, K. A., Tarantal, A. F., Rubin, D. L., Lyster, P. 2011; 44 (1): 137-145

    Abstract

    The biomedical research community relies on a diverse set of resources, both within their own institutions and at other research centers. In addition, an increasing number of shared electronic resources have been developed. Without effective means to locate and query these resources, it is challenging, if not impossible, for investigators to be aware of the myriad resources available, or to effectively perform resource discovery when the need arises. In this paper, we describe the development and use of the Biomedical Resource Ontology (BRO) to enable semantic annotation and discovery of biomedical resources. We also describe the Resource Discovery System (RDS) which is a federated, inter-institutional pilot project that uses the BRO to facilitate resource discovery on the Internet. Through the RDS framework and its associated Biositemaps infrastructure, the BRO facilitates semantic search and discovery of biomedical resources, breaking down barriers and streamlining scientific research that will improve human health.

    View details for DOI 10.1016/j.jbi.2010.10.003

    View details for Web of Science ID 000288289900015

    View details for PubMedID 20955817

    View details for PubMedCentralID PMC3050430

  • How orthogonal are the OBO Foundry ontologies? Journal of biomedical semantics Ghazvinian, A., Noy, N. F., Musen, M. A. 2011; 2: S2-?

    Abstract

    Ontologies in biomedicine facilitate information integration, data exchange, search and query of biomedical data, and other critical knowledge-intensive tasks. The OBO Foundry is a collaborative effort to establish a set of principles for ontology development with the eventual goal of creating a set of interoperable reference ontologies in the domain of biomedicine. One of the key requirements to achieve this goal is to ensure that ontology developers reuse term definitions that others have already created rather than create their own definitions, thereby making the ontologies orthogonal.We used a simple lexical algorithm to analyze the extent to which the set of OBO Foundry candidate ontologies identified from September 2009 to September 2010 conforms to this vision. Specifically, we analyzed (1) the level of explicit term reuse in this set of ontologies, (2) the level of overlap, where two ontologies define similar terms independently, and (3) how the levels of reuse and overlap changed during the course of this year.We found that 30% of the ontologies reuse terms from other Foundry candidates and 96% of the candidate ontologies contain terms that overlap with terms from the other ontologies. We found that while term reuse increased among the ontologies between September 2009 and September 2010, the level of overlap among the ontologies remained relatively constant. Additionally, we analyzed the six ontologies announced as OBO Foundry members on March 5, 2010, and identified that the level of overlap was extremely low, but, notably, so was the level of term reuse.We have created a prototype web application that allows OBO Foundry ontology developers to see which classes from their ontologies overlap with classes from other ontologies in the OBO Foundry (http://obomap.bioontology.org). From our analysis, we conclude that while the OBO Foundry has made significant progress toward orthogonality during the period of this study through increased adoption of explicit term reuse, a large amount of overlap remains among these ontologies. Furthermore, the characteristics of the identified overlap, such as the terms it comprises and its distribution among the ontologies, indicate that the achieving orthogonality will be exceptionally difficult, if not impossible.

    View details for DOI 10.1186/2041-1480-2-S2-S2

    View details for PubMedID 21624157

    View details for PubMedCentralID PMC3102891

  • ARGOS Policy Brief on Semantic Interoperability TRANSATLANTIC COOPERATION SURROUNDING HEALTH RELATED INFORMATION AND COMMUNICATION TECHNOLOGY Kalra, D., Musen, M., Smith, B., Ceusters, W., De Moor, G., DeMoor, G. J. 2011; 170: 1–15

    Abstract

    Semantic interoperability is one of the priority themes of the ARGOS Trans-Atlantic Observatory. This topic represents a globally recognised challenge that must be addressed if electronic health records are to be shared among heterogeneous systems, and the information in them exploited to the maximum benefit of patients, professionals, health services, research, and industry. Progress in this multi-faceted challenge has been piecemeal, and valuable lessons have been learned, and approaches discovered, in Europe and in the US that can be shared and combined. Experts from both continents have met at three ARGOS workshops during 2010 and 2011 to share understanding of these issues and how they might be tackled collectively from both sides of the Atlantic. This policy brief summarises the problems and the reasons why they are important to tackle, and also why they are so difficult. It outlines the major areas of semantic innovation that exist and that are available to help address this challenge. It proposes a series of next steps that need to be championed on both sides of the Atlantic if further progress is to be made in sharing and analysing electronic health records meaningfully. Semantic interoperability requires the use of standards, not only for EHR data to be transferred and structurally mapped into a receiving repository, but also for the clinical content of the EHR to be interpreted in conformity with the original meanings intended by its authors. Wide-scale engagement with professional bodies, globally, is needed to develop these clinical information standards. Accurate and complete clinical documentation, faithful to the patient's situation, and interoperability between systems, require widespread and dependable access to published and maintained collections of coherent and quality-assured semantic resources, including models such as archetypes and templates that would (1) provide clinical context, (2) be mapped to interoperability standards for EHR data, (3) be linked to well specified multi-lingual terminology value sets, and (4) be derived from high quality ontologies. There is need to gain greater experience in how semantic resources should be defined, validated, and disseminated, how users (who increasingly will include patients) should be educated to improve the quality and consistency of EHR documentation and to make full use of it. There are urgent needs to scale up the authorship, acceptance, and adoption of clinical information standards, to leverage and harmonise the islands of standardisation optimally, to assure the quality of the artefacts produced, and to organise end-to-end governance of the development and adoption of solutions.

    View details for DOI 10.3233/978-1-60750-810-6-1

    View details for Web of Science ID 000328281200003

    View details for PubMedID 21893897

    View details for PubMedCentralID PMC4896070

  • Integration and publication of heterogeneous text-mined relationships on the Semantic Web. Journal of biomedical semantics Coulet, A., Garten, Y., Dumontier, M., Altman, R. B., Musen, M. A., Shah, N. H. 2011; 2: S10-?

    Abstract

    Advances in Natural Language Processing (NLP) techniques enable the extraction of fine-grained relationships mentioned in biomedical text. The variability and the complexity of natural language in expressing similar relationships causes the extracted relationships to be highly heterogeneous, which makes the construction of knowledge bases difficult and poses a challenge in using these for data mining or question answering.We report on the semi-automatic construction of the PHARE relationship ontology (the PHArmacogenomic RElationships Ontology) consisting of 200 curated relations from over 40,000 heterogeneous relationships extracted via text-mining. These heterogeneous relations are then mapped to the PHARE ontology using synonyms, entity descriptions and hierarchies of entities and roles. Once mapped, relationships can be normalized and compared using the structure of the ontology to identify relationships that have similar semantics but different syntax. We compare and contrast the manual procedure with a fully automated approach using WordNet to quantify the degree of integration enabled by iterative curation and refinement of the PHARE ontology. The result of such integration is a repository of normalized biomedical relationships, named PHARE-KB, which can be queried using Semantic Web technologies such as SPARQL and can be visualized in the form of a biological network.The PHARE ontology serves as a common semantic framework to integrate more than 40,000 relationships pertinent to pharmacogenomics. The PHARE ontology forms the foundation of a knowledge base named PHARE-KB. Once populated with relationships, PHARE-KB (i) can be visualized in the form of a biological network to guide human tasks such as database curation and (ii) can be queried programmatically to guide bioinformatics applications such as the prediction of molecular interactions. PHARE is available at http://purl.bioontology.org/ontology/PHARE.

    View details for DOI 10.1186/2041-1480-2-S2-S10

    View details for PubMedID 21624156

    View details for PubMedCentralID PMC3102890

  • Using text to build semantic networks for pharmacogenomics JOURNAL OF BIOMEDICAL INFORMATICS Coulet, A., Shah, N. H., Garten, Y., Musen, M., Altman, R. B. 2010; 43 (6): 1009-1019

    Abstract

    Most pharmacogenomics knowledge is contained in the text of published studies, and is thus not available for automated computation. Natural Language Processing (NLP) techniques for extracting relationships in specific domains often rely on hand-built rules and domain-specific ontologies to achieve good performance. In a new and evolving field such as pharmacogenomics (PGx), rules and ontologies may not be available. Recent progress in syntactic NLP parsing in the context of a large corpus of pharmacogenomics text provides new opportunities for automated relationship extraction. We describe an ontology of PGx relationships built starting from a lexicon of key pharmacogenomic entities and a syntactic parse of more than 87 million sentences from 17 million MEDLINE abstracts. We used the syntactic structure of PGx statements to systematically extract commonly occurring relationships and to map them to a common schema. Our extracted relationships have a 70-87.7% precision and involve not only key PGx entities such as genes, drugs, and phenotypes (e.g., VKORC1, warfarin, clotting disorder), but also critical entities that are frequently modified by these key entities (e.g., VKORC1 polymorphism, warfarin response, clotting disorder treatment). The result of our analysis is a network of 40,000 relationships between more than 200 entity types with clear semantics. This network is used to guide the curation of PGx knowledge and provide a computable resource for knowledge discovery.

    View details for DOI 10.1016/j.jbi.2010.08.005

    View details for Web of Science ID 000285036700017

    View details for PubMedID 20723615

    View details for PubMedCentralID PMC2991587

  • Building a biomedical ontology recommender web service. Journal of biomedical semantics Jonquet, C., Musen, M. A., Shah, N. H. 2010; 1: S1-?

    Abstract

    Researchers in biomedical informatics use ontologies and terminologies to annotate their data in order to facilitate data integration and translational discoveries. As the use of ontologies for annotation of biomedical datasets has risen, a common challenge is to identify ontologies that are best suited to annotating specific datasets. The number and variety of biomedical ontologies is large, and it is cumbersome for a researcher to figure out which ontology to use.We present the Biomedical Ontology Recommender web service. The system uses textual metadata or a set of keywords describing a domain of interest and suggests appropriate ontologies for annotating or representing the data. The service makes a decision based on three criteria. The first one is coverage, or the ontologies that provide most terms covering the input text. The second is connectivity, or the ontologies that are most often mapped to by other ontologies. The final criterion is size, or the number of concepts in the ontologies. The service scores the ontologies as a function of scores of the annotations created using the National Center for Biomedical Ontology (NCBO) Annotator web service. We used all the ontologies from the UMLS Metathesaurus and the NCBO BioPortal.We compare and contrast our Recommender by an exhaustive functional comparison to previously published efforts. We evaluate and discuss the results of several recommendation heuristics in the context of three real world use cases. The best recommendations heuristics, rated 'very relevant' by expert evaluators, are the ones based on coverage and connectivity criteria. The Recommender service (alpha version) is available to the community and is embedded into BioPortal.

    View details for DOI 10.1186/2041-1480-1-S1-S1

    View details for PubMedID 20626921

    View details for PubMedCentralID PMC2903720

  • Mapping Master: A Flexible Approach for Mapping Spreadsheets to OWL 9th International Semantic Web Conference O'Connor, M. J., Halaschek-Wiener, C., Musen, M. A. SPRINGER-VERLAG BERLIN. 2010: 194–208
  • Ontology Development for the Masses: Creating ICD-11 in WebProtege 17th International Conference on Knowledge Engineering and Management by the Masses (EKAW) Tudorache, T., Falconer, S., Noy, N. F., Nyulas, C., Uestuen, T. B., Storey, M., Musen, M. A. SPRINGER-VERLAG BERLIN. 2010: 74–89
  • Optimize First, Buy Later: Analyzing Metrics to Ramp-Up Very Large Knowledge Bases 9th International Semantic Web Conference LePendu, P., Noy, N. F., Jonquet, C., Alexander, P. R., Shah, N. H., Musen, M. A. SPRINGER-VERLAG BERLIN. 2010: 486–501
  • A Typology for Modeling Processes in Clinical Guidelines and Protocols 1st International Conference on Security-Enriched Urban Computing and Smart Grid Tu, S. W., Musen, M. A. SPRINGER-VERLAG BERLIN. 2010: 545–553
  • Will Semantic Web Technologies Work for the Development of ICD-11? 9th International Semantic Web Conference Tudorache, T., Falconer, S., Nyulas, C., Noy, N. F., Musen, M. A. SPRINGER-VERLAG BERLIN. 2010: 257–272
  • The Lexicon Builder Web service: Building Custom Lexicons from two hundred Biomedical Ontologies. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Parai, G. K., Jonquet, C., xu, r., Musen, M. A., Shah, N. H. 2010; 2010: 587-591

    Abstract

    Domain specific biomedical lexicons are extensively used by researchers for natural language processing tasks. Currently these lexicons are created manually by expert curators and there is a pressing need for automated methods to compile such lexicons. The Lexicon Builder Web service addresses this need and reduces the investment of time and effort involved in lexicon maintenance. The service has three components: Inclusion - selects one or several ontologies (or its branches) and includes preferred names and synonym terms; Exclusion - filters terms based on the term's Medline frequency, syntactic type, UMLS semantic type and match with stopwords; Output - aggregates information, handles compression and output formats. Evaluation demonstrates that the service has high accuracy and runtime performance. It is currently being evaluated for several use cases to establish its utility in biomedical information processing tasks. The Lexicon Builder promotes collaboration, sharing and standardization of lexicons amongst researchers by automating the creation, maintainence and cross referencing of custom lexicons.

    View details for PubMedID 21347046

  • A Comprehensive Analysis of Five Million UMLS Metathesaurus Terms Using Eighteen Million MEDLINE Citations. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium xu, r., Musen, M. A., Shah, N. H. 2010; 2010: 907-911

    Abstract

    The Unified Medical Language System (UMLS) Metathesaurus is widely used for biomedical natural language processing (NLP) tasks. In this study, we systematically analyzed UMLS Metathesaurus terms by analyzing their occurrences in over 18 million MEDLINE abstracts. Our goals were: 1. analyze the frequency and syntactic distribution of Metathesaurus terms in MEDLINE; 2. create a filtered UMLS Metathesaurus based on the MEDLINE analysis; 3. augment the UMLS Metathesaurus where each term is associated with metadata on its MEDLINE frequency and syntactic distribution statistics. After MEDLINE frequency-based filtering, the augmented UMLS Metathesaurus contains 518,835 terms and is roughly 13% of its original size. We have shown that the syntactic and frequency information is useful to identify errors in the Metathesaurus. This filtered and augmented UMLS Metathesaurus can potentially be used to improve efficiency and precision of UMLS-based information retrieval and NLP tasks.

    View details for PubMedID 21347110

  • The ontology life cycle: Integrated tools for editing, publishing, peer review, and evolution of ontologies. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Noy, N., Tudorache, T., Nyulas, C., Musen, M. 2010; 2010: 552-556

    Abstract

    Ontologies have become a critical component of many applications in biomedical informatics. However, the landscape of the ontology tools today is largely fragmented, with independent tools for ontology editing, publishing, and peer review: users develop an ontology in an ontology editor, such as Protégé; and publish it on a Web server or in an ontology library, such as BioPortal, in order to share it with the community; they use the tools provided by the library or mailing lists and bug trackers to collect feedback from users. In this paper, we present a set of tools that bring the ontology editing and publishing closer together, in an integrated platform for the entire ontology lifecycle. This integration streamlines the workflow for collaborative development and increases integration between the ontologies themselves through the reuse of terms.

    View details for PubMedID 21347039

  • Supporting the Collaborative Authoring of ICD-11 with WebProtégé. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Tudorache, T., Falconer, S., Nyulas, C., Storey, M., Ustün, T. B., Musen, M. A. 2010; 2010: 802-806

    Abstract

    The World Health Organization (WHO) is well under way with the new revision of the International Classification of Diseases (ICD-11). The current revision process is significantly different from past ones: the ICD-11 authoring is now open to a large international community of medical experts, who perform the authoring in a web-based collaborative platform. The classification is also embracing a more formal representation that is suitable for electronic health records. We present the ICD Collaborative Authoring Tool (iCAT), a customization of the WebProtégé editor that supports the community based authoring of ICD-11 on the Web and provides features such as discussion threads integrated in the authoring process, change tracking, content reviewing, and so on. The WHO editors evaluated the initial version of iCAT and found the tool intuitive and easy to learn. They also identified improvement potentials and new requirements for large-scale collaboration support. A demo version of the tool is available at: http://icatdemo.stanford.edu.

    View details for PubMedID 21347089

  • An ontology-neutral framework for enrichment analysis. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Tirrell, R., Evani, U., Berman, A. E., Mooney, S. D., Musen, M. A., Shah, N. H. 2010; 2010: 797-801

    Abstract

    Advanced statistical methods used to analyze high-throughput data (e.g. gene-expression assays) result in long lists of "significant genes." One way to gain insight into the significance of altered expression levels is to determine whether Gene Ontology (GO) terms associated with a particular biological process, molecular function, or cellular component are over- or under-represented in the set of genes deemed significant. This process, referred to as enrichment analysis, profiles a gene-set, and is relevant for and extensible to data analysis with other high-throughput measurement modalities such as proteomics, metabolomics, and tissue-microarray assays. With the availability of tools for automatic ontology-based annotation of datasets with terms from biomedical ontologies besides the GO, we need not restrict enrichment analysis to the GO. We describe, RANSUM - Rich Annotation Summarizer - which performs enrichment analysis using any ontology in the National Center for Biomedical Ontology's (NCBO) BioPortal. We outline the methodology of enrichment analysis, the associated challenges, and discuss novel analyses enabled by RANSUM.

    View details for PubMedID 21347088

  • Software-engineering challenges of building and deploying reusable problem solverse AI EDAM-ARTIFICIAL INTELLIGENCE FOR ENGINEERING DESIGN ANALYSIS AND MANUFACTURING O'Connor, M. J., Nyulas, C., Tu, S., Buckeridge, D. L., Okhmatovskaia, A., Musen, M. A. 2009; 23 (4): 339-356

    Abstract

    Problem solving methods (PSMs) are software components that represent and encode reusable algorithms. They can be combined with representations of domain knowledge to produce intelligent application systems. A goal of research on PSMs is to provide principled methods and tools for composing and reusing algorithms in knowledge-based systems. The ultimate objective is to produce libraries of methods that can be easily adapted for use in these systems. Despite the intuitive appeal of PSMs as conceptual building blocks, in practice, these goals are largely unmet. There are no widely available tools for building applications using PSMs and no public libraries of PSMs available for reuse. This paper analyzes some of the reasons for the lack of widespread adoptions of PSM techniques and illustrate our analysis by describing our experiences developing a complex, high-throughput software system based on PSM principles. We conclude that many fundamental principles in PSM research are useful for building knowledge-based systems. In particular, the task-method decomposition process, which provides a means for structuring knowledge-based tasks, is a powerful abstraction for building systems of analytic methods. However, despite the power of PSMs in the conceptual modeling of knowledge-based systems, software engineering challenges have been seriously underestimated. The complexity of integrating control knowledge modeled by developers using PSMs with the domain knowledge that they model using ontologies creates a barrier to widespread use of PSM-based systems. Nevertheless, the surge of recent interest in ontologies has led to the production of comprehensive domain ontologies and of robust ontology-authoring tools. These developments present new opportunities to leverage the PSM approach.

    View details for DOI 10.1017/S0890060409990047

    View details for Web of Science ID 000271131600003

    View details for PubMedCentralID PMC3615443

  • Software-engineering challenges of building and deploying reusable problem solvers. Artificial intelligence for engineering design, analysis and manufacturing : AI EDAM O'Connor, M. J., Nyulas, C., Tu, S., Buckeridge, D. L., Okhmatovskaia, A., Musen, M. A. 2009; 23 (Spec Iss 4): 339-356

    Abstract

    Problem solving methods (PSMs) are software components that represent and encode reusable algorithms. They can be combined with representations of domain knowledge to produce intelligent application systems. A goal of research on PSMs is to provide principled methods and tools for composing and reusing algorithms in knowledge-based systems. The ultimate objective is to produce libraries of methods that can be easily adapted for use in these systems. Despite the intuitive appeal of PSMs as conceptual building blocks, in practice, these goals are largely unmet. There are no widely available tools for building applications using PSMs and no public libraries of PSMs available for reuse. This paper analyzes some of the reasons for the lack of widespread adoptions of PSM techniques and illustrate our analysis by describing our experiences developing a complex, high-throughput software system based on PSM principles. We conclude that many fundamental principles in PSM research are useful for building knowledge-based systems. In particular, the task-method decomposition process, which provides a means for structuring knowledge-based tasks, is a powerful abstraction for building systems of analytic methods. However, despite the power of PSMs in the conceptual modeling of knowledge-based systems, software engineering challenges have been seriously underestimated. The complexity of integrating control knowledge modeled by developers using PSMs with the domain knowledge that they model using ontologies creates a barrier to widespread use of PSM-based systems. Nevertheless, the surge of recent interest in ontologies has led to the production of comprehensive domain ontologies and of robust ontology-authoring tools. These developments present new opportunities to leverage the PSM approach.

    View details for DOI 10.1017/S0890060409990047

    View details for PubMedID 23565031

    View details for PubMedCentralID PMC3615443

  • Development of Large-Scale Functional Brain Networks in Children PLOS BIOLOGY Supekar, K., Musen, M., Menon, V. 2009; 7 (7)

    Abstract

    The ontogeny of large-scale functional organization of the human brain is not well understood. Here we use network analysis of intrinsic functional connectivity to characterize the organization of brain networks in 23 children (ages 7-9 y) and 22 young-adults (ages 19-22 y). Comparison of network properties, including path-length, clustering-coefficient, hierarchy, and regional connectivity, revealed that although children and young-adults' brains have similar "small-world" organization at the global level, they differ significantly in hierarchical organization and interregional connectivity. We found that subcortical areas were more strongly connected with primary sensory, association, and paralimbic areas in children, whereas young-adults showed stronger cortico-cortical connectivity between paralimbic, limbic, and association areas. Further, combined analysis of functional connectivity with wiring distance measures derived from white-matter fiber tracking revealed that the development of large-scale brain networks is characterized by weakening of short-range functional connectivity and strengthening of long-range functional connectivity. Importantly, our findings show that the dynamic process of over-connectivity followed by pruning, which rewires connectivity at the neuronal level, also operates at the systems level, helping to reconfigure and rebalance subcortical and paralimbic connectivity in the developing brain. Our study demonstrates the usefulness of network analysis of brain connectivity to elucidate key principles underlying functional brain maturation, paving the way for novel studies of disrupted brain connectivity in neurodevelopmental disorders such as autism.

    View details for DOI 10.1371/journal.pbio.1000157

    View details for Web of Science ID 000268405700010

    View details for PubMedID 19621066

    View details for PubMedCentralID PMC2705656

  • BioPortal: ontologies and integrated data resources at the click of a mouse NUCLEIC ACIDS RESEARCH Noy, N. F., Shah, N. H., Whetzel, P. L., Dai, B., Dorf, M., Griffith, N., Jonquet, C., Rubin, D. L., Storey, M., Chute, C. G., Musen, M. A. 2009; 37: W170-W173

    Abstract

    Biomedical ontologies provide essential domain knowledge to drive data integration, information retrieval, data annotation, natural-language processing and decision support. BioPortal (http://bioportal.bioontology.org) is an open repository of biomedical ontologies that provides access via Web services and Web browsers to ontologies developed in OWL, RDF, OBO format and Protégé frames. BioPortal functionality includes the ability to browse, search and visualize ontologies. The Web interface also facilitates community-based participation in the evaluation and evolution of ontology content by providing features to add notes to ontology terms, mappings between terms and ontology reviews based on criteria such as usability, domain coverage, quality of content, and documentation and support. BioPortal also enables integrated search of biomedical data resources such as the Gene Expression Omnibus (GEO), ClinicalTrials.gov, and ArrayExpress, through the annotation and indexing of these resources with ontologies in BioPortal. Thus, BioPortal not only provides investigators, clinicians, and developers 'one-stop shopping' to programmatically access biomedical ontologies, but also provides support to integrate data from a variety of biomedical resources.

    View details for DOI 10.1093/nar/gkp440

    View details for Web of Science ID 000267889100031

    View details for PubMedID 19483092

    View details for PubMedCentralID PMC2703982

  • Computational neuroanatomy: ontology-based representation of neural components and connectivity 1st Summit on Translational Bioinformatics Rubin, D. L., Talos, I., Halle, M., Musen, M. A., Kikinis, R. BIOMED CENTRAL LTD. 2009

    Abstract

    A critical challenge in neuroscience is organizing, managing, and accessing the explosion in neuroscientific knowledge, particularly anatomic knowledge. We believe that explicit knowledge-based approaches to make neuroscientific knowledge computationally accessible will be helpful in tackling this challenge and will enable a variety of applications exploiting this knowledge, such as surgical planning.We developed ontology-based models of neuroanatomy to enable symbolic lookup, logical inference and mathematical modeling of neural systems. We built a prototype model of the motor system that integrates descriptive anatomic and qualitative functional neuroanatomical knowledge. In addition to modeling normal neuroanatomy, our approach provides an explicit representation of abnormal neural connectivity in disease states, such as common movement disorders. The ontology-based representation encodes both structural and functional aspects of neuroanatomy. The ontology-based models can be evaluated computationally, enabling development of automated computer reasoning applications.Neuroanatomical knowledge can be represented in machine-accessible format using ontologies. Computational neuroanatomical approaches such as described in this work could become a key tool in translational informatics, leading to decision support applications that inform and guide surgical planning and personalized care for neurological disease in the future.

    View details for Web of Science ID 000265602500004

    View details for PubMedID 19208191

    View details for PubMedCentralID PMC2646240

  • Ontology-driven indexing of public datasets for translational bioinformatics 1st Summit on Translational Bioinformatics Shah, N. H., Jonquet, C., Chiang, A. P., Butte, A. J., Chen, R., Musen, M. A. BIOMED CENTRAL LTD. 2009

    Abstract

    The volume of publicly available genomic scale data is increasing. Genomic datasets in public repositories are annotated with free-text fields describing the pathological state of the studied sample. These annotations are not mapped to concepts in any ontology, making it difficult to integrate these datasets across repositories. We have previously developed methods to map text-annotations of tissue microarrays to concepts in the NCI thesaurus and SNOMED-CT. In this work we generalize our methods to map text annotations of gene expression datasets to concepts in the UMLS. We demonstrate the utility of our methods by processing annotations of datasets in the Gene Expression Omnibus. We demonstrate that we enable ontology-based querying and integration of tissue and gene expression microarray data. We enable identification of datasets on specific diseases across both repositories. Our approach provides the basis for ontology-driven data integration for translational research on gene and protein expression data. Based on this work we have built a prototype system for ontology based annotation and indexing of biomedical data. The system processes the text metadata of diverse resource elements such as gene expression data sets, descriptions of radiology images, clinical-trial reports, and PubMed article abstracts to annotate and index them with concepts from appropriate ontologies. The key functionality of this system is to enable users to locate biomedical data resources related to particular ontology concepts.

    View details for Web of Science ID 000265602500002

    View details for PubMedID 19208184

    View details for PubMedCentralID PMC2646250

  • Creating mappings for ontologies in biomedicine: simple methods work. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Ghazvinian, A., Noy, N. F., Musen, M. A. 2009; 2009: 198-202

    Abstract

    Creating mappings between concepts in different ontologies is a critical step in facilitating data integration. In recent years, researchers have developed many elaborate algorithms that use graph structure, background knowledge, machine learning and other techniques to generate mappings between ontologies. We compared the performance of these advanced algorithms on creating mappings for biomedical ontologies with the performance of a simple mapping algorithm that relies on lexical matching. Our evaluation has shown that (1) most of the advanced algorithms are either not publicly available or do not scale to the size of biomedical ontologies today, and (2) for many biomedical ontologies, simple lexical matching methods outperform most of the advanced algorithms in both precision and recall. Our results have practical implications for biomedical researchers who need to create alignments for their ontologies.

    View details for PubMedID 20351849

  • Traversing Ontologies to Extract Views MODULAR ONTOLOGIES: CONCEPTS, THEORIES AND TECHNIQUES FOR KNOWLEDGE MODULARIZATION Noy, N. F., Musen, M. A., Stuckenschmidt, H., Parent, C., Spaccapietra, S. 2009; 5445: 245–60
  • Semantic Wiki Search 6th European Sematic Web Conference Haase, P., Herzig, D., Musen, M., Tran, T. SPRINGER-VERLAG BERLIN. 2009: 445–460
  • What Four Million Mappings Can Tell You about Two Hundred Ontologies 8th International Semantic Web Conference Ghazvinian, A., Noy, N. F., Jonquet, C., Shah, N., Musen, M. A. SPRINGER-VERLAG BERLIN. 2009: 229–242
  • A Bayesian network model for analysis of detection performance in surveillance systems. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Izadi, M., Buckeridge, D., Okhmatovskaia, A., Tu, S. W., O'Connor, M. J., Nyulas, C., Musen, M. A. 2009; 2009: 276-280

    Abstract

    Worldwide developments concerning infectious diseases and bioterrorism are driving forces for improving aberrancy detection in public health surveillance. The performance of an aberrancy detection algorithm can be measured in terms of sensitivity, specificity and timeliness. However, these metrics are probabilistically dependent variables and there is always a trade-off between them. This situation raises the question of how to quantify this tradeoff. The answer to this question depends on the characteristics of the specific disease under surveillance, the characteristics of data used for surveillance, and the algorithmic properties of detection methods. In practice, the evidence describing the relative performance of different algorithms remains fragmented and mainly qualitative. In this paper, we consider the development and evaluation of a Bayesian network framework for analysis of performance measures of aberrancy detection algorithms. This framework enables principled comparison of algorithms and identification of suitable algorithms for use in specific public health surveillance settings.

    View details for PubMedID 20351864

  • Comparison of concept recognizers for building the Open Biomedical Annotator 2nd Summit on Translational Bioinformatics Shah, N. H., Bhatia, N., Jonquet, C., Rubin, D., Chiang, A. P., Musen, M. A. BIOMED CENTRAL LTD. 2009

    Abstract

    The National Center for Biomedical Ontology (NCBO) is developing a system for automated, ontology-based access to online biomedical resources (Shah NH, et al.: Ontology-driven indexing of public datasets for translational bioinformatics. BMC Bioinformatics 2009, 10(Suppl 2):S1). The system's indexing workflow processes the text metadata of diverse resources such as datasets from GEO and ArrayExpress to annotate and index them with concepts from appropriate ontologies. This indexing requires the use of a concept-recognition tool to identify ontology concepts in the resource's textual metadata. In this paper, we present a comparison of two concept recognizers - NLM's MetaMap and the University of Michigan's Mgrep. We utilize a number of data sources and dictionaries to evaluate the concept recognizers in terms of precision, recall, speed of execution, scalability and customizability. Our evaluations demonstrate that Mgrep has a clear edge over MetaMap for large-scale service oriented applications. Based on our analysis we also suggest areas of potential improvements for Mgrep. We have subsequently used Mgrep to build the Open Biomedical Annotator service. The Annotator service has access to a large dictionary of biomedical terms derived from the United Medical Language System (UMLS) and NCBO ontologies. The Annotator also leverages the hierarchical structure of the ontologies and their mappings to expand annotations. The Annotator service is available to the community as a REST Web service for creating ontology-based annotations of their data.

    View details for Web of Science ID 000270371700015

    View details for PubMedID 19761568

    View details for PubMedCentralID PMC2745685

  • The open biomedical annotator. Summit on translational bioinformatics Jonquet, C., Shah, N. H., Musen, M. A. 2009; 2009: 56-60

    Abstract

    The range of publicly available biomedical data is enormous and is expanding fast. This expansion means that researchers now face a hurdle to extracting the data they need from the large numbers of data that are available. Biomedical researchers have turned to ontologies and terminologies to structure and annotate their data with ontology concepts for better search and retrieval. However, this annotation process cannot be easily automated and often requires expert curators. Plus, there is a lack of easy-to-use systems that facilitate the use of ontologies for annotation. This paper presents the Open Biomedical Annotator (OBA), an ontology-based Web service that annotates public datasets with biomedical ontology concepts based on their textual metadata (www.bioontology.org). The biomedical community can use the annotator service to tag datasets automatically with ontology terms (from UMLS and NCBO BioPortal ontologies). Such annotations facilitate translational discoveries by integrating annotated data.[1].

    View details for PubMedID 21347171

  • Understanding Detection Performance in Public Health Surveillance: Modeling Aberrancy-detection Algorithms JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Buckeridge, D. L., Okhmatovskaia, A., Tu, S., O'Connor, M., Nyulas, C., Musen, M. A. 2008; 15 (6): 760-769

    Abstract

    Statistical aberrancy-detection algorithms play a central role in automated public health systems, analyzing large volumes of clinical and administrative data in real-time with the goal of detecting disease outbreaks rapidly and accurately. Not all algorithms perform equally well in terms of sensitivity, specificity, and timeliness in detecting disease outbreaks and the evidence describing the relative performance of different methods is fragmented and mainly qualitative.We developed and evaluated a unified model of aberrancy-detection algorithms and a software infrastructure that uses this model to conduct studies to evaluate detection performance. We used a task-analytic methodology to identify the common features and meaningful distinctions among different algorithms and to provide an extensible framework for gathering evidence about the relative performance of these algorithms using a number of evaluation metrics. We implemented our model as part of a modular software infrastructure (Biological Space-Time Outbreak Reasoning Module, or BioSTORM) that allows configuration, deployment, and evaluation of aberrancy-detection algorithms in a systematic manner.We assessed the ability of our model to encode the commonly used EARS algorithms and the ability of the BioSTORM software to reproduce an existing evaluation study of these algorithms.Using our unified model of aberrancy-detection algorithms, we successfully encoded the EARS algorithms, deployed these algorithms using BioSTORM, and were able to reproduce and extend previously published evaluation results.The validated model of aberrancy-detection algorithms and its software implementation will enable principled comparison of algorithms, synthesis of results from evaluation studies, and identification of surveillance algorithms for use in specific public health settings.

    View details for DOI 10.1197/jamia.M2799

    View details for Web of Science ID 000260905500008

    View details for PubMedID 18755992

    View details for PubMedCentralID PMC2585528

  • Reports of the AAAI 2008 Spring Symposia AI MAGAZINE Balduccini, M., Baral, C., Brodaric, B., Colton, S., Fox, P., Gutelius, D., Hinkelmann, K., Horswill, I., Huberman, B., Hudlicka, E., Lerman, K., Lisetti, C., McGuinness, D., Maher, M. L., Musen, M. A., Sahami, M., Sleeman, D., Thoenssen, B., Velasquez, J., Ventura, D. 2008; 29 (3): 107-115
  • Network analysis of intrinsic functional brain connectivity in Alzheimer's disease PLOS COMPUTATIONAL BIOLOGY Supekar, K., Menon, V., Rubin, D., Musen, M., Greicius, M. D. 2008; 4 (6)

    Abstract

    Functional brain networks detected in task-free ("resting-state") functional magnetic resonance imaging (fMRI) have a small-world architecture that reflects a robust functional organization of the brain. Here, we examined whether this functional organization is disrupted in Alzheimer's disease (AD). Task-free fMRI data from 21 AD subjects and 18 age-matched controls were obtained. Wavelet analysis was applied to the fMRI data to compute frequency-dependent correlation matrices. Correlation matrices were thresholded to create 90-node undirected-graphs of functional brain networks. Small-world metrics (characteristic path length and clustering coefficient) were computed using graph analytical methods. In the low frequency interval 0.01 to 0.05 Hz, functional brain networks in controls showed small-world organization of brain activity, characterized by a high clustering coefficient and a low characteristic path length. In contrast, functional brain networks in AD showed loss of small-world properties, characterized by a significantly lower clustering coefficient (p<0.01), indicative of disrupted local connectivity. Clustering coefficients for the left and right hippocampus were significantly lower (p<0.01) in the AD group compared to the control group. Furthermore, the clustering coefficient distinguished AD participants from the controls with a sensitivity of 72% and specificity of 78%. Our study provides new evidence that there is disrupted organization of functional brain networks in AD. Small-world metrics can characterize the functional organization of the brain in AD, and our findings further suggest that these network measures may be useful as an imaging-based biomarker to distinguish AD from healthy aging.

    View details for DOI 10.1371/journal.pcbi.1000100

    View details for Web of Science ID 000259786700013

    View details for PubMedID 18584043

    View details for PubMedCentralID PMC2435273

  • iTools: A Framework for Classification, Categorization and Integration of Computational Biology Resources PLOS ONE Dinov, I. D., Rubin, D., Lorensen, W., Dugan, J., Ma, J., Murphy, S., Kirschner, B., Bug, W., Sherman, M., Floratos, A., Kennedy, D., Jagadish, H. V., Schmidt, J., Athey, B., Califano, A., Musen, M., Altman, R., Kikinis, R., Kohane, I., Delp, S., Parker, D. S., Toga, A. W. 2008; 3 (5)

    Abstract

    The advancement of the computational biology field hinges on progress in three fundamental directions--the development of new computational algorithms, the availability of informatics resource management infrastructures and the capability of tools to interoperate and synergize. There is an explosion in algorithms and tools for computational biology, which makes it difficult for biologists to find, compare and integrate such resources. We describe a new infrastructure, iTools, for managing the query, traversal and comparison of diverse computational biology resources. Specifically, iTools stores information about three types of resources--data, software tools and web-services. The iTools design, implementation and resource meta-data content reflect the broad research, computational, applied and scientific expertise available at the seven National Centers for Biomedical Computing. iTools provides a system for classification, categorization and integration of different computational biology resources across space-and-time scales, biomedical problems, computational infrastructures and mathematical foundations. A large number of resources are already iTools-accessible to the community and this infrastructure is rapidly growing. iTools includes human and machine interfaces to its resource meta-data repository. Investigators or computer programs may utilize these interfaces to search, compare, expand, revise and mine meta-data descriptions of existent computational biology resources. We propose two ways to browse and display the iTools dynamic collection of resources. The first one is based on an ontology of computational biology resources, and the second one is derived from hyperbolic projections of manifolds or complex structures onto planar discs. iTools is an open source project both in terms of the source code development as well as its meta-data content. iTools employs a decentralized, portable, scalable and lightweight framework for long-term resource management. We demonstrate several applications of iTools as a framework for integrated bioinformatics. iTools and the complete details about its specifications, usage and interfaces are available at the iTools web page http://iTools.ccb.ucla.edu.

    View details for DOI 10.1371/journal.pone.0002265

    View details for Web of Science ID 000262268500012

    View details for PubMedID 18509477

    View details for PubMedCentralID PMC2386255

  • A prototype symbolic model of canonical functional neuroanatomy of the motor system JOURNAL OF BIOMEDICAL INFORMATICS Talos, I., Rubin, D. L., Halle, M., Musen, M., Kikinis, R. 2008; 41 (2): 251-263

    Abstract

    Recent advances in bioinformatics have opened entire new avenues for organizing, integrating and retrieving neuroscientific data, in a digital, machine-processable format, which can be at the same time understood by humans, using ontological, symbolic data representations. Declarative information stored in ontological format can be perused and maintained by domain experts, interpreted by machines, and serve as basis for a multitude of decision support, computerized simulation, data mining, and teaching applications. We have developed a prototype symbolic model of canonical neuroanatomy of the motor system. Our symbolic model is intended to support symbolic look up, logical inference and mathematical modeling by integrating descriptive, qualitative and quantitative functional neuroanatomical knowledge. Furthermore, we show how our approach can be extended to modeling impaired brain connectivity in disease states, such as common movement disorders. In developing our ontology, we adopted a disciplined modeling approach, relying on a set of declared principles, a high-level schema, Aristotelian definitions, and a frame-based authoring system. These features, along with the use of the Unified Medical Language System (UMLS) vocabulary, enable the alignment of our functional ontology with an existing comprehensive ontology of human anatomy, and thus allow for combining the structural and functional views of neuroanatomy for clinical decision support and neuroanatomy teaching applications. Although the scope of our current prototype ontology is limited to a particular functional system in the brain, it may be possible to adapt this approach for modeling other brain functional systems as well.

    View details for DOI 10.1016/j.jbi.2007.11.003

    View details for Web of Science ID 000255360000005

    View details for PubMedID 18164666

    View details for PubMedCentralID PMC2376098

  • Developing biomedical ontologies collaboratively. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Noy, N. F., Tudorache, T., de Coronado, S., Musen, M. A. 2008: 520-524

    Abstract

    The development of ontologies that define entities and relationships among them has become essential for modern work in biomedicine. Ontologies are becoming so large in their coverage that no single centralized group of people can develop them effectively and ontology development becomes a community-based enterprise. In this paper we present Collaborative Protégé-a prototype tool that supports many aspects of community-based development, such as discussions integrated with ontology-editing process, chats, and annotation of changes. We have evaluated Collaborative Protégé in the context of the NCI Thesaurus development. Users have found the tool effective for carrying out discussions and recording design rationale.

    View details for PubMedID 18998901

  • Collecting Community-Based Mappings in an Ontology Repository 7th International Semantic Web Conference (ISWC 2008) Noy, N. F., Griffith, N., Musen, M. A. SPRINGER-VERLAG BERLIN. 2008: 371–386
  • Supporting Collaborative Ontology Development in Protege 7th International Semantic Web Conference (ISWC 2008) Tudorache, T., Noy, N. F., Tu, S., Musen, M. A. SPRINGER-VERLAG BERLIN. 2008: 17–32
  • A system for ontology-based annotation of biomedical data 5th International Workshop on Data Integration in the Life Sciences Jonquet, C., Musen, M. A., Shah, N. SPRINGER-VERLAG BERLIN. 2008: 144–152
  • A Generic Ontology for Collaborative Ontology-Development Workflows 16th International Conference on Knowledge Engineering - Practice and Patterns Sebastian, A., Noy, N. F., Tudorache, T., Musen, M. A. SPRINGER-VERLAG BERLIN. 2008: 318–328
  • Calling on a million minds for community annotation in WikiProteins GENOME BIOLOGY Mons, B., Ashburner, M., Chichester, C., van Mulligen, E., Weeber, M., den Dunnen, J., van Ommen, G., Musen, M., Cockerill, M., Hermjakob, H., Mons, A., Packer, A., Pacheco, R., Lewis, S., Berkeley, A., Melton, W., Barris, N., Wales, J., Meijssen, G., Moeller, E., Roes, P. J., Borner, K., Bairoch, A. 2008; 9 (5)

    Abstract

    WikiProteins enables community annotation in a Wiki-based system. Extracts of major data sources have been fused into an editable environment that links out to the original sources. Data from community edits create automatic copies of the original data. Semantic technology captures concepts co-occurring in one sentence and thus potential factual statements. In addition, indirect associations between concepts have been calculated. We call on a 'million minds' to annotate a 'million concepts' and to collect facts from the literature with the reward of collaborative knowledge discovery. The system is available for beta testing at http://www.wikiprofessional.org.

    View details for DOI 10.1186/gb-2008-9-5-r89

    View details for Web of Science ID 000257564800019

    View details for PubMedID 18507872

    View details for PubMedCentralID PMC2441475

  • Predicting outbreak detection in public health surveillance: quantitative analysis to enable evidence-based method selection. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Buckeridge, D. L., Okhmatovskaia, A., Tu, S., O'Connor, M., Nyulas, C., Musen, M. A. 2008: 76-80

    Abstract

    Public health surveillance is critical for accurate and timely outbreak detection and effective epidemic control. A wide range of statistical algorithms is used for surveillance, and important differences have been noted in the ability of these algorithms to detect outbreaks. The evidence about the relative performance of these algorithms, however, remains limited and mainly qualitative. Using simulated outbreak data, we developed and validated quantitative models for predicting the ability of commonly used surveillance algorithms to detect different types of outbreaks. The developed models accurately predict the ability of different algorithms to detect different types of outbreaks. These models enable evidence-based algorithm selection and can guide research into algorithm development.

    View details for PubMedID 18999264

  • BioPortal: ontologies and data resources with the click of a mouse. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Musen, M. A., Shah, N. H., Noy, N. F., Dai, B. Y., Dorf, M., Griffith, N., Buntrok, J., Jonquet, C., Montegut, M. J., Rubin, D. L. 2008: 1223-1224

    View details for PubMedID 18999306

  • Representing the NCI Thesaurus in OWL DL: Modeling tools help modeling languages. Applied ontology Noy, N. F., de Coronado, S., Solbrig, H., Fragoso, G., Hartel, F. W., Musen, M. A. 2008; 3 (3): 173-190

    Abstract

    The National Cancer Institute's (NCI) Thesaurus is a biomedical reference ontology. The NCI Thesaurus is represented using Description Logic, more specifically Ontylog, a Description logic implemented by Apelon, Inc. We are exploring the use of the DL species of the Web Ontology Language (OWL DL)-a W3C recommended standard for ontology representation-instead of Ontylog for representing the NCI Thesaurus. We have studied the requirements for knowledge representation of the NCI Thesaurus, and considered how OWL DL (and its implementation in Protégé-OWL) satisfies these requirements. In this paper, we discuss the areas where OWL DL was sufficient for representing required components, where tool support that would hide some of the complexity and extra levels of indirection would be required, and where language expressiveness is not sufficient given the representation requirements. Because many of the knowledge-representation issues that we encountered are very similar to the issues in representing other biomedical terminologies and ontologies in general, we believe that the lessons that we learned and the approaches that we developed will prove useful and informative for other researchers.

    View details for PubMedID 19789731

  • Comparison of ontology-based semantic-similarity measures. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Lee, W., Shah, N., Sundlass, K., Musen, M. 2008: 384-388

    Abstract

    Semantic-similarity measures quantify concept similarities in a given ontology. Potential applications for these measures include search, data mining, and knowledge discovery in database or decision-support systems that utilize ontologies. To date, there have not been comparisons of the different semantic-similarity approaches on a single ontology. Such a comparison can offer insight on the validity of different approaches. We compared 3 approaches to semantic similarity-metrics (which rely on expert opinion, ontologies only, and information content) with 4 metrics applied to SNOMED-CT. We found that there was poor agreement among those metrics based on information content with the ontology only metric. The metric based only on the ontology structure correlated most with expert opinion. Our results suggest that metrics based on the ontology only may be preferable to information-content-based metrics, and point to the need for more research on validating the different approaches.

    View details for PubMedID 18999312

  • UMLS-Query: a perl module for querying the UMLS. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Shah, N. H., Muse, M. A. 2008: 652-656

    Abstract

    The Metathesaurus from the Unified Medical Language System (UMLS) is a widely used ontology resource, which is mostly used in a relational database form for terminology research, mapping and information indexing. A significant section of UMLS users use a MySQL installation of the metathesaurus and Perl programming language as their access mechanism. We describe UMLS-Query, a Perl module that provides functions for retrieving concept identifiers, mapping text-phrases to Metathesaurus concepts and graph traversal in the Metathesaurus stored in a MySQL database. UMLS-Query can be used to build applications for semi-automated sample annotation, terminology based browsers for tissue sample databases and for terminology research. We describe the results of such uses of UMLS-Query and present the module for others to use.

    View details for PubMedID 18998805

  • Protege: A tool for managing and using terminology in radiology applications JOURNAL OF DIGITAL IMAGING Rubin, D. L., Noy, N. F., Musen, M. A. 2007; 20: 34-46

    Abstract

    The development of standard terminologies such as RadLex is becoming important in radiology applications, such as structured reporting, teaching file authoring, report indexing, and text mining. The development and maintenance of these terminologies are challenging, however, because there are few specialized tools to help developers to browse, visualize, and edit large taxonomies. Protégé ( http://protege.stanford.edu ) is an open-source tool that allows developers to create and to manage terminologies and ontologies. It is more than a terminology-editing tool, as it also provides a platform for developers to use the terminologies in end-user applications. There are more than 70,000 registered users of Protégé who are using the system to manage terminologies and ontologies in many different domains. The RadLex project has recently adopted Protégé for managing its radiology terminology. Protégé provides several features particularly useful to managing radiology terminologies: an intuitive graphical user interface for navigating large taxonomies, visualization components for viewing complex term relationships, and a programming interface so developers can create terminology-driven radiology applications. In addition, Protégé has an extensible plug-in architecture, and its large user community has contributed a rich library of components and extensions that provide much additional useful functionalities. In this report, we describe Protégé's features and its particular advantages in the radiology domain in the creation, maintenance, and use of radiology terminology.

    View details for DOI 10.1007/s10278-007-9065-0

    View details for Web of Science ID 000250825300004

    View details for PubMedID 17687607

    View details for PubMedCentralID PMC2039856

  • The SAGE guideline model: Achievements and overview JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Ru, S. W., Campbell, J. R., Glasgow, J., Nyman, M. A., McClure, R., McClay, J., Parker, C., Hrabak, K. M., Berg, D., Weida, T., Mansfield, J. G., Musen, M. A., Abarbanel, R. M. 2007; 14 (5): 589-598

    Abstract

    The SAGE (Standards-Based Active Guideline Environment) project was formed to create a methodology and infrastructure required to demonstrate integration of decision-support technology for guideline-based care in commercial clinical information systems. This paper describes the development and innovative features of the SAGE Guideline Model and reports our experience encoding four guidelines. Innovations include methods for integrating guideline-based decision support with clinical workflow and employment of enterprise order sets. Using SAGE, a clinician informatician can encode computable guideline content as recommendation sets using only standard terminologies and standards-based patient information models. The SAGE Model supports encoding large portions of guideline knowledge as re-usable declarative evidence statements and supports querying external knowledge sources.

    View details for DOI 10.1197/jamia.M2399

    View details for Web of Science ID 000249769700007

    View details for PubMedID 17600098

    View details for PubMedCentralID PMC1975799

  • Annotation and query of tissue microarray data using the NCI Thesaurus BMC BIOINFORMATICS Shah, N. H., Rubin, D. L., Espinosa, I., Montgomery, K., Musen, M. A. 2007; 8

    Abstract

    The Stanford Tissue Microarray Database (TMAD) is a repository of data serving a consortium of pathologists and biomedical researchers. The tissue samples in TMAD are annotated with multiple free-text fields, specifying the pathological diagnoses for each sample. These text annotations are not structured according to any ontology, making future integration of this resource with other biological and clinical data difficult.We developed methods to map these annotations to the NCI thesaurus. Using the NCI-T we can effectively represent annotations for about 86% of the samples. We demonstrate how this mapping enables ontology driven integration and querying of tissue microarray data. We have deployed the mapping and ontology driven querying tools at the TMAD site for general use.We have demonstrated that we can effectively map the diagnosis-related terms describing a sample in TMAD to the NCI-T. The NCI thesaurus terms have a wide coverage and provide terms for about 86% of the samples. In our opinion the NCI thesaurus can facilitate integration of this resource with other biological data.

    View details for DOI 10.1186/1471-2105-8-296

    View details for Web of Science ID 000249734300001

    View details for PubMedID 17686183

    View details for PubMedCentralID PMC1988837

  • OBO to OWL: a protege OWL tab to read/save OBO ontologies BIOINFORMATICS Moreira, D. A., Musen, M. A. 2007; 23 (14): 1868-1870

    Abstract

    The Open Biomedical Ontologies (OBO) format from the GO consortium is a very successful format for biomedical ontologies, including the Gene Ontology. But it lacks formal computational definitions for its constructs and tools, like DL reasoners, to facilitate ontology development/maintenance. We describe the OBO Converter, a Java tool to convert files from OBO format to Web Ontology Language (OWL) (and vice versa) that can also be used as a Protégé Tab plug-in. It uses the OBO to OWL mapping provided by the National Center for Biomedical Ontologies (NCBO) (a joint effort of OBO developers and OWL experts) and offers options to ease the task of saving/reading files in both formats.bioontology.org/tools/oboinowl/obo_converter.html.Supplementary data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btm258

    View details for Web of Science ID 000249248300030

    View details for PubMedID 17496317

  • Using semantic dependencies for consistency management of an ontology of brain-cortex anatomy 1st International Workshop on Formal Biomedical Knowledge Representation (KR-MED 2004) Dameron, O., Musen, M. A., Gibaud, B. ELSEVIER SCIENCE BV. 2007: 217–25

    Abstract

    In the context of the Semantic Web, ontologies have to be usable by software agents as well as by humans. Therefore, they must meet explicit representation and consistency requirements. This article describes a method for managing the semantic consistency of an ontology of brain-cortex anatomy.The methodology relies on the explicit identification of the relationship properties and of the dependencies that might exist among concepts or relationships. These dependencies have to be respected for insuring the semantic consistency of the model. We propose a method for automatically generating all the dependent items. As a consequence, knowledge base updates are easier and safer.Our approach is composed of three main steps: (1) providing a realistic representation, (2) ensuring the intrinsic consistency of the model and (3) checking its incremental consistency. The corner stone of ontological modeling lies in the expressiveness of the model and in the sound principles that structure it. This part defines the ideal possibilities of the ontology and is called realism of representation. Regardless of how well a model represents reality, the intrinsic consistency of a model corresponds to its lack of contradiction. This step is particularly important as soon as dependencies between relationships or concepts have to be fulfilled. Eventually, the incremental consistency encompasses the respect of the two previous criteria during the successive updates of the ontology.The explicit representation of dependencies among concepts and relationships in an ontology can be helpfully used to assist in the management of the knowledge base and to ensure the model's semantic consistency.

    View details for DOI 10.1016/j.artmed.2006.09.004

    View details for Web of Science ID 000246657200004

    View details for PubMedID 17254759

  • Technology for Building Intelligent Systems: From Psychology to Engineering 52nd Nebraska Symposium on Motivation Musen, M. A. UNIV NEBRASKA PRESS. 2007: 145–184

    View details for Web of Science ID 000248483200006

    View details for PubMedID 17682334

  • Searching Ontologies Based on Content: Experiments in the Biomedical Domain 4th International Conference on Knowledge Capture Alani, H., Noy, N. F., Shah, N., Shadbolt, N., Musen, M. A. ASSOC COMPUTING MACHINERY. 2007: 55–62
  • Document-oriented views of guideline knowledge bases 11th Conference on Artificial Intelligence in Medicine (AIME 2007) Tu, S. W., Condamoor, S., Mather, T., Hall, R., Jones, N., Musen, M. A. SPRINGER-VERLAG BERLIN. 2007: 431–440
  • Using semantic web technologies for knowledge-driven querying of biomedical data 11th Conference on Artificial Intelligence in Medicine (AIME 2007) O'Connor, M., Shankar, R., Tu, S., Nyulas, C., Parrish, D., Musen, M., Das, A. SPRINGER-VERLAG BERLIN. 2007: 267–276
  • Querying the semantic web with SWRL International Symposium on Rule Interchange and Applications O'Connor, M., Tu, S., Nyulas, C., Das, A., Musen, M. SPRINGER-VERLAG BERLIN. 2007: 155–159
  • Efficiently querying relational databases using OWL and SWRL 1st International Conference on Web Reasoning and Rule Systems O'Connor, M., Shankar, R., Tu, S., Nyulas, C., Das, A., Musen, M. SPRINGER-VERLAG BERLIN. 2007: 361–363
  • Knowledge Zone: A Public Repository of Peer-Reviewed Biomedical Ontologies 12th World Congress on Health (Medical) Informatics Supekar, K., Rubin, D., Noy, N., Musen, M. I O S PRESS. 2007: 812–816

    Abstract

    Reuse of ontologies is important for achieving better interoperability among health systems and relieving knowledge engineers from the burden of developing ontologies from scratch. Most of the work that aims to facilitate ontology reuse has focused on building ontology libraries that are simple repositories of ontologies or has led to keyword-based search tools that search among ontologies. To our knowledge, there are no operational methodologies that allow users to evaluate ontologies and to compare them in order to choose the most appropriate ontology for their task. In this paper, we present, Knowledge Zone - a Web-based portal that allows users to submit their ontologies, to associate metadata with their ontologies, to search for existing ontologies, to find ontology rankings based on user reviews, to post their own reviews, and to rate reviews.

    View details for Web of Science ID 000272064000163

    View details for PubMedID 17911829

  • Interpretation errors related to the GO annotation file format. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Moreira, D. A., Shah, N. H., Musen, M. A. 2007: 538-542

    Abstract

    The Gene Ontology (GO) is the most widely used ontology for creating biomedical annotations. GO annotations are statements associating a biological entity with a GO term. These statements comprise a large dataset of biological knowledge that is used widely in biomedical research. GO Annotations are available as "gene association files" from the GO website in a tab-delimited file format (GO Annotation File Format) composed of rows of 15 tab-delimited fields. This simple format lacks the knowledge representation (KR) capabilities to represent unambiguously semantic relationships between each field. This paper demonstrates that this KR shortcoming leads users to interpret the files in ways that can be erroneous. We propose a complementary format to represent GO annotation files as knowledge bases using the W3C recommended Web Ontology Language (OWL).

    View details for PubMedID 18693894

  • Evaluating detection of an inhalational anthrax outbreak EMERGING INFECTIOUS DISEASES Buckeridge, D. L., Owens, D. K., Switzer, P., Frank, J., Musen, M. A. 2006; 12 (12): 1942-1949

    Abstract

    Timely detection of an inhalational anthrax outbreak is critical for clinical and public health management. Syndromic surveillance has received considerable investment, but little is known about how it will perform relative to routine clinical case finding for detection of an inhalational anthrax outbreak. We conducted a simulation study to compare clinical case finding with syndromic surveillance for detection of an outbreak of inhalational anthrax. After simulated release of 1 kg of anthrax spores, the proportion of outbreaks detected first by syndromic surveillance was 0.59 at a specificity of 0.9 and 0.28 at a specificity of 0.975. The mean detection benefit of syndromic surveillance was 1.0 day at a specificity of 0.9 and 0.32 days at a specificity of 0.975. When syndromic surveillance was sufficiently sensitive to detect a substantial proportion of outbreaks before clinical case finding, it generated frequent false alarms.

    View details for Web of Science ID 000242301900022

    View details for PubMedID 17326949

    View details for PubMedCentralID PMC3291344

  • Using ontologies linked with geometric models to reason about penetrating injuries ARTIFICIAL INTELLIGENCE IN MEDICINE Rubin, D. L., Dameron, O., Bashir, Y., Grossman, D., Dev, P., Musen, M. A. 2006; 37 (3): 167-176

    Abstract

    Medical assessment of penetrating injuries is a difficult and knowledge-intensive task, and rapid determination of the extent of internal injuries is vital for triage and for determining the appropriate treatment. Physical examination and computed tomographic (CT) imaging data must be combined with detailed anatomic, physiologic, and biomechanical knowledge to assess the injured subject. We are developing a methodology to automate reasoning about penetrating injuries using canonical knowledge combined with specific subject image data.In our approach, we build a three-dimensional geometric model of a subject from segmented images. We link regions in this model to entities in two knowledge sources: (1) a comprehensive ontology of anatomy containing organ identities, adjacencies, and other information useful for anatomic reasoning and (2) an ontology of regional perfusion containing formal definitions of arterial anatomy and corresponding regions of perfusion. We created computer reasoning services ("problem solvers") that use the ontologies to evaluate the geometric model of the subject and deduce the consequences of penetrating injuries.We developed and tested our methods using data from the Visible Human. Our problem solvers can determine the organs that are injured given particular trajectories of projectiles, whether vital structures--such as a coronary artery--are injured, and they can predict the propagation of injury ensuing after vital structures are injured.We have demonstrated the capability of using ontologies with medical images to support computer reasoning about injury based on those images. Our methodology demonstrates an approach to creating intelligent computer applications that reason with image data, and it may have value in helping practitioners in the assessment of penetrating injury.

    View details for DOI 10.1016/j.artmed.2006.03.006

    View details for Web of Science ID 000238992500002

    View details for PubMedID 16730959

  • National Center for Biomedical Ontology: Advancing biomedicine through structured organization of scientific knowledge OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY Rubin, D. L., Lewis, S. E., Mungall, C. J., Misra, S., Westerfield, M., Ashburner, M., Sim, I., Chute, C. G., Solbrig, H., Storey, M., Smith, B., Day-Richter, J., Noy, N. F., Musen, M. A. 2006; 10 (2): 185-198

    Abstract

    The National Center for Biomedical Ontology is a consortium that comprises leading informaticians, biologists, clinicians, and ontologists, funded by the National Institutes of Health (NIH) Roadmap, to develop innovative technology and methods that allow scientists to record, manage, and disseminate biomedical information and knowledge in machine-processable form. The goals of the Center are (1) to help unify the divergent and isolated efforts in ontology development by promoting high quality open-source, standards-based tools to create, manage, and use ontologies, (2) to create new software tools so that scientists can use ontologies to annotate and analyze biomedical data, (3) to provide a national resource for the ongoing evaluation, integration, and evolution of biomedical ontologies and associated tools and theories in the context of driving biomedical projects (DBPs), and (4) to disseminate the tools and resources of the Center and to identify, evaluate, and communicate best practices of ontology development to the biomedical community. Through the research activities within the Center, collaborations with the DBPs, and interactions with the biomedical community, our goal is to help scientists to work more effectively in the e-science paradigm, enhancing experiment design, experiment execution, data analysis, information synthesis, hypothesis generation and testing, and understand human disease.

    View details for Web of Science ID 000240210900015

    View details for PubMedID 16901225

  • Wrestling with SUMO and bio-ontologies. Nature biotechnology Stoeckert, C., Ball, C., Brazma, A., Brinkman, R., Causton, H., Fan, L., Fostel, J., Fragoso, G., Heiskanen, M., Holstege, F., Morrison, N., Parkinson, H., Quackenbush, J., Rocca-Serra, P., Sansone, S. A., Sarkans, U., Sherlock, G., Stevens, R., Taylor, C., Taylor, R., Whetzel, P., White, J. 2006; 24 (1): 21-2; author reply 23

    View details for DOI 10.1038/nbt0106-21a

    View details for PubMedID 16404382

  • To the editor NATURE BIOTECHNOLOGY Stoeckert, C., Ball, C., Brazma, A., Brinkman, R., Causton, H., Fan, L. J., Fostel, J., Fragoso, G., Heiskanen, M., Holstege, F., Morrison, N., Parkinson, H., Quackenbush, J., Rocca-Serra, P., Sansone, S. A., Sarkans, U., Sherlock, G., Stevens, R., TAYLOR, C., Taylor, R., Whetzel, P., WHITE, J. 2006; 24 (1): 21-22
  • A framework for ontology evolution in collaborative environments 5th International Semantic Web Conference (ISWC 2006) Noy, N. F., Chugh, A., Liu, W., Musen, M. A. SPRINGER-VERLAG BERLIN. 2006: 544–558
  • Wrestling with SUMO and bio-ontologies NATURE BIOTECHNOLOGY Musen, M. A., Lewis, S., Smith, B. 2006; 24 (1): 21-21

    View details for Web of Science ID 000234555800010

    View details for PubMedID 16404381

  • Identifying barriers to hypertension guideline adherence using clinician feedback at the point of care. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Lin, N. D., Martins, S. B., Chan, A. S., Coleman, R. W., Bosworth, H. B., Oddone, E. Z., Shankar, R. D., Musen, M. A., Hoffman, B. B., Goldstein, M. K. 2006: 494-498

    Abstract

    Factors contributing to low adherence to clinical guidelines by clinicians are not well understood. The user interface of ATHENA-HTN, a guideline-based decision support system (DSS) for hypertension, presents a novel opportunity to collect clinician feedback on recommendations displayed at the point of care. We analyzed feedback from 46 clinicians who received ATHENA advisories as part of a 15-month randomized trial to identify potential reasons clinicians may not intensify hypertension therapy when it is recommended. Among the 368 visits for which feedback was provided, clinicians commonly reported they did not follow recommendations because: recorded blood pressure was not representative of the patient's typical blood pressure; hypertension was not a clinical priority for the visit; or patients were nonadherent to medications. For many visits, current quality-assurance algorithms may incorrectly identify clinically appropriate decisions as guideline nonadherent due to incomplete capture of relevant information. We present recommendations for how automated DSSs may help identify "apparent" barriers and better target decision support.

    View details for PubMedID 17238390

  • Ontology-based annotation and query of tissue microarray data. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Shah, N. H., Rubin, D. L., Supekar, K. S., Musen, M. A. 2006: 709-713

    Abstract

    The Stanford Tissue Microarray Database (TMAD) is a repository of data amassed by a consortium of pathologists and biomedical researchers. The TMAD data are annotated with multiple free-text fields, specifying the pathological diagnoses for each tissue sample. These annotations are spread out over multiple text fields and are not structured according to any ontology, making it difficult to integrate this resource with other biological and clinical data. We developed methods to map these annotations to the NCI thesaurus and the SNOMED-CT ontologies. Using these two ontologies we can effectively represent about 80% of the annotations in a structured manner. This mapping offers the ability to perform ontology driven querying of the TMAD data. We also found that 40% of annotations can be mapped to terms from both ontologies, providing the potential to align the two ontologies based on experimental data. Our approach provides the basis for a data-driven ontology alignment by mapping annotations of experimental data.

    View details for PubMedID 17238433

  • Use of declarative statements in creating and maintaining computer-interpretable knowledge bases for guideline-based care. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Tu, S. W., Hrabak, K. M., Campbell, J. R., Glasgow, J., Nyman, M. A., McClure, R., McClay, J., Abarbanel, R., Mansfield, J. G., Martins, S. M., Goldstein, M. K., Musen, M. A. 2006: 784-788

    Abstract

    Developing computer-interpretable clinical practice guidelines (CPGs) to provide decision support for guideline-based care is an extremely labor-intensive task. In the EON/ATHENA and SAGE projects, we formulated substantial portions of CPGs as computable statements that express declarative relationships between patient conditions and possible interventions. We developed query and expression languages that allow a decision-support system (DSS) to evaluate these statements in specific patient situations. A DSS can use these guideline statements in multiple ways, including: (1) as inputs for determining preferred alternatives in decision-making, and (2) as a way to provide targeted commentaries in the clinical information system. The use of these declarative statements significantly reduces the modeling expertise and effort required to create and maintain computer-interpretable knowledge bases for decision-support purpose. We discuss possible implications for sharing of such knowledge bases.

    View details for PubMedID 17238448

  • Ontology-based representation of simulation models of physiology. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Rubin, D. L., Grossman, D., Neal, M., Cook, D. L., Bassingthwaighte, J. B., Musen, M. A. 2006: 664-668

    Abstract

    Dynamic simulation models of physiology are often represented as a set of mathematical equations. Such models are very useful for studying and understanding the dynamic behavior of physiological variables. However, the sheer number of equations and variables can make these models unwieldy, difficult to under-stand, and challenging to maintain. We describe a symbolic, ontologically-guided methodology for representing a physiological model of the circulation. We created an ontology describing the types of equations in the model as well as the anatomic components and how they are connected to form a circulatory loop. The ontology provided an explicit representation of the model, both its mathematical and anatomic content, abstracting and hiding much of the mathematical complexity. The ontology also provided a framework to construct a graphical representation of the model, providing a simpler visualization than the large set of mathematical equations. Our approach may help model builders to maintain, debug, and extend simulation models.

    View details for PubMedID 17238424

  • Ontology-centered syndromic surveillance for bioterrorism AAAI Spring Symposium on AI Technologies for Homeland Security Crubezy, M., O'Connor, M., Pincus, Z., Musen, M. A., Buckeridge, D. L. IEEE COMPUTER SOC. 2005: 26–35
  • An evaluation model for syndromic surveillance: assessing the performance of a temporal algorithm. MMWR. Morbidity and mortality weekly report Buckeridge, D. L., Switzer, P., Owens, D., SIEGRIST, D., Pavlin, J., Musen, M. 2005; 54: 109-115

    Abstract

    Syndromic surveillance offers the potential to rapidly detect outbreaks resulting from terrorism. Despite considerable experience with implementing syndromic surveillance, limited evidence exists to describe the performance of syndromic surveillance systems in detecting outbreaks.To describe a model for simulating cases that might result from exposure to inhalational anthrax and then use the model to evaluate the ability of syndromic surveillance to detect an outbreak of inhalational anthrax after an aerosol release.Disease progression and health-care use were simulated for persons infected with anthrax. Simulated cases were then superimposed on authentic surveillance data to create test data sets. A temporal outbreak detection algorithm was applied to each test data set, and sensitivity and timeliness of outbreak detection were calculated by using syndromic surveillance.The earliest detection using a temporal algorithm was 2 days after a release. Earlier detection tended to occur when more persons were infected, and performance worsened as the proportion of persons seeking care in the prodromal disease state declined. A shorter median incubation state led to earlier detection, as soon as 1 day after release when the incubation state was < or =5 days.Syndromic surveillance of a respiratory syndrome using a temporal detection algorithm tended to detect an anthrax attack within 3-4 days after exposure if >10,000 persons were infected. The performance of surveillance (i.e., timeliness and sensitivity) worsened as the number of persons infected decreased.

    View details for PubMedID 16177701

  • EZPAL: Environment for composing constraint axioms by instantlating templates 6th International Protege Users Conference Hou, C. S., Musen, M. A., Noy, N. F. ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD. 2005: 578–96
  • Challenges in converting frame-based ontology into OWL: the Foundational Model of Anatomy case-study. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Dameron, O., Rubin, D. L., Musen, M. A. 2005: 181-185

    Abstract

    A description logics representation of the Foundational Model of Anatomy (FMA) in the Web Ontology Language (OWL-DL) would allow developers to combine it with other OWL ontologies, and would provide the benefit of being able to access generic reasoning tools. However, the FMA is currently represented in a frame language. The differences between description logics and frames are not only syntactic, but also semantic. We analyze some theoretical and computational limitations of converting the FMA into OWL-DL. Namely, some of the constructs used in the FMA do not have a direct equivalent in description logics, and a complete conversion of the FMA in description logics is too large to support reasoning. Therefore, an OWL-DL representation of the FMA would have to be optimized for each application. We propose a solution based on OWL-Full, a superlanguage of OWL-DL, that meets the expressiveness requirements and remains application-independent. Specific simplified OWL-DL representations can then be generated from the OWL-Full model by applications. We argue that this solution is easier to implement and closer to the application needs than an integral translation, and that the latter approach would only make the FMA maintenance more difficult.

    View details for PubMedID 16779026

  • Supporting rule system interoperability on the semantic web with SWRL 4th International Semantic Web Conference (ISWC 2005) O'Connor, M. T., Knublauch, H., Tu, S., Grosof, B., Dean, M., Grosso, W., Musen, M. SPRINGER-VERLAG BERLIN. 2005: 974–986
  • Semantic clinical guideline documents. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Eriksson, H., Tu, S. W., Musen, M. 2005: 236-240

    Abstract

    Decision-support systems based on clinical practice guidelines can support physicians and other healthcare personnel in the process of following best practice consistently. A knowledge-based approach to represent guidelines makes it possible to encode computer-interpretable guidelines in a formal manner,perform consistency checks, and use the guidelines directly in decision-support systems.Decision-support authors and guideline users require guidelines in human-readable formats in addition to computer-interpretable ones (e.g., for guideline review and quality assurance). We propose a new document-oriented information architecture that combines knowledge-representation models with electronic and paper documents. The approach integrates decision-support modes with standard document formats to create a combined clinical-guideline model that supports on-line viewing, printing, and decision support.

    View details for PubMedID 16779037

  • Use of description logic classification to reason about consequences of penetrating injuries. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Rubin, D. L., Dameron, O., Musen, M. A. 2005: 649-653

    Abstract

    The consequences of penetrating injuries can be complex, including abnormal blood flow through the injury channel and functional impairment of organs if arteries supplying them have been severed. Determining the consequences of such injuries can be posed as a classification problem, requiring a priori symbolic knowledge of anatomy. We hypothesize that such symbolic knowledge can be modeled using ontologies, and that the reasoning task can be accomplished using knowl-edge representation in description logics (DL) and automatic classification. We demonstrate the capabilities of automated classification using the Web Ontology Language (OWL) to reason about the consequences of penetrating injuries. We created in OWL a knowledge model of chest and heart anatomy describing the heart structure and the surrounding anatomic compartments, as well as the perfusion of regions of the heart by branches of the coronary arteries. We then used a domain-independent classifier to infer ischemic regions of the heart as well as anatomic spaces containing ectopic blood secondary to the injuries. Our results highlight the advantages of posing reasoning problems as a classification task, and lever-aging the automatic classification capabilities of DL to create intelligent applications.

    View details for PubMedID 16779120

  • Using an Ontology of Human Anatomy to Inform Reasoning with Geometric Models 13th Conference on Medicine Meets Virtual Reality Rubin, D. L., Bashir, Y., Grossman, D., Dev, P., Musen, M. A. I O S PRESS. 2005: 429–435

    Abstract

    The Virtual Soldier project is a large effort on the part of the U.S. Defense Advanced Research Projects agency to explore using both general anatomical knowledge and specific computed tomographic (CT) images of individual soldiers to aid the rapid diagnosis and treatment of penetrating injuries. Our goal is to develop intelligent computer applications that use this knowledge to reason about the anatomic structures that are directly injured and to predict propagation of injuries secondary to primary organ damage. To accomplish this, we needed to develop an architecture to combine geometric data with anatomic knowledge and reasoning services that use this information to predict the consequences of injuries.

    View details for Web of Science ID 000273828700086

    View details for PubMedID 15718773

  • Ontology metadata to support the building of a library of biomedical ontologies. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Supekar, K., Musen, M. 2005: 1126-?

    View details for PubMedID 16779413

  • Translating research into practice: Organizational issues in implementing automated decision support for hypertension in three medical centers JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Goldstein, M. K., Coleman, R. W., Tu, S. W., Shankar, R. D., O'Connor, M. J., Musen, M. A., Martins, S. B., Lavori, P. W., Shlipak, M. G., Oddone, E., Advani, A. A., Gholami, P., Hoffman, B. B. 2004; 11 (5): 368-376

    Abstract

    Information technology can support the implementation of clinical research findings in practice settings. Technology can address the quality gap in health care by providing automated decision support to clinicians that integrates guideline knowledge with electronic patient data to present real-time, patient-specific recommendations. However, technical success in implementing decision support systems may not translate directly into system use by clinicians. Successful technology integration into clinical work settings requires explicit attention to the organizational context. We describe the application of a "sociotechnical" approach to integration of ATHENA DSS, a decision support system for the treatment of hypertension, into geographically dispersed primary care clinics. We applied an iterative technical design in response to organizational input and obtained ongoing endorsements of the project by the organization's administrative and clinical leadership. Conscious attention to organizational context at the time of development, deployment, and maintenance of the system was associated with extensive clinician use of the system.

    View details for Web of Science ID 000223898000005

    View details for PubMedID 15187064

    View details for PubMedCentralID PMC516243

  • Ontology versioning in an ontology management framework IEEE INTELLIGENT SYSTEMS Noy, N. F., Musen, M. A. 2004; 19 (4): 6-13
  • Pushing the envelope: challenges in a frame-based representation of human anatomy DATA & KNOWLEDGE ENGINEERING Noy, N. F., Musen, M. A., Mejino, J. L., Rosse, C. 2004; 48 (3): 335-359
  • Linking ontologies with three-dimensional models of anatomy to predict the effects of penetrating injuries. Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference Rubin, D. L., Bashir, Y., Grossman, D., Dev, P., Musen, M. A. 2004; 5: 3128-3131

    Abstract

    Rapid diagnosis of penetrating injuries is essential to increased chance of survival. Geometric models representing anatomic structures could be useful, but such models generally contain only information about the relationships of points in space as well as display properties. We describe an approach to predicting the anatomic consequences of penetrating injury by creating a geometric model of anatomy that integrates biomechanical and anatomic knowledge. We created a geometric model of the heart from the Visible Human image data set. We linked this geometric model of anatomy with an ontology of descriptive anatomic knowledge. A hierarchy of abstract geometric objects was created that represents organs and organ parts. These geometric objects contain information about organ identity, composition, adjacency, and tissue biomechanical properties. This integrated model can support anatomic reasoning. Given a bullet trajectory and a parametric representation of a cone of tissue damage, we can use our model to predict the organs and organ parts that are injured. Our model is extensible, being able to incorporate future information, such as physiological implications of organ injuries.

    View details for PubMedID 17270942

  • Linking ontologies with three-dimensional models of anatomy to predict the effects of penetrating injuries 26th Annual International Conference of the IEEE-Engineering-in-Medicine-and-Biology-Society Rubin, D. L., Bashir, Y., Grossman, D., Dev, P., Musen, M. A. IEEE. 2004: 3128–31

    Abstract

    Rapid diagnosis of penetrating injuries is essential to increased chance of survival. Geometric models representing anatomic structures could be useful, but such models generally contain only information about the relationships of points in space as well as display properties. We describe an approach to predicting the anatomic consequences of penetrating injury by creating a geometric model of anatomy that integrates biomechanical and anatomic knowledge. We created a geometric model of the heart from the Visible Human image data set. We linked this geometric model of anatomy with an ontology of descriptive anatomic knowledge. A hierarchy of abstract geometric objects was created that represents organs and organ parts. These geometric objects contain information about organ identity, composition, adjacency, and tissue biomechanical properties. This integrated model can support anatomic reasoning. Given a bullet trajectory and a parametric representation of a cone of tissue damage, we can use our model to predict the organs and organ parts that are injured. Our model is extensible, being able to incorporate future information, such as physiological implications of organ injuries.

    View details for Web of Science ID 000225461800809

  • A knowledge-based framework for deploying surveillance problem solvers International Conference on Information and Knowledge Engineering (IKE 04) Buckeridge, D. L., O'Connor, M. J., Xu, H. B., Musen, M. A. C S R E A PRESS. 2004: 28–32
  • Specifying ontology views by traversal 3rd International Semantic Web Conference Noy, N. F., Musen, M. A. SPRINGER-VERLAG BERLIN. 2004: 713–725
  • Tracking changes during ontology evolution 3rd International Semantic Web Conference Noy, N. F., Kunnatur, S., Klein, M., Musen, M. A. SPRINGER-VERLAG BERLIN. 2004: 259–273
  • The Protege OWL Plugin: An open development environment for Semantic Web applications 3rd International Semantic Web Conference Knublauch, H., Fergerson, R. W., Noy, N. F., Musen, M. A. SPRINGER-VERLAG BERLIN. 2004: 229–243
  • Evaluating provider adherence in a trial of a guideline-based decision support system for hypertension 11th World Congress on Medical Informatics Chan, A. S., Coleman, R. W., Martins, S. B., Advani, A., Musen, M. A., Bosworth, H. B., Oddone, E. Z., Shlipak, M. G., Hoffman, B. B., Goldstein, M. K. I O S PRESS. 2004: 125–129

    Abstract

    Measurement of provider adherence to a guideline-based decision support system (DSS) presents a number of important challenges. Establishing a causal relationship between the DSS and change in concordance requires consideration of both the primary intention of the guideline and different ways providers attempt to satisfy the guideline. During our work with a guideline-based decision support system for hypertension, ATHENA DSS, we document a number of subtle deviations from the strict hypertension guideline recommendations that ultimately demonstrate provider adherence. We believe that understanding these complexities is crucial to any valid evaluation of provider adherence. We also describe the development of an advisory evaluation engine that automates the interpretation of clinician adherence with the DSS on multiple levels, facilitating the high volume of complex data analysis that is created in a clinical trial of a guideline-based DSS.

    View details for Web of Science ID 000226723300026

    View details for PubMedID 15360788

  • An intelligent case-adjustment algorithm for the automated design of population-based quality auditing protocols 11th World Congress on Medical Informatics Advani, A., Jones, N., Shahar, Y., Goldstein, M., Musen, M. A. I O S PRESS. 2004: 1003–1007

    Abstract

    We develop a method and algorithm for deciding the optimal approach to creating quality-auditing protocols for guideline-based clinical performance measures. An important element of the audit protocol design problem is deciding which guide-line elements to audit. Specifically, the problem is how and when to aggregate individual patient case-specific guideline elements into population-based quality measures. The key statistical issue involved is the trade-off between increased reliability with more general population-based quality measures versus increased validity from individually case-adjusted but more restricted measures done at a greater audit cost. Our intelligent algorithm for auditing protocol design is based on hierarchically modeling incrementally case-adjusted quality constraints. We select quality constraints to measure using an optimization criterion based on statistical generalizability coefficients. We present results of the approach from a deployed decision support system for a hypertension guideline.

    View details for Web of Science ID 000226723300202

    View details for PubMedID 15360963

  • Modeling guidelines for integration into clinical workflow 11th World Congress on Medical Informatics Tu, S. W., Musen, M. A., Shankar, R., Campbell, J., Hrabak, K., McClay, J., Huff, S. M., Mcclure, R., Parker, C., Rocha, R., ABARBANEL, R., Beard, N., Glasgow, J., Mansfield, G., Ram, P., Ye, Q., Mays, E., Weida, T., Chute, C. G., McDonald, K., Mohr, D., Nyman, M. A., Scheitel, S., Solbrig, H., Zill, D. A., Goldstein, M. K. I O S PRESS. 2004: 174–178

    Abstract

    The success of clinical decision-support systems requires that they are seamlessly integrated into clinical workflow. In the SAGE project, which aims to create the technological infra-structure for implementing computable clinical practice guide-lines in enterprise settings, we created a deployment-driven methodology for developing guideline knowledge bases. It involves (1) identification of usage scenarios of guideline-based care in clinical workflow, (2) distillation and disambiguation of guideline knowledge relevant to these usage scenarios, (3) formalization of data elements and vocabulary used in the guideline, and (4) encoding of usage scenarios and guideline knowledge using an executable guideline model. This methodology makes explicit the points in the care process where guideline-based decision aids are appropriate and the roles of clinicians for whom the guideline-based assistance is intended. We have evaluated the methodology by simulating the deployment of an immunization guideline in a real clinical information system and by reconstructing the workflow context of a deployed decision-support system for guideline-based care. We discuss the implication of deployment-driven guideline encoding for sharability of executable guidelines.

    View details for PubMedID 15360798

  • The SAGE guideline modeling: Motivation and methodology Symposium on Computerized Guidelines and Protocols Tu, S. W., Campbell, J., Musen, M. A. I O S PRESS. 2004: 167–171

    Abstract

    The SAGE (Standards-Based Sharable Active Guideline Environment) project is a collaboration among research groups at six institutions in the US. The ultimate goal of the project is to create an infrastructure that will allow execution of standards-based clinical practice guidelines across heterogeneous clinical information systems. This paper describes the design goals of the SAGE guideline model in the context of the technological infrastructure and guideline modeling methodology that the project is developing.

    View details for PubMedID 15537222

  • The PROMPT suite: interactive tools for ontology merging and mapping 17th National Conference on Artificial Intelligence Noy, N. F., Musen, M. A. ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD. 2003: 983–1024
  • Configuring online problem-solving resources with the internet reasoning service IEEE INTELLIGENT SYSTEMS Crubezy, M., Musen, M. A., Motta, E., Lu, W. J. 2003; 18 (2): 34-42
  • The structure of guideline recommendations: a synthesis. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Tu, S. W., Campbell, J., Musen, M. A. 2003: 679-683

    Abstract

    We propose that recommendations in a clinical guideline can be structured either as collections of decisions that are to be applied in specific situations or as processes that specify activities that take place over time. We formalize them as "recommendation sets" consisting of either Activity Graphs that represent guideline-directed processes or Decision Maps that represent atemporal recommendations or recommendations involving decisions made at one time point. We model guideline processes as specializations of workflow processes and provide possible computational models for decision maps. We evaluate the proposed formalism by showing how various guideline-modeling methodologies, including GLIF, EON, PRODIGY3, and Medical Logic Modules can be mapped into the proposed structures. The generality of the formalism makes it a candidate for standardizing the structure of recommendations for computer-interpretable guidelines.

    View details for PubMedID 14728259

  • Challenges in Medical Informatics. A Discipline Coming of Age. Yearbook of medical informatics Musen, M. A., van Bemmel, J. H. 2003: 209-210

    View details for PubMedID 27706339

  • UPML: The language and tool support for making the Semantic Web alive Seminar on Semantics for the Web Omelayenko, B., Crubezy, M., Fensel, D., Benjamins, R., Wielinga, B., Motta, E., Musen, M., Ding, Y. M I T PRESS. 2003: 141–170
  • BioSTORM: a system for automated surveillance of diverse data sources. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium O'Connor, M. J., Buckeridge, D. L., Choy, M., Crubezy, M., Pincus, Z., Musen, M. A. 2003: 1071-?

    Abstract

    Heightened concerns about bioterrorism are forcing changes to the traditional biosurveillance-model. Public health departments are under pressure to follow multiple, non-specific, pre-diagnostic indicators, often drawn from many data sources. As a result, there is a need for biosurveillance systems that can use a variety of analysis techniques to rapidly integrate and process multiple diverse data feeds using a variety of problem solving techniques to give timely analysis. To meet these requirements, we are developing a new system called BioSTORM (Biological Spatio-Temporal Outbreak Reasoning Module).

    View details for PubMedID 14728574

  • A knowledge-acquisition wizard to encode guidelines. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Shankar, R. D., Tu, S. W., Musen, M. A. 2003: 1007-?

    Abstract

    An important step in building guideline-based clinical care systems is encoding guidelines. Protégé-2000, developed in our laboratory, is a general-purpose knowledge-acquisition tool that facilitates domain experts and developers to record, browse and maintain domain knowledge in knowledge bases. In this poster we illustrate a knowledge-acquisition wizard that we built around Protégé-2000. The wizard provides an environment that is more intuitive to domain specialists to enter knowledge, and domain specialists and practitioners to review the knowledge entered.

    View details for PubMedID 14728510

  • Protégé-2000: an open-source ontology-development and knowledge-acquisition environment. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Noy, N. F., Crubezy, M., Fergerson, R. W., Knublauch, H., Tu, S. W., Vendetti, J., Musen, M. A. 2003: 953-?

    Abstract

    Protégé-2000 is an open-source tool that assists users in the construction of large electronic knowledge bases. It has an intuitive user interface that enables developers to create and edit domain ontologies. Numerous plugins provide alternative visualization mechanisms, enable management of multiple ontologies, allow the use of interference engines and problem solvers with Protégé ontologies, and provide other functionality. The Protégé user community has more than 7000 members.

    View details for PubMedID 14728458

  • Contextualizing heterogeneous data for integration and inference. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Pincus, Z., Musen, M. A. 2003: 514-518

    Abstract

    Systems that attempt to integrate and analyze data from multiple data sources are greatly aided by the addition of specific semantic and metadata "context" that explicitly describes what a data value means. In this paper, we describe a systematic approach to constructing models of data and their context. Our approach provides a generic "template" for constructing such models. For each data source, a developer creates a customized model by filling in the tem-plate with predefined attributes and value. This approach facilitates model construction and provides consistent syntax and semantics among models created with the template. Systems that can process the template structure and attribute values can reason about any model so described. We used the template to create a detailed knowledge base for syndromic surveillance data integration and analysis. The knowledge base provided support for data integration, translation, and analysis methods.

    View details for PubMedID 14728226

  • Developing quality indicators and auditing protocols from formal guideline models: knowledge representation and transformations. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Advani, A., Goldstein, M., Shahar, Y., Musen, M. A. 2003: 11-15

    Abstract

    Automated quality assessment of clinician actions and patient outcomes is a central problem in guideline- or standards-based medical care. In this paper we describe a model representation and algorithm for deriving structured quality indicators and auditing protocols from formalized specifications of guidelines used in decision support systems. We apply the model and algorithm to the assessment of physician concordance with a guideline knowledge model for hypertension used in a decision-support system. The properties of our solution include the ability to derive automatically context-specific and case-mix-adjusted quality indicators that can model global or local levels of detail about the guideline parameterized by defining the reliability of each indicator or element of the guideline.

    View details for PubMedID 14728124

  • An analytic framework fo space-time aberrancy detection in public health surveillance data. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Buckeridge, D. L., Musen, M. A., Switzer, P., Crubézy, M. 2003: 120-124

    Abstract

    Public health surveillance is changing in response to concerns about bioterrorism, which have increased the pressure for early detection of epidemics. Rapid detection necessitates following multiple non-specific indicators and accounting for spatial structure. No single analytic method can meet all of these requirements for all data sources and all surveillance goals. Analytic methods must be selected and configured to meet a surveillance goal, but there are no uniform criteria to guide the selection and configuration process. In this paper, we describe work towards the development of an analytic framework for space-time aberrancy detection in public health surveillance data. The framework decomposes surveillance analysis into sub-tasks and identifies knowledge that can facilitate selection of methods to accomplish sub-tasks.

    View details for PubMedID 14728146

  • The evolution of Protege: an environment for knowledge-based systems development INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES Gennari, J. H., Musen, M. A., Fergerson, R. W., Grosso, W. E., Crubezy, M., Eriksson, H., Noy, N. F., Tu, S. W. 2003; 58 (1): 89-123
  • Patient safety in guideline-based decision support for hypertension management: ATHENA DSS Annual Meeting of the American-Medical-Information-Association Goldstein, M. K., Hoffman, B. B., Coleman, R. W., Tu, S. W., Shankar, R. D., O'Connor, M., Martins, S., Advani, A., Musen, M. A. B M J PUBLISHING GROUP. 2002: S11–S16
  • Medical quality assessment by scoring adherence to guideline intentions Annual Meeting of the American-Medical-Information-Association Advani, A., Shahar, Y., Musen, M. A. B M J PUBLISHING GROUP. 2002: S92–S97
  • Bioterrorism preparedness and response: use of information technologies and decision support systems. Evidence report/technology assessment (Summary) Bravata, D. M., McDonald, K., Owens, D. K., Buckeridge, D., Haberland, C., Rydzak, C., Schleinitz, M., Smith, W. M., Szeto, H., Wilkening, D., Musen, M., Duncan, B. W., Nouri, B., Dangiolo, M. B., Liu, H., Shofer, S., Graham, J., Davies, S. 2002: 1-8

    View details for PubMedID 12154489

  • Breast cancer on the world wide web: cross sectional survey of quality of information and popularity of websites BRITISH MEDICAL JOURNAL Meric, F., Bernstam, E. V., Mirza, N. Q., Hunt, K. K., Ames, F. C., Ross, M. I., Kuerer, H. M., Pollock, R. E., Musen, M. A., Singletary, S. E. 2002; 324 (7337): 577-581

    Abstract

    To determine the characteristics of popular breast cancer related websites and whether more popular sites are of higher quality.The search engine Google was used to generate a list of websites about breast cancer. Google ranks search results by measures of link popularity---the number of links to a site from other sites. The top 200 sites returned in response to the query "breast cancer" were divided into "more popular" and "less popular" subgroups by three different measures of link popularity: Google rank and number of links reported independently by Google and by AltaVista (another search engine).Type and quality of content.More popular sites according to Google rank were more likely than less popular ones to contain information on ongoing clinical trials (27% v 12%, P=0.01 ), results of trials (12% v 3%, P=0.02), and opportunities for psychosocial adjustment (48% v 23%, P<0.01). These characteristics were also associated with higher number of links as reported by Google and AltaVista. More popular sites by number of linking sites were also more likely to provide updates on other breast cancer research, information on legislation and advocacy, and a message board service. Measures of quality such as display of authorship, attribution or references, currency of information, and disclosure did not differ between groups.Popularity of websites is associated with type rather than quality of content. Sites that include content correlated with popularity may best meet the public's desire for information about breast cancer.

    View details for Web of Science ID 000174384000021

    View details for PubMedID 11884322

    View details for PubMedCentralID PMC78995

  • SYNCHRONUS: A reusable software module for temporal integration Annual Symposium of the American-Medical-Informatics-Association Das, A. K., Musen, M. A. HANLEY & BELFUS INC MED PUBLISHERS. 2002: 195–199

    Abstract

    Querying time-stamped data in clinical databases is an essential step in the actuation of many decision-support rules. Since previous methods of temporal data management are not readily transferable among legacy databases, developers must create de novo querying methods that allow temporal integration of a decision-support program and existing database. In this paper, we outline four software-engineering principles that support a general, reusable approach to temporal integration. We then describe the design and implementation of SYNCHRONUS, a software module that advances our prior work on temporal querying. We show how this module satisfies the four principles for the task of temporal integration. SYNCHRONUS can help developers to overcome the software-engineering burden of temporal model heterogeneity within decision-support architectures.

    View details for Web of Science ID 000189418100040

    View details for PubMedID 12463814

  • Challenges for Medical Informatics as an Academic Discipline: Workshop Report. Yearbook of medical informatics Musen, M. A., van Bemmel, J. H. 2002: 194-197

    View details for PubMedID 27706365

  • A typology for modeling processes in clinical guidelines and protocols Annual Symposium of the American-Medical-Informatics-Association Tu, S. W., Johnson, P. D., Musen, M. A. HANLEY & BELFUS INC MED PUBLISHERS. 2002: 1181–1181
  • Standards-based sharable active guideline environment (SAGE): A project to develop a universal framework for encoding and disseminating electronic clinical practice guidelines Annual Symposium of the American-Medical-Informatics-Association Beard, N., Campbell, J. R., Huff, S. M., Leon, M., MANSFIELD, J. G., Mays, E., McClay, J., Mohr, D. N., Musen, M. A., O'Brien, D., Rocha, R. A., Saulovich, A., Scheitel, S. M., Tu, S. W. HANLEY & BELFUS INC MED PUBLISHERS. 2002: 973–973
  • Conceptual heterogeneity complicates automated syndromic surveillance for bioterrorism Annual Symposium of the American-Medical-Informatics-Association Graham, J., Buckeridge, D., Choy, M., Musen, M. HANLEY & BELFUS INC MED PUBLISHERS. 2002: 1030–1030
  • Use of Protege-2000 to encode clinical guidelines Annual Symposium of the American-Medical-Informatics-Association Shankar, R. D., Tu, S. W., Musen, M. A. HANLEY & BELFUS INC MED PUBLISHERS. 2002: 1164–1164
  • Protocol design patterns: Domain-oriented abstractions to support the authoring of computer-executable clinical trials Annual Symposium of the American-Medical-Informatics-Association Nguyen, J. H., Kahn, M. G., Broverman, C. A., Musen, M. A. HANLEY & BELFUS INC MED PUBLISHERS. 2002: 1114–1114
  • PROMPTDIFF: A fixed-point algorithm for comparing ontology versions 18th National Conference on Artificial Intelligence/14th Conference on Innovative Applications of Artificial Intelligence Noy, N. E., Musen, M. A. M I T PRESS. 2002: 744–750
  • Knowledge-based bioterrorism surveillance Annual Symposium of the American-Medical-Informatics-Association Buckeridge, D. L., Graham, J., O'Connor, M. J., Choy, M. K., Tu, S. W., Musen, M. A. HANLEY & BELFUS INC MED PUBLISHERS. 2002: 76–80

    Abstract

    An epidemic resulting from an act of bioterrorism could be catastrophic. However, if an epidemic can be detected and characterized early on, prompt public health intervention may mitigate its impact. Current surveillance approaches do not perform well in terms of rapid epidemic detection or epidemic monitoring. One reason for this shortcoming is their failure to bring existing knowledge and data to bear on the problem in a coherent manner. Knowledge-based methods can integrate surveillance data and knowledge, and allow for careful evaluation of problem-solving methods. This paper presents an argument for knowledge-based surveillance, describes a prototype of BioSTORM, a system for real-time epidemic surveillance, and shows an initial evaluation of this system applied to a simulated epidemic from a bioterrorism attack.

    View details for Web of Science ID 000189418100016

    View details for PubMedID 12463790

  • Medical informatics: Searching for underlying components METHODS OF INFORMATION IN MEDICINE Musen, M. A. 2002; 41 (1): 12-19

    Abstract

    To discuss unifying principles that can provide a theory for the diverse aspects of work in medical informatics. If medical informatics is to have academic credibility, it must articulate a clear theory that is distinct from that of computer science or of other related areas of study.The notions of reusable domain antologies and problem-solving methods provide the foundation for current work on second-generation knowledge-based systems. These abstractions are also attractive for defining the core contributions of basic research in informatics. We can understand many central activities within informatics in terms defining, refining, applying, and evaluating domain ontologies and problem-solving methods.Construing work in medical informatics in terms of actions involving ontologies and problem-solving methods may move us closer to a theoretical basis for our field.

    View details for Web of Science ID 000174503800004

    View details for PubMedID 11933757

  • The chronus II temporal database mediator Annual Symposium of the American-Medical-Informatics-Association O'Connor, M. J., Tu, S. W., Musen, M. A. HANLEY & BELFUS INC MED PUBLISHERS. 2002: 567–571

    Abstract

    Clinical databases typically contain a significant amount of temporal information. This information is often crucial in medical decision-support systems. Although temporal queries are common in clinical systems, the medical informatics field has no standard means for representing or querying temporal data. Over the past decade, the temporal database community has made a significant amount of progress in temporal systems. Much of this research can be applied to clinical database systems. This paper outlines a temporal database mediator called Chronus II. Chronus II extends the standard relational model and the SQL query language to support temporal queries. It provides an expressive general-purpose temporal query language that is tuned to the querying requirements of clinical decision support systems. This paper describes how we have used Chronus II to tackle a variety of clinical problems in decision support systems developed by our group.

    View details for Web of Science ID 000189418100115

    View details for PubMedID 12474882

  • A framework for evidence-adaptive quality assessment that unifies guideline-based and performance-indicator approaches Annual Symposium of the American-Medical-Informatics-Association Advani, A., Goldstein, M., Musen, M. A. HANLEY & BELFUS INC MED PUBLISHERS. 2002: 2–6

    Abstract

    Automated quality assessment of clinician actions and patient outcomes is a central problem in guideline- or standards-based medical care. In this paper we describe a unified model representation and algorithm for evidence-adaptive quality assessment scoring that can: (1) use both complex case-specific guidelines and single-step population-wide performance-indicators as quality measures; (2) score adherence consistently with quantitative population-based medical utilities of the quality measures where available; and (3) give worst-case and best-case scores for variations based on (a) uncertain knowledge of the best practice, (b) guideline customization to an individual patient or particular population, (c) physician practice style variation, or (d) imperfect reliability of the quality measure. Our solution uses fuzzy measure-theoretic scoring to handle the uncertain knowledge about best-practices and the ambiguity from practice variation. We show results of applying our method to retrospective data from a guideline project to improve the quality of hypertension care.

    View details for Web of Science ID 000189418100001

    View details for PubMedID 12463775

    View details for PubMedCentralID PMC2244239

  • Creating Semantic Web contents with Protege-2000 IEEE INTELLIGENT SYSTEMS & THEIR APPLICATIONS Noy, N. F., Sintek, M., Decker, S., Crubezy, M., Fergerson, R. W., Musen, M. A. 2001; 16 (2): 60-71
  • Building an explanation function for a hypertension decision-support system 10th World Congress on Medical Informatics (MEDINFO 2001) Shankar, R. D., Martins, S. B., Tu, S. W., Goldstein, M. K., Musen, M. A. I O S PRESS. 2001: 538–542

    Abstract

    ATHENA DSS is a decision-support system that provides recommendations for managing hypertension in primary care. ATHENA DSS is built on a component-based architecture called EON. User acceptance of a system like this one depends partly on how well the system explains its reasoning and justifies its conclusions. We addressed this issue by adapting WOZ, a declarative explanation framework, to build an explanation function for ATHENA DSS. ATHENA DSS is built based on a component-based architecture called EON. The explanation function obtains its information by tapping into EON's components, as well as into other relevant sources such as the guideline document and medical literature. It uses an argument model to identify the pieces of information that constitute an explanation, and employs a set of visual clients to display that explanation. By incorporating varied information sources, by mirroring naturally occurring medical arguments and by utilizing graphic visualizations, ATHENA DSS's explanation function generates rich, evidence-based explanations.

    View details for Web of Science ID 000172901700127

    View details for PubMedID 11604798

  • Patient safety in guideline-based decision support for hypertension management: ATHENA DSS Annual Symposium of the American-Medical-Informatics-Association (AMIA 2001) Goldstein, M. K., Hoffman, B. B., Coleman, R. W., Tu, S. W., Shankar, R. D., O'Connor, M., Martins, S., Advani, A., Musen, M. A. BMJ PUBLISHING GROUP. 2001: 214–218

    Abstract

    The Institute of Medicine recently issued a landmark report on medical error.1 In the penumbra of this report, every aspect of health care is subject to new scrutiny regarding patient safety. Informatics technology can support patient safety by correcting problems inherent in older technology; however, new information technology can also contribute to new sources of error. We report here a categorization of possible errors that may arise in deploying a system designed to give guideline-based advice on prescribing drugs, an approach to anticipating these errors in an automated guideline system, and design features to minimize errors and thereby maximize patient safety. Our guideline implementation system, based on the EON architecture, provides a framework for a knowledge base that is sufficiently comprehensive to incorporate safety information, and that is easily reviewed and updated by clinician-experts.

    View details for Web of Science ID 000172263400045

    View details for PubMedID 11825183

    View details for PubMedCentralID PMC2243380

  • A virtual medical record for guideline-based decision support Annual Symposium of the American-Medical-Informatics-Association (AMIA 2001) Johnson, P. D., Tu, S. W., Musen, M. A., Purves, I. BMJ PUBLISHING GROUP. 2001: 294–298

    Abstract

    A major obstacle in deploying computer-based clinical guidelines at the point of care is the variability of electronic medical records and the consequent need to adapt guideline modeling languages, guideline knowledge bases, and execution engines to idiosyncratic data models in the deployment environment. This paper reports an approach, developed jointly by researchers at Newcastle and Stanford, where guideline models are encoded assuming a uniform virtual electronic medical record and guideline-specific concept ontologies. For implementing a guideline-based decision-support system in multiple deployment environments, we created mapping knowledge bases to link terms in the concept ontology with the terminology used in the deployment systems. Mediation components use these mapping knowledge bases to map data in locally deployed medical record architectures to the virtual medical record. We discuss the possibility of using the HL7 Reference Information Model (RIM) as the basis for a standardized virtual medical record, showing how this approach also complies with the European pre-standard ENV13606 for electronic healthcare record communication.

    View details for Web of Science ID 000172263400061

    View details for PubMedID 11825198

  • Representation of structural relationships in the Foundational Model of anatomy Mejino, J. L., Noy, N. F., Musen, M. A., Brinkley, J. F., Rosse, C. BMJ PUBLISHING GROUP. 2001: 973–973
  • A client-server framework for deploying a decision-support system in a resource-constrained environment O'Connor, M. J., Shankar, R. D., Tu, S. W., Advani, A., Goldstein, M. K., Coleman, R. W., Musen, M. A. BMJ PUBLISHING GROUP. 2001: 986–986
  • Modeling data and knowledge in the EON guideline architecture 10th World Congress on Medical Informatics (MEDINFO 2001) Tu, S. W., Musen, M. A. I O S PRESS. 2001: 280–284

    Abstract

    Compared to guideline representation formalisms, data and knowledge modeling for clinical guidelines is a relatively neglected area. Yet it has enormous impact on the format and expressiveness of decision criteria that can be written, on the inferences that can be made from patient data, on the ease with which guidelines can be formalized, and on the method of integrating guideline-based decision-support services into implementation sites' information systems. We clarify the respective roles that data and knowledge modeling play in providing patient-specific decision support based on clinical guidelines. We show, in the context of the EON guideline architecture, how we use the Protégé-2000 knowledge-engineering environment to build (1) a patient-data information model, (2) a medical-specialty model, and (3) a guideline model that formalizes the knowledge needed to generate recommendations regarding clinical decisions and actions. We show how the use of such models allows development of alternative decision-criteria languages and allows systematic mapping of the data required for guideline execution from patient data contained in electronic medical record systems.

    View details for Web of Science ID 000172901700062

    View details for PubMedID 11604749

  • A formal method to resolve temporal mismatches in clinical databases Annual Symposium of the American-Medical-Informatics-Association (AMIA 2001) Das, A. K., Musen, M. A. BMJ PUBLISHING GROUP. 2001: 130–134

    Abstract

    Overcoming data heterogeneity is essential to the transfer of decision-support programs to legacy databases and to the integration of data in clinical repositories. Prior methods have focused primarily on problems of differences in terminology and patient identifiers, and have not addressed formally the problem of temporal data heterogeneity, even though time is a necessary element in storing, manipulating, and reasoning about clinical data. In this paper, we present a method to resolve temporal mismatches present in clinical databases. This method is based on a foundational model of time that can formalize various temporal representations. We use this temporal model to define a novel set of twelve operators that can map heterogeneous time-stamped data into a uniform temporal scheme. We present an algorithm that uses these mapping operators, and we discuss our implementation and evaluation of the method as a software program called Synchronus.

    View details for Web of Science ID 000172263400028

    View details for PubMedID 11825168

    View details for PubMedCentralID PMC2243601

  • Medical quality assessment by scoring adherence to guideline intentions Annual Symposium of the American-Medical-Informatics-Association (AMIA 2001) Advani, A., Shahar, Y., Musen, M. A. BMJ PUBLISHING GROUP. 2001: 2–6

    Abstract

    Quality assessment of clinician actions and patient outcomes is a central problem in guideline- or standards-based medical care. In this paper we describe an approach for evaluating and consistently scoring clinician adherence to medical guidelines using the intentions of guideline authors. We present the Quality Indicator Language (QUIL) that may be used to formally specify quality constraints on physician behavior and patient outcomes derived from medical guidelines. We present a modeling and scoring methodology for consistently evaluating multi-step and multi-choice guideline plans based on guideline intentions and their revisions.

    View details for Web of Science ID 000172263400002

    View details for PubMedID 11825146

  • Integration of textual guideline documents with formal guideline knowledge bases Annual Symposium of the American-Medical-Informatics-Association (AMIA 2001) Shankar, R. D., Tu, S. W., Martins, S. B., Fagan, L. M., Goldstein, M. K., Musen, M. A. BMJ PUBLISHING GROUP. 2001: 617–621

    Abstract

    Numerous approaches have been proposed to integrate the text of guideline documents with guideline-based care systems. Current approaches range from serving marked up guideline text documents to generating advisories using complex guideline knowledge bases. These approaches have integration problems mainly because they tend to rigidly link the knowledge base with text. We are developing a bridge approach that uses an information retrieval technology. The new approach facilitates a versatile decision-support system by using flexible links between the formal structures of the knowledge base and the natural language style of the guideline text.

    View details for Web of Science ID 000172263400126

    View details for PubMedID 11825260

  • RASTA: A distributed temporal abstraction system to facilitate knowledge-driven monitoring of clinical databases 10th World Congress on Medical Informatics (MEDINFO 2001) O'Connor, M. J., Grosso, W. E., Tu, S. W., Musen, M. A. I O S PRESS. 2001: 508–512

    Abstract

    The time dimension is very important for applications that reason with clinical data. Unfortunately, this task is inherently computationally expensive. As clinical decision support systems tackle increasingly varied problems, they will increase the demands on the temporal reasoning component, which may lead to slow response times. This paper addresses this problem. It describes a temporal reasoning system called RASTA that uses a distributed algorithm that enables it to deal with large data sets. The algorithm also supports a variety of configuration options, enabling RASTA to deal with a range of application requirements.

    View details for Web of Science ID 000172901700121

    View details for PubMedID 11604792

  • Integration and beyond: Linking information from disparate sources and into workflow JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Stead, W. W., MILLER, R. A., Musen, M. A., Hersh, W. R. 2000; 7 (2): 135-145

    Abstract

    The vision of integrating information-from a variety of sources, into the way people work, to improve decisions and process-is one of the cornerstones of biomedical informatics. Thoughts on how this vision might be realized have evolved as improvements in information and communication technologies, together with discoveries in biomedical informatics, and have changed the art of the possible. This review identified three distinct generations of "integration" projects. First-generation projects create a database and use it for multiple purposes. Second-generation projects integrate by bringing information from various sources together through enterprise information architecture. Third-generation projects inter-relate disparate but accessible information sources to provide the appearance of integration. The review suggests that the ideas developed in the earlier generations have not been supplanted by ideas from subsequent generations. Instead, the ideas represent a continuum of progress along the three dimensions of workflow, structure, and extraction.

    View details for Web of Science ID 000085723800004

    View details for PubMedID 10730596

  • The impact of displayed awards on the credibility and retention of Web site information Annual Symposium of the American-Medical-Informatics-Association Shon, J., Marshall, J., Musen, M. A. HANLEY & BELFUS INC. 2000: 794–798

    Abstract

    Ratings systems and awards for medical Web sites have proliferated, but the validity and utility of the systems has not been well established. This study examined the effect of awards on the perceived credibility and retention of health information on a Web page. We recruited study participants from Internet newsgroups and presented them with information on the claimed health benefits of shark cartilage. Participants were randomized to receive health information with and without a medical award present on the page. We subsequently asked them to evaluate the credibility of the Web page and posed multiple-choice questions regarding the content of the pages. 137 completed responses were included for analysis. Our results show that the presentation of awards has no significant effect on the credibility or retention of health information on a Web page. Significantly, the highly educated participants in our study found inaccurate and misleading information on shark cartilage to be slightly believable.

    View details for Web of Science ID 000170207500162

    View details for PubMedID 11079993

    View details for PubMedCentralID PMC2243820

  • PROMPT: Algorithm and tool for automated ontology merging and alignment 17th National Conference on Artificial Intelligence (AAAI-2000)/12th Conference on Innovative Applications of Artificial Intelligence (IAAI-2000) Noy, N. F., Musen, M. A. M I T PRESS. 2000: 450–455
  • The knowledge model of Protege-2000: Combining interoperability and flexibility 12th International Conference on Knowledge Engineering and Knowledge Management Noy, N. F., Fergerson, R. W., Musen, M. A. SPRINGER-VERLAG BERLIN. 2000: 17–32
  • Knowledge representation and tool support for critiquing clinical trial protocols Annual Symposium of the American-Medical-Informatics-Association Rubin, D. L., Gennari, J., Musen, M. A. HANLEY & BELFUS INC. 2000: 724–728

    Abstract

    The increasing complexities of clinical trials have led to increasing costs for investigators and organizations that author and administer those trials. The process of authoring a clinical trial protocol, the document that specifies the details of the study, is usually a manual task, and thus authors may introduce subtle errors in medical and procedural content. We have created a protocol inspection and critiquing tool (PICASSO) that evaluates the procedural aspects of a clinical trial protocol. To implement this tool, we developed a knowledge base for clinical trials that contains knowledge of the medical domain (diseases, drugs, lab tests, etc.) and of specific requirements for clinical trial protocols (eligibility criteria, patient treatments, and monitoring activities). We also developed a set of constraints, expressed in a formal language, that describe appropriate practices for authoring clinical trials. If a clinical trial designed with PICASSO violates any of these constraints, PICASSO generates a message to the user and a list of inconsistencies for each violated constraint. To test our methodology, we encoded portions of a hypothetical protocol and implemented designs consistent and inconsistent with known clinical trial practice. Our hope is that this methodology will be useful for standardizing new protocols and improving their quality.

    View details for Web of Science ID 000170207500148

    View details for PubMedID 11079979

  • A case study in using Protege-2000 as a tool for CommonKADS 12th International Conference on Knowledge Engineering and Knowledge Management Schreiber, G., Crubezy, M., Musen, M. SPRINGER-VERLAG BERLIN. 2000: 33–48
  • Explanations for a hypertension decision-support system Shankar, R. D., Tu, S. W., Goldstein, M. K., Musen, M. A. HANLEY & BELFUS INC. 2000: 1136–1136
  • Representation of temporal indeterminacy in clinical databases Annual Symposium of the American-Medical-Informatics-Association O'Connor, M. J., Tu, S. W., Musen, M. A. HANLEY & BELFUS INC. 2000: 615–619

    Abstract

    Temporal indeterminancy is common in clinical medicine because the time of many clinical events is frequently not precisely known. Decision support systems that reason with clinical data may need to deal with this indeterminancy. This indeterminacy support must have a sound foundational model so that other system components may take advantage of it. In particular, it should operate in concert with temporal abstraction, a feature that is crucial in several clinical decision support systems that our group has developed. We have implemented a temporal query system called Tzolkin that provides extensive support for the temporal indeterminancies found in clinical medicine, and have integrated this support with our temporal abstraction mechanism. The resulting system provides a simple, yet powerful approach for dealing with temporal indeterminancy and temporal abstraction.

    View details for Web of Science ID 000170207500126

    View details for PubMedID 11079957

  • From guideline modeling to guideline execution: Defining guideline-based decision-support services Annual Symposium of the American-Medical-Informatics-Association Tu, S. W., Musen, M. A. HANLEY & BELFUS INC. 2000: 863–867

    Abstract

    We describe our task-based approach to defining the guideline-based decision-support services that the EON system provides. We categorize uses of guidelines in patient-specific decision support into a set of generic tasks--making of decisions, specification of work to be performed, interpretation of data, setting of goals, and issuance of alert and reminders--that can be solved using various techniques. Our model includes constructs required for representing the knowledge used by these techniques. These constructs form a toolkit from which developers can select modeling solutions for guideline task. Based on the tasks and the guideline model, we define a guideline-execution architecture and a model of interactions between a decision-support server and clients that invoke services provided by the server. These services use generic interfaces derived from guideline tasks and their associated modeling constructs. We describe two implementations of these decision-support services and discuss how this work can be generalized. We argue that a well-defined specification of guideline-based decision-support services will facilitate sharing of tools that implement computable clinical guidelines.

    View details for Web of Science ID 000170207500176

    View details for PubMedID 11080007

  • Implementing clinical practice guidelines while taking account of changing evidence: ATHENA DSS, an easily modifiable decision-support system for managing hypertension in primary care Annual Symposium of the American-Medical-Informatics-Association Goldstein, M. K., Hoffman, B. B., Coleman, R. W., Musen, M. A., Tu, S. W., Advani, A., Shankar, R., O'Connor, M. HANLEY & BELFUS INC. 2000: 300–304

    Abstract

    This paper describes the ATHENA Decision Support System (DSS), which operationalizes guidelines for hypertension using the EON architecture. ATHENA DSS encourages blood pressure control and recommends guideline-concordant choice of drug therapy in relation to comorbid diseases. ATHENA DSS has an easily modifiable knowledge base that specifies eligibility criteria, risk stratification, blood pressure targets, relevant comorbid diseases, guideline-recommended drug classes for patients with comorbid disease, preferred drugs within each drug class, and clinical messages. Because evidence for best management of hypertension evolves continually, ATHENA DSS is designed to allow clinical experts to customize the knowledge base to incorporate new evidence or to reflect local interpretations of guideline ambiguities. Together with its database mediator Athenaeum, ATHENA DSS has physical and logical data independence from the legacy Computerized Patient Record System (CPRS) supplying the patient data, so it can be integrated into a variety of electronic medical record systems.

    View details for Web of Science ID 000170207500062

    View details for PubMedID 11079893

  • Ontology acquisition from on-line knowledge sources Annual Symposium of the American-Medical-Informatics-Association Li, Q., Shilane, P., Noy, N. F., Musen, M. A. HANLEY & BELFUS INC. 2000: 497–501

    Abstract

    Electronic knowledge representation is becoming more and more pervasive both in the form of formal ontologies and less formal reference vocabularies, such as UMLS. The developers of clinical knowledge bases need to reuse these resources. Such reuse requires a new generation of tools for ontology development and management. Medical experts with little or no computer science experience need tools that will enable them to develop knowledge bases and provide capabilities for directly importing knowledge not only from formal knowledge bases but also from reference terminologies. The portions of knowledge bases that are imported from disparate resources then need to be merged or aligned to one another in order to link corresponding terms, to remove redundancies, to resolve logical conflicts. We discuss the requirements for ontology-management tools that will enable interoperability of disparate knowledge sources. Our group is developing a suite of tools for knowledge-base management based on the Protégé-2000 environment for ontology development and knowledge acquisition. We describe one such tool in detail here: an application for incorporating information from remote knowledge sources such as UMLS into a Protégé knowledge base.

    View details for Web of Science ID 000170207500102

    View details for PubMedID 11079933

  • Design and use of clinical ontologies: Curricular goals for the education of health-telematics professionals 2nd European Workshop on Acceptance of Telematics Applications by Healthcare Professionals Musen, M. A. I O S PRESS. 2000: 40–47

    Abstract

    In computer science, the notion of a domain ontology--a formal specification of the concepts and of the relationships among concepts that characterize an application are a--has received considerable attention. In human-computer interaction, ontologies play a key role in defining the terms with which users and computer systems communicate. Such ontologies either implicitly or explicitly drive all dialogs between the computer and the user. In the construction of health-telematics applications, professionals need to understand how to design and apply domain ontologies to ensure effective communication with end-users. We currently are revising our training program in Medical Information Sciences at Stanford University to teach professional students in health telematics how to develop effective domain ontologies. Instruction concerning the construction and application of clinical domain ontologies should become an integral component of all health telematics curricula.

    View details for Web of Science ID 000086539700007

    View details for PubMedID 11010333

  • Scalable software architectures for decision support Grand Challenges Conference on Health Informatics - Challenges to Progress Musen, M. A. SCHATTAUER GMBH-VERLAG MEDIZIN NATURWISSENSCHAFTEN. 1999: 229–38

    Abstract

    Interest in decision-support programs for clinical medicine soared in the 1970s. Since that time, workers in medical informatics have been particularly attracted to rule-based systems as a means of providing clinical decision support. Although developers have built many successful applications using production rules, they also have discovered that creation and maintenance of large rule bases is quite problematic. In the 1980s, several groups of investigators began to explore alternative programming abstractions that can be used to build decision-support systems. As a result, the notions of "generic tasks" and of reusable problem-solving methods became extremely influential. By the 1990s, academic centers were experimenting with architectures for intelligent systems based on two classes of reusable components: (1) problem-solving methods--domain-independent algorithms for automating stereotypical tasks--and (2) domain ontologies that captured the essential concepts (and relationships among those concepts) in particular application areas. This paper highlights how developers can construct large, maintainable decision-support systems using these kinds of building blocks. The creation of domain ontologies and problem-solving methods is the fundamental end product of basic research in medical informatics. Consequently, these concepts need more attention by our scientific community.

    View details for Web of Science ID 000084637800002

    View details for PubMedID 10805007

  • Semi-automated entry of clinical temporal-abstraction knowledge JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Shahar, Y., Chen, H., Stites, D. P., Basso, L. V., Kaizer, H., Wilson, D. M., Musen, M. A. 1999; 6 (6): 494-511

    Abstract

    The authors discuss the usability of an automated tool that supports entry, by clinical experts, of the knowledge necessary for forming high-level concepts and patterns from raw time-oriented clinical data.Based on their previous work on the RESUME system for forming high-level concepts from raw time-oriented clinical data, the authors designed a graphical knowledge acquisition (KA) tool that acquires the knowledge required by RESUME. This tool was designed using Protégé, a general framework and set of tools for the construction of knowledge-based systems. The usability of the KA tool was evaluated by three expert physicians and three knowledge engineers in three domains-the monitoring of children's growth, the care of patients with diabetes, and protocol-based care in oncology and in experimental therapy for AIDS. The study evaluated the usability of the KA tool for the entry of previously elicited knowledge.The authors recorded the time required to understand the methodology and the KA tool and to enter the knowledge; they examined the subjects' qualitative comments; and they compared the output abstractions with benchmark abstractions computed from the same data and a version of the same knowledge entered manually by RESUME experts.Understanding RESUME required 6 to 20 hours (median, 15 to 20 hours); learning to use the KA tool required 2 to 6 hours (median, 3 to 4 hours). Entry times for physicians varied by domain-2 to 20 hours for growth monitoring (median, 3 hours), 6 and 12 hours for diabetes care, and 5 to 60 hours for protocol-based care (median, 10 hours). An increase in speed of up to 25 times (median, 3 times) was demonstrated for all participants when the KA process was repeated. On their first attempt at using the tool to enter the knowledge, the knowledge engineers recorded entry times similar to those of the expert physicians' second attempt at entering the same knowledge. In all cases RESUME, using knowledge entered by means of the KA tool, generated abstractions that were almost identical to those generated using the same knowledge entered manually.The authors demonstrate that the KA tool is usable and effective for expert physicians and knowledge engineers to enter clinical temporal-abstraction knowledge and that the resulting knowledge bases are as valid as those produced by manual entry.

    View details for PubMedID 10579607

  • Use of a domain model to drive an interactive knowledge-editing tool INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES Musen, M. A., Fagan, L. M., Combs, D. M., Shortliffe, E. H. 1999; 51 (2): 479-495
  • Integration of temporal reasoning and temporal-data maintenance into a reusable database mediator to answer abstract, time-oriented queries: The Tzolkin system JOURNAL OF INTELLIGENT INFORMATION SYSTEMS Nguyen, J. H., Shahar, Y., Tu, S. W., Das, A. K., Musen, M. A. 1999; 13 (1-2): 121-145
  • Justification of automated decision-making: Medical explanations as medical arguments Annual Symposium of the American-Medical-Informatics-Association Shankar, R. D., Musen, M. A. BMJ PUBLISHING GROUP. 1999: 395–399

    Abstract

    People use arguments to justify their claims. Computer systems use explanations to justify their conclusions. We are developing WOZ, an explanation framework that justifies the conclusions of a clinical decision-support system. WOZ's central component is the explanation strategy that decides what information justifies a claim. The strategy uses Toulmin's argument structure to define pieces of information and to orchestrate their presentation. WOZ uses explicit models that abstract the core aspects of the framework such as the explanation strategy. In this paper, we present the use of arguments, the modeling of explanations, and the explanation process used in WOZ. WOZ exploits the wealth of naturally occurring arguments, and thus can generate convincing medical explanations.

    View details for Web of Science ID 000170207300082

    View details for PubMedID 10566388

  • Tool support for authoring eligibility criteria for cancer trials Annual Symposium of the American-Medical-Informatics-Association Rubin, D. L., Gennari, J. H., Srinivas, S., Yuen, A., Kaizer, H., Musen, M. A., Silva, J. S. BMJ PUBLISHING GROUP. 1999: 369–373

    Abstract

    A critical component of authoring new clinical trial protocols is assembling a set of eligibility criteria for patient enrollment. We found that clinical protocols in three different cancer domains can be categorized according to a set of clinical states that describe various clinical scenarios for that domain. Classifying protocols in this manner revealed similarities among the eligibility criteria and permitted some standardization of criteria based on clinical state. We have developed an eligibility criteria authoring tool which uses a standard set of eligibility criteria and a diagram of the clinical states to present the relevant eligibility criteria to the protocol author. We demonstrate our ideas with phase-3 protocols from breast cancer, prostate cancer, and non-small cell lung cancer. Based on measurements of redundancy and percentage coverage of criteria included in our tool, we conclude that our model reduces redundancy in the number of criteria needed to author multiple protocols, and it allows some eligibility criteria to be authored automatically based on the clinical state of interest for a protocol.

    View details for Web of Science ID 000170207300077

    View details for PubMedID 10566383

  • EON 2.0: Enhanced middleware for automation of protocol-directed therapy Musen, M. A., Tu, S. W., Shankar, R. D., O'Connor, M. J., Advani, A. BMJ PUBLISHING GROUP. 1999: 1215–1215
  • Representing the digital anatomist foundational model as a protege ontology Hahn, J. S., Burnside, E., Brinkley, J. F., Rosse, C., Musen, M. A. BMJ PUBLISHING GROUP. 1999: 1070–1070
  • Applying temporal joins to clinical databases Annual Symposium of the American-Medical-Informatics-Association O'Connor, M. J., Tu, S. W., Musen, M. A. BMJ PUBLISHING GROUP. 1999: 335–339

    Abstract

    Clinical databases typically contain a significant amount of temporal information, information that is often crucial in medical decision-support systems. Most recent clinical information systems use the relational model when working with this information. Although these systems have reasonably well-defined semantics for temporal queries on a single relational table, many do not fully address the complex semantics of operations involving multiple temporal tables. Such operations can arise frequently in queries on clinical databases. This paper describes the issues encountered when joining a set of temporal tables, and outlines how such joins are far more complex than non-temporal ones. We describe the semantics of temporal joins in a query management system called Chronus II, a system we have developed to assist in evaluating patients for clinical trials.

    View details for Web of Science ID 000170207300070

    View details for PubMedID 10566376

  • Representation of change in controlled medical terminologies ARTIFICIAL INTELLIGENCE IN MEDICINE Oliver, D. E., Shahar, Y., Shortliffe, E. H., Musen, M. A. 1999; 15 (1): 53-76

    Abstract

    Computer-based systems that support health care require large controlled terminologies to manage names and meanings of data elements. These terminologies are not static, because change in health care is inevitable. To share data and applications in health care, we need standards not only for terminologies and concept representation, but also for representing change. To develop a principled approach to managing change, we analyze the requirements of controlled medical terminologies and consider features that frame knowledge-representation systems have to offer. Based on our analysis, we present a concept model, a set of change operations, and a change-documentation model that may be appropriate for controlled terminologies in health care. We are currently implementing our modeling approach within a computational architecture.

    View details for Web of Science ID 000078040100004

    View details for PubMedID 9930616

  • Integrating a modern knowledge-based system architecture with a legacy VA database: The ATHENA and EON projects at Stanford Annual Symposium of the American-Medical-Informatics-Association Advani, A., Tu, S., O'Connor, M., Coleman, R., Goldstein, M. K., Musen, M. BMJ PUBLISHING GROUP. 1999: 653–657

    Abstract

    We present a methodology and database mediator tool for integrating modern knowledge-based systems, such as the Stanford EON architecture for automated guideline-based decision-support, with legacy databases, such as the Veterans Health Information Systems & Technology Architecture (VISTA) systems, which are used nation-wide. Specifically, we discuss designs for database integration in ATHENA, a system for hypertension care based on EON, at the VA Palo Alto Health Care System. We describe a new database mediator that affords the EON system both physical and logical data independence from the legacy VA database. We found that to achieve our design goals, the mediator requires two separate mapping levels and must itself involve a knowledge-based component.

    View details for Web of Science ID 000170207300134

    View details for PubMedID 10566440

  • The low availability of metadata elements for evaluating the quality of medical information on the World Wide Web Annual Symposium of the American-Medical-Informatics-Association Shon, J., Musen, M. A. BMJ PUBLISHING GROUP. 1999: 945–949

    Abstract

    A great barrier to the use of Internet resources for patient education is the concern over the quality of information available. We conducted a study to determine what information was available in Web pages, both within text and metadata source code, that could be used in the assessment of information quality. Analysis of pages retrieved from 97 unique sites using a simple keyword search for "breast cancer treatment" on a generic and a health-specific search engine revealed that basic publishing elements were present in low frequency: authorship (20%), attribution/references (32%), disclosure (41%), and currency (35%). Only one page retrieved contained all four elements. Automated extraction of metadata elements from the source code of 822 pages retrieved from five popular generic search engines revealed even less information. We discuss the design of a metadata-based system for the evaluation of quality of medical content on the World Wide Web that addresses current limitations in ensuring quality.

    View details for Web of Science ID 000170207300194

    View details for PubMedID 10566500

    View details for PubMedCentralID PMC2232512

  • A flexible approach to guideline modeling Annual Symposium of the American-Medical-Informatics-Association Tu, S. W., Musen, M. A. BMJ PUBLISHING GROUP. 1999: 420–424

    Abstract

    We describe a task-oriented approach to guideline modeling that we have been developing in the EON project. We argue that guidelines seek to change behaviors by making statements involving some or all of the following tasks: (1) setting of goals or constraints, (2) making decisions among alternatives, (3) sequencing and synchronization of actions, and (4) interpreting data. Statements about these tasks make assumptions about models of time and of data abstractions, and about degree of uncertainty, points of view, and exception handling. Because of this variability in guideline tasks and assumptions, monolithic models cannot be custom tailored to the requirements of different classes of guidelines. Instead, we have created a core model that defines a set of basic concepts and relations and that uses different submodels to account for differing knowledge requirements. We describe the conceptualization of the guideline domain that underlies our approach, discuss components of the core model and possible submodels, and give three examples of specialized guideline models to illustrate how task-specific guideline models can be specialized and assembled to better match modeling requirements of different guidelines.

    View details for Web of Science ID 000170207300087

    View details for PubMedID 10566393

  • Domain ontologies in software engineering: Use of protege with the EON architecture METHODS OF INFORMATION IN MEDICINE Musen, M. A. 1998; 37 (4-5): 540-550

    Abstract

    Domain ontologies are formal descriptions of the classes of concepts and the relationships among those concepts that describe an application area. The Protégé software-engineering methodology provides a clear division between domain ontologies and domain-independent problem-solvers that, when mapped to domain ontologies, can solve application tasks. The Protégé approach allows domain ontologies to inform the total software-engineering process, and for ontologies to be shared among a variety of problem-solving components. We illustrate the approach by describing the development of EON, a set of middleware components that automate various aspects of protocol-directed therapy. Our work illustrates the organizing effect that domain ontologies can have on the software-development process. Ontologies, like all formal representations, have limitations in their ability to capture the semantics of application areas. Nevertheless, the capability of ontologies to encode clinical distinctions not usually captured by controlled medical terminologies provides significant advantages for developers and maintainers of clinical software applications.

    View details for Web of Science ID 000077676800026

    View details for PubMedID 9865052

  • Reuse, CORBA, and knowledge-based systems INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES Gennari, J. H., Cheng, H. N., Altman, R. B., Musen, M. A. 1998; 49 (4): 523-546
  • Episodic refinement of episodic skeletal-plan refinement INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES Tu, S. W., Musen, M. A. 1998; 48 (4): 475-497
  • A declarative explanation framework that uses a collection of visualization agents JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Shankar, R. D., Tu, S. W., Musen, M. A. 1998: 602-606

    Abstract

    User acceptance of a knowledge-based system depends partly on how effective the system is in explaining its reasoning and justifying its conclusions. The WOZ framework provides effective explanations for component-based decision-support systems. It represents explanation using explicit models, and employs a collection of visualization agents. It blends the strong features of existing explanation strategies, component-based systems, graphical visualizations, and explicit models. We illustrate the features of WOZ with the help of a component-based medical therapy system. We describe the explanation strategy, the roles of the visualization agents and components, and the communication structure. The integration of existing and new visualization applications, the domain-independent framework, and the incorporation of varied knowledge sources for explanation can result in a flexible explanation facility.

    View details for Web of Science ID 000171768600117

    View details for PubMedID 9929290

  • VM-in-Protege: A study of software reuse 9th World Congress on Medical Informatics: Global Health Networking - A Vision for the Next Millennium (MEDINFO 98) Park, J. Y., Musen, M. A. I O S PRESS. 1998: 644–648

    Abstract

    Protégé is a system that encompasses a suite of graphical tools and a methodology for applying them to the task of creating and maintaining knowledge-based systems. One of our key goals for Protégé is to facilitate reuse on new problems of components of previously developed solutions. We investigated this reusability by applying preexisting library components in the Protégé system to a reconstruction of VM, a well-known rule-based system for ventilator management. The formal steps of the Protégé methodology-ontology creation, problem-solving method selection, knowledge engineering, and mapping-relation instantiation-were followed, and a working system with much of the reasoning capability of the original VM was created. The work illuminated important lessons regarding aspects of component reusability.

    View details for Web of Science ID 000077613500128

    View details for PubMedID 10384534

  • Therapy planning as constraint satisfaction: A computer-based antiretroviral therapy advisor for the management of HIV JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Smith, D. S., Park, J. Y., Musen, M. A. 1998: 627-631

    Abstract

    We applied the Protégé methodology for building knowledge-based systems to the domain of antiretroviral therapy. We modeled the task of prescribing drug therapy for HIV, abstracting the essential characteristics of the problem solving. We mapped our model of the antiretroviral-therapy domain to the class of constraint-satisfaction problems, and reused the propose-and-revise problem-solving method, from the Protégé library of methods, to build an antiretroviral therapy advisor, ART Critic. Careful modeling and using Protégé allowed us to build a useful and extensible knowledge-based application rapidly.

    View details for Web of Science ID 000171768600122

    View details for PubMedID 9929295

  • Modern architectures for intelligent systems: Reusable ontologies and problem-solving methods JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Musen, M. A. 1998: 46-52

    Abstract

    When interest in intelligent systems for clinical medicine soared in the 1970s, workers in medical informatics became particularly attracted to rule-based systems. Although many successful rule-based applications were constructed, development and maintenance of large rule bases remained quite problematic. In the 1980s, an entire industry dedicated to the marketing of tools for creating rule-based systems rose and fell, as workers in medical informatics began to appreciate deeply why knowledge acquisition and maintenance for such systems are difficult problems. During this time period, investigators began to explore alternative programming abstractions that could be used to develop intelligent systems. The notions of "generic tasks" and of reusable problem-solving methods became extremely influential. By the 1990s, academic centers were experimenting with architectures for intelligent systems based on two classes of reusable components: (1) domain-independent problem-solving methods-standard algorithms for automating stereotypical tasks--and (2) domain ontologies that captured the essential concepts (and relationships among those concepts) in particular application areas. This paper will highlight how intelligent systems for diverse tasks can be efficiently automated using these kinds of building blocks. The creation of domain ontologies and problem-solving methods is the fundamental end product of basic research in medical informatics. Consequently, these concepts need more attention by our scientific community.

    View details for Web of Science ID 000171768600008

    View details for PubMedID 9929181

  • Sequential versus standard neural networks for pattern recognition: An example using the domain of coronary heart disease COMPUTERS IN BIOLOGY AND MEDICINE OHNOMACHADO, L., Musen, M. A. 1997; 27 (4): 267-281

    Abstract

    The goal of this study was to compare standard and sequential neural network models for recognition of patterns of disease progression. Medical researchers who perform prognostic modeling usually oversimplify the problem by choosing a single point in time to predict outcomes (e.g. death in 5 years). This approach not only fails to differentiate patterns of disease progression, but also wastes important information that is usually available in time-oriented research data bases. The adequate use of sequential neural networks can improve the performance of prognostic systems if the interdependencies among prognoses at different intervals of time are explicitly modeled. In such models, predictions for a certain interval of time (e.g. death within 1 year) are influenced by predictions made for other intervals, and prognostic survival curves that provide consistent estimates for several points in time can be produced. We developed a system of neural network models that makes use of time-oriented data to predict development of coronary heart disease (CHD), using a set of 2594 patients. The output of the neural network system was a prognostic curve representing survival without CHD, and the inputs were the values of demographic, clinical, and laboratory variables. The system of neural networks was trained by backpropagation and its results were evaluated in test sets of previously unseen cases. We showed that, by explicitly modeling time in the neural network architecture, the performance of the prognostic index, measured by the area under the receiver operating characteristic (ROC) curve, was significantly improved (p < 0.05).

    View details for Web of Science ID A1997XV85800002

    View details for PubMedID 9303265

  • A foundational model of time for heterogeneous clinical databases JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Das, A. K., Musen, M. A. 1997: 106-110

    Abstract

    Differences among the database representations of clinical data are a major barrier to the integration of databases and to the sharing of decision-support applications across databases. Prior research on resolving data heterogeneity has not addressed specifically the types of mismatches found in various timestamping approaches for clinical data. Such temporal mismatches, which include time-unit differences among timestamps, must be overcome before many applications can use these data to reason about diagnosis, therapy, or prognosis. In this paper, we present an analysis of the types of temporal mismatches that exist in databases. To formalize these various approaches to timestamping, we provide a foundational model of time. This model gives us the semantics necessary to encode the temporal dimensions of clinical data in legacy databases and to transform such heterogeneous data into a uniform temporal representation suitable for decision support. We have implemented this foundational model as an extension to our Chronus system, which provides clinical decision-support applications the ability to match temporal patterns in clinical databases. We discuss the uniqueness of our approach in comparison with other research on representing and querying clinical data with varying timestamp representations.

    View details for Web of Science ID 000171774300023

    View details for PubMedID 9357598

    View details for PubMedCentralID PMC2233542

  • EON: CORBA-based middleware for automation of protocol-directed therapy Musen, M. A., Tu, S. W., Advani, A., Das, A. K., Hasan, Z., Nguyen, J., Shahar, Y. BMJ PUBLISHING GROUP. 1997: 1025–1025
  • A temporal database mediator for protocol-based decision support JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Nguyen, J. H., Shahar, Y., Tu, S. S., Das, A. K., Musen, M. A. 1997: 298-302

    Abstract

    To meet the data-processing requirements for protocol-based decision support, a clinical data-management system must be capable of creating high-level summaries of time-oriented patient data, and of retrieving those summaries in a temporally meaningful fashion. We previously described a temporal-abstraction module (RESUME) and a temporal-querying module (Chronus) that can be used together to perform these tasks. These modules had to be coordinated by individual applications, however, to resolve the temporal queries of protocol planners. In this paper, we present a new module that integrates the previous two modules and that provides for their coordination automatically. The new module can be used as a standalone system for retrieving both primitive and abstracted time-oriented data, or can be embedded in a larger computational framework for protocol-based reasoning.

    View details for Web of Science ID 000171774300061

    View details for PubMedID 9357636

  • Knowledge-based temporal abstraction in clinical domains ARTIFICIAL INTELLIGENCE IN MEDICINE Shahar, Y., Musen, M. A. 1996; 8 (3): 267-298

    Abstract

    We have defined a knowledge-based framework for the creation of abstract, interval-based concepts from time-stamped clinical data, the knowledge-based temporal-abstraction (KBTA) method. The KBTA method decomposes its task into five subtasks; for each subtask we propose a formal solving mechanism. Our framework emphasizes explicit representation of knowledge required for abstraction of time-oriented clinical data, and facilitates its acquisition, maintenance, reuse and sharing. The RESUME system implements the KBTA method. We tested RESUME in several clinical-monitoring domains, including the domain of monitoring patients who have insulin-dependent diabetes. We acquired from a diabetes-therapy expert diabetes-therapy temporal-abstraction knowledge. Two diabetes-therapy experts (including the first one) created temporal abstractions from about 800 points of diabetic-patients' data. RESUME generated about 80% of the abstractions agreed by both experts; about 97% of the generated abstractions were valid. We discuss the advantages and limitations of the current architecture.

    View details for Web of Science ID A1996UW89100005

    View details for PubMedID 8830925

  • Reusable ontologies, knowledge-acquisition tools, and performance systems: PROTEGE-II solutions to Sisyphus-2 Knowledge Acquisition Workshop Rothenfluh, T. E., Gennari, J. H., Eriksson, H., Puerta, A. R., Tu, S. W., Musen, M. A. ACADEMIC PRESS LTD ELSEVIER SCIENCE LTD. 1996: 303–32
  • Knowledge acquisition for temporal abstraction. Proceedings : a conference of the American Medical Informatics Association / ... AMIA Annual Fall Symposium. AMIA Fall Symposium Stein, A., Musen, M. A., Shahar, Y. 1996: 204-208

    Abstract

    Temporal abstraction is the task of detecting relevant patterns in data over time. The knowledge-based temporal-abstraction method uses knowledge about a clinical domain's contexts, external events, and parameters to create meaningful interval-based abstractions from raw time-stamped clinical data. In this paper, we describe the acquisition and maintenance of domain-specific temporal-abstraction knowledge. Using the PROTEGE-II framework, we have designed a graphical tool for acquiring temporal knowledge directly from expert physicians, maintaining the knowledge in a sharable form, and converting the knowledge into a suitable format for use by an appropriate problem-solving method. In initial tests, the tool offered significant gains in our ability to rapidly acquire temporal knowledge and to use that knowledge to perform automated temporal reasoning.

    View details for PubMedID 8947657

  • Toward reusable software components at the point of care. Proceedings : a conference of the American Medical Informatics Association / ... AMIA Annual Fall Symposium. AMIA Fall Symposium Tuttle, M. S., Sherertz, D. D., Olson, N. E., Nelson, S. J., Erlbaum, M. S., Keck, K. D., Davis, A. N., Suarez-Munist, O. N., Lipow, S. S., Cole, W. G., Fagan, L. M., ACUFF, R. D., Crangle, C. E., Musen, M. A., Tu, S. W., Wiederhold, G. C., Carlson, R. W. 1996: 150-154

    Abstract

    An architecture built from five software components -a Router, Parser, Matcher, Mapper, and Server -fulfills key requirements common to several point-of-care information and knowledge processing tasks. The requirements include problem-list creation, exploiting the contents of the Electronic Medical Record for the patient at hand, knowledge access, and support for semantic visualization and software agents. The components use the National Library of Medicine Unified Medical Language System to create and exploit lexical closure-a state in which terms, text and reference models are represented explicitly and consistently. Preliminary versions of the components are in use in an oncology knowledge server.

    View details for PubMedID 8947646

  • Conceptual and formal specifications of problem-solving methods INTERNATIONAL JOURNAL OF EXPERT SYSTEMS Fensel, D., Eriksson, H., Musen, M. A., Studer, R. 1996; 9 (4): 507-532
  • Making generic guidelines site-specific. Proceedings : a conference of the American Medical Informatics Association / ... AMIA Annual Fall Symposium. AMIA Fall Symposium Fridsma, D. B., Gennari, J. H., Musen, M. A. 1996: 597-601

    Abstract

    Health care providers are more likely to follow a clinical guideline if the guideline's recommendations are consistent with the way in which their organization does its work. Unfortunately, developing guidelines that are specific to an organization is expensive, and limits the ability to share guidelines among different institutions. We describe a methodology that separates the site-independent information of guidelines from site-specific information, and that facilitates the development of site-specific guidelines from generic guidelines. We have used this methodology in a prototype system that assists developers in creating generic guidelines that are sharable across different sites. When combined with site information, generic guidelines can be used to generate site-specific guidelines that are responsive to organizational change and that can be implemented at a level of detail that makes site-specific computer-based workflow management and simulation possible.

    View details for PubMedID 8947736

  • The EON model of intervention protocols and guidelines. Proceedings : a conference of the American Medical Informatics Association / ... AMIA Annual Fall Symposium. AMIA Fall Symposium Tu, S. W., Musen, M. A. 1996: 587-591

    Abstract

    We present a computational model of treatment protocols abstracted from implemented systems that we have developed previously. In our framework, a protocol is modeled as a hierarchical plan where high-level protocol steps are decomposed into descriptions of more specific actions. The clinical algorithms embodied in a protocol are represented by procedures that encode the sequencing, looping, and synchronization of protocol steps. The representation allows concurrent and optional protocol steps. We define the semantics of a procedure in terms of an execution model that specifies how the procedure should be interpreted. We show that the model can be applied to an asthma guideline different from the protocols for which the model was originally constructed.

    View details for PubMedID 8947734

  • Task modeling with reusable problem-solving methods ARTIFICIAL INTELLIGENCE Eriksson, H., Shahar, Y., Tu, S. W., Puerta, A. R., Musen, M. A. 1995; 79 (2): 293-326
  • Computer-based screening of patients with HIV/AIDS for clinical-trial eligibility. The Online journal of current clinical trials Carlson, R. W., Tu, S. W., Lane, N. M., Lai, T. L., Kemper, C. A., Musen, M. A., Shortliffe, E. H. 1995; Doc No 179: [3347 words, 32 paragraphs]

    Abstract

    To assess the potential effect of a computer-based system on accrual to clinical trials, we have developed methodology to identify retrospectively and prospectively patients who are eligible or potentially eligible for protocols.Retrospective chart abstraction with computer screening of data for potential protocol eligibility.A county-operated clinic serving human immunodeficiency virus (HIV) positive patients with or without acquired immune deficiency syndrome (AIDS).A randomly selected group of 60 patients who were HIV-infected, 30 of whom had an AIDS-defining diagnosis.Using a computer-based eligibility screening system, for each clinic visit and hospitalization, patients were categorized as eligible, potentially eligible, or ineligible for each of the 17 protocols active during the 7-month study period. Reasons for ineligibility were categorized.None of the patients was enrolled on a clinical trial during the 7-month period. Thirteen patients were identified as eligible for protocol; three patients were eligible for two different protocols; and one patient was eligible for the same protocol during two different time intervals. Fifty-four patients were identified as potentially eligible for a total of 165 accrual opportunities, but important information, such as the result of a required laboratory test, was missing, so that eligibility could not be determined unequivocally. Ineligibility for protocol was determined in 414 (35%) potential opportunities based only on conditions that were amenable to modification, such as the use of concurrent medications; 194 (17%) failed only laboratory tests or subjective determinations not routinely performed; and 346 (29%) failed only routine laboratory tests.There are substantial numbers of eligible and potentially eligible patients who are not enrolled or evaluated for enrollment in prospective clinical trials. Computer-based eligibility screening when coupled with a computer-based medical record offers the potential to identify patients eligible or potentially eligible for clinical trial, to assist in the selection of protocol eligibility criteria, and to make accrual estimates.

    View details for PubMedID 7719564

  • THE SEPARATION OF REVIEWING KNOWLEDGE FROM MEDICAL KNOWLEDGE METHODS OF INFORMATION IN MEDICINE VANDERLEI, J., Musen, M. A. 1995; 34 (1-2): 131-139

    Abstract

    The developers of reviewing systems that rely on computer-based patient-record systems as a source of data need to model reviewing knowledge and medical knowledge. We simulate how the same medical knowledge could be entered in four different systems: CARE, the Arden syntax, Essential-attending and HyperCritic. We subsequently analyze how the original knowledge is represented in the symbols or syntax used by these systems. We conclude that these systems provide different alternatives in dealing with the vocabulary provided by the computer-based patient records. In addition, the use of computer-based patient records for review poses new challenges for the content of that record: to facilitate review, the reasoning of the physician needs to be captured in addition to the actions of the physician.

    View details for Web of Science ID A1995QT06800016

    View details for PubMedID 9082122

  • The development of a controlled medical terminology: identification, collaboration, and customization. Medinfo. MEDINFO Miller, E. T., WIECKERT, K. E., Fagan, L. M., Musen, M. A. 1995; 8: 148-152

    Abstract

    An increasing focus in health care is the development and use of electronic medical record systems to capture and store patient information. T-HELPER is an electronic medical record system that health care providers use to record ambulatory-care patient progress notes. These data are stored in an on-line database and analyzed by T-HELPER to provide users with decision support regarding patient eligibility for clinical trial protocols and assistance with continued protocol-based care. Our goal is to provide a system that enhances the process of identifying patients who are potentially eligible for clinical trials of experimental therapies in a clinic that is limited by the existence of a singular clinical trial coordinator. Effective implementation of such a system requires the development of a meaningful controlled medical terminology that satisfies the needs of a diverse community of providers all of who contribute to the health care process. The development of a controlled medical terminology is a process of identification, collaboration, and customization. We enlisted the help of collaborators familiar with the proposed work environment to identify user needs, to collaborate with our development team to construct the preliminary terminology, and to customize the controlled medical terminology to make it meaningful and acceptable to the clinic users.

    View details for PubMedID 8591141

  • A component-based architecture for automation of protocol-directed therapy 5th Conference on Artificial Intelligence in Medicine Europe (AIME 95) Musen, M. A., Tu, S. W., Das, A. K., Shahar, Y. SPRINGER-VERLAG BERLIN. 1995: 3–13
  • CALIPER: individualized-growth curves for the monitoring of children's growth. Medinfo. MEDINFO Kuilboer, M. M., Wilson, D. M., Musen, M. A., Wit, J. M. 1995; 8: 1686-?

    Abstract

    Monitoring children's growth is a fundamental part of pediatric care. Deviation from the expected growth pattern can be an important sign of disease and often results in parental anxiety. Most preprinted growth curves are based on cross-sectional data derived from population-based studies of normal children. Since the age of the pubertal growth spurt varies substantially among the normal curves, these curves don't adequately reflect the expected growth pattern of an individual child. In addition, any preprinted growth curve based on the general population becomes less useful when the maturation of a child and the heights of it's parents differ substantially from the average. Established methods exist to adjust the general reference-growth curves for parental height. However, these methods generally are too time consuming to be used in clinic. Only heuristic methods are known to us to adjust the general-reference curves for maturation. We have developed the decision-support system CALIPER, that enables and standardizes the generation of individualized reference-growth curves. CALIPER consists of a graphical interface for data entry, a progress-report generator, and a module for the interactive, dynamic display of general-reference curves and individualized-reference curves. Preference settings such as ethnic background and gender determine the required population curves and individualization method. Individualization can be based on parental height and/or maturation. Maturation is based on an assessment of a child's bone age and/or pubertal stage. The bone age can be assessed by different methods. We have performed an evaluation of CALIPER's methodology by assessing the effect of individualization on the reference growth curves for 466 normally growing children. The individualized-reference curves reflect the growth pattern of children significantly better than the general-reference curves. CALIPER can be used on a case by case base as an aid in clinic (assessment of children's growth and communication with patient and parents) or as a tool to investigate current clinical questions concerning the relation of bone age, pubertal stage, and growth pattern for any part of the population. Besides providing for decision support by the interactive graphical representation of individualized-reference curves and growth data, CALIPER will be linked to a module that can provide automatic interpretation of the data (Kuilboer et al, SCAMC-93). CALIPER runs on a Macintosh, and requires 600K of memory. A color monitor is preferable, but not required. We will demonstrate several cases that will illustrate the clinical problem and CALIPER's potential.

    View details for PubMedID 8591546

  • PROTEGE-II: computer support for development of intelligent systems from libraries of components. Medinfo. MEDINFO Musen, M. A., Gennari, J. H., Eriksson, H., Tu, S. W., Puerta, A. R. 1995; 8: 766-770

    Abstract

    PROTEGE-II is a suite of tools that facilitates the development of intelligent systems. A tool called MAiTRE allows system builders to create and refine abstract models (ontologies) of application domains. A tool called DASH takes as input a modified domain ontology and generates automatically a knowledge-acquisition tool that application specialists can use to enter the detailed content knowledge required to define particular applications. The domain-dependent knowledge entered into the knowledge-acquisition tool is used by assemblies of domain-independent problem-solving methods that provide the computational strategies required to solve particular application tasks. The result is an architecture that offers a divide-and-conquer approach that separates system-building tasks that require skill in domain analysis and modeling from those that require simple entry of content knowledge. At the same time, applications can be constructed from libraries of component--of both domain ontologies and domain-independent problem-solving methods--allowing the reuse of knowledge and facilitating ongoing system maintenance. We have used PROTEGE-II to construct a number of knowledge-based systems, including the reasoning components of T-Helper, which assists physicians in the protocol-based care of patients who have HIV infection.

    View details for PubMedID 8591322

  • Knowledge-based temporal abstraction in diabetes therapy. Medinfo. MEDINFO Shahar, Y., Das, A. K., Tu, S. W., Kraemer, F. B., Basso, L. V., Musen, M. A. 1995; 8: 852-856

    Abstract

    We suggest a general framework for solving the task of creating abstract, interval-based concepts from time-stamped clinical data. We refer to this problem-solving framework as the knowledge-based temporal-abstraction (KBTA) method. The KBTA method emphasizes explicit representation, acquisition, maintenance, reuse, and the sharing of knowledge required for abstraction of time-oriented clinical data. We describe the subtasks into which the KBTA method decomposes its task, the problem-solving mechanisms that solve these subtasks, and the knowledge necessary for instantiating these mechanisms in a particular clinical domain. We have implemented the KBTA method in the RESUME system and have applied it to the task of monitoring the care of insulin-dependent diabetics.

    View details for PubMedID 8591345

  • A comparison of the temporal expressiveness of three database query methods. Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care Das, A. K., Musen, M. A. 1995: 331-337

    Abstract

    Time is a multifaceted phenomenon that developers of clinical decision-support systems can model at various levels of complexity. An unresolved issue for the design of clinical databases is whether the underlying data model should support interval semantics. In this paper, we examine whether interval-based operations are required for querying protocol-based conditions. We report on an analysis of a set of 256 eligibility criteria that the T-HELPER system uses to screen patients for enrollment in eight clinical-trial protocols for HIV disease. We consider three data-manipulation methods for temporal querying: the consensus query representation Arden Syntax, the commercial standard query language SQL, and the temporal query language TimeLineSQL (TLSQL). We compare the ability of these three query methods to express the eligibility criteria. Seventy nine percent of the 256 criteria require operations on time stamps. These temporal conditions comprise four distinct patterns, two of which use interval-based data. Our analysis indicates that the Arden Syntax can query the two non-interval patterns, which represent 54% of the temporal conditions. Timepoint comparisons formulated in SQL can instantiate the two non-interval patterns and one interval pattern, which encompass 96% of the temporal conditions. TLSQL, which supports an interval-based model of time, can express all four types of temporal patterns. Our results demonstrate that the T-HELPER system requires simple temporal operations for most protocol-based queries. Of the three approaches tested, TLSQL is the only query method that is sufficiently expressive for the temporal conditions in this system.

    View details for PubMedID 8563296

  • Hierarchical neural networks for survival analysis. Medinfo. MEDINFO Ohno-Machado, L., Walker, M. G., Musen, M. A. 1995; 8: 828-832

    Abstract

    Neural networks offer the potential of providing more accurate predictions of survival time than do traditional methods. Their use in medical applications has, however, been limited, especially when some data is censored or the frequency of events is low. To reduce the effect of these problems, we have developed a hierarchical architecture of neural networks that predicts survival in a stepwise manner. Predictions are made for the first time interval, then for the second, and so on. The system produces a survival estimate for patients at each interval, given relevant covariates, and is able to handle continuous and discrete variables, as well as censored data. We compared the hierarchical system of neural networks with a nonhierarchical system for a data set of 428 AIDS patients. The hierarchical model predicted survival more accurately than did the nonhierarchical (although both had low sensitivity). The hierarchical model could also learn the same patterns in less than half the time required by the nonhierarchical model. These results suggest that the use of hierarchical systems is advantageous when censored data is present, the number of events is small, and time-dependent variables are necessary.

    View details for PubMedID 8591339

  • A web-based architecture for a medical vocabulary server. Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care Gennari, J. H., Oliver, D. E., Pratt, W., Rice, J., Musen, M. A. 1995: 275-279

    Abstract

    For health care providers to share computing resources and medical application programs across different sites, those applications must share a common medical vocabulary. To construct a common vocabulary, researchers must have an architecture that supports collaborative, networked development. In this paper, we present a web-based server architecture for the collaborative development of a medical vocabulary: a system that provides network services in support of medical applications that need a common, controlled medical terminology. The server supports vocabulary browsing and editing and can respond to direct programmatic queries about vocabulary terms. We have tested the programmatic query-response capability of the vocabulary server with a medical application that determines when patients who have HIV infection may be eligible for certain clinical trials. Our emphasis in this paper is not on the content of the vocabulary, but rather on the communication protocol and the tools that enable collaborative improvement of the vocabulary by any network-connected user.

    View details for PubMedID 8563284

    View details for PubMedCentralID PMC2579098

  • A comparison of two computer-based prognostic systems for AIDS. Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care Ohno-Machado, L., Musen, M. A. 1995: 737-741

    Abstract

    We compare the performances of a Cox model and a neural network model that are used as prognostic tools for a cohort of people living with AIDS. We modeled disease progression for patients who had AIDS (according to the 1993 CDC definition) in a cohort of 588 patients in California, using data from the ATHOS project. We divided the study population into 10 training and 10 test sets and evaluated the prognostic accuracy of a Cox proportional hazards model and of a neural network model by determining the number of predicted deaths, the sensitivities, specificities, positive predictive values, and negative predictive values for intervals of one year following the diagnosis of AIDS. For the Cox model, we further tested the agreement between a series of binary observations, representing death in one, two, and three years, and a set of estimates which define the probability of survival for those intervals. Both models were able to provide accurate numbers on how many patients were likely to die at each interval, and reasonable individualized estimates for the two- and three-year survival of a given patient, but failed to provide reliable predictions for the first year after diagnosis. There was no evidence that the Cox model performed better than did the neural network model or vice-versa, but the former method had the advantage of providing some insight on which variables were most influential for prognosis. Nevertheless, it is likely that the assumptions required by the Cox model may not be satisfied in all data sets, justifying the use of neural networks in certain cases.

    View details for PubMedID 8563387

  • A rational reconstruction of INTERNIST-I using PROTEGE-II. Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care Musen, M. A., Gennari, J. H., Wong, W. W. 1995: 289-293

    Abstract

    PROTEGE-II is a methodology and a suite of tools that allow developers to build and maintain knowledge-based systems in a principled manner. We used PROTEGE-II to reconstruct the well-known INTERNIST-I system, demonstrating the role of a domain ontology (a framework for specification of a model of an application area), a reusable problem-solving method, and declarative mapping relations in creating a new, working program. PROTEGE-II generates automatically a domain-specific knowledge-acquisition tool, which, in the case of the INTERNIST-I reconstruction, has much of the functionality of the QMR-KAT knowledge-acquisition tool. This study provides a means to understand better both the PROTEGE-II methodology and the models that underlie INTERNIST-I.

    View details for PubMedID 8563287

  • GRAPH-GRAMMAR ASSISTANCE FOR AUTOMATED GENERATION OF INFLUENCE DIAGRAMS IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS EGAR, J. W., Musen, M. A. 1994; 24 (11): 1625-1642
  • A TEMPORAL QUERY SYSTEM FOR PROTOCOL-DIRECTED DECISION-SUPPORT METHODS OF INFORMATION IN MEDICINE Das, A. K., Musen, M. A. 1994; 33 (4): 358-370

    Abstract

    Chronus is a query system that supports temporal extensions to the Structured Query Language (SQL) for relational databases. Although the relational data model can store time-stamped data and can permit simple temporal-comparison operations, it does not provide either a closed or a sufficient algebra for manipulating temporal data. In this paper, we outline an algebra that maintains a consistent relational representation of temporal data and that allows the type of temporal queries needed for protocol-directed decision support. We also discuss how Chronus can translate between our temporal algebra and the relational algebra used for SQL queries. We have applied our system to the task of screening patients for clinical trials. Our results demonstrate that Chronous can express sufficiently all required temporal queries, and that the search time of such queries is similar to that of standard SQL.

    View details for Web of Science ID A1994PN89900007

    View details for PubMedID 7799812

  • GENERATION OF KNOWLEDGE-ACQUISITION TOOLS FROM DOMAIN ONTOLOGIES INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES Eriksson, H., Puerta, A. R., Musen, M. A. 1994; 41 (3): 425-453
  • MAPPING DOMAINS TO METHODS IN SUPPORT OF REUSE INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES Gennari, J. H., Tu, S. W., Rothenfluh, T. E., Musen, M. A. 1994; 41 (3): 399-424
  • A LOGICAL FOUNDATION FOR REPRESENTATION OF CLINICAL-DATA 16th Annual Symposium on Computer Applications in Medical Care Campbell, K. E., Das, A. K., Musen, M. A. HANLEY & BELFUS INC. 1994: 218–32

    Abstract

    A general framework for representation of clinical data that provides a declarative semantics of terms and that allows developers to define explicitly the relationships among both terms and combinations of terms.Use of conceptual graphs as a standard representation of logic and of an existing standardized vocabulary, the Systematized Nomenclature of Medicine (SNOMED International), for lexical elements. Concepts such as time, anatomy, and uncertainty must be modeled explicitly in a way that allows relation of these foundational concepts to surface-level clinical descriptions in a uniform manner.The proposed framework was used to model a simple radiology report, which included temporal references.Formal logic provides a framework for formalizing the representation of medical concepts. Actual implementations will be required to evaluate the practicality of this approach.

    View details for Web of Science ID A1994QE63900002

    View details for PubMedID 7719805

    View details for PubMedCentralID PMC116201

  • PROTEGE-II - A SUITE FOR TOOLS FOR DEVELOPMENT OF INTELLIGENT SYSTEMS FROM REUSABLE COMPONENTS 18th Annual Symposium on Computer Applications in Medical Care - Transforming Information, Changing Health Care Musen, M. A., Eriksson, H., Gennari, J. H., Tu, S. W., Puerta, A. R. BMJ PUBLISHING GROUP. 1994: 1065–1065

    View details for Web of Science ID A1994QF21600265

    View details for PubMedID 7949900

  • MODEL-BASED AUTOMATED GENERATION OF USER INTERFACES 12th National Conference on Artificial Intelligence Puerta, A. R., Eriksson, H., Gennari, J. H., Musen, M. A. MIT PRESS. 1994: 471–477
  • HIERARCHICAL NEURAL NETWORKS FOR PARTIAL DIAGNOSIS IN MEDICINE 1994 International-Neural-Network-Society Annual Meeting - World Congress on Neural Networks-San Diego OHNOMACHADO, L., Musen, M. A. LAWRENCE ERLBAUM ASSOC PUBL. 1994: A291–A296
  • BEYOND DATA MODELS FOR AUTOMATED USER INTERFACE GENERATION 1994 Conference of the British HCI Group - People and Computers IX (HCI 94) Puerta, A. R., Eriksson, H., Gennari, J. H., Musen, M. A. CAMBRIDGE UNIV PRESS. 1994: 353–366
  • A computer-based approach to quality improvement for telephone triage in a community AIDS clinic. Nursing administration quarterly Henry, S. B., Borchelt, D., SCHREINER, J. G., Musen, M. A. 1994; 18 (2): 65-73

    Abstract

    Observation of the current procedure for telephone triage in a community-based acquired immunodeficiency syndrome (AIDS) clinic and a retrospective chart audit identified opportunities for improvement in the process for the management of telephone triage encounters. Specifically, it pointed out that the nurses faced difficulties in accessing relevant clinical data, and that a large number of data were missing in the documentation for the encounter. Five design goals for a computer-based system to improve the management of the telephone triage encounter were generated by an interdisciplinary project team. A computer-based approach to management of the telephone triage encounter complemented by the development of performance standards and guidelines has the potential to improve both the process of telephone triage and the documentation of the triage encounter.

    View details for PubMedID 8159333

  • PATIENT-CARE APPLICATIONS ON INTERNET 18th Annual Symposium on Computer Applications in Medical Care - Transforming Information, Changing Health Care Barnett, O., SHORTLIFFE, E., Chueh, H., PIGGINS, J., Greenes, R., Cimino, J., Musen, M., Clayton, P., Humphreys, B., KINGSLAND, L. BMJ PUBLISHING GROUP. 1994: 1060–1060

    View details for Web of Science ID A1994QF21600260

    View details for PubMedID 7949895

  • KNOWLEDGE-BASED TEMPORAL ABSTRACTION FOR DIABETIC MONITORING 18th Annual Symposium on Computer Applications in Medical Care - Transforming Information, Changing Health Care Shahar, Y., Das, A. K., Tu, S. W., Kraemer, F. B., Musen, M. A. BMJ PUBLISHING GROUP. 1994: 697–701

    Abstract

    We have developed a general method that solves the task of creating abstract, interval-based concepts from time-stamped clinical data. We refer to this method as knowledge-based temporal-abstraction (KBTA). In this paper, we focus on the knowledge representation, acquisition, maintenance, reuse and sharing aspects of the KBTA method. We describe five problem-solving mechanisms that solve the five subtasks into which the KBTA method decomposes its task, and four types of knowledge necessary for instantiating these mechanisms in a particular domain. We present an example of instantiating the KBTA method in the clinical area of monitoring insulin-dependent-diabetes patients.

    View details for Web of Science ID A1994QF21600123

    View details for PubMedID 7950015

  • A TEMPORAL-ABSTRACTION MEDIATOR FOR PROTOCOL-BASED DECISION-SUPPORT SYSTEMS 18th Annual Symposium on Computer Applications in Medical Care - Transforming Information, Changing Health Care Das, A. K., Shahar, Y., Tu, S. W., Musen, M. A. BMJ PUBLISHING GROUP. 1994: 320–324

    Abstract

    The inability of many clinical decision-support applications to integrate with existing databases limits the wide-scale deployment of such systems. To overcome this obstacle, we have designed a data-interpretation module that can be embedded in a general architecture for protocol-based reasoning and that can support the fundamental task of detecting temporal abstractions. We have developed this software module by coupling two existing systems--RESUME and Chronus--that provide complementary temporal-abstraction techniques at the application and the database levels, respectively. Their encapsulation into a single module thus can resolve the temporal queries of protocol planners with the domain-specific knowledge needed for the temporal-abstraction task and with primary time-stamped data stored in autonomous clinical databases. We show that other computer methods for the detection of temporal abstractions do not scale up to the data- and knowledge-intensive environments of protocol-based decision-support systems.

    View details for Web of Science ID A1994QF21600058

    View details for PubMedID 7949943

  • A METHODOLOGY FOR DETERMINING PATIENTS ELIGIBILITY FOR CLINICAL-TRIALS METHODS OF INFORMATION IN MEDICINE Tu, S. W., Kemper, C. A., Lane, N. M., Carlson, R. W., Musen, M. A. 1993; 32 (4): 317-325

    Abstract

    The task of determining patients' eligibility for clinical trials is knowledge and data intensive. In this paper, we present a model for the task of eligibility determination, and describe how a computer system can assist clinical researchers in performing that task. Qualitative and probabilistic approaches to computing and summarizing the eligibility status of potentially eligible patients are described. The two approaches are compared, and a synthesis that draws on the strengths of each approach is proposed. The result of applying these techniques to a database of HIV-positive patient cases suggests that computer programs such as the one described can increase the accrual rate of eligible patients into clinical trials. These methods may also be applied to the task of determining from electronic patient records whether practice guidelines apply in particular clinical situations.

    View details for Web of Science ID A1993LQ71400012

    View details for PubMedID 8412828

  • RESUME - A TEMPORAL-ABSTRACTION SYSTEM FOR PATIENT MONITORING 16TH SYMP ON COMPUTER APPLICATIONS IN MEDICAL CARE Shahar, Y., Musen, M. A. ACADEMIC PRESS INC ELSEVIER SCIENCE. 1993: 255–73

    Abstract

    RESUME is a system that performs temporal abstraction of time-stamped data. The temporal-abstraction task is crucial for planning treatment, for executing treatment plans, for identifying clinical problems, and for revising treatment plans. The RESUME system is based on a model of three basic temporal-abstraction mechanisms: point temporal abstraction, a mechanism for abstracting the values of several parameters into a value of another parameter; temporal inference, a mechanism for inferring sound logical conclusions over a single interval or two meeting intervals; and temporal interpolation, a mechanism for bridging nonmeeting temporal intervals. Making explicit the knowledge required for temporal abstraction supports the acquisition and the sharing of that knowledge. We have implemented the RESUME system using the CLIPS knowledge-representation shell. The RESUME system emphasizes the need for explicit representation of temporal-abstraction knowledge, and the advantages of modular, task-specific but domain-independent architectures for building medical knowledge-based systems.

    View details for Web of Science ID A1993LG73100006

    View details for PubMedID 8325005

  • METATOOLS FOR KNOWLEDGE ACQUISITION IEEE SOFTWARE Eriksson, H., Musen, M. 1993; 10 (3): 23-29
  • RESPONSE OF GENERAL-PRACTITIONERS TO COMPUTER-GENERATED CRITIQUES OF HYPERTENSION THERAPY METHODS OF INFORMATION IN MEDICINE VANDERLEI, J., VANDERDOES, E., INTVELD, A. J., Musen, M. A., VANBEMMEL, J. H. 1993; 32 (2): 146-153

    Abstract

    We recently have shown that a computer system, known as HyperCritic, can successfully audit general practitioners' treatment of hypertension by analyzing computer-based patient records. HyperCritic reviews the electronic medical records and offers unsolicited advice. To determine which unsolicited advice might be perceived as inappropriate, builders of programs such as HyperCritic need insight into providers' responses to computer-generated critique of their patient care. Twenty medical charts, describing in total 243 visits of patients with hypertension, were audited by 8 human reviewers and by the critiquing-system HyperCritic. A panel of 14 general practitioners subsequently judged the relevance of those critiques on a five-point scale ranging from relevant critique to erroneous or harmful critique. The panel judged reviewers' comments to be either relevant or somewhat relevant in 61 to 68% of cases, and either erroneous or possibly erroneous in 15 to 18%; the panel judged HyperCritic's comments to be either relevant or somewhat relevant in 65% of cases, and either erroneous or possibly erroneous in 16%. Comparison of individual members of the panel showed large differences; for example, the portion of HyperCritic's comments judged relevant ranged from 0 to 82%. We conclude that, from the perspective of general practitioners, critiques generated by the critiquing system HyperCritic are perceived equally beneficial as critiques generated by human reviewers. Different general practitioners, however, judge the critiques differently. Before auditing systems based on computer-based patient records that are acceptable to practitioners can be introduced, additional studies are needed to evaluate the reasons a physician may have for judging critiques to be irrelevant, and to evaluate the effect of critiques on physician behavior.

    View details for Web of Science ID A1993LA06800009

    View details for PubMedID 8321133

  • AIDS2: a decision-support tool for decreasing physicians' uncertainty regarding patient eligibility for HIV treatment protocols. Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care Ohno-Machado, L., Parra, E., Henry, S. B., Tu, S. W., Musen, M. A. 1993: 429-433

    Abstract

    We have developed a decision-support tool, the AIDS Intervention Decision-Support System (AIDS2), to assist in the task of matching patients to therapy-related research protocols. The purposes of AIDS2 are to determine the initial eligibility status of HIV-infected patients for therapy-related research protocols, and to suggest additional data-gathering activities that will decrease uncertainty related to the eligibility status. AIDS2 operates in either a patient-driven or protocol-driven mode. We represent the system knowledge in three combined levels: a classification level, where deterministic knowledge is represented; a belief-network level, where probabilistic knowledge is represented; and a control level, where knowledge about the system's operation is stored. To determine whether the design specifications were met, we presented a series of 10 clinical cases based on actual patients to the system. AIDS2 provided meaningful advice in all cases.

    View details for PubMedID 8130510

  • A NEEDS ANALYSIS FOR COMPUTER-BASED TELEPHONE TRIAGE IN A COMMUNITY AIDS CLINIC 16th Annual Symposium on Computer Applications in Medical Care Henry, S. B., SCHREINER, J. G., Borchelt, D., Musen, M. A. MCGRAW-HILL BOOK CO. 1993: 59–63
  • GRAPH-GRAMMAR PRODUCTIONS FOR THE MODELING OF MEDICAL DILEMMAS 16th Annual Symposium on Computer Applications in Medical Care EGAR, J. W., Musen, M. A. MCGRAW-HILL BOOK CO. 1993: 349–353
  • T-HELPER - AUTOMATED SUPPORT FOR COMMUNITY-BASED CLINICAL RESEARCH 16th Annual Symposium on Computer Applications in Medical Care Musen, M. A., Carlson, R. W., Fagan, L. M., Deresinski, S. C., Shortliffe, E. H. MCGRAW-HILL BOOK CO. 1993: 719–723
  • PROBLEM-SOLVING MODELS FOR GENERATION OF TASK-SPECIFIC KNOWLEDGE-ACQUISITION TOOLS IFIP TC12 Workshop on Artificial Intelligence from the Information Processing Perspective (AIFIPP 92) Musen, M. A., Tu, S. W. ELSEVIER SCIENCE PUBL B V. 1993: 23–49
  • MODELING TASKS WITH MECHANISMS INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS Puerta, A. R., Tu, S. W., Musen, M. A. 1993; 8 (1): 129-152
  • Knowledge reuse: temporal-abstraction mechanisms for the assessment of children's growth. Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care Kuilboer, M. M., Shahar, Y., Wilson, D. M., Musen, M. A. 1993: 449-453

    Abstract

    Currently, many workers in the field of medical informatics realize the importance of knowledge reuse. The PROTEGE-II project seeks to develop and implement a domain-independent framework that allows system builders to create custom-tailored role-limiting methods from generic reusable components. These new role-limiting methods are used to create domain- and task-specific knowledge-acquisition tools with which an application expert can generate domain- and task-specific decision-support systems. One required set of reusable components embodies the problem-solving knowledge to generate temporal abstractions. Previously, members of the PROTEGE-II project have used these temporal-abstraction mechanisms to infer the presence of myelotoxicity in patients with AIDS. In this paper, we show that these mechanisms are reusable in the domain of assessment of children's growth.

    View details for PubMedID 8130514

  • Automated modeling of medical decisions. Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care EGAR, J. W., Musen, M. A. 1993: 424-428

    Abstract

    We have developed a graph grammar and a graph-grammar derivation system that, together, generate decision-theoretic models from unordered lists of medical terms. The medical terms represent considerations in a dilemma that confronts the patient and the health-care provider. Our current grammar ensures that several desirable structural properties are maintained in all derived decision models.

    View details for PubMedID 8130509

  • A computer-based tool for generation of progress notes. Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care Campbell, K. E., WIECKERT, K., Fagan, L. M., Musen, M. A. 1993: 284-288

    Abstract

    IVORY, a computer-based tool that uses clinical findings as the basic unit for composing progress notes, generates progress notes more efficiently than does a character-based word processor. IVORY's clinical findings are contained within a structured vocabulary that we developed to support generation of both prose progress notes and SNOMED III codes. Observational studies of physician participation in the development of IVORY's structured vocabulary have helped us to identify areas where changes are required before IVORY will be acceptable for routine clinical use.

    View details for PubMedID 8130479

  • An extended SQL for temporal data management in clinical decision-support systems. Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care Das, A. K., Tu, S. W., Purcell, G. P., Musen, M. A. 1992: 128-132

    Abstract

    We are developing a database implementation to support temporal data management for the T-HELPER physician workstation, an advice system for protocol-based care of patients who have HIV disease. To understand the requirements for the temporal database, we have analyzed the types of temporal predicates found in clinical-trial protocols. We extend the standard relational data model in three ways to support these querying requirements. First, we incorporate timestamps into the two-dimensional relational table to store the temporal dimension of both instant- and interval-based data. Second, we develop a set of operations on timepoints and intervals to manipulate timestamped data. Third, we modify the relational query language SQL so that its underlying algebra supports the specified operations on timestamps in relational tables. We show that our temporal extension to SQL meets the temporal data-management needs of protocol-directed decision support.

    View details for PubMedID 1482853

  • CONCEPTUAL MODELS FOR AUTOMATIC-GENERATION OF KNOWLEDGE-ACQUISITION TOOLS LECTURE NOTES IN ARTIFICIAL INTELLIGENCE Eriksson, H., Musen, M. A. 1992; 599: 14-36
  • CONCEPTUAL MODELS FOR AUTOMATIC-GENERATION OF KNOWLEDGE-ACQUISITION TOOLS 6TH EUROPEAN KNOWLEDGE ACQUISITION WORKSHOP : CURRENT DEVELOPMENT IN KNOWLEDGE ACQUISITION ( EKAW 92 ) Eriksson, H., Musen, M. A. SPRINGER-VERLAG BERLIN. 1992: 14–36
  • Graph-grammar productions for the modeling of medical dilemmas. Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care EGAR, J. W., Musen, M. A. 1992: 349-353

    Abstract

    We introduce graph-grammar production rules, which can guide physicians to construct models for normative decision making. A physician describes a medical decision problem using standard terminology, and the graph-grammar system matches a graph-manipulation rule to each of the standard terms. With minimal help from the physician, these graph-manipulation rules can construct an appropriate Bayesian probabilistic network. The physician can then assess the necessary probabilities and utilities to arrive at a rational decision. The grammar relies on prototypical forms that we have observed in models of medical dilemmas. We have found graph grammars to be a concise and expressive formalism for describing prototypical forms, and we believe such grammars can greatly facilitate the modeling of medical dilemmas and medical plans.

    View details for PubMedID 1482895

  • Representation of clinical data using SNOMED III and conceptual graphs. Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care Campbell, K. E., Musen, M. A. 1992: 354-358

    Abstract

    None of the coding schemes currently contained within the Unified Medical Language System (UMLS) is sufficiently expressive to represent medical progress notes adequately. Some coding schemes suffer from domain incompleteness, others suffer from the inability to represent modifiers and time references, and some suffer from both problems. The recently released version of the Systematized Nomenclature of Medicine (SNOMED III) is a potential solution to the data-representation problem because it is relatively domain complete, and because it uses a generative coding scheme that will allow the construction of codes that contain modifiers and time references. SNOMED III does have an important weakness, however. SNOMED III lacks a formalized system for using its codes; thus, it fails to ensure consistency in its use across different institutions. Application of conceptual-graph formalisms to SNOMED III can ensure such consistency of use. Conceptual-graph formalisms will also allow mapping of the resulting SNOMED III codes onto relational data models and onto other formal systems, such as first-order predicate calculus.

    View details for PubMedID 1482897

  • A temporal-abstraction system for patient monitoring. Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care Shahar, Y., Musen, M. A. 1992: 121-127

    Abstract

    RESUME is a system that performs temporal abstraction of time-stamped data. RESUME is based on a model of three temporal-abstraction mechanisms: point temporal abstraction (a mechanism for abstracting values of several parameters into a value of another parameter); temporal inference (a mechanism for inferring sound logical conclusions over a single interval or two meeting intervals); and temporal interpolation (a mechanism for bridging nonmeeting temporal intervals). Making explicit the knowledge required for temporal abstraction supports the acquisition of that knowledge.

    View details for PubMedID 1482852

  • A needs analysis for computer-based telephone triage in a community AIDS clinic. Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care Henry, S. B., SCHREINER, J. G., Borchelt, D., Musen, M. A. 1992: 59-63

    Abstract

    This study describes the complexity of the telephone-triage task in a community-based AIDS clinic. We identify deficiencies related to the data management for and documentation of the telephone-triage encounter, including inaccessibility of the medical record and failure to document required data elements. Our needs analysis suggests five design criteria for a computer-based system that assists nurses with the telephone-triage task: (1) online accessibility of the medical record, (2) ability to move among modules of the medical record and the triage-encounter module, (3) ease of data entry, (4) compliance with standards for documentation, and (5) notification of the primary-care physician in an appropriate and timely manner.

    View details for PubMedID 1482941

  • T-HELPER: automated support for community-based clinical research. Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care Musen, M. A., Carlson, R. W., Fagan, L. M., Deresinski, S. C., Shortliffe, E. H. 1992: 719-723

    Abstract

    There are increasing expectations that community-based physicians who care for people with HIV infection will offer their patients opportunities to enroll in clinical trials. The information-management requirements of clinical investigation, however, make it unrealistic for most providers who do not practice in academic centers to participate in clinical research. Our T-HELPER computer system offers community-based physicians the possibility of enrolling patients in clinical trials as a component of primary care. T-HELPER facilitates data management for patients with HIV disease, and can offer patient-specific and situation-specific advice concerning new protocols for which patients may be eligible and the treatment required by those protocols in which patients currently are enrolled. We are installing T-HELPER at three county-operated AIDS clinics in the San Francisco Bay Area, and plan a comprehensive evaluation of the system and its influence on clinical research.

    View details for PubMedID 1482965

  • COMPARISON OF COMPUTER-AIDED AND HUMAN REVIEW OF GENERAL-PRACTITIONERS MANAGEMENT OF HYPERTENSION LANCET VANDERLEI, J., Musen, M. A., VANDERDOES, E., INTVELD, A. J., VANBEMMEL, J. H. 1991; 338 (8781): 1504-1508

    Abstract

    Computer programs that automatically review decisions can help physicians provide better patient care. In the Netherlands, the ELIAS computer information system has replaced paper medical records in some general practices. We have written a computer program called 'HyperCritic' that audits general practitioners' management of patients with essential hypertension by taking patient-specific data from the ELIAS system. We investigated whether the computer-based medical records contain sufficient information to generate critiques, and compared the limitations of audit by hypercritic with those of review by a panel of eight physicians. Hypercritic and the physicians independently reviewed the medical records of 20 randomly selected patients with hypertension and commented on the decisions made at each of 243 patient visits. Of 468 comments on patient management, 260 were judged correct by six or more of the physicians; hypercritic also made 118 of these 260 comments. The main reasons why the program did not produce the other 142 comments were: insufficient data in the computer-based medical record; absence of sufficient medical consensus; and omissions in the database of hypercritic. Calculation of an "index of merit" ([sensitivity + specificity] - 1) for individual reviewers showed that hypercritic performed better (index of merit 0.62) in its limited domain than did physician reviewers (0.3-0.56). At least in hypertension management, automated review of computer-based medical records compares favourably with review by physicians. Further development of computer-aided clinical audit requires the introduction of computer-based medical records that capture the reasoning of physicians, and of widely accepted practice guidelines.

    View details for Web of Science ID A1991GV07700015

    View details for PubMedID 1683929

  • A MODEL FOR CRITIQUING BASED ON AUTOMATED MEDICAL RECORDS COMPUTERS AND BIOMEDICAL RESEARCH VANDERLEI, J., Musen, M. A. 1991; 24 (4): 344-378

    Abstract

    We describe the design of a critiquing system, HyperCritic, that relies on automated medical records for its data input. The purpose of the system is to advise general practitioners who are treating patients who have hypertension. HyperCritic has access to the data stored in a primary-care information system that supports a fully automated medical record. Hyper-Critic relies on data in the automated medical record to critique the management of hypertensive patients, avoiding a consultation-style interaction with the user. The first step in the critiquing process involves the interpretation of the medical record in an attempt to discover the physician's actions and decisions. After detecting the relevant events in the medical record, HyperCritic views the task of critiquing as the assignment of critiquing statements to these patient-specific events. Critiquing statements are defined as recommendations involving one or more suggestions for possible modifications in the actions of the physician. The core of the model underlying HyperCritic is that the process of generating the critiquing statements is viewed as the application of a limited set of abstract critiquing tasks. We distinguish four categories of critiquing tasks: preparation tasks, selection tasks, monitoring tasks, and responding tasks. The execution of these critiquing tasks requires specific medical factual knowledge. This factual knowledge is separated from the critiquing tasks and is stored in a medical fact base. The principal advantage demonstrated by HyperCritic is the adaption of a domain-independent critiquing structure. We show how this domain-independent critiquing structure can be used to facilitate knowledge acquisition and maintenance of the system.

    View details for Web of Science ID A1991FX17100004

    View details for PubMedID 1889202

  • Temporal-abstraction mechanisms in management of clinical protocols. Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care Shahar, Y., Tu, S. W., Musen, M. A. 1991: 629-633

    Abstract

    We have identified several general temporal-abstraction mechanisms needed for reasoning about time-stamped data, such as are needed in management of patients being treated on clinical protocols: simple temporal abstraction (a mechanism for abstracting several parameter values into one class), temporal inference (a mechanism for inferring sound logical conclusions over a single interval or two meeting intervals), and temporal interpolation (a mechanism for bridging non-meeting temporal intervals). Making explicit the knowledge required for temporal abstractions supports the acquisition of planning knowledge, the identification of clinical problems, and the formulation of clinical-management-plan revisions.

    View details for PubMedID 1807678

  • EPISODIC SKELETAL-PLAN REFINEMENT BASED ON TEMPORAL DATA COMMUNICATIONS OF THE ACM Tu, S. W., Kahn, M. G., Musen, M. A., Ferguson, J. C., Shortliffe, E. H., Fagan, L. M. 1989; 32 (12): 1439-1455
  • AN EDITOR FOR THE CONCEPTUAL MODELS OF INTERACTIVE KNOWLEDGE-ACQUISITION TOOLS INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES Musen, M. A. 1989; 31 (6): 673-698
  • KNOWLEDGE ENGINEERING FOR CLINICAL CONSULTATION PROGRAMS - MODELING THE APPLICATION AREA METHODS OF INFORMATION IN MEDICINE Musen, M. A., VANDERLEI, J. 1989; 28 (1): 28-35

    Abstract

    Developers of computer-based decision-support tools frequently adopt either pattern recognition or artificial intelligence techniques as the basis for their programs. Because these developers often choose to accentuate the differences between these alternative approaches, the more fundamental similarities are frequently overlooked. The principal challenge in the creation of any clinical consultation program - regardless of the methodology that is used - lies in creating a computational model of the application domain. The difficulty in generating such a model manifests itself in symptoms that workers in the expert systems community have labeled "the knowledge-acquisition bottleneck" and "the problem of brittleness". This paper explores these two symptoms and shows how the development of consultation programs based on pattern-recognition techniques is subject to analogous difficulties. The expert systems and pattern recognition communities must recognize that they face similar challenges, and must unite to develop methods that assist with the process of building of models of complex application tasks.

    View details for Web of Science ID A1989T523500006

    View details for PubMedID 2649771

  • GRAPHICAL ACCESS TO MEDICAL EXPERT SYSTEMS .3. DESIGN OF A KNOWLEDGE ACQUISITION ENVIRONMENT METHODS OF INFORMATION IN MEDICINE Walton, J. D., Musen, M. A., Combs, D. M., Lane, C. D., Shortliffe, E. H., Fagan, L. M. 1987; 26 (3): 78-88

    View details for Web of Science ID A1987J707100003

    View details for PubMedID 3670103

  • KNOWLEDGE ENGINEERING FOR A CLINICAL-TRIAL ADVICE SYSTEM - UNCOVERING ERRORS IN PROTOCOL SPECIFICATION (REPRINTED FROM PROCEEDING, OF THE AAAMSI CONGRESS 86, MAY PG 24-27, 1986) BULLETIN DU CANCER Musen, M. A., ROHN, J. A., Fagan, L. M., Shortliffe, E. H. 1987; 74 (3): 291-296

    Abstract

    ONCOCIN is an expert system that provides advice to physicians who are treating cancer patients enrolled in clinical trials. The process of encoding oncology protocol knowledge for the system has revealed serious omissions and unintentional ambiguities in the protocol documents. We have also discovered that many protocols allow for significant latitude in treating patients and that even when protocol guidelines are explicit, physicians often choose to apply their own judgment on the assumption that the specifications are incomplete. Computer-based tools offer the possibility of insuring completeness and reproducibility in the definition of new protocols. One goal of our automated protocol authoring environment, called OPAL, is to help physicians develop protocols that are free of ambiguity and thus to assure better compliance and standardization of care.

    View details for Web of Science ID A1987J346100007

    View details for PubMedID 3620734

  • USE OF A DOMAIN MODEL TO DRIVE AN INTERACTIVE KNOWLEDGE-EDITING TOOL INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES Musen, M. A., Fagan, L. M., Combs, D. M., Shortliffe, E. H. 1987; 26 (1): 105-121
  • PhLeGrA: Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data International Conference on World Wide Web Kamdar, M. R., Musen, M. 2017: 321–29

    View details for DOI 10.1145/3038912.3052692

  • BiOnIC: A Catalog of User Interactions with Biomedical Ontologies International Semantic Web Conference Kamdar, M. R., Walk, S., Tudorache, T., Musen, M. A. 2017: 130–38