Alison Callahan is a research scientist in the Center for Biomedical Informatics and a member of the Shah Lab. Her work involves research and development of informatics methods for the analysis of biomedical and clinical data, to derive insights and inform medical decision making.

Alison completed her PhD in the Department of Biology at Carleton University in Ottawa, Canada. Her doctoral research focused on developing HyQue, a framework for representing and evaluating scientific hypotheses, and applying this framework to discover genes related to aging. She was also a developer for Bio2RDF, an open-source project to build and provide the largest network of Linked Data for the life sciences.

Education & Certifications

  • Doctor of Philosophy, Biology, Carleton University, Ottawa, Canada (2014)
  • Master of Information Studies, University of Toronto, Toronto, Canada (2009)
  • Bachelor of Science, Carleton University, Ottawa, Canada (2007)

Professional Affiliations and Activities

  • Member, American Medical Informatics Association (2014 - Present)

All Publications

  • Developing a data sharing community for spinal cord injury research. Experimental neurology Callahan, A., Anderson, K. D., Beattie, M. S., Bixby, J. L., Ferguson, A. R., Fouad, K., Jakeman, L. B., Nielson, J. L., Popovich, P. G., Schwab, J. M., Lemmon, V. P. 2017


    The rapid growth in data sharing presents new opportunities across the spectrum of biomedical research. Global efforts are underway to develop practical guidance for implementation of data sharing and open data resources. These include the recent recommendation of 'FAIR Data Principles', which assert that if data is to have broad scientific value, then digital representations of that data should be Findable, Accessible, Interoperable and Reusable (FAIR). The spinal cord injury (SCI) research field has a long history of collaborative initiatives that include sharing of preclinical research models and outcome measures. In addition, new tools and resources are being developed by the SCI research community to enhance opportunities for data sharing and access. With this in mind, the National Institute of Neurological Disorders and Stroke (NINDS) at the National Institutes of Health (NIH) hosted a workshop on October 5-6, 2016 in Bethesda, MD, in collaboration with the Open Data Commons for Spinal Cord Injury (ODC-SCI) titled "Preclinical SCI Data: Creating a FAIR Share Community". Workshop invitees were nominated by the workshop steering committee (co-chairs: ARF and VPL; members: AC, KDA, MSB, KF, LBJ, PGP, JMS), to bring together junior and senior level experts including preclinical and basic SCI researchers from academia and industry, data science and bioinformatics experts, investigators with expertise in other neurological disease fields, clinical researchers, members of the SCI community, and program staff representing federal and private funding agencies. The workshop and ODC-SCI efforts were sponsored by the International Spinal Research Trust (ISRT), the Rick Hansen Institute, Wings for Life, the Craig H. Neilsen Foundation and NINDS. The number of attendees was limited to ensure active participation and feedback in small groups. The goals were to examine the current landscape for data sharing in SCI research and provide a path to its future. Below are highlights from the workshop, including perspectives on the value of data sharing in SCI research, workshop participant perspectives and concerns, descriptions of existing resources and actionable directions for further engaging the SCI research community in a model that may be applicable to many other areas of neuroscience. This manuscript is intended to share these initial findings with the broader research community, and to provide talking points for continued feedback from the SCI field, as it continues to move forward in the age of data sharing.

    View details for DOI 10.1016/j.expneurol.2017.05.012

    View details for PubMedID 28576567

  • RegenBase: a knowledge base of spinal cord injury biology for translational research DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION Callahan, A., Abeyruwan, S. W., Al-Ali, H., Sakurai, K., Ferguson, A. R., Popovich, P. G., Shah, N. H., Visser, U., Bixby, J. L., Lemmon, V. P. 2016


    Spinal cord injury (SCI) research is a data-rich field that aims to identify the biological mechanisms resulting in loss of function and mobility after SCI, as well as develop therapies that promote recovery after injury. SCI experimental methods, data and domain knowledge are locked in the largely unstructured text of scientific publications, making large scale integration with existing bioinformatics resources and subsequent analysis infeasible. The lack of standard reporting for experiment variables and results also makes experiment replicability a significant challenge. To address these challenges, we have developed RegenBase, a knowledge base of SCI biology. RegenBase integrates curated literature-sourced facts and experimental details, raw assay data profiling the effect of compounds on enzyme activity and cell growth, and structured SCI domain knowledge in the form of the first ontology for SCI, using Semantic Web representation languages and frameworks. RegenBase uses consistent identifier schemes and data representations that enable automated linking among RegenBase statements and also to other biological databases and electronic resources. By querying RegenBase, we have identified novel biological hypotheses linking the effects of perturbagens to observed behavioral outcomes after SCI. RegenBase is publicly available for browsing, querying and download.Database URL:

    View details for DOI 10.1093/database/baw040

    View details for Web of Science ID 000374094100001

    View details for PubMedID 27055827

    View details for PubMedCentralID PMC4823819

  • Feasibility of Prioritizing Drug-Drug-Event Associations Found in Electronic Health Records. Drug safety Banda, J. M., Callahan, A., Winnenburg, R., Strasberg, H. R., Cami, A., Reis, B. Y., Vilar, S., Hripcsak, G., Dumontier, M., Shah, N. H. 2016; 39 (1): 45-57


    Several studies have demonstrated the ability to detect adverse events potentially related to multiple drug exposure via data mining. However, the number of putative associations produced by such computational approaches is typically large, making experimental validation difficult. We theorized that those potential associations for which there is evidence from multiple complementary sources are more likely to be true, and explored this idea using a published database of drug-drug-adverse event associations derived from electronic health records (EHRs).We prioritized drug-drug-event associations derived from EHRs using four sources of information: (1) public databases, (2) sources of spontaneous reports, (3) literature, and (4) non-EHR drug-drug interaction (DDI) prediction methods. After pre-filtering the associations by removing those found in public databases, we devised a ranking for associations based on the support from the remaining sources, and evaluated the results of this rank-based prioritization.We collected information for 5983 putative EHR-derived drug-drug-event associations involving 345 drugs and ten adverse events from four data sources and four prediction methods. Only seven drug-drug-event associations (<0.5 %) had support from the majority of evidence sources, and about one third (1777) had support from at least one of the evidence sources.Our proof-of-concept method for scoring putative drug-drug-event associations from EHRs offers a systematic and reproducible way of prioritizing associations for further study. Our findings also quantify the agreement (or lack thereof) among complementary sources of evidence for drug-drug-event associations and highlight the challenges of developing a robust approach for prioritizing signals of these associations.

    View details for DOI 10.1007/s40264-015-0352-2

    View details for PubMedID 26446143

  • The health care and life sciences community profile for dataset descriptions. PeerJ Dumontier, M., Gray, A. J., Marshall, M. S., Alexiev, V., Ansell, P., Bader, G., Baran, J., Bolleman, J. T., Callahan, A., Cruz-Toledo, J., Gaudet, P., Gombocz, E. A., Gonzalez-Beltran, A. N., Groth, P., Haendel, M., Ito, M., Jupp, S., Juty, N., Katayama, T., Kobayashi, N., Krishnaswami, K., Laibe, C., Le Novère, N., Lin, S., Malone, J., Miller, M., Mungall, C. J., Rietveld, L., Wimalaratne, S. M., Yamaguchi, A. 2016; 4


    Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.

    View details for DOI 10.7717/peerj.2331

    View details for PubMedID 27602295

    View details for PubMedCentralID PMC4991880

  • An evidence-based approach to identify aging-related genes in Caenorhabditis elegans BMC BIOINFORMATICS Callahan, A., Cifuentes, J. J., Dumontier, M. 2015; 16
  • Analyzing search behavior of healthcare professionals for drug safety surveillance. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Odgers, D. J., Harpaz, R., Callahan, A., Stiglic, G., Shah, N. H. 2015; 20: 306-317


    Post-market drug safety surveillance is hugely important and is a significant challenge despite the existence of adverse event (AE) reporting systems. Here we describe a preliminary analysis of search logs from healthcare professionals as a source for detecting adverse drug events. We annotate search log query terms with biomedical terminologies for drugs and events, and then perform a statistical analysis to identify associations among drugs and events within search sessions. We evaluate our approach using two different types of reference standards consisting of known adverse drug events (ADEs) and negative controls. Our approach achieves a discrimination accuracy of 0.85 in terms of the area under the receiver operator curve (AUC) for the reference set of well-established ADEs and an AUC of 0.68 for the reference set of recently labeled ADEs. We also find that the majority of associations in the reference sets have support in the search log data. Despite these promising results additional research is required to better understand users' search behavior, biasing factors, and the overall utility of analyzing healthcare professional search logs for drug safety surveillance.

    View details for PubMedID 25592591

  • Analyzing Information Seeking and Drug-Safety Alert Response by Health Care Professionals as New Methods for Surveillance. Journal of medical Internet research Callahan, A., Pernek, I., Stiglic, G., Leskovec, J., Strasberg, H. R., Shah, N. H. 2015; 17 (8)


    Patterns in general consumer online search logs have been used to monitor health conditions and to predict health-related activities, but the multiple contexts within which consumers perform online searches make significant associations difficult to interpret. Physician information-seeking behavior has typically been analyzed through survey-based approaches and literature reviews. Activity logs from health care professionals using online medical information resources are thus a valuable yet relatively untapped resource for large-scale medical surveillance.To analyze health care professionals' information-seeking behavior and assess the feasibility of measuring drug-safety alert response from the usage logs of an online medical information resource.Using two years (2011-2012) of usage logs from UpToDate, we measured the volume of searches related to medical conditions with significant burden in the United States, as well as the seasonal distribution of those searches. We quantified the relationship between searches and resulting page views. Using a large collection of online mainstream media articles and Web log posts we also characterized the uptake of a Food and Drug Administration (FDA) alert via changes in UpToDate search activity compared with general online media activity related to the subject of the alert.Diseases and symptoms dominate UpToDate searches. Some searches result in page views of only short duration, while others consistently result in longer-than-average page views. The response to an FDA alert for Celexa, characterized by a change in UpToDate search activity, differed considerably from general online media activity. Changes in search activity appeared later and persisted longer in UpToDate logs. The volume of searches and page view durations related to Celexa before the alert also differed from those after the alert.Understanding the information-seeking behavior associated with online evidence sources can offer insight into the information needs of health professionals and enable large-scale medical surveillance. Our Web log mining approach has the potential to monitor responses to FDA alerts at a national level. Our findings can also inform the design and content of evidence-based medical information resources such as UpToDate.

    View details for DOI 10.2196/jmir.4427

    View details for PubMedID 26293444

  • Text Mining for Adverse Drug Events: the Promise, Challenges, and State of the Art DRUG SAFETY Harpaz, R., Callahan, A., Tamang, S., Low, Y., Odgers, D., Finlayson, S., Jung, K., LePendu, P., Shah, N. H. 2014; 37 (10): 777-790


    Text mining is the computational process of extracting meaningful information from large amounts of unstructured text. It is emerging as a tool to leverage underutilized data sources that can improve pharmacovigilance, including the objective of adverse drug event (ADE) detection and assessment. This article provides an overview of recent advances in pharmacovigilance driven by the application of text mining, and discusses several data sources-such as biomedical literature, clinical narratives, product labeling, social media, and Web search logs-that are amenable to text mining for pharmacovigilance. Given the state of the art, it appears text mining can be applied to extract useful ADE-related information from multiple textual sources. Nonetheless, further research is required to address remaining technical challenges associated with the text mining methodologies, and to conclusively determine the relative contribution of each textual source to improving pharmacovigilance.

    View details for DOI 10.1007/s40264-014-0218-z

    View details for Web of Science ID 000344615300005

    View details for PubMedCentralID PMC4217510

  • Minimum Information about a Spinal Cord Injury Experiment: A Proposed Reporting Standard for Spinal Cord Injury Experiments JOURNAL OF NEUROTRAUMA Lemmon, V. P., Ferguson, A. R., Popovich, P. G., Xu, X., Snow, D. M., Igarashi, M., Beattie, C. E., Bixby, J. L. 2014; 31 (15): 1354-1361


    The lack of reproducibility in many areas of experimental science has a number of causes, including a lack of transparency and precision in the description of experimental approaches. This has far-reaching consequences, including wasted resources and slowing of progress. Additionally, the large number of laboratories around the world publishing articles on a given topic make it difficult, if not impossible, for individual researchers to read all of the relevant literature. Consequently, centralized databases are needed to facilitate the generation of new hypotheses for testing. One strategy to improve transparency in experimental description, and to allow the development of frameworks for computer-readable knowledge repositories, is the adoption of uniform reporting standards, such as common data elements (data elements used in multiple clinical studies) and minimum information standards. This article describes a minimum information standard for spinal cord injury (SCI) experiments, its major elements, and the approaches used to develop it. Transparent reporting standards for experiments using animal models of human SCI aim to reduce inherent bias and increase experimental value.

    View details for DOI 10.1089/neu.2014.3400

    View details for Web of Science ID 000340535500004

    View details for PubMedID 24870067

  • Automatically exposing OpenLifeData via SADI semantic Web Services Journal of Biomedical Semantics Rodgriguez Gonzalez, A., Callahan, A., Cruz-Toledo, J., Garcia, A., Egana Aranguren, M., Dumontier, M., Wilkinson, M. D. 2014; 5

    View details for DOI 10.1186/2041-1480-5-46

  • The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery. Journal of biomedical semantics Dumontier, M., Baker, C. J., Baran, J., Callahan, A., Chepelev, L., Cruz-Toledo, J., Del Rio, N. R., Duck, G., Furlong, L. I., Keath, N., Klassen, D., McCusker, J. P., Queralt-Rosinach, N., Samwald, M., Villanueva-Rosales, N., Wilkinson, M. D., Hoehndorf, R. 2014; 5 (1): 14-?


    The Semanticscience Integrated Ontology (SIO) is an ontology to facilitate biomedical knowledge discovery. SIO features a simple upper level comprised of essential types and relations for the rich description of arbitrary (real, hypothesized, virtual, fictional) objects, processes and their attributes. SIO specifies simple design patterns to describe and associate qualities, capabilities, functions, quantities, and informational entities including textual, geometrical, and mathematical entities, and provides specific extensions in the domains of chemistry, biology, biochemistry, and bioinformatics. SIO provides an ontological foundation for the Bio2RDF linked data for the life sciences project and is used for semantic integration and discovery for SADI-based semantic web services. SIO is freely available to all users under a creative commons by attribution license. See website for further information:

    View details for DOI 10.1186/2041-1480-5-14

    View details for PubMedID 24602174

  • Ontology-Based Querying with Bio2RDF's Linked Open Data. Journal of biomedical semantics Callahan, A., Cruz-Toledo, J., Dumontier, M. 2013; 4: S1-?


    A key activity for life scientists in this post "-omics" age involves searching for and integrating biological data from a multitude of independent databases. However, our ability to find relevant data is hampered by non-standard web and database interfaces backed by an enormous variety of data formats. This heterogeneity presents an overwhelming barrier to the discovery and reuse of resources which have been developed at great public expense.To address this issue, the open-source Bio2RDF project promotes a simple convention to integrate diverse biological data using Semantic Web technologies. However, querying Bio2RDF remains difficult due to the lack of uniformity in the representation of Bio2RDF datasets.We describe an update to Bio2RDF that includes tighter integration across 19 new and updated RDF datasets. All available open-source scripts were first consolidated to a single GitHub repository and then redeveloped using a common API that generates normalized IRIs using a centralized dataset registry. We then mapped dataset specific types and relations to the Semanticscience Integrated Ontology (SIO) and demonstrate simplified federated queries across multiple Bio2RDF endpoints.This coordinated release marks an important milestone for the Bio2RDF open source linked data framework. Principally, it improves the quality of linked data in the Bio2RDF network and makes it easier to access or recreate the linked data locally. We hope to continue improving the Bio2RDF network of linked data by identifying priority databases and increasing the vocabulary coverage to additional dataset vocabularies beyond SIO.

    View details for DOI 10.1186/2041-1480-4-S1-S1

    View details for PubMedID 23735196

  • Evaluating Scientific Hypotheses Using the SPARQL Inferencing Notation The Semantic Web: Research and Applications Callahan, A., Dumontier, M. Springer Berlin Heidelberg. 2012: 647–658
  • HyQue: evaluating hypotheses using Semantic Web technologies. Journal of biomedical semantics Callahan, A., Dumontier, M., Shah, N. H. 2011; 2: S3-?


    Key to the success of e-Science is the ability to computationally evaluate expert-composed hypotheses for validity against experimental data. Researchers face the challenge of collecting, evaluating and integrating large amounts of diverse information to compose and evaluate a hypothesis. Confronted with rapidly accumulating data, researchers currently do not have the software tools to undertake the required information integration tasks.We present HyQue, a Semantic Web tool for querying scientific knowledge bases with the purpose of evaluating user submitted hypotheses. HyQue features a knowledge model to accommodate diverse hypotheses structured as events and represented using Semantic Web languages (RDF/OWL). Hypothesis validity is evaluated against experimental and literature-sourced evidence through a combination of SPARQL queries and evaluation rules. Inference over OWL ontologies (for type specifications, subclass assertions and parthood relations) and retrieval of facts stored as Bio2RDF linked data provide support for a given hypothesis. We evaluate hypotheses of varying levels of detail about the genetic network controlling galactose metabolism in Saccharomyces cerevisiae to demonstrate the feasibility of deploying such semantic computing tools over a growing body of structured knowledge in Bio2RDF.HyQue is a query-based hypothesis evaluation system that can currently evaluate hypotheses about the galactose metabolism in S. cerevisiae. Hypotheses as well as the supporting or refuting data are represented in RDF and directly linked to one another allowing scientists to browse from data to hypothesis and vice versa. HyQue hypotheses and data are available at

    View details for DOI 10.1186/2041-1480-2-S2-S3

    View details for PubMedID 21624158

  • Contextual Cocitation: Augmenting Cocitation Analysis and its Applications JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY Callahan, A., Hockema, S., Eysenbach, G. 2010; 61 (6): 1130-1143

    View details for DOI 10.1002/asi.21313

    View details for Web of Science ID 000277892600005

  • Behaviourally mediated crypsis in two nocturnal moths with contrasting appearance PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES Webster, R. J., Callahan, A., Godin, J. J., Sherratt, T. N. 2009; 364 (1516): 503-510


    The natural resting orientations of several species of nocturnal moth on tree trunks were recorded over a three-month period in eastern Ontario, Canada. Moths from certain genera exhibited resting orientation distributions that differed significantly from random, whereas others did not. In particular, Catocala spp. collectively tended to orient vertically, whereas subfamily Larentiinae representatives showed a variety of orientations that did not differ significantly from random. To understand why different moth species adopted different orientations, we presented human subjects with a computer-based detection task of finding and 'attacking' Catocala cerogama and Euphyia intermediata target images at different orientations when superimposed on images of sugar maple (Acer saccharum) trees. For both C. cerogama and E. intermediata, orientation had a significant effect on survivorship, although the effect was more pronounced in C. cerogama. When the tree background images were flipped horizontally the optimal orientation changed accordingly, indicating that the detection rates were dependent on the interaction between certain directional appearance features of the moth and its background. Collectively, our results suggest that the contrasting wing patterns of the moths are involved in background matching, and that the moths are able to improve their crypsis through appropriate behavioural orientation.

    View details for DOI 10.1098/rstb.2008.0215

    View details for Web of Science ID 000262353500010

    View details for PubMedID 19000977

  • Empirical tests of the role of disruptive coloration in reducing detectability PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES Fraser, S., Callahan, A., Klassen, D., Sherratt, T. N. 2007; 274 (1615): 1325-1331


    Disruptive patterning is a potentially universal camouflage technique that is thought to enhance concealment by rendering the detection of body shapes more difficult. In a recent series of field experiments, artificial moths with markings that extended to the edges of their 'wings' survived at higher rates than moths with the same edge patterns inwardly displaced. While this result seemingly indicates a benefit to obscuring edges, it is possible that the higher density markings of the inwardly displaced patterns concomitantly reduced their extent of background matching. Likewise, it has been suggested that the mealworm baits placed on the artificial moths could have created differential contrasts with different moth patterns. To address these concerns, we conducted controlled trials in which human subjects searched for computer-generated moth images presented against images of oak trees. Moths with edge-extended disruptive markings survived at higher rates, and took longer to find, than all other moth types, whether presented sequentially or simultaneously. However, moths with no edge markings and reduced interior pattern density survived better than their high-density counterparts, indicating that background matching may have played a so-far unrecognized role in the earlier experiments. Our disruptively patterned non-background-matching moths also had the lowest overall survivorship, indicating that disruptive coloration alone may not provide significant protection from predators. Collectively, our results provide independent support for the survival value of disruptive markings and demonstrate that there are common features in human and avian perception of camouflage.

    View details for DOI 10.1098/rspb.2007.0153

    View details for Web of Science ID 000245301900012

    View details for PubMedID 17360282