Nigam H. Shah, MBBS, PhD

Professor of Medicine (Biomedical Informatics) and of Biomedical Data Science

Medicine - Biomedical Informatics Research

Web page: http://web.stanford.edu/~nigam

Bio

Dr. Nigam Shah is Professor of Medicine at Stanford University, and Chief Data Scientist for Stanford Health Care. His research group analyzes multiple types of health data (EHR, Claims, Wearables, Weblogs, and Patient blogs), to answer clinical questions, generate insights, and build predictive models for the learning health system. At Stanford Healthcare, he leads artificial intelligence and data science efforts for advancing the scientific understanding of disease, improving the practice of clinical medicine and orchestrating the delivery of health care.

Dr. Shah is an inventor on eight patents and patent applications, has authored over 200 scientific publications and has co-founded three companies. Dr. Shah was elected into the American College of Medical Informatics (ACMI) in 2015 and was inducted into the American Society for Clinical Investigation (ASCI) in 2016. He holds an MBBS from Baroda Medical College, India, a PhD from Penn State University and completed postdoctoral training at Stanford University.

Academic Appointments

Professor, Medicine - Biomedical Informatics Research
Professor, Department of Biomedical Data Science
Member, Bio-X
Member, Cardiovascular Institute
Faculty Affiliate, Institute for Human-Centered Artificial Intelligence (HAI)
Member, Wu Tsai Human Performance Alliance
Member, Maternal & Child Health Research Institute (MCHRI)
Member, Stanford Cancer Institute
Member, Wu Tsai Neurosciences Institute

Administrative Appointments

Chief Data Scientist, Stanford Healthcare (2022 - Present)
Co-director, Center for Artificial Intelligence in Medicine & Imaging (AIMI) (2020 - Present)
Associate Dean for Research, School of Medicine (2019 - Present)
Associate Director, Stanford Center for Biomedical Research (BMIR) (2013 - Present)
Director, Informatics Core, Stanford Center for Clinical and Translational Research, and Education (Spectrum) (2017 - 2022)
Associate CIO, Data Science, Stanford Healthcare (2018 - 2022)
Executive Committee Member, Biomedical Informatics Graduate Program (2011 - 2021)
Member, Cancer Institute Informatics Steering Committee (2011 - 2015)
Scientific Program Chair, AMIA Summit on Translational Bioinformatics (2011 - 2012)
Advisory Committee Member, Stanford Center for Clinical Informatics (2011 - 2012)

Honors & Awards

Fellow, American College of Medical Informatics (11/2015)
New Investigator Award, American Medical Informatics Association (AMIA) (11/2013)
Biosciences Faculty Award recognizing outstanding teaching contributions, Stanford School of Medicine (06/2012)
Fellow, American Society for Clinical Investigation (04/2016)
Ramoni Best paper award, AMIA Summit on Translational Bioinformatics (03/2013)
Distinguished paper award, AMIA Summit on Translational Bioinformatics (03/2011)
Outstanding paper award, AMIA Summit on Translational Bioinformatics (03/2009)
Outstanding paper award, Summit on Translational Bioinformatics (03/2008)

Professional Education

Postdoctoral, Stanford University, Biomedical Informatics (2007)
PhD, The Pennsylvania State University, Molecular Medicine (2005)
MBBS, Baroda Medical College, Medicine (1999)

Contact

Academic
nigam@stanford.edu
University - Faculty Department: Med/BMIR Position: Professor
- 453 Quarry Rd # 115B
- Palo Alto, California 94304-1419
- (650) 725-6236 (office)
(650) 725-7944 (fax)

Additional Info

Mail Code: 5479
Other Names:
Nigam Shah
ORCID:
https://orcid.org/0000-0001-9385-7158

Current Research and Scholarly Interests

In the past, we have developed methods to analyze multiple datatypes for generating insights. Such as: Detecting skin adverse reactions by analyzing content in a health social network, enabling medical device surveillance, discovering drug adverse events as well as drug-drug interactions from clinical notes using novel methods for processing textual documents. Inferring physical function from wearables data, predicting healthcare utilization from Web search logs and understanding information seeking behavior of health professionals.

Our current research is focused on bringing AI into clinical use, safely, ethically and cost effectively. Research on Responsible AI (https://rail.stanford.edu/) is translated into practice by the Data Science team at Stanford Healthcare. This work is organized in two broad work-streams.

(1) Creation and adoption of foundation models in medicine: Given the high interest in using large language models (LLMs) in medicine, the creation and use of LLMs in medicine needs to be actively shaped by provisioning relevant training data, specifying the desired benefits, and evaluating the benefits via testing in real-world deployments.

(2) Making machine learning models clinically useful: Whether a classifier or prediction model is useful in guiding care depends on the interplay between the model's output, the intervention it triggers, and the intervention’s benefits and harms. Our work stemmed from the effort in improving palliative care using machine learning. Blog posts at HAI summarize our work in easily accessible manner.

2025-26 Courses

Data Driven Medicine
CIM 213 (Spr)
Data Science for Medicine
BMDS 215 (Aut)
Healthcare Technology Operations Management
BMDS 284 (Aut)
Independent Studies (9)
- Bioengineering Problems and Experimental Investigation
  BIOE 191 (Win)
- Biomedical Informatics Teaching Methods
  BMDS 295 (Aut, Win, Spr)
- Directed Reading
  BMDS 299 (Aut, Win, Spr)
- Directed Reading in Medicine
  MED 299 (Aut, Win, Spr, Sum)
- Early Clinical Experience in Medicine
  MED 280 (Aut, Win, Spr, Sum)
- Graduate Research
  MED 399 (Aut, Win, Spr, Sum)
- Medical Scholars Research
  BMDS 370 (Aut, Win, Spr)
- Medical Scholars Research
  MED 370 (Aut, Win, Spr, Sum)
- Undergraduate Research
  MED 199 (Aut, Win, Spr, Sum)
Prior Year Courses
2024-25 Courses
- Data Driven Medicine
  CIM 213 (Spr)
- Data Science for Medicine
  BIOMEDIN 215 (Aut)
2023-24 Courses
- Data Driven Medicine
  BIOMEDIN 225 (Spr)
- Data Science for Medicine
  BIOMEDIN 215 (Aut)
2022-23 Courses
- Data Driven Medicine
  BIOMEDIN 225 (Win)
- Data Science for Medicine
  BIOMEDIN 215 (Aut)

Stanford Advisees

Doctoral Dissertation Reader (AC)
Oana Enache
Postdoctoral Faculty Sponsor
Francois Grolleau, Brenna Li
Doctoral Dissertation Advisor (AC)
Suhana Bedi, Akshay Swaminathan
Doctoral (Program)
Alyssa Unell

Graduate and Fellowship Programs

Biomedical Data Science (Phd Program)
Biomedical Data Science (Masters Program)

All Publications

AI, Health, and Health Care Today and Tomorrow: The JAMA Summit Report on Artificial Intelligence. JAMA Angus, D. C., Khera, R., Lieu, T., Liu, V., Ahmad, F. S., Anderson, B., Bhavani, S. V., Bindman, A., Brennan, T., Celi, L. A., Chen, F., Cohen, I. G., Denniston, A., Desai, S., Embí, P., Faisal, A., Ferryman, K., Gerhart, J., Gross, M., Hernandez-Boussard, T., Howell, M., Johnson, K., Lee, K., Liu, X., Lomis, K., London, A. J., Longhurst, C. A., Mandl, K., McGlynn, E., Mello, M. M., Munoz, F., Ohno-Machado, L., Ouyang, D., Perlis, R., Phillips, A., Rhew, D., Ross, J. S., Saria, S., Schwamm, L., Seymour, C. W., Shah, N. H., Shah, R., Singh, K., Solomon, M., Spates, K., Spector-Bagdady, K., Wang, T., Gichoya, J. W., Weinstein, J., Wiens, J., Bibbins-Domingo, K. 2025

Abstract

Artificial intelligence (AI) is changing health and health care on an unprecedented scale. Though the potential benefits are massive, so are the risks. The JAMA Summit on AI discussed how health and health care AI should be developed, evaluated, regulated, disseminated, and monitored.Health and health care AI is wide-ranging, including clinical tools (eg, sepsis alerts or diabetic retinopathy screening software), technologies used by individuals with health concerns (eg, mobile health apps), tools used by health care systems to improve business operations (eg, revenue cycle management or scheduling), and hybrid tools supporting both business operations (eg, documentation and billing) and clinical activities (eg, suggesting diagnoses or treatment plans). Many AI tools are already widely adopted, especially for medical imaging, mobile health, health care business operations, and hybrid functions like scribing outpatient visits. All these tools can have important health effects (good or bad), but these effects are often not quantified because evaluations are extremely challenging or not required, in part because many are outside the US Food and Drug Administration's regulatory oversight. A major challenge in evaluation is that a tool's effects are highly dependent on the human-computer interface, user training, and setting in which the tool is used. Numerous efforts lay out standards for the responsible use of AI, but most focus on monitoring for safety (eg, detection of model hallucinations) or institutional compliance with various process measures, and do not address effectiveness (ie, demonstration of improved outcomes). Ensuring AI is deployed equitably and in a manner that improves health outcomes or, if improving efficiency of health care delivery, does so safely, requires progress in 4 areas. First, multistakeholder engagement throughout the total product life cycle is needed. This effort would include greater partnership of end users with developers in initial tool creation and greater partnership of developers, regulators, and health care systems in the evaluation of tools as they are deployed. Second, measurement tools for evaluation and monitoring should be developed and disseminated. Beyond proposed monitoring and certification initiatives, this will require new methods and expertise to allow health care systems to conduct or participate in rapid, efficient, and robust evaluations of effectiveness. The third priority is creation of a nationally representative data infrastructure and learning environment to support the generation of generalizable knowledge about health effects of AI tools across different settings. Fourth, an incentive structure should be promoted, using market forces and policy levers, to drive these changes.AI will disrupt every part of health and health care delivery in the coming years. Given the many long-standing problems in health care, this disruption represents an incredible opportunity. However, the odds that this disruption will improve health for all will depend heavily on the creation of an ecosystem capable of rapid, efficient, robust, and generalizable knowledge about the consequences of these tools on health.

View details for DOI 10.1001/jama.2025.18490

View details for PubMedID 41082366
Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review. JAMA Bedi, S., Liu, Y., Orr-Ewing, L., Dash, D., Koyejo, S., Callahan, A., Fries, J. A., Wornow, M., Swaminathan, A., Lehmann, L. S., Hong, H. J., Kashyap, M., Chaurasia, A. R., Shah, N. R., Singh, K., Tazbaz, T., Milstein, A., Pfeffer, M. A., Shah, N. H. 2024

Abstract

Large language models (LLMs) can assist in various health care activities, but current evaluation approaches may not adequately identify the most useful application areas.To summarize existing evaluations of LLMs in health care in terms of 5 components: (1) evaluation data type, (2) health care task, (3) natural language processing (NLP) and natural language understanding (NLU) tasks, (4) dimension of evaluation, and (5) medical specialty.A systematic search of PubMed and Web of Science was performed for studies published between January 1, 2022, and February 19, 2024.Studies evaluating 1 or more LLMs in health care.Three independent reviewers categorized studies via keyword searches based on the data used, the health care tasks, the NLP and NLU tasks, the dimensions of evaluation, and the medical specialty.Of 519 studies reviewed, published between January 1, 2022, and February 19, 2024, only 5% used real patient care data for LLM evaluation. The most common health care tasks were assessing medical knowledge such as answering medical licensing examination questions (44.5%) and making diagnoses (19.5%). Administrative tasks such as assigning billing codes (0.2%) and writing prescriptions (0.2%) were less studied. For NLP and NLU tasks, most studies focused on question answering (84.2%), while tasks such as summarization (8.9%) and conversational dialogue (3.3%) were infrequent. Almost all studies (95.4%) used accuracy as the primary dimension of evaluation; fairness, bias, and toxicity (15.8%), deployment considerations (4.6%), and calibration and uncertainty (1.2%) were infrequently measured. Finally, in terms of medical specialty area, most studies were in generic health care applications (25.6%), internal medicine (16.4%), surgery (11.4%), and ophthalmology (6.9%), with nuclear medicine (0.6%), physical medicine (0.4%), and medical genetics (0.2%) being the least represented.Existing evaluations of LLMs mostly focus on accuracy of question answering for medical examinations, without consideration of real patient care data. Dimensions such as fairness, bias, and toxicity and deployment considerations received limited attention. Future evaluations should adopt standardized applications and metrics, use clinical data, and broaden focus to include a wider range of tasks and specialties.

View details for DOI 10.1001/jama.2024.21700

View details for PubMedID 39405325

View details for PubMedCentralID PMC11480901
External validation of AI models in health should be replaced with recurring local validation. Nature medicine Youssef, A., Pencina, M., Thakur, A., Zhu, T., Clifton, D., Shah, N. H. 2023

View details for DOI 10.1038/s41591-023-02540-z

View details for PubMedID 37853136

View details for PubMedCentralID 9931319
The Stanford Medicine data science ecosystem for clinical and translational research. JAMIA open Callahan, A., Ashley, E., Datta, S., Desai, P., Ferris, T. A., Fries, J. A., Halaas, M., Langlotz, C. P., Mackey, S., Posada, J. D., Pfeffer, M. A., Shah, N. H. 2023; 6 (3): ooad054

Abstract

To describe the infrastructure, tools, and services developed at Stanford Medicine to maintain its data science ecosystem and research patient data repository for clinical and translational research.The data science ecosystem, dubbed the Stanford Data Science Resources (SDSR), includes infrastructure and tools to create, search, retrieve, and analyze patient data, as well as services for data deidentification, linkage, and processing to extract high-value information from healthcare IT systems. Data are made available via self-service and concierge access, on HIPAA compliant secure computing infrastructure supported by in-depth user training.The Stanford Medicine Research Data Repository (STARR) functions as the SDSR data integration point, and includes electronic medical records, clinical images, text, bedside monitoring data and HL7 messages. SDSR tools include tools for electronic phenotyping, cohort building, and a search engine for patient timelines. The SDSR supports patient data collection, reproducible research, and teaching using healthcare data, and facilitates industry collaborations and large-scale observational studies.Research patient data repositories and their underlying data science infrastructure are essential to realizing a learning health system and advancing the mission of academic medical centers. Challenges to maintaining the SDSR include ensuring sufficient financial support while providing researchers and clinicians with maximal access to data and digital infrastructure, balancing tool development with user training, and supporting the diverse needs of users.Our experience maintaining the SDSR offers a case study for academic medical centers developing data science and research informatics infrastructure.

View details for DOI 10.1093/jamiaopen/ooad054

View details for PubMedID 37545984

View details for PubMedCentralID PMC10397535
Creation and Adoption of Large Language Models in Medicine. JAMA Shah, N. H., Entwistle, D., Pfeffer, M. A. 2023

Abstract

Importance: There is increased interest in and potential benefits from using large language models (LLMs) in medicine. However, by simply wondering how the LLMs and the applications powered by them will reshape medicine instead of getting actively involved, the agency in shaping how these tools can be used in medicine is lost.Observations: Applications powered by LLMs are increasingly used to perform medical tasks without the underlying language model being trained on medical records and without verifying their purported benefit in performing those tasks.Conclusions and Relevance: The creation and use of LLMs in medicine need to be actively shaped by provisioning relevant training data, specifying the desired benefits, and evaluating the benefits via testing in real-world deployments.

View details for DOI 10.1001/jama.2023.14217

View details for PubMedID 37548965
The shaky foundations of large language models and foundation models for electronic health records. NPJ digital medicine Wornow, M., Xu, Y., Thapa, R., Patel, B., Steinberg, E., Fleming, S., Pfeffer, M. A., Fries, J., Shah, N. H. 2023; 6 (1): 135

Abstract

The success of foundation models such as ChatGPT and AlphaFold has spurred significant interest in building similar models for electronic medical records (EMRs) to improve patient care and hospital operations. However, recent hype has obscured critical gaps in our understanding of these models' capabilities. In this narrative review, we examine 84 foundation models trained on non-imaging EMR data (i.e., clinical text and/or structured data) and create a taxonomy delineating their architectures, training data, and potential use cases. We find that most models are trained on small, narrowly-scoped clinical datasets (e.g., MIMIC-III) or broad, public biomedical corpora (e.g., PubMed) and are evaluated on tasks that do not provide meaningful insights on their usefulness to health systems. Considering these findings, we propose an improved evaluation framework for measuring the benefits of clinical foundation models that is more closely grounded to metrics that matter in healthcare.

View details for DOI 10.1038/s41746-023-00879-8

View details for PubMedID 37516790

View details for PubMedCentralID 8371605
Discrepancies Between Clearance Summaries and Marketing Materials of Software-Enabled Medical Devices Cleared by the US Food and Drug Administration. JAMA network open Shah, N. H., Mello, M. M. 2023; 6 (7): e2321753

View details for DOI 10.1001/jamanetworkopen.2023.21753

View details for PubMedID 37405777
DEPLOYR: a technical framework for deploying custom real-time machine learning models into the electronic medical record. Journal of the American Medical Informatics Association : JAMIA Corbin, C. K., Maclay, R., Acharya, A., Mony, S., Punnathanam, S., Thapa, R., Kotecha, N., Shah, N. H., Chen, J. H. 2023

Abstract

Heatlhcare institutions are establishing frameworks to govern and promote the implementation of accurate, actionable, and reliable machine learning models that integrate with clinical workflow. Such governance frameworks require an accompanying technical framework to deploy models in a resource efficient, safe and high-quality manner. Here we present DEPLOYR, a technical framework for enabling real-time deployment and monitoring of researcher-created models into a widely used electronic medical record system.We discuss core functionality and design decisions, including mechanisms to trigger inference based on actions within electronic medical record software, modules that collect real-time data to make inferences, mechanisms that close-the-loop by displaying inferences back to end-users within their workflow, monitoring modules that track performance of deployed models over time, silent deployment capabilities, and mechanisms to prospectively evaluate a deployed model's impact.We demonstrate the use of DEPLOYR by silently deploying and prospectively evaluating 12 machine learning models trained using electronic medical record data that predict laboratory diagnostic results, triggered by clinician button-clicks in Stanford Health Care's electronic medical record.Our study highlights the need and feasibility for such silent deployment, because prospectively measured performance varies from retrospective estimates. When possible, we recommend using prospectively estimated performance measures during silent trials to make final go decisions for model deployment.Machine learning applications in healthcare are extensively researched, but successful translations to the bedside are rare. By describing DEPLOYR, we aim to inform machine learning deployment best practices and help bridge the model implementation gap.

View details for DOI 10.1093/jamia/ocad114

View details for PubMedID 37369008
EHR foundation models improve robustness in the presence of temporal distribution shift. Scientific reports Guo, L. L., Steinberg, E., Fleming, S. L., Posada, J., Lemmon, J., Pfohl, S. R., Shah, N., Fries, J., Sung, L. 2023; 13 (1): 3767

Abstract

Temporal distribution shift negatively impacts the performance of clinical prediction models over time. Pretraining foundation models using self-supervised learning on electronic health records (EHR) may be effective in acquiring informative global patterns that can improve the robustness of task-specific models. The objective wasto evaluate the utility of EHR foundation models in improving the in-distribution (ID) and out-of-distribution (OOD) performance of clinical prediction models. Transformer- and gated recurrent unit-based foundation models were pretrained on EHR of up to 1.8M patients (382M coded events) collected within pre-determined year groups (e.g., 2009-2012) and were subsequently used to construct patient representations for patients admitted to inpatient units. These representations were used to train logistic regression models to predict hospital mortality, long length of stay, 30-day readmission, and ICU admission. We compared our EHR foundation models with baseline logistic regression models learned on count-based representations (count-LR) in ID and OOD year groups. Performance was measured using area-under-the-receiver-operating-characteristic curve (AUROC), area-under-the-precision-recall curve, and absolute calibration error. Both transformer and recurrent-based foundation models generally showed better ID and OOD discrimination relative to count-LR and often exhibited less decay in tasks where there is observable degradation of discrimination performance (average AUROC decay of 3% for transformer-based foundation model vs. 7% for count-LR after 5-9years). In addition, the performance and robustness of transformer-based foundation models continued to improve as pretraining set size increased. These results suggest that pretraining EHR foundation models at scale is a useful approach for developing clinical prediction models that perform well in the presence of temporal distribution shift.

View details for DOI 10.1038/s41598-023-30820-8

View details for PubMedID 36882576
A framework to identify ethical concerns with ML-guided care workflows: a case study of mortality prediction to guide advance care planning. Journal of the American Medical Informatics Association : JAMIA Cagliero, D., Deuitch, N., Shah, N., Feudtner, C., Char, D. 2023

Abstract

Identifying ethical concerns with ML applications to healthcare (ML-HCA) before problems arise is now a stated goal of ML design oversight groups and regulatory agencies. Lack of accepted standard methodology for ethical analysis, however, presents challenges. In this case study, we evaluate use of a stakeholder "values-collision" approach to identify consequential ethical challenges associated with an ML-HCA for advanced care planning (ACP). Identification of ethical challenges could guide revision and improvement of the ML-HCA.We conducted semistructured interviews of the designers, clinician-users, affiliated administrators, and patients, and inductive qualitative analysis of transcribed interviews using modified grounded theory.Seventeen stakeholders were interviewed. Five "values-collisions"-where stakeholders disagreed about decisions with ethical implications-were identified: (1) end-of-life workflow and how model output is introduced; (2) which stakeholders receive predictions; (3) benefit-harm trade-offs; (4) whether the ML design team has a fiduciary relationship to patients and clinicians; and, (5) how and if to protect early deployment research from external pressures, like news scrutiny, before research is completed.From these findings, the ML design team prioritized: (1) alternative workflow implementation strategies; (2) clarification that prediction was only evaluated for ACP need, not other mortality-related ends; and (3) shielding research from scrutiny until endpoint driven studies were completed.In this case study, our ethical analysis of this ML-HCA for ACP was able to identify multiple sites of intrastakeholder disagreement that mark areas of ethical and value tension. These findings provided a useful initial ethical screening.

View details for DOI 10.1093/jamia/ocad022

View details for PubMedID 36826400
Assessing the net benefit of machine learning models in the presence of resource constraints. Journal of the American Medical Informatics Association : JAMIA Singh, K., Shah, N. H., Vickers, A. J. 2023

Abstract

OBJECTIVE: The objective of this study is to provide a method to calculate model performance measures in the presence of resource constraints, with a focus on net benefit (NB).MATERIALS AND METHODS: To quantify a model's clinical utility, the Equator Network's TRIPOD guidelines recommend the calculation of the NB, which reflects whether the benefits conferred by intervening on true positives outweigh the harms conferred by intervening on false positives. We refer to the NB achievable in the presence of resource constraints as the realized net benefit (RNB), and provide formulae for calculating the RNB.RESULTS: Using 4 case studies, we demonstrate the degree to which an absolute constraint (eg, only 3 available intensive care unit [ICU] beds) diminishes the RNB of a hypothetical ICU admission model. We show how the introduction of a relative constraint (eg, surgical beds that can be converted to ICU beds for very high-risk patients) allows us to recoup some of the RNB but with a higher penalty for false positives.DISCUSSION: RNB can be calculated in silico before the model's output is used to guide care. Accounting for the constraint changes the optimal strategy for ICU bed allocation.CONCLUSIONS: This study provides a method to account for resource constraints when planning model-based interventions, either to avoid implementations where constraints are expected to play a larger role or to design more creative solutions (eg, converted ICU beds) to overcome absolute constraints when possible.

View details for DOI 10.1093/jamia/ocad006

View details for PubMedID 36810659
Clinical utility gains from incorporating comorbidity and geographic location information into risk estimation equations for atherosclerotic cardiovascular disease. Journal of the American Medical Informatics Association : JAMIA Xu, Y., Foryciarz, A., Steinberg, E., Shah, N. H. 2023

Abstract

There are over 363 customized risk models of the American College of Cardiology and the American Heart Association (ACC/AHA) pooled cohort equations (PCE) in the literature, but their gains in clinical utility are rarely evaluated. We build new risk models for patients with specific comorbidities and geographic locations and evaluate whether performance improvements translate to gains in clinical utility.We retrain a baseline PCE using the ACC/AHA PCE variables and revise it to incorporate subject-level information of geographic location and 2 comorbidity conditions. We apply fixed effects, random effects, and extreme gradient boosting (XGB) models to handle the correlation and heterogeneity induced by locations. Models are trained using 2 464 522 claims records from Optum©'s Clinformatics® Data Mart and validated in the hold-out set (N = 1 056 224). We evaluate models' performance overall and across subgroups defined by the presence or absence of chronic kidney disease (CKD) or rheumatoid arthritis (RA) and geographic locations. We evaluate models' expected utility using net benefit and models' statistical properties using several discrimination and calibration metrics.The revised fixed effects and XGB models yielded improved discrimination, compared to baseline PCE, overall and in all comorbidity subgroups. XGB improved calibration for the subgroups with CKD or RA. However, the gains in net benefit are negligible, especially under low exchange rates.Common approaches to revising risk calculators incorporating extra information or applying flexible models may enhance statistical performance; however, such improvement does not necessarily translate to higher clinical utility. Thus, we recommend future works to quantify the consequences of using risk calculators to guide clinical decisions.

View details for DOI 10.1093/jamia/ocad017

View details for PubMedID 36795076
APLUS: A Python Library for Usefulness Simulations of Machine Learning Models in Healthcare. Journal of biomedical informatics Wornow, M., Gyang Ross, E., Callahan, A., Shah, N. H. 2023: 104319

Abstract

Despite the creation of thousands of machine learning (ML) models, the promise of improving patient care with ML remains largely unrealized. Adoption into clinical practice is lagging, in large part due to disconnects between how ML practitioners evaluate models and what is required for their successful integration into care delivery. Models are just one component of care delivery workflows whose constraints determine clinicians' abilities to act on models' outputs. However, methods to evaluate the usefulness of models in the context of their corresponding workflows are currently limited. To bridge this gap we developed APLUS, a reusable framework for quantitatively assessing via simulation the utility gained from integrating a model into a clinical workflow. We describe the APLUS simulation engine and workflow specification language, and apply it to evaluate a novel ML-based screening pathway for detecting peripheral artery disease at Stanford Health Care.

View details for DOI 10.1016/j.jbi.2023.104319

View details for PubMedID 36791900
Investigating real-world consequences of biases in commonly used clinical calculators. The American journal of managed care Yoo, R. M., Dash, D., Lu, J. H., Genkins, J. Z., Rabbani, N., Fries, J. A., Shah, N. H. 2023; 29 (1): e1-e7

Abstract

OBJECTIVES: To evaluate whether one summary metric of calculator performance sufficiently conveys equity across different demographic subgroups, as well as to evaluate how calculator predictive performance affects downstream health outcomes.STUDY DESIGN: We evaluate 3 commonly used clinical calculators-Model for End-Stage Liver Disease (MELD), CHA2DS2-VASc, and simplified Pulmonary Embolism Severity Index (sPESI)-on the cohort extracted from the Stanford Medicine Research Data Repository, following the cohort selection process as described in respective calculator derivation papers.METHODS: We quantified the predictive performance of the 3 clinical calculators across sex and race. Then, using the clinical guidelines that guide care based on these calculators' output, we quantified potential disparities in subsequent health outcomes.RESULTS: Across the examined subgroups, the MELD calculator exhibited worse performance for female and White populations, CHA2DS2-VASc calculator for the male population, and sPESI for the Black population. The extent to which such performance differences translated into differential health outcomes depended on the distribution of the calculators' scores around the thresholds used to trigger a care action via the corresponding guidelines. In particular, under the old guideline for CHA2DS2-VASc, among those who would not have been offered anticoagulant therapy, the Hispanic subgroup exhibited the highest rate of stroke.CONCLUSIONS: Clinical calculators, even when they do not include variables such as sex and race as inputs, can have very different care consequences across those subgroups. These differences in health care outcomes across subgroups can be explained by examining the distribution of scores and their calibration around the thresholds encoded in the accompanying care guidelines.

View details for DOI 10.37765/ajmc.2023.89306

View details for PubMedID 36716157
Holistic evaluation of large language models for medical tasks with MedHELM. Nature medicine Bedi, S., Cui, H., Fuentes, M., Unell, A., Wornow, M., Banda, J. M., Kotecha, N., Keyes, T., Mai, Y., Oez, M., Qiu, H., Jain, S., Schettini, L., Kashyap, M., Fries, J. A., Swaminathan, A., Chung, P., Haredasht, F. N., Lopez, I., Aali, A., Tse, G., Nayak, A., Vedak, S., Jain, S. S., Patel, B., Fayanju, O., Shah, S., Goh, E., Yao, D. H., Soetikno, B., Reis, E., Gatidis, S., Divi, V., Capasso, R., Saralkar, R., Chiang, C. C., Jindal, J., Pham, T., Ghoddusi, F., Lin, S., Chiou, A. S., Hong, H. J., Roy, M., Gensheimer, M. F., Patel, H., Schulman, K., Dash, D., Char, D., Downing, L., Grolleau, F., Black, K., Mieso, B., Zahedivash, A., Yim, W. W., Sharma, H., Lee, T., Kirsch, H., Lee, J., Ambers, N., Lugtu, C., Sharma, A., Mawji, B., Alekseyev, A., Zhou, V., Kakkar, V., Helzer, J., Revri, A., Bannett, Y., Daneshjou, R., Chen, J., Alsentzer, E., Morse, K., Ravi, N., Aghaeepour, N., Kennedy, V., Chaudhari, A., Wang, T., Koyejo, S., Lungren, M. P., Horvitz, E., Liang, P., Pfeffer, M. A., Shah, N. H. 2026

Abstract

While large language models (LLMs) achieve near-perfect scores on medical licensing exams, these evaluations inadequately reflect the complexity and diversity of real-world clinical practice. Here we introduce MedHELM, an extensible evaluation framework with three contributions. First, a clinician-validated taxonomy organizing medical AI applications into five categories that mirror real clinical tasks-clinical decision support (diagnostic decisions, treatment planning), clinical note generation (visit documentation, procedure reports), patient communication (education materials, care instructions), medical research (literature analysis, clinical data analysis) and administration (scheduling, workflow coordination). These encompass 22 subcategories and 121 specific tasks reflecting daily medical practice. Second, a comprehensive benchmark suite of 37 evaluations covering all subcategories. Third, systematic comparison of nine frontier LLMs-Claude 3.5 Sonnet, Claude 3.7 Sonnet, DeepSeek R1, Gemini 1.5 Pro, Gemini 2.0 Flash, GPT-4o, GPT-4o mini, Llama 3.3 and o3-mini-using an automated LLM-jury evaluation method. Our LLM-jury uses multiple AI evaluators to assess model outputs against expert-defined criteria. Advanced reasoning models (DeepSeek R1, o3-mini) demonstrated superior performance with win rates of 66%, although Claude 3.5 Sonnet achieved comparable results at 15% lower computational cost. These results not only highlight current model capabilities but also demonstrate how MedHELM could enable evidence-based selection of medical AI systems for healthcare applications.

View details for DOI 10.1038/s41591-025-04151-2

View details for PubMedID 41559415

View details for PubMedCentralID 10916499
How to interpret 'zero-shot' results from generative EHR models. Nature medicine Bedi, S., Fries, J. A., Shah, N. H. 2026

View details for DOI 10.1038/s41591-025-04094-8

View details for PubMedID 41501487

View details for PubMedCentralID 11412988
QUEST-AI: A System for Question Generation, Verification, and Refinement using AI for USMLE-Style Exams. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Bedi, S., Fleming, S. L., Chiang, C. C., Morse, K., Kumar, A., Patel, B., Jindal, J. A., Davenport, C., Yamaguchi, C., Shah, N. H. 2025; 30: 54-69

Abstract

The United States Medical Licensing Examination (USMLE) is a critical step in assessing the competence of future physicians, yet the process of creating exam questions and study materials is both time-consuming and costly. While Large Language Models (LLMs), such as OpenAI's GPT-4, have demonstrated proficiency in answering medical exam questions, their potential in generating such questions remains underexplored. This study presents QUEST-AI, a novel system that utilizes LLMs to (1) generate USMLE-style questions, (2) identify and flag incorrect questions, and (3) correct errors in the flagged questions. We evaluated this system's output by constructing a test set of 50 LLM-generated questions mixed with 50 human-generated questions and conducting a two-part assessment with three physicians and two medical students. The assessors attempted to distinguish between LLM and human-generated questions and evaluated the validity of the LLM-generated content. A majority of exam questions generated by QUEST-AI were deemed valid by a panel of three clinicians, with strong correlations between performance on LLM-generated and human-generated questions. This pioneering application of LLMs in medical education could significantly increase the ease and efficiency of developing USMLE-style medical exam content, offering a cost-effective and accessible alternative for exam preparation.

View details for PubMedID 39670361
Evaluating transparency in AI/ML model characteristics for FDA-reviewed medical devices. NPJ digital medicine Mehta, V., Komanduri, A., Bhadouriya, R. S., Mehta, V., Johnson, M. D., Shrestha, P., Nikolov, M., Jain, B., Shah, N., Schulman, K. 2025; 8 (1): 673

Abstract

The rapid integration of artificial intelligence (AI) and machine learning (ML) into medical devices has underscored the need for transparency in regulatory reporting. In 2021, the U.S. Food and Drug Administration (FDA) issued Good Machine Learning Practice (GMLP) principles, but adherence in FDA-reviewed devices remains uncertain. We reviewed 1,012 summaries of safety and effectiveness (SSEDs) for AI/ML-enabled devices approved or cleared by the FDA between 1970 and December 2024. Transparency in model development and performance was assessed using a novel AI Characteristics Transparency Reporting (ACTR) score across 17 categories. The average ACTR score was 3.3 out of 17, with modest improvement by 0.88 points (95% CI, 0.54-1.23) after the 2021 guidelines. Nearly half of devices did not report a clinical study and over half did not report any performance metric. These findings highlight transparency gaps and emphasize the need for enforceable standards to ensure trust in AI/ML medical technologies.

View details for DOI 10.1038/s41746-025-02052-9

View details for PubMedID 41249460

View details for PubMedCentralID 11041443
Target Product Profile to Evaluate the Clinical Utility, Financial Impact, and Ethical Implications of an AI-Based HCM Detection Model Parsa, S., Keyes, T., Dash, D., Mello, M., Salisbury, H., Callahan, A., Goto, S., Salerno, M., Parikh, V., Mahaffey, K., Ashley, E., Shah, N., Jain, S. LIPPINCOTT WILLIAMS & WILKINS. 2025

View details for DOI 10.1161/circ.152.suppl_3.4363340

View details for Web of Science ID 001613939400018
Physician Perspectives on Large Language Models in Healthcare: A Cross-Sectional Survey Study. Applied clinical informatics Hong, H. J., Shah, N., Pfeffer, M. A., Lehmann, L. S. 2025

Abstract

This study aims to evaluate physicians' practices and perspectives regarding large language models (LLMs) in healthcare settings.A cross-sectional survey study was conducted between May and July 2024 comparing physician perspectives at two major academic medical centers (AMCs), one with institutional LLM access and one without. Participants included both clinical faculty and trainees recruited through departmental leadership and snowball sampling. Primary outcomes were current LLM use frequency, ranked importance of evaluation metrics, liability concerns, and preferred learning topics.Among 306 respondents (217 attending physicians [70.9%], 80 trainees [26.1%]), 197 (64.4%) reported using LLMs. The AMC with institutional LLM access reported significantly lower liability concerns (49.2% vs 66.7% reporting high concern; 17.5 percentage points difference [95% CI, 6.8-28.2]; P=.0082). Accuracy was prioritized across all specialties (median rank 1.0 [IQR, 1.0-2.0]). Of the respondents, 287 physicians (94%) requested additional training. Key learning priorities were clinical applications (206 [71.9%]) and risk management (181 [63.1%]). Despite widespread personal use, only 8 physicians (2.6%) recommended LLMs to patients. Notable specialty and demographic variations emerged, with younger physicians showing higher enthusiasm but also elevated legal concerns.This survey study provides insights into physicians' current usage patterns and perspectives on LLMs. Liability concerns appear to be lessened in settings with institutional LLM access. The findings suggest opportunities for medical centers to consider when developing LLM-related policies and educational programs.

View details for DOI 10.1055/a-2735-0527

View details for PubMedID 41167595
Evaluating Treatment Prioritization Rules via Rank-Weighted Average Treatment Effects. Journal of the American Statistical Association Yadlowsky, S., Fleming, S., Shah, N., Brunskill, E., Wager, S. 2025; 120 (549): 38-51

Abstract

There are a number of available methods for selecting whom to prioritize for treatment, including ones based on treatment effect estimation, risk scoring, and hand-crafted rules. We propose rank-weighted average treatment effect (RATE) metrics as a simple and general family of metrics for comparing and testing the quality of treatment prioritization rules. RATE metrics are agnostic as to how the prioritization rules were derived, and only assess how well they identify individuals that benefit the most from treatment. We define a family of RATE estimators and prove a central limit theorem that enables asymptotically exact inference in a wide variety of randomized and observational study settings. RATE metrics subsume a number of existing metrics, including the Qini coefficient, and our analysis directly yields inference methods for these metrics. We showcase RATE in the context of a number of applications, including optimal targeting of aspirin to stroke patients.

View details for DOI 10.1080/01621459.2024.2393466

View details for PubMedID 40248684

View details for PubMedCentralID PMC12002561
Generative artificial intelligence in medicine. Nature medicine Teo, Z. L., Thirunavukarasu, A. J., Elangovan, K., Cheng, H., Moova, P., Soetikno, B., Nielsen, C., Pollreisz, A., Ting, D. S., Morris, R. J., Shah, N. H., Langlotz, C. P., Ting, D. S. 2025

Abstract

Generative artificial intelligence (GAI) can automate a growing number of biomedical tasks, ranging from clinical decision support to design and analysis of research studies. GAI uses machine learning and transformer model architectures to generate useful text, images and sound data in response to user queries. While previous biomedical deep-learning applications have used general-purpose datasets and enormous volumes of labeled data for training, evidence now suggests that GAI models may perform better while requiring less training data-for example, using smaller, domain-specific datasets. Moreover, AI techniques have progressed from fully supervised training to less label-intensive approaches, such as weakly supervised or unsupervised fine-tuning and reinforcement learning. Recent iterations of GAI, such as agents, mixture-of-expert models and reasoning models, have further extended their capabilities to assist with complex and multistage tasks. Here, we provide an overview of recent technical advancements in GAI. We explore the potential of the latest generation of models to improve healthcare for clinicians and patients, and discuss validation approaches using specific examples to illustrate challenges and opportunities for further work.

View details for DOI 10.1038/s41591-025-03983-2

View details for PubMedID 41053447

View details for PubMedCentralID 12082058
TIMER: temporal instruction modeling and evaluation for longitudinal clinical records. NPJ digital medicine Cui, H., Unell, A., Chen, B., Fries, J. A., Alsentzer, E., Koyejo, S., Shah, N. H. 2025; 8 (1): 577

Abstract

Electronic health records (EHRs) contain rich longitudinal information for clinical decision-making, yet LLMs struggle to reason across patient timelines. We introduce TIMER (Temporal Instruction Modeling and Evaluation for Longitudinal Clinical Records), a method to improve LLMs' temporal reasoning over multi-visit EHRs through time-aware instruction tuning. TIMER grounds LLMs in patient-specific temporal contexts by linking each instruction-response pair to specific timestamps, ensuring temporal fidelity throughout the training process. Evaluations show that TIMER-tuned models outperform conventional medical instruction-tuned approaches by 6.6% in completeness on clinician-curated benchmarks, with distribution-matched training demonstrating advantages up to 6.5% in temporal reasoning. Qualitative analyses reveal that using TIMER enhances temporal boundary adherence, trend detection, and chronological precision, necessary for applications such as disease trajectory modeling and treatment response monitoring. Overall, TIMER provides a methodological basis for developing LLMs that can effectively engage with the inherently longitudinal nature of data for patient care. Code is available at TIMER .

View details for DOI 10.1038/s41746-025-01965-9

View details for PubMedID 41006898

View details for PubMedCentralID PMC12475073
Key takeaways from Stanford's symposium on AI for Data Science. Journal of clinical and translational science Desai, M., Auerbach, J., Baker, L., Benjamin-Chung, J., Bondy, M., Boulos, M., Bunning, B. J., Deng, N., Goodman, S. N., Horn, I., Linos, E., Musen, M. A., Sanders, L., Shah, N., Singer, S., Williams, M., Zou, J., Pencina, M. 2025; 9 (1): e237

Abstract

Numerous symposia and conferences have been held to discuss the promise of Artificial Intelligence (AI). Many center on its potential to transform fields like health and medicine, law, education, business, and more. Further, while many AI-focused events include those data scientists involved in developing foundational models, to our knowledge, there has been little attention on AI's role for data science and the data scientist. In a new symposium series with its inaugural debut in December 2024 titled AI for Data Science, thought leaders convened to discuss both the promises and challenges of integrating AI into the workflows of data scientists. A keynote address by Michael Pencina from Duke University together with contributions from three panels covered a wide range of topics including rigor, reproducibility, the training of current and future data scientists, and the potential of AI's integration in public health.

View details for DOI 10.1017/cts.2025.10154

View details for PubMedID 41395171

View details for PubMedCentralID PMC12695503
Key takeaways from Stanford's symposium on AI for Data Science JOURNAL OF CLINICAL AND TRANSLATIONAL SCIENCE Desai, M., Auerbach, J., Baker, L., Benjamin-Chung, J., Bondy, M., Boulos, M., Bunning, B. J., Deng, N., Goodman, S. N., Horn, I., Linos, E., Musen, M. A., Sanders, L., Shah, N., Singer, S., Williams, M., Zou, J., Pencina, M. 2025; 9 (1)

View details for DOI 10.1017/cts.2025.10154

View details for Web of Science ID 001597186100001
Approach to the Postmarket Evaluation of Consumer Wearable Technologies. JAMA cardiology Pundi, K., Bhavnani, S., Seninger, C., Zuckerman, B., Paulsen, J., Aguel, F., Din, N., Viggiano, B., Yoo, R. M., Dalal, N., Go, A. S., Granger, C., Krumholz, H., Lacar, K., Li, R., Lin, S., Mahaffey, K. W., Mahoney, M., McCall, D., Hills, M. T., Harrington, R. A., Hernandez-Boussard, T., Saha, A., Shah, N., Turakhia, M. P. 2025

Abstract

Consumer wearable technologies have wide applications, including some that have US Food and Drug Administration clearance for health-related notifications. While wearable technologies may have premarket testing, validation, and safety evaluation as part of a regulatory authorization process, information on their postmarket use remains limited. The Stanford Center for Digital Health organized 2 pan-stakeholder think tank meetings to develop an organizing concept for empirical research on the postmarket evaluation of consumer-facing wearables.The postmarket evaluation of consumer wearables involves broad consideration of an individual consumer's journey from acquisition, intended and unintended use of the wearable, and access to health care resources on receipt of a notification. For individuals who do access the health care system, a wearable's downstream effects can be studied through appropriate clinical evaluation, delivery of guideline-directed treatments, shared decision-making in areas of clinical equipoise, and analysis of clinical end points and patient harms. Effective postmarket research draws from denominators appropriate to the clinical question, with clearly defined parameters for success and failure. Generalizability related to data completeness and reliability should also be considered. As patients increasingly integrate wearables into their health monitoring, cross-platform data sharing with a focus on privacy and data quality can drive patient-centered innovation and identify opportunities to bridge gaps in medical care.The think tank identified priorities in postmarket research, comprising the journey from consumer to patient and accounting for patient, clinician, health care delivery system, and societal impacts of consumer wearables. Overall, this approach serves not only to organize the study of consumer wearables but also to act as a guidepost for using real-world data in postmarket research.

View details for DOI 10.1001/jamacardio.2025.3006

View details for PubMedID 40928810
Fidelity of Medical Reasoning in Large Language Models. JAMA network open Bedi, S., Jiang, Y., Chung, P., Koyejo, S., Shah, N. 2025; 8 (8): e2526021

View details for DOI 10.1001/jamanetworkopen.2025.26021

View details for PubMedID 40779272
Differential reasoning and chain-of-thought processes in Deepseek-R1 and Open AI o3-mini-high for determining American Society of Anesthesiologists physical status. British journal of anaesthesia Ke, Y. H., Leong, Y. H., Jin, L., Elangovan, K., Abdullah, H. R., Sia, A. T., Ong, J. C., Shah, N. H., Wong, T. Y., Wei Ting, D. S. 2025

View details for DOI 10.1016/j.bja.2025.05.044

View details for PubMedID 40628583
International partnership for governing generative artificial intelligence models in medicine. Nature medicine Ong, J. C., Ning, Y., Collins, G. S., Bitterman, D. S., Beecy, A. N., Chang, R. T., Denniston, A. K., Freyer, O., Gilbert, S., de Hond, A., Leeuwenberg, A. M., Zhao, L., Lim, J. C., Liu, M., Liu, X., Longhurst, C. A., Ma, Y., Qiu, Y., Sarkar, R., Sheng, B., Singh, K., Tan, I. S., Tham, Y. C., Thirunavukarasu, A. J., Ting, D. S., Vogel, S., Zhang, R., Zhao, J., Chapman, W. W., Shah, N. H., Moons, K. G., Wong, T. Y., Liu, N. 2025

View details for DOI 10.1038/s41591-025-03787-4

View details for PubMedID 40588674

View details for PubMedCentralID 12104976
Feasibility of Automated Precharting using GPT-4 in New Specialty Referrals. AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science Liang, A. S., Banda, J. M., Savage, T., Pandya, A., Carey, R., Megwalu, U. C., Chang, M. T., Dash, D., Corbin, C. K., Sharma, A., Thapa, R., Kotecha, N., Shah, N. H., Lee, J. Y., Chen, J. H. 2025; 2025: 312-321

Abstract

This study evaluates the feasibility of using GPT-4 to automate precharting for specialty referrals, focusing on new patients referred to an otolaryngology clinic for nasal congestion. We describe the design decisions and strategies tested in creating this precharting utility, including methods for prompt design and token limit handling. Through iterative testing and building, our tool achieved 95.0% agreement with physician consensus in a small retrospective test sample. Results from a small prospective pilot showed favorable feedback of summaries in a real-world clinical setting, though there was a discrepancy between high intention to use the summary but lower perception of time savings. Our results demonstrate that automated pre-charting with accuracy and clinical relevance can be feasible with large language models such as GPT-4. Our design features can inform the development of vendor chart summarization solutions.

View details for PubMedID 40502261

View details for PubMedCentralID PMC12150724
Answering real-world clinical questions using large language model, retrieval-augmented generation, and agentic systems. Digital health Low, Y. S., Jackson, M. L., Hyde, R. J., Brown, R. E., Sanghavi, N. M., Baldwin, J. D., Pike, C. W., Muralidharan, J., Hui, G., Alexander, N., Hassan, H., Nene, R. V., Pike, M., Pokrzywa, C. J., Vedak, S., Yan, A. P., Yao, D. H., Zipursky, A. R., Dinh, C., Ballentine, P., Derieg, D. C., Polony, V., Chawdry, R. N., Davies, J., Hyde, B. B., Shah, N. H., Gombar, S. 2025; 11: 20552076251348850

Abstract

The practice of evidence-based medicine can be challenging when relevant data are lacking or difficult to contextualize for a specific patient. Large language models (LLMs) could potentially address both challenges by summarizing published literature or generating new studies using real-world data.We submitted 50 clinical questions to five LLM-based systems: OpenEvidence, which uses an LLM for retrieval-augmented generation (RAG); ChatRWD, which uses an LLM as an interface to a data extraction and analysis pipeline; and three general-purpose LLMs (ChatGPT-4, Claude 3 Opus, Gemini 1.5 Pro). Nine independent physicians evaluated the answers for relevance, quality of supporting evidence, and actionability (i.e., sufficient to justify or change clinical practice).General-purpose LLMs rarely produced relevant, evidence-based answers (2-10% of questions). In contrast, RAG-based and agentic LLM systems, respectively, produced relevant, evidence-based answers for 24% (OpenEvidence) to 58% (ChatRWD) of questions. OpenEvidence produced actionable results for 48% of questions with existing evidence, compared to 37% for ChatRWD and <5% for the general-purpose LLMs. ChatRWD provided actionable results for 52% of questions that lacked existing literature compared to <10% for other LLMs.Special-purpose LLM systems greatly outperformed general-purpose LLMs in producing answers to clinical questions. Retrieval-augmented generation-based LLM (OpenEvidence) performed well when existing data were available, while only the agentic ChatRWD was able to provide actionable answers when preexisting studies were lacking.Synergistic systems combining RAG-based evidence summarization and agentic generation of novel evidence could improve the availability of pertinent evidence for patient care.

View details for DOI 10.1177/20552076251348850

View details for PubMedID 40510193

View details for PubMedCentralID PMC12159471
AI in Health Care: The Leadership Role of Board-Certified Clinical Informaticists. Applied clinical informatics Morse, K. E., Pageler, N. M., Shah, N. H., Townsend, T., Sharp, C., Pfeffer, M. A. 2025; 16 (3): 612-613

View details for DOI 10.1055/a-2556-4698

View details for PubMedID 40602801

View details for PubMedCentralID PMC12221687
QUEST-AI: A System for Question Generation, Verification, and Refinement using AI for USMLE-Style Exams. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Bedi, S., Fleming, S. L., Chiang, C. C., Morse, K., Kumar, A., Patel, B., Jindal, J. A., Davenport, C., Yamaguchi, C., Shah, N. H. 2025; 30: 54-69

Abstract

The United States Medical Licensing Examination (USMLE) is a critical step in assessing the competence of future physicians, yet the process of creating exam questions and study materials is both time-consuming and costly. While Large Language Models (LLMs), such as OpenAI's GPT-4, have demonstrated proficiency in answering medical exam questions, their potential in generating such questions remains underexplored. This study presents QUEST-AI, a novel system that utilizes LLMs to (1) generate USMLE-style questions, (2) identify and flag incorrect questions, and (3) correct errors in the flagged questions. We evaluated this system's output by constructing a test set of 50 LLM-generated questions mixed with 50 human-generated questions and conducting a two-part assessment with three physicians and two medical students. The assessors attempted to distinguish between LLM and human-generated questions and evaluated the validity of the LLM-generated content. A majority of exam questions generated by QUEST-AI were deemed valid by a panel of three clinicians, with strong correlations between performance on LLM-generated and human-generated questions. This pioneering application of LLMs in medical education could significantly increase the ease and efficiency of developing USMLE-style medical exam content, offering a cost-effective and accessible alternative for exam preparation.

View details for DOI 10.1142/9789819807024_0005

View details for PubMedID 40299581
Reformulating patient stratification for targeting interventions by accounting for severity of downstream outcomes resulting from disease onset: a case study in sepsis. Journal of the American Medical Informatics Association : JAMIA Kamran, F., Tjandra, D., Valley, T. S., Prescott, H. C., Shah, N. H., Liu, V. X., Horvitz, E., Wiens, J. 2025

Abstract

To quantify differences between (1) stratifying patients by predicted disease onset risk alone and (2) stratifying by predicted disease onset risk and severity of downstream outcomes. We perform a case study of predicting sepsis.We performed a retrospective analysis using observational data from Michigan Medicine at the University of Michigan (U-M) between 2016 and 2020 and the Beth Israel Deaconess Medical Center (BIDMC) between 2008 and 2012. We measured the correlation between the estimated sepsis risk and the estimated effect of sepsis on mortality using Spearman's correlation. We compared patients stratified by sepsis risk with patients stratified by sepsis risk and effect of sepsis on mortality.The U-M and BIDMC cohorts included 7282 and 5942 ICU visits; 7.9% and 8.1% developed sepsis, respectively. Among visits with sepsis, 21.9% and 26.3% experienced mortality at U-M and BIDMC. The effect of sepsis on mortality was weakly correlated with sepsis risk (U-M: 0.35 [95% CI: 0.33-0.37], BIDMC: 0.31 [95% CI: 0.28-0.34]). High-risk patients identified by both stratification approaches overlapped by 66.8% and 52.8% at U-M and BIDMC, respectively. Accounting for risk of mortality identified an older population (U-M: age = 66.0 [interquartile range-IQR: 55.0-74.0] vs age = 63.0 [IQR: 51.0-72.0], BIDMC: age = 74.0 [IQR: 61.0-83.0] vs age = 68.0 [IQR: 59.0-78.0]).Predictive models that guide selective interventions ignore the effect of disease on downstream outcomes. Reformulating patient stratification to account for the estimated effect of disease on downstream outcomes identifies a different population compared to stratification on disease risk alone.Models that predict the risk of disease and ignore the effects of disease on downstream outcomes could be suboptimal for stratification.

View details for DOI 10.1093/jamia/ocaf036

View details for PubMedID 40127468
Red teaming ChatGPT in medicine to yield real-world insights on model behavior. NPJ digital medicine Chang, C. T., Farah, H., Gui, H., Rezaei, S. J., Bou-Khalil, C., Park, Y. J., Swaminathan, A., Omiye, J. A., Kolluri, A., Chaurasia, A., Lozano, A., Heiman, A., Jia, A. S., Kaushal, A., Jia, A., Iacovelli, A., Yang, A., Salles, A., Singhal, A., Narasimhan, B., Belai, B., Jacobson, B. H., Li, B., Poe, C. H., Sanghera, C., Zheng, C., Messer, C., Kettud, D. V., Pandya, D., Kaur, D., Hla, D., Dindoust, D., Moehrle, D., Ross, D., Chou, E., Lin, E., Haredasht, F. N., Cheng, G., Gao, I., Chang, J., Silberg, J., Fries, J. A., Xu, J., Jamison, J., Tamaresis, J. S., Chen, J. H., Lazaro, J., Banda, J. M., Lee, J. J., Matthys, K. E., Steffner, K. R., Tian, L., Pegolotti, L., Srinivasan, M., Manimaran, M., Schwede, M., Zhang, M., Nguyen, M., Fathzadeh, M., Zhao, Q., Bajra, R., Khurana, R., Azam, R., Bartlett, R., Truong, S. T., Fleming, S. L., Raj, S., Behr, S., Onyeka, S., Muppidi, S., Bandali, T., Eulalio, T. Y., Chen, W., Zhou, X., Ding, Y., Cui, Y., Tan, Y., Liu, Y., Shah, N., Daneshjou, R. 2025; 8 (1): 149

Abstract

Red teaming, the practice of adversarially exposing unexpected or undesired model behaviors, is critical towards improving equity and accuracy of large language models, but non-model creator-affiliated red teaming is scant in healthcare. We convened teams of clinicians, medical and engineering students, and technical professionals (80 participants total) to stress-test models with real-world clinical cases and categorize inappropriate responses along axes of safety, privacy, hallucinations/accuracy, and bias. Six medically-trained reviewers re-analyzed prompt-response pairs and added qualitative annotations. Of 376 unique prompts (1504 responses), 20.1% were inappropriate (GPT-3.5: 25.8%; GPT-4.0: 16%; GPT-4.0 with Internet: 17.8%). Subsequently, we show the utility of our benchmark by testing GPT-4o, a model released after our event (20.4% inappropriate). 21.5% of responses appropriate with GPT-3.5 were inappropriate in updated models. We share insights for constructing red teaming prompts, and present our benchmark for iterative model assessments.

View details for DOI 10.1038/s41746-025-01542-0

View details for PubMedID 40055532

View details for PubMedCentralID 10564921
Against reflexive recalibration: towards a causal framework for addressing miscalibration. Diagnostic and prognostic research Swaminathan, A., Srivastava, U., Tu, L., Lopez, I., Shah, N. H., Vickers, A. J. 2025; 9 (1): 4

View details for DOI 10.1186/s41512-024-00184-2

View details for PubMedID 39930530
Artificial Intelligence In Health And Health Care: Priorities For Action. Health affairs (Project Hope) Matheny, M. E., Goldsack, J. C., Saria, S., Shah, N. H., Gerhart, J., Cohen, I. G., Price, W. N., Patel, B., Payne, P. R., Embí, P. J., Anderson, B., Horvitz, E. 2025: 101377hlthaff202401003

Abstract

The field of artificial intelligence (AI) has entered a new cycle of intense opportunity, fueled by advances in deep learning, including generative AI. Applications of recent advances affect many aspects of everyday life, yet nowhere is it more important to use this technology safely, effectively, and equitably than in health and health care. Here, as part of the National Academy of Medicine's Vital Directions for Health and Health Care: Priorities for 2025 initiative, which is designed to provide guidance on pressing health care issues for the incoming presidential administration, we describe the steps needed to achieve these goals. We focus on four strategic areas: ensuring safe, effective, and trustworthy use of AI; promotion and development of an AI-competent health care workforce; investing in AI research to support the science, practice, and delivery of health and health care; and promotion of policies and procedures to clarify AI liability and responsibilities.

View details for DOI 10.1377/hlthaff.2024.01003

View details for PubMedID 39841940
Clinical entity augmented retrieval for clinical information extraction. NPJ digital medicine Lopez, I., Swaminathan, A., Vedula, K., Narayanan, S., Nateghi Haredasht, F., Ma, S. P., Liang, A. S., Tate, S., Maddali, M., Gallo, R. J., Shah, N. H., Chen, J. H. 2025; 8 (1): 45

Abstract

Large language models (LLMs) with retrieval-augmented generation (RAG) have improved information extraction over previous methods, yet their reliance on embeddings often leads to inefficient retrieval. We introduce CLinical Entity Augmented Retrieval (CLEAR), a RAG pipeline that retrieves information using entities. We compared CLEAR to embedding RAG and full-note approaches for extracting 18 variables using six LLMs across 20,000 clinical notes. Average F1 scores were 0.90, 0.86, and 0.79; inference times were 4.95, 17.41, and 20.08 s per note; average model queries were 1.68, 4.94, and 4.18 per note; and average input tokens were 1.1k, 3.8k, and 6.1k per note for CLEAR, embedding RAG, and full-note approaches, respectively. In conclusion, CLEAR utilizes clinical entities for information retrieval and achieves >70% reduction in token usage and inference time with improved performance compared to modern methods.

View details for DOI 10.1038/s41746-024-01377-1

View details for PubMedID 39828800

View details for PubMedCentralID 4287068
Toward expert-level medical question answering with large language models. Nature medicine Singhal, K., Tu, T., Gottweis, J., Sayres, R., Wulczyn, E., Amin, M., Hou, L., Clark, K., Pfohl, S. R., Cole-Lewis, H., Neal, D., Rashid, Q. M., Schaekermann, M., Wang, A., Dash, D., Chen, J. H., Shah, N. H., Lachgar, S., Mansfield, P. A., Prakash, S., Green, B., Dominowska, E., Agüera Y Arcas, B., Tomašev, N., Liu, Y., Wong, R., Semturs, C., Mahdavi, S. S., Barral, J. K., Webster, D. R., Corrado, G. S., Matias, Y., Azizi, S., Karthikesalingam, A., Natarajan, V. 2025

Abstract

Large language models (LLMs) have shown promise in medical question answering, with Med-PaLM being the first to exceed a 'passing' score in United States Medical Licensing Examination style questions. However, challenges remain in long-form medical question answering and handling real-world workflows. Here, we present Med-PaLM 2, which bridges these gaps with a combination of base LLM improvements, medical domain fine-tuning and new strategies for improving reasoning and grounding through ensemble refinement and chain of retrieval. Med-PaLM 2 scores up to 86.5% on the MedQA dataset, improving upon Med-PaLM by over 19%, and demonstrates dramatic performance increases across MedMCQA, PubMedQA and MMLU clinical topics datasets. Our detailed human evaluations framework shows that physicians prefer Med-PaLM 2 answers to those from other physicians on eight of nine clinical axes. Med-PaLM 2 also demonstrates significant improvements over its predecessor across all evaluation metrics, particularly on new adversarial datasets designed to probe LLM limitations (P < 0.001). In a pilot study using real-world medical questions, specialists preferred Med-PaLM 2 answers to generalist physician answers 65% of the time. While specialist answers were still preferred overall, both specialists and generalists rated Med-PaLM 2 to be as safe as physician answers, demonstrating its growing potential in real-world medical applications.

View details for DOI 10.1038/s41591-024-03423-7

View details for PubMedID 39779926

View details for PubMedCentralID 10396962
From Better Models to Better Care. JACC. Heart failure Shah, N. H., Jain, S. S. 2025; 13 (1): 88-90

View details for DOI 10.1016/j.jchf.2024.09.021

View details for PubMedID 39779184
Feasibility of Automated Precharting using GPT-4 in New Specialty Referrals. AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science Liang, A. S., Banda, J. M., Savage, T., Pandya, A., Carey, R., Megwalu, U. C., Chang, M. T., Dash, D., Corbin, C. K., Sharma, A., Thapa, R., Kotecha, N., Shah, N. H., Lee, J. Y., Chen, J. H. 2025; 2025: 312-321

Abstract

This study evaluates the feasibility of using GPT-4 to automate precharting for specialty referrals, focusing on new patients referred to an otolaryngology clinic for nasal congestion. We describe the design decisions and strategies tested in creating this precharting utility, including methods for prompt design and token limit handling. Through iterative testing and building, our tool achieved 95.0% agreement with physician consensus in a small retrospective test sample. Results from a small prospective pilot showed favorable feedback of summaries in a real-world clinical setting, though there was a discrepancy between high intention to use the summary but lower perception of time savings. Our results demonstrate that automated pre-charting with accuracy and clinical relevance can be feasible with large language models such as GPT-4. Our design features can inform the development of vendor chart summarization solutions.

View details for PubMedID 40502261
Developing a Research Center for Artificial Intelligence in Medicine. Mayo Clinic proceedings. Digital health Langlotz, C. P., Kim, J., Shah, N., Lungren, M. P., Larson, D. B., Datta, S., Li, F. F., O'Hara, R., Montine, T. J., Harrington, R. A., Gold, G. E. 2024; 2 (4): 677-686

Abstract

Artificial intelligence (AI) and machine learning (ML) are driving innovation in biosciences and are already affecting key elements of medical scholarship and clinical care. Many schools of medicine are capitalizing on the promise of these new technologies by establishing academic units to catalyze and grow research and innovation in AI/ML. At Stanford University, we have developed a successful model for an AI/ML research center with support from academic leaders, clinical departments, extramural grants, and industry partners. The Center for Artificial Intelligence in Medicine and Imaging uses the following 4 key tactics to support AI/ML research: project-based learning opportunities that build interdisciplinary collaboration; internal grant programs that catalyze extramural funding; infrastructure that facilitates the rapid creation of large multimodal AI-ready clinical data sets; and educational and open data programs that engage the broader research community. The center is based on the premise that foundational and applied research are not in tension but instead are complementary. Solving important biomedical problems with AI/ML requires high-quality foundational team science that incorporates the knowledge and expertise of clinicians, clinician scientists, computer scientists, and data scientists. As AI/ML becomes an essential component of research and clinical care, multidisciplinary centers of excellence in AI/ML will become a key part of the scholarly portfolio of academic medical centers and will provide a foundation for the responsible, ethical, and fair implementation of AI/ML systems.

View details for DOI 10.1016/j.mcpdig.2024.07.005

View details for PubMedID 39802660

View details for PubMedCentralID PMC11720458
The Coming AI Revolution in ClinicalTrials. Journal of the American College of Cardiology Jain, S. S., Sarraju, A., Shah, N. H., Schulman, K. A., Ashley, E. A., Harrington, R. A., Mahaffey, K. W. 2024

View details for DOI 10.1016/j.jacc.2024.10.093

View details for PubMedID 39641738
Evaluating Treatment Prioritization Rules via Rank-Weighted Average Treatment Effects JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION Yadlowsky, S., Fleming, S., Shah, N., Brunskill, E., Wager, S. 2024

View details for DOI 10.1080/01621459.2024.2393466

View details for Web of Science ID 001330048300001
Automated patient selection and care coaches to increase advance care planning for cancer patients. Journal of the National Cancer Institute Gensheimer, M. F., Teuteberg, W., Patel, M. I., Gupta, D., Noroozi, M., Ling, X., Fardeen, T., Seevaratnam, B., Lu, Y., Alves, N., Rogers, B., Asuncion, M. K., Denofrio, J., Hansen, J., Shah, N. H., Chen, T., Cabebe, E., Blayney, D. W., Colevas, A. D., Ramchandran, K. 2024

Abstract

Advance care planning/serious illness conversations can help clinicians understand patients' values and preferences. There are limited data on how to increase these conversations, and their effect on care patterns. We hypothesized that using a machine learning survival model to select patients for serious illness conversations, along with trained care coaches to conduct the conversations, would increase uptake in cancer patients at high risk of short-term mortality.We conducted a cluster-randomized stepped wedge study on the physician level. Oncologists entered the intervention condition in a random order over six months. Adult patients with metastatic cancer were included. Patients with <2 year computer-predicted survival and no prognosis documentation were classified as high-priority for serious illness conversations. In the intervention condition, providers received automated weekly emails highlighting high-priority patients and were asked to document prognosis for them. Care coaches reached out to these patients to conduct the remainder of the conversation. The primary endpoint was proportion of visits with prognosis documentation within 14 days.6,372 visits in 1,825 patients were included in the primary analysis. The proportion of visits with prognosis documentation within 14 days was higher in the intervention condition than control condition: 2.9% vs 1.1% (adjusted odds ratio 4.3, p < .0001). The proportion of visits with advance care planning documentation was also higher in the intervention condition: 7.7% vs 1.8% (adjusted odds ratio 14.2, p < .0001). In high-priority visits, advance care planning documentation rate in intervention/control visits was 24.2% vs 4.0%.The intervention increased documented conversations, with contributions by both providers and care coaches.

View details for DOI 10.1093/jnci/djae243

View details for PubMedID 39348179
Avoiding Financial Toxicity for Patients from Clinicians' Use of AI. The New England journal of medicine Jain, S. S., Mello, M. M., Shah, N. H. 2024

View details for DOI 10.1056/NEJMp2406135

View details for PubMedID 39348681
Standing on FURM Ground: A Framework for Evaluating Fair, Useful, and Reliable AI Models in Health Care Systems NEJM CATALYST INNOVATIONS IN CARE DELIVERY Callahan, A., McElfresh, D., Banda, J. M., Bunney, G., Char, D., Chen, J., Corbin, C. K., Dash, D., Downing, N. L., Jain, S. S., Kotecha, N., Masterson, J., Mello, M. M., Morse, K., Nallan, S., Pandya, A., Revri, A., Sharma, A., Sharp, C., Thapa, R., Wornow, M., Youssef, A., Pfeffer, M. A., Shah, N. H. 2024; 5 (10)

View details for DOI 10.1056/CAT.24.0131

View details for Web of Science ID 001422126900001
The Need for Continuous Evaluation of Artificial Intelligence Prediction Algorithms. JAMA network open Shah, N. H., Pfeffer, M. A., Ghassemi, M. 2024; 7 (9): e2433009

View details for DOI 10.1001/jamanetworkopen.2024.33009

View details for PubMedID 39264634
Evaluating the clinical benefits of LLMs. Nature medicine Bedi, S., Jain, S. S., Shah, N. H. 2024

View details for DOI 10.1038/s41591-024-03181-6

View details for PubMedID 39060659

View details for PubMedCentralID 9931230
Automating the Enterprise with Foundation Models PROCEEDINGS OF THE VLDB ENDOWMENT Wornow, M., Narayan, A., Opsahl-Ong, K., McIntyre, Q., Shah, N., Re, C. 2024; 17 (11): 2805-2812

View details for DOI 10.14778/3681954.3681964

View details for Web of Science ID 001416290800010
Merlin: A Vision Language Foundation Model for 3D Computed Tomography. Research square Blankemeier, L., Cohen, J. P., Kumar, A., Veen, D. V., Gardezi, S., Paschali, M., Chen, Z., Delbrouck, J. B., Reis, E., Truyts, C., Bluethgen, C., Jensen, M., Ostmeier, S., Varma, M., Valanarasu, J., Fang, Z., Huo, Z., Nabulsi, Z., Ardila, D., Weng, W. H., Junior, E. A., Ahuja, N., Fries, J., Shah, N., Johnston, A., Boutin, R., Wentland, A., Langlotz, C., Hom, J., Gatidis, S., Chaudhari, A. 2024

Abstract

Over 85 million computed tomography (CT) scans are performed annually in the US, of which approximately one quarter focus on the abdomen. Given the current shortage of both general and specialized radiologists, there is a large impetus to use artificial intelligence to alleviate the burden of interpreting these complex imaging studies while simultaneously using the images to extract novel physiological insights. Prior state-of-the-art approaches for automated medical image interpretation leverage vision language models (VLMs) that utilize both the image and the corresponding textual radiology reports. However, current medical VLMs are generally limited to 2D images and short reports. To overcome these shortcomings for abdominal CT interpretation, we introduce Merlin - a 3D VLM that leverages both structured electronic health records (EHR) and unstructured radiology reports for pretraining without requiring additional manual annotations. We train Merlin using a high-quality clinical dataset of paired CT scans (6+ million images from 15,331 CTs), EHR diagnosis codes (1.8+ million codes), and radiology reports (6+ million tokens) for training. We comprehensively evaluate Merlin on 6 task types and 752 individual tasks. The non-adapted (off-the-shelf) tasks include zero-shot findings classification (31 findings), phenotype classification (692 phenotypes), and zero-shot cross-modal retrieval (image to findings and image to impressions), while model adapted tasks include 5-year chronic disease prediction (6 diseases), radiology report generation, and 3D semantic segmentation (20 organs). We perform internal validation on a test set of 5,137 CTs, and external validation on 7,000 clinical CTs and on two public CT datasets (VerSe, TotalSegmentator). Beyond these clinically-relevant evaluations, we assess the efficacy of various network architectures and training strategies to depict that Merlin has favorable performance to existing task-specific baselines. We derive data scaling laws to empirically assess training data needs for requisite downstream task performance. Furthermore, unlike conventional VLMs that require hundreds of GPUs for training, we perform all training on a single GPU. This computationally efficient design can help democratize foundation model training, especially for health systems with compute constraints. We plan to release our trained models, code, and dataset, pending manual removal of all protected health information.

View details for DOI 10.21203/rs.3.rs-4546309/v1

View details for PubMedID 38978576

View details for PubMedCentralID PMC11230513
A multi-center study on the adaptability of a shared foundation model for electronic health records. NPJ digital medicine Guo, L. L., Fries, J., Steinberg, E., Fleming, S. L., Morse, K., Aftandilian, C., Posada, J., Shah, N., Sung, L. 2024; 7 (1): 171

Abstract

Foundation models are transforming artificial intelligence (AI) in healthcare by providing modular components adaptable for various downstream tasks, making AI development more scalable and cost-effective. Foundation models for structured electronic health records (EHR), trained on coded medical records from millions of patients, demonstrated benefits including increased performance with fewer training labels, and improved robustness to distribution shifts. However, questions remain on the feasibility of sharing these models across hospitals and their performance in local tasks. This multi-center study examined the adaptability of a publicly accessible structured EHR foundation model (FMSM), trained on 2.57 M patient records from Stanford Medicine. Experiments used EHR data from The Hospital for Sick Children (SickKids) and Medical Information Mart for Intensive Care (MIMIC-IV). We assessed both adaptability via continued pretraining on local data, and task adaptability compared to baselines of locally training models from scratch, including a local foundation model. Evaluations on 8 clinical prediction tasks showed that adapting the off-the-shelf FMSM matched the performance of gradient boosting machines (GBM) locally trained on all data while providing a 13% improvement in settings with few task-specific training labels. Continued pretraining on local data showed FMSM required fewer than 1% of training examples to match the fully trained GBM's performance, and was 60 to 90% more sample-efficient than training local foundation models from scratch. Our findings demonstrate that adapting EHR foundation models across hospitals provides improved prediction performance at less cost, underscoring the utility of base foundation models as modular components to streamline the development of healthcare AI.

View details for DOI 10.1038/s41746-024-01166-w

View details for PubMedID 38937550

View details for PubMedCentralID 10396962
Large Language Models in Medicine: Addressing Ethical Challenges Chang, Y., Ong, J., William, W., Butte, A. J., Shah, N. H., Chew, L., Liu, N., Doshi-Velez, F., Lu, W., Savulescu, J., Ting, D. ASSOC RESEARCH VISION OPHTHALMOLOGY INC. 2024

View details for Web of Science ID 001312227701059
Ethical and regulatory challenges of large language models in medicine. The Lancet. Digital health Ong, J. C., Chang, S. Y., William, W., Butte, A. J., Shah, N. H., Chew, L. S., Liu, N., Doshi-Velez, F., Lu, W., Savulescu, J., Ting, D. S. 2024

Abstract

With the rapid growth of interest in and use of large language models (LLMs) across various industries, we are facing some crucial and profound ethical concerns, especially in the medical field. The unique technical architecture and purported emergent abilities of LLMs differentiate them substantially from other artificial intelligence (AI) models and natural language processing techniques used, necessitating a nuanced understanding of LLM ethics. In this Viewpoint, we highlight ethical concerns stemming from the perspectives of users, developers, and regulators, notably focusing on data privacy and rights of use, data provenance, intellectual property contamination, and broad applications and plasticity of LLMs. A comprehensive framework and mitigating strategies will be imperative for the responsible integration of LLMs into medical practice, ensuring alignment with ethical principles and safeguarding against potential societal risks.

View details for DOI 10.1016/S2589-7500(24)00061-X

View details for PubMedID 38658283
Scalable Approach to Consumer Wearable Postmarket Surveillance: Development and Validation Study. JMIR medical informatics Yoo, R. M., Viggiano, B. T., Pundi, K. N., Fries, J. A., Zahedivash, A., Podchiyska, T., Din, N., Shah, N. H. 2024; 12: e51171

Abstract

Background: With the capability to render prediagnoses, consumer wearables have the potential to affect subsequent diagnoses and the level of care in the health care delivery setting. Despite this, postmarket surveillance of consumer wearables has been hindered by the lack of codified terms in electronic health records (EHRs) to capture wearable use.Objective: We sought to develop a weak supervision-based approach to demonstrate the feasibility and efficacy of EHR-based postmarket surveillance on consumer wearables that render atrial fibrillation (AF) prediagnoses.Methods: We applied data programming, where labeling heuristics are expressed as code-based labeling functions, to detect incidents of AF prediagnoses. A labeler model was then derived from the predictions of the labeling functions using the Snorkel framework. The labeler model was applied to clinical notes to probabilistically label them, and the labeled notes were then used as a training set to fine-tune a classifier called Clinical-Longformer. The resulting classifier identified patients with an AF prediagnosis. A retrospective cohort study was conducted, where the baseline characteristics and subsequent care patterns of patients identified by the classifier were compared against those who did not receive a prediagnosis.Results: The labeler model derived from the labeling functions showed high accuracy (0.92; F1-score=0.77) on the training set. The classifier trained on the probabilistically labeled notes accurately identified patients with an AF prediagnosis (0.95; F1-score=0.83). The cohort study conducted using the constructed system carried enough statistical power to verify the key findings of the Apple Heart Study, which enrolled a much larger number of participants, where patients who received a prediagnosis tended to be older, male, and White with higher CHA2DS2-VASc (congestive heart failure, hypertension, age ≥75 years, diabetes, stroke, vascular disease, age 65-74 years, sex category) scores (P<.001). We also made a novel discovery that patients with a prediagnosis were more likely to use anticoagulants (525/1037, 50.63% vs 5936/16,560, 35.85%) and have an eventual AF diagnosis (305/1037, 29.41% vs 262/16,560, 1.58%). At the index diagnosis, the existence of a prediagnosis did not distinguish patients based on clinical characteristics, but did correlate with anticoagulant prescription (P=.004 for apixaban and P=.01 for rivaroxaban).Conclusions: Our work establishes the feasibility and efficacy of an EHR-based surveillance system for consumer wearables that render AF prediagnoses. Further work is necessary to generalize these findings for patient populations at other sites.

View details for DOI 10.2196/51171

View details for PubMedID 38596848
MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records. Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence Fleming, S. L., Lozano, A., Haberkorn, W. J., Jindal, J. A., Reis, E., Thapa, R., Blankemeier, L., Genkins, J. Z., Steinberg, E., Nayak, A., Patel, B., Chiang, C. C., Callahan, A., Huo, Z., Gatidis, S., Adams, S., Fayanju, O., Shah, S. J., Savage, T., Goh, E., Chaudhari, A. S., Aghaeepour, N., Sharp, C., Pfeffer, M. A., Liang, P., Chen, J. H., Morse, K. E., Brunskill, E. P., Fries, J. A., Shah, N. H. 2024; 38 (20): 22021-22030

Abstract

The ability of large language models (LLMs) to follow natural language instructions with human-level fluency suggests many opportunities in healthcare to reduce administrative burden and improve quality of care. However, evaluating LLMs on realistic text generation tasks for healthcare remains challenging. Existing question answering datasets for electronic health record (EHR) data fail to capture the complexity of information needs and documentation burdens experienced by clinicians. To address these challenges, we introduce MedAlign, a benchmark dataset of 983 natural language instructions for EHR data. MedAlign is curated by 15 clinicians (7 specialities), includes clinician-written reference responses for 303 instructions, and provides 276 longitudinal EHRs for grounding instruction-response pairs. We used MedAlign to evaluate 6 general domain LLMs, having clinicians rank the accuracy and quality of each LLM response. We found high error rates, ranging from 35% (GPT-4) to 68% (MPT-7B-Instruct), and 8.3% drop in accuracy moving from 32k to 2k context lengths for GPT-4. Finally, we report correlations between clinician rankings and automated natural language generation metrics as a way to rank LLMs without human review. MedAlign is provided under a research data use agreement to enable LLM evaluations on tasks aligned with clinician needs and preferences.

View details for DOI 10.1609/aaai.v38i20.30205

View details for PubMedID 41584261

View details for PubMedCentralID PMC12826664
Ensuring useful adoption of generative artificial intelligence in healthcare. Journal of the American Medical Informatics Association : JAMIA Jindal, J. A., Lungren, M. P., Shah, N. H. 2024

Abstract

This article aims to examine how generative artificial intelligence (AI) can be adopted with the most value in health systems, in response to the Executive Order on AI.We reviewed how technology has historically been deployed in healthcare, and evaluated recent examples of deployments of both traditional AI and generative AI (GenAI) with a lens on value.Traditional AI and GenAI are different technologies in terms of their capability and modes of current deployment, which have implications on value in health systems.Traditional AI when applied with a framework top-down can realize value in healthcare. GenAI in the short term when applied top-down has unclear value, but encouraging more bottom-up adoption has the potential to provide more benefit to health systems and patients.GenAI in healthcare can provide the most value for patients when health systems adapt culturally to grow with this new technology and its adoption patterns.

View details for DOI 10.1093/jamia/ocae043

View details for PubMedID 38452298
Characterizing the limitations of using diagnosis codes in the context of machine learning for healthcare. BMC medical informatics and decision making Guo, L. L., Morse, K. E., Aftandilian, C., Steinberg, E., Fries, J., Posada, J., Fleming, S. L., Lemmon, J., Jessa, K., Shah, N., Sung, L. 2024; 24 (1): 51

Abstract

Diagnostic codes are commonly used as inputs for clinical prediction models, to create labels for prediction tasks, and to identify cohorts for multicenter network studies. However, the coverage rates of diagnostic codes and their variability across institutions are underexplored. The primary objective was to describe lab- and diagnosis-based labels for 7 selected outcomes at three institutions. Secondary objectives were to describe agreement, sensitivity, and specificity of diagnosis-based labels against lab-based labels.This study included three cohorts: SickKids from The Hospital for Sick Children, and StanfordPeds and StanfordAdults from Stanford Medicine. We included seven clinical outcomes with lab-based definitions: acute kidney injury, hyperkalemia, hypoglycemia, hyponatremia, anemia, neutropenia and thrombocytopenia. For each outcome, we created four lab-based labels (abnormal, mild, moderate and severe) based on test result and one diagnosis-based label. Proportion of admissions with a positive label were presented for each outcome stratified by cohort. Using lab-based labels as the gold standard, agreement using Cohen's Kappa, sensitivity and specificity were calculated for each lab-based severity level.The number of admissions included were: SickKids (n = 59,298), StanfordPeds (n = 24,639) and StanfordAdults (n = 159,985). The proportion of admissions with a positive diagnosis-based label was significantly higher for StanfordPeds compared to SickKids across all outcomes, with odds ratio (99.9% confidence interval) for abnormal diagnosis-based label ranging from 2.2 (1.7-2.7) for neutropenia to 18.4 (10.1-33.4) for hyperkalemia. Lab-based labels were more similar by institution. When using lab-based labels as the gold standard, Cohen's Kappa and sensitivity were lower at SickKids for all severity levels compared to StanfordPeds.Across multiple outcomes, diagnosis codes were consistently different between the two pediatric institutions. This difference was not explained by differences in test results. These results may have implications for machine learning model development and deployment.

View details for DOI 10.1186/s12911-024-02449-8

View details for PubMedID 38355486

View details for PubMedCentralID PMC10868117
Health AI Assurance Laboratories-Reply. JAMA Shah, N. H., Halamka, J. D., Anderson, B. 2024

View details for DOI 10.1001/jama.2024.1087

View details for PubMedID 38345790
Clinfo.ai: An Open-Source Retrieval-Augmented Large Language Model System for Answering Medical Questions using Scientific Literature. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Lozano, A., Fleming, S. L., Chiang, C., Shah, N. 2024; 29: 8-23

Abstract

The quickly-expanding nature of published medical literature makes it challenging for clinicians and researchers to keep up with and summarize recent, relevant findings in a timely manner. While several closed-source summarization tools based on large language models (LLMs) now exist, rigorous and systematic evaluations of their outputs are lacking. Furthermore, there is a paucity of high-quality datasets and appropriate benchmark tasks with which to evaluate these tools. We address these issues with four contributions: we release Clinfo.ai, an open-source WebApp that answers clinical questions based on dynamically retrieved scientific literature; we specify an information retrieval and abstractive summarization task to evaluate the performance of such retrieval-augmented LLM systems; we release a dataset of 200 questions and corresponding answers derived from published systematic reviews, which we name PubMed Retrieval and Synthesis (PubMedRS-200); and report benchmark results for Clinfo.ai and other publicly available OpenQA systems on PubMedRS-200.

View details for PubMedID 38160266
MEDALIGN: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records Fleming, S. L., Lozano, A., Haberkorn, W. J., Jindal, J. A., Reis, E., Thapa, R., Blankemeier, L., Genkins, J. Z., Steinberg, E., Nayak, A., Patel, B., Chiang, C., Callahan, A., Huo, Z., Gatidis, S., Adams, S., Fayanju, O., Shah, S. J., Savage, T., Goh, E., Chaudhari, A. S., Aghaeepour, N., Sharp, C., Pfeffer, M. A., Liang, P., Chen, J. H., Morse, K. E., Brunskill, E. P., Fries, J. A., Shah, N. H. edited by Wooldridge, M., Dy, J., Natarajan, S. ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE. 2024: 22021-22030

View details for Web of Science ID 001239985800017
A Nationwide Network of Health AI Assurance Laboratories. JAMA Shah, N. H., Halamka, J. D., Saria, S., Pencina, M., Tazbaz, T., Tripathi, M., Callahan, A., Hildahl, H., Anderson, B. 2023

Abstract

Importance: Given the importance of rigorous development and evaluation standards needed of artificial intelligence (AI) models used in health care, nationwide accepted procedures to provide assurance that the use of AI is fair, appropriate, valid, effective, and safe are urgently needed.Observations: While there are several efforts to develop standards and best practices to evaluate AI, there is a gap between having such guidance and the application of such guidance to both existing and new AI models being developed. As of now, there is no publicly available, nationwide mechanism that enables objective evaluation and ongoing assessment of the consequences of using health AI models in clinical care settings.Conclusion and Relevance: The need to create a public-private partnership to support a nationwide health AI assurance labs network is outlined here. In this network, community best practices could be applied for testing health AI models to produce reports on their performance that can be widely shared for managing the lifecycle of AI models over time and across populations and sites where these models are deployed.

View details for DOI 10.1001/jama.2023.26930

View details for PubMedID 38117493
Organizational Factors in Clinical Data Sharing for Artificial Intelligence in Health Care. JAMA network open Youssef, A., Ng, M. Y., Long, J., Hernandez-Boussard, T., Shah, N., Miner, A., Larson, D., Langlotz, C. P. 2023; 6 (12): e2348422

Abstract

Limited sharing of data sets that accurately represent disease and patient diversity limits the generalizability of artificial intelligence (AI) algorithms in health care.To explore the factors associated with organizational motivation to share health data for AI development.This qualitative study investigated organizational readiness for sharing health data across the academic, governmental, nonprofit, and private sectors. Using a multiple case studies approach, 27 semistructured interviews were conducted with leaders in data-sharing roles from August 29, 2022, to January 9, 2023. The interviews were conducted in the English language using a video conferencing platform. Using a purposive and nonprobabilistic sampling strategy, 78 individuals across 52 unique organizations were identified. Of these, 35 participants were enrolled. Participant recruitment concluded after 27 interviews, as theoretical saturation was reached and no additional themes emerged.Concepts defining organizational readiness for data sharing and the association between data-sharing factors and organizational behavior were mapped through iterative qualitative analysis to establish a framework defining organizational readiness for sharing clinical data for AI development.Interviews included 27 leaders from 18 organizations (academia: 10, government: 7, nonprofit: 8, and private: 2). Organizational readiness for data sharing centered around 2 main constructs: motivation and capabilities. Motivation related to the alignment of an organization's values with data-sharing priorities and was associated with its engagement in data-sharing efforts. However, organizational motivation could be modulated by extrinsic incentives for financial or reputational gains. Organizational capabilities comprised infrastructure, people, expertise, and access to data. Cross-sector collaboration was a key strategy to mitigate barriers to access health data.This qualitative study identified sector-specific factors that may affect the data-sharing behaviors of health organizations. External incentives may bolster cross-sector collaborations by helping overcome barriers to accessing health data for AI development. The findings suggest that tailored incentives may boost organizational motivation and facilitate sustainable flow of health data for AI development.

View details for DOI 10.1001/jamanetworkopen.2023.48422

View details for PubMedID 38113040
Beta-2 adrenergic receptor agonism alters astrocyte phagocytic activity and has potential applications to psychiatric disease. Discover mental health Bowen, E. R., DiGiacomo, P., Fraser, H. P., Guttenplan, K., Smith, B. A., Heberling, M. L., Vidano, L., Shah, N., Shamloo, M., Wilson, J. L., Grimes, K. V. 2023; 3 (1): 27

Abstract

Schizophrenia is a debilitating condition necessitating more efficacious therapies. Previous studies suggested that schizophrenia development is associated with aberrant synaptic pruning by glial cells. We pursued an interdisciplinary approach to understand whether therapeutic reduction in glial cell-specifically astrocytic-phagocytosis might benefit neuropsychiatric patients. We discovered that beta-2 adrenergic receptor (ADRB2) agonists reduced phagocytosis using a high-throughput, phenotypic screen of over 3200 compounds in primary human fetal astrocytes. We used protein interaction pathways analysis to associate ADRB2, to schizophrenia and endocytosis. We demonstrated that patients with a pediatric exposure to salmeterol, an ADRB2 agonist, had reduced in-patient psychiatry visits using a novel observational study in the electronic health record. We used a mouse model of inflammatory neurodegenerative disease and measured changes in proteins associated with endocytosis and vesicle-mediated transport after ADRB2 agonism. These results provide substantial rationale for clinical consideration of ADRB2 agonists as possible therapies for patients with schizophrenia.

View details for DOI 10.1007/s44192-023-00050-5

View details for PubMedID 38036718

View details for PubMedCentralID 3129332
President Biden's Executive Order on Artificial Intelligence-Implications for Health Care Organizations. JAMA Mello, M. M., Shah, N. H., Char, D. S. 2023

View details for DOI 10.1001/jama.2023.25051

View details for PubMedID 38032634
Lessons Learned from a Multi-Site, Team-Based Serious Illness Care Program Implementation at an Academic Medical Center. Journal of palliative medicine Seevaratnam, B., Wang, S., Fong, R., Hui, F., Callahan, A., Chobot, S., Gensheimer, M. F., Li, R. C., Nguyen, D., Ramchandran, K., Shah, N. H., Shieh, L., Zeng, J. G., Teuteberg, W. 2023

Abstract

Background: Patients with serious illness benefit from conversations to share prognosis and explore goals and values. To address this, we implemented Ariadne Labs' Serious Illness Care Program (SICP) at Stanford Health Care. Objective: Improve quantity, timing, and quality of serious illness conversations. Methods: Initial implementation followed Ariadne Labs' SICP framework. We later incorporated a team-based approach that included nonphysician care team members. Outcomes included number of patients with documented conversations according to clinician role and practice location. Machine learning algorithms were used in some settings to identify eligible patients. Results: Ambulatory oncology and hospital medicine were our largest implementation sites, engaging 4707 and 642 unique patients in conversations, respectively. Clinicians across eight disciplines engaged in these conversations. Identified barriers that included leadership engagement, complex workflows, and patient identification. Conclusion: Several factors contributed to successful SICP implementation across clinical sites: innovative clinical workflows, machine learning based predictive algorithms, and nonphysician care team member engagement.

View details for DOI 10.1089/jpm.2023.0254

View details for PubMedID 37935036
Multinational patterns of second line antihyperglycaemic drug initiation across cardiovascular risk groups: federated pharmacoepidemiological evaluation in LEGEND-T2DM. BMJ medicine Khera, R., Dhingra, L. S., Aminorroaya, A., Li, K., Zhou, J. J., Arshad, F., Blacketer, C., Bowring, M. G., Bu, F., Cook, M., Dorr, D. A., Duarte-Salles, T., DuVall, S. L., Falconer, T., French, T. E., Hanchrow, E. E., Horban, S., Lau, W. C., Li, J., Liu, Y., Lu, Y., Man, K. K., Matheny, M. E., Mathioudakis, N., McLemore, M. F., Minty, E., Morales, D. R., Nagy, P., Nishimura, A., Ostropolets, A., Pistillo, A., Posada, J. D., Pratt, N., Reyes, C., Ross, J. S., Seager, S., Shah, N., Simon, K., Wan, E. Y., Yang, J., Yin, C., You, S. C., Schuemie, M. J., Ryan, P. B., Hripcsak, G., Krumholz, H., Suchard, M. A. 2023; 2 (1): e000651

Abstract

To assess the uptake of second line antihyperglycaemic drugs among patients with type 2 diabetes mellitus who are receiving metformin.Federated pharmacoepidemiological evaluation in LEGEND-T2DM.10 US and seven non-US electronic health record and administrative claims databases in the Observational Health Data Sciences and Informatics network in eight countries from 2011 to the end of 2021.4.8 million patients (≥18 years) across US and non-US based databases with type 2 diabetes mellitus who had received metformin monotherapy and had initiated second line treatments.The exposure used to evaluate each database was calendar year trends, with the years in the study that were specific to each cohort.The outcome was the incidence of second line antihyperglycaemic drug use (ie, glucagon-like peptide-1 receptor agonists, sodium-glucose cotransporter-2 inhibitors, dipeptidyl peptidase-4 inhibitors, and sulfonylureas) among individuals who were already receiving treatment with metformin. The relative drug class level uptake across cardiovascular risk groups was also evaluated.4.6 million patients were identified in US databases, 61 382 from Spain, 32 442 from Germany, 25 173 from the UK, 13 270 from France, 5580 from Scotland, 4614 from Hong Kong, and 2322 from Australia. During 2011-21, the combined proportional initiation of the cardioprotective antihyperglycaemic drugs (glucagon-like peptide-1 receptor agonists and sodium-glucose cotransporter-2 inhibitors) increased across all data sources, with the combined initiation of these drugs as second line drugs in 2021 ranging from 35.2% to 68.2% in the US databases, 15.4% in France, 34.7% in Spain, 50.1% in Germany, and 54.8% in Scotland. From 2016 to 2021, in some US and non-US databases, uptake of glucagon-like peptide-1 receptor agonists and sodium-glucose cotransporter-2 inhibitors increased more significantly among populations with no cardiovascular disease compared with patients with established cardiovascular disease. No data source provided evidence of a greater increase in the uptake of these two drug classes in populations with cardiovascular disease compared with no cardiovascular disease.Despite the increase in overall uptake of cardioprotective antihyperglycaemic drugs as second line treatments for type 2 diabetes mellitus, their uptake was lower in patients with cardiovascular disease than in people with no cardiovascular disease over the past decade. A strategy is needed to ensure that medication use is concordant with guideline recommendations to improve outcomes of patients with type 2 diabetes mellitus.

View details for DOI 10.1136/bmjmed-2023-000651

View details for PubMedID 37829182

View details for PubMedCentralID PMC10565313
Using public clinical trial reports to probe non-experimental causal inference methods. BMC medical research methodology Steinberg, E., Ignatiadis, N., Yadlowsky, S., Xu, Y., Shah, N. 2023; 23 (1): 204

Abstract

BACKGROUND: Non-experimental studies (also known as observational studies) are valuable for estimating the effects of various medical interventions, but are notoriously difficult to evaluate because the methods used in non-experimental studies require untestable assumptions. This lack of intrinsic verifiability makes it difficult both to compare different non-experimental study methods and to trust the results of any particular non-experimental study.METHODS: We introduce TrialProbe, a data resource and statistical framework for the evaluation of non-experimental methods. We first collect a dataset of pseudo "ground truths" about the relative effects of drugs by using empirical Bayesian techniques to analyze adverse events recorded in public clinical trial reports. We then develop a framework for evaluating non-experimental methods against that ground truth by measuring concordance between the non-experimental effect estimates and the estimates derived from clinical trials. As a demonstration of our approach, we also perform an example methods evaluation between propensity score matching, inverse propensity score weighting, and an unadjusted approach on a large national insurance claims dataset.RESULTS: From the 33,701 clinical trial records in our version of the ClinicalTrials.gov dataset, we are able to extract 12,967 unique drug/drug adverse event comparisons to form a ground truth set. During our corresponding methods evaluation, we are able to use that reference set to demonstrate that both propensity score matching and inverse propensity score weighting can produce estimates that have high concordance with clinical trial results and substantially outperform an unadjusted baseline.CONCLUSIONS: We find that TrialProbe is an effective approach for probing non-experimental study methods, being able to generate large ground truth sets that are able to distinguish how well non-experimental methods perform in real world observational data.

View details for DOI 10.1186/s12874-023-02025-0

View details for PubMedID 37689623
Ranitidine Use and Incident Cancer in a Multinational Cohort. JAMA network open You, S. C., Seo, S. I., Falconer, T., Yanover, C., Duarte-Salles, T., Seager, S., Posada, J. D., Shah, N. H., Nguyen, P. A., Kim, Y., Hsu, J. C., Van Zandt, M., Hsu, M. H., Lee, H. L., Ko, H., Shin, W. G., Pratt, N., Park, R. W., Reich, C. G., Suchard, M. A., Hripcsak, G., Park, C. H., Prieto-Alhambra, D. 2023; 6 (9): e2333495

Abstract

Ranitidine, the most widely used histamine-2 receptor antagonist (H2RA), was withdrawn because of N-nitrosodimethylamine impurity in 2020. Given the worldwide exposure to this drug, the potential risk of cancer development associated with the intake of known carcinogens is an important epidemiological concern.To examine the comparative risk of cancer associated with the use of ranitidine vs other H2RAs.This new-user active comparator international network cohort study was conducted using 3 health claims and 9 electronic health record databases from the US, the United Kingdom, Germany, Spain, France, South Korea, and Taiwan. Large-scale propensity score (PS) matching was used to minimize confounding of the observed covariates with negative control outcomes. Empirical calibration was performed to account for unobserved confounding. All databases were mapped to a common data model. Database-specific estimates were combined using random-effects meta-analysis. Participants included individuals aged at least 20 years with no history of cancer who used H2RAs for more than 30 days from January 1986 to December 2020, with a 1-year washout period. Data were analyzed from April to September 2021.The main exposure was use of ranitidine vs other H2RAs (famotidine, lafutidine, nizatidine, and roxatidine).The primary outcome was incidence of any cancer, except nonmelanoma skin cancer. Secondary outcomes included all cancer except thyroid cancer, 16 cancer subtypes, and all-cause mortality.Among 1 183 999 individuals in 11 databases, 909 168 individuals (mean age, 56.1 years; 507 316 [55.8%] women) were identified as new users of ranitidine, and 274 831 individuals (mean age, 58.0 years; 145 935 [53.1%] women) were identified as new users of other H2RAs. Crude incidence rates of cancer were 14.30 events per 1000 person-years (PYs) in ranitidine users and 15.03 events per 1000 PYs among other H2RA users. After PS matching, cancer risk was similar in ranitidine compared with other H2RA users (incidence, 15.92 events per 1000 PYs vs 15.65 events per 1000 PYs; calibrated meta-analytic hazard ratio, 1.04; 95% CI, 0.97-1.12). No significant associations were found between ranitidine use and any secondary outcomes after calibration.In this cohort study, ranitidine use was not associated with an increased risk of cancer compared with the use of other H2RAs. Further research is needed on the long-term association of ranitidine with cancer development.

View details for DOI 10.1001/jamanetworkopen.2023.33495

View details for PubMedID 37725377

View details for PubMedCentralID PMC10509724
Self-supervised machine learning using adult inpatient data produces effective models for pediatric clinical prediction tasks. Journal of the American Medical Informatics Association : JAMIA Lemmon, J., Guo, L. L., Steinberg, E., Morse, K. E., Fleming, S. L., Aftandilian, C., Pfohl, S. R., Posada, J. D., Shah, N., Fries, J., Sung, L. 2023

Abstract

Development of electronic health records (EHR)-based machine learning models for pediatric inpatients is challenged by limited training data. Self-supervised learning using adult data may be a promising approach to creating robust pediatric prediction models. The primary objective was to determine whether a self-supervised model trained in adult inpatients was noninferior to logistic regression models trained in pediatric inpatients, for pediatric inpatient clinical prediction tasks.This retrospective cohort study used EHR data and included patients with at least one admission to an inpatient unit. One admission per patient was randomly selected. Adult inpatients were 18 years or older while pediatric inpatients were more than 28 days and less than 18 years. Admissions were temporally split into training (January 1, 2008 to December 31, 2019), validation (January 1, 2020 to December 31, 2020), and test (January 1, 2021 to August 1, 2022) sets. Primary comparison was a self-supervised model trained in adult inpatients versus count-based logistic regression models trained in pediatric inpatients. Primary outcome was mean area-under-the-receiver-operating-characteristic-curve (AUROC) for 11 distinct clinical outcomes. Models were evaluated in pediatric inpatients.When evaluated in pediatric inpatients, mean AUROC of self-supervised model trained in adult inpatients (0.902) was noninferior to count-based logistic regression models trained in pediatric inpatients (0.868) (mean difference = 0.034, 95% CI=0.014-0.057; P < .001 for noninferiority and P = .006 for superiority).Self-supervised learning in adult inpatients was noninferior to logistic regression models trained in pediatric inpatients. This finding suggests transferability of self-supervised models trained in adult patients to pediatric patients, without requiring costly model retraining.

View details for DOI 10.1093/jamia/ocad175

View details for PubMedID 37639620
Characterizing subgroup performance of probabilistic phenotype algorithms within older adults: a case study for dementia, mild cognitive impairment, and Alzheimer's and Parkinson's diseases. JAMIA open Banda, J. M., Shah, N. H., Periyakoil, V. S. 2023; 6 (2): ooad043

Abstract

Objective: Biases within probabilistic electronic phenotyping algorithms are largely unexplored. In this work, we characterize differences in subgroup performance of phenotyping algorithms for Alzheimer's disease and related dementias (ADRD) in older adults.Materials and methods: We created an experimental framework to characterize the performance of probabilistic phenotyping algorithms under different racial distributions allowing us to identify which algorithms may have differential performance, by how much, and under what conditions. We relied on rule-based phenotype definitions as reference to evaluate probabilistic phenotype algorithms created using the Automated PHenotype Routine for Observational Definition, Identification, Training and Evaluation framework.Results: We demonstrate that some algorithms have performance variations anywhere from 3% to 30% for different populations, even when not using race as an input variable. We show that while performance differences in subgroups are not present for all phenotypes, they do affect some phenotypes and groups more disproportionately than others.Discussion: Our analysis establishes the need for a robust evaluation framework for subgroup differences. The underlying patient populations for the algorithms showing subgroup performance differences have great variance between model features when compared with the phenotypes with little to no differences.Conclusion: We have created a framework to identify systematic differences in the performance of probabilistic phenotyping algorithms specifically in the context of ADRD as a use case. Differences in subgroup performance of probabilistic phenotyping algorithms are not widespread nor do they occur consistently. This highlights the great need for careful ongoing monitoring to evaluate, measure, and try to mitigate such differences.

View details for DOI 10.1093/jamiaopen/ooad043

View details for PubMedID 37397506
Use of Electronic Health Record Data for Drug Safety Signal Identification: A Scoping Review. Drug safety Davis, S. E., Zabotka, L., Desai, R. J., Wang, S. V., Maro, J. C., Coughlin, K., Hernández-Muñoz, J. J., Stojanovic, D., Shah, N. H., Smith, J. C. 2023

Abstract

Pharmacovigilance programs protect patient health and safety by identifying adverse event signals through postmarketing surveillance of claims data and spontaneous reports. Electronic health records (EHRs) provide new opportunities to address limitations of traditional approaches and promote discovery-oriented pharmacovigilance.To evaluate the current state of EHR-based medication safety signal identification, we conducted a scoping literature review of studies aimed at identifying safety signals from routinely collected patient-level EHR data. We extracted information on study design, EHR data elements utilized, analytic methods employed, drugs and outcomes evaluated, and key statistical and data analysis choices.We identified 81 eligible studies. Disproportionality methods were the predominant analytic approach, followed by data mining and regression. Variability in study design makes direct comparisons difficult. Studies varied widely in terms of data, confounding adjustment, and statistical considerations.Despite broad interest in utilizing EHRs for safety signal identification, current efforts fail to leverage the full breadth and depth of available data or to rigorously control for confounding. The development of best practices and application of common data models would promote the expansion of EHR-based pharmacovigilance.

View details for DOI 10.1007/s40264-023-01325-0

View details for PubMedID 37340238

View details for PubMedCentralID 8688411
Principled estimation and evaluation of treatment effect heterogeneity: A case study application to dabigatran for patients with atrial fibrillation. Journal of biomedical informatics Xu, Y., Bechler, K., Callahan, A., Shah, N. 2023: 104420

Abstract

To apply the latest guidance for estimating and evaluating heterogeneous treatment effects (HTEs) in an end-to-end case study of the Long-term Anticoagulation Therapy (RE-LY) trial, and summarize the main takeaways from applying state-of-the-art metalearners and novel evaluation metrics in-depth to inform their applications to personalized care in biomedical research.Based on the characteristics of the RE-LY data, we selected four metalearners (S-learner with Lasso, X-learner with Lasso, R-learner with random survival forest and Lasso, and causal survival forest) to estimate the HTEs of dabigatran. For the outcomes of (1) stroke or systemic embolism and (2) major bleeding, we compared dabigatran 150 mg, dabigatran 110 mg, and warfarin. We assessed the overestimation of treatment heterogeneity by the metalearners via a global null analysis and their discrimination and calibration ability using two novel metrics: rank-weighted average treatment effects (RATE) and estimated calibration error for treatment heterogeneity. Finally, we visualized the relationships between estimated treatment effects and baseline covariates using partial dependence plots.The RATE metric suggested that either the applied metalearners had poor performance of estimating HTEs or there was no treatment heterogeneity for either the stroke/SE or major bleeding outcome of any treatment comparison. Partial dependence plots revealed that several covariates had consistent relationships with the treatment effects estimated by multiple metalearners. The applied metalearners showed differential performance across outcomes and treatment comparisons, and the X- and R-learners yielded smaller calibration errors than the others.HTE estimation is difficult, and a principled estimation and evaluation process is necessary to provide reliable evidence and prevent false discoveries. We have demonstrated how to choose appropriate metalearners based on specific data properties, applied them using the off-the-shelf implementation tool survlearners, and evaluated their performance using recently defined formal metrics. We suggest that clinical implications should be drawn based on the common trends across the applied metalearners.

View details for DOI 10.1016/j.jbi.2023.104420

View details for PubMedID 37328098
Contextualising adverse events of special interest to characterise the baseline incidence rates in 24 million patients with COVID-19 across 26 databases: a multinational retrospective cohort study. EClinicalMedicine Voss, E. A., Shoaibi, A., Yin Hui Lai, L., Blacketer, C., Alshammari, T., Makadia, R., Haynes, K., Sena, A. G., Rao, G., van Sandijk, S., Fraboulet, C., Boyer, L., Le Carrour, T., Horban, S., Morales, D. R., Martinez Roldan, J., Ramirez-Anguita, J. M., Mayer, M. A., de Wilde, M., John, L. H., Duarte-Salles, T., Roel, E., Pistillo, A., Kolde, R., Maljkovic, F., Denaxas, S., Papez, V., Kahn, M. G., Natarajan, K., Reich, C., Secora, A., Minty, E. P., Shah, N. H., Posada, J. D., Garcia Morales, M. T., Bosca, D., Cadenas Juanino, H., Diaz Holgado, A., Pedrera Jimenez, M., Serrano Balazote, P., Garcia Barrio, N., Sen, S., Uresin, A. Y., Erdogan, B., Belmans, L., Byttebier, G., Malbrain, M. L., Dedman, D. J., Cuccu, Z., Vashisht, R., Butte, A. J., Patel, A., Dahm, L., Han, C., Bu, F., Arshad, F., Ostropolets, A., Nyberg, F., Hripcsak, G., Suchard, M. A., Prieto-Alhambra, D., Rijnbeek, P. R., Schuemie, M. J., Ryan, P. B. 2023; 58: 101932

Abstract

Background: Adverse events of special interest (AESIs) were pre-specified to be monitored for the COVID-19 vaccines. Some AESIs are not only associated with the vaccines, but with COVID-19. Our aim was to characterise the incidence rates of AESIs following SARS-CoV-2 infection in patients and compare these to historical rates in the general population.Methods: A multi-national cohort study with data from primary care, electronic health records, and insurance claims mapped to a common data model. This study's evidence was collected between Jan 1, 2017 and the conclusion of each database (which ranged from Jul 2020 to May 2022). The 16 pre-specified prevalent AESIs were: acute myocardial infarction, anaphylaxis, appendicitis, Bell's palsy, deep vein thrombosis, disseminated intravascular coagulation, encephalomyelitis, Guillain- Barre syndrome, haemorrhagic stroke, non-haemorrhagic stroke, immune thrombocytopenia, myocarditis/pericarditis, narcolepsy, pulmonary embolism, transverse myelitis, and thrombosis with thrombocytopenia. Age-sex standardised incidence rate ratios (SIR) were estimated to compare post-COVID-19 to pre-pandemic rates in each of the databases.Findings: Substantial heterogeneity by age was seen for AESI rates, with some clearly increasing with age but others following the opposite trend. Similarly, differences were also observed across databases for same health outcome and age-sex strata. All studied AESIs appeared consistently more common in the post-COVID-19 compared to the historical cohorts, with related meta-analytic SIRs ranging from 1.32 (1.05 to 1.66) for narcolepsy to 11.70 (10.10 to 13.70) for pulmonary embolism.Interpretation: Our findings suggest all AESIs are more common after COVID-19 than in the general population. Thromboembolic events were particularly common, and over 10-fold more so. More research is needed to contextualise post-COVID-19 complications in the longer term.Funding: None.

View details for DOI 10.1016/j.eclinm.2023.101932

View details for PubMedID 37034358
Evaluation of Feature Selection Methods for Preserving Machine Learning Performance in the Presence of Temporal Dataset Shift in Clinical Medicine. Methods of information in medicine Lemmon, J., Guo, L. L., Posada, J., Pfohl, S. R., Fries, J., Fleming, S. L., Aftandilian, C., Shah, N., Sung, L. 2023

Abstract

BACKGROUND: Temporal dataset shift can cause degradation in model performance as discrepancies between training and deployment data grow over time. The primary objective was to determine whether parsimonious models produced by specific feature selection methods are more robust to temporal dataset shift as measured by out-of-distribution (OOD) performance, while maintaining in-distribution (ID) performance.METHODS: Our dataset consisted of intensive care unit patients from MIMIC-IV categorized by year groups (2008-2010, 2011-2013, 2014-2016, and 2017-2019). We trained baseline models using L2-regularized logistic regression on 2008-2010 to predict in-hospital mortality, long length of stay (LOS), sepsis, and invasive ventilation in all year groups. We evaluated three feature selection methods: L1-regularized logistic regression (L1), Remove and Retrain (ROAR), and causal feature selection. We assessed whether a feature selection method could maintain ID performance (2008-2010) and improve OOD performance (2017-2019). We also assessed whether parsimonious models retrained on OOD data performed as well as oracle models trained on all features in the OOD year group.RESULTS: The baseline model showed significantly worse OOD performance with the long LOS and sepsis tasks when compared with the ID performance. L1 and ROAR retained 3.7 to 12.6% of all features, whereas causal feature selection generally retained fewer features. Models produced by L1 and ROAR exhibited similar ID and OOD performance as the baseline models. The retraining of these models on 2017-2019 data using features selected from training on 2008-2010 data generally reached parity with oracle models trained directly on 2017-2019 data using all available features. Causal feature selection led to heterogeneous results with the superset maintaining ID performance while improving OOD calibration only on the long LOS task.CONCLUSIONS: While model retraining can mitigate the impact of temporal dataset shift on parsimonious models produced by L1 and ROAR, new methods are required to proactively improve temporal robustness.

View details for DOI 10.1055/s-0043-1762904

View details for PubMedID 36812932
Standardizing Multi-site Clinical Note Titles to LOINC Document Ontology: A Transformer-based Approach. AMIA ... Annual Symposium proceedings. AMIA Symposium Zuo, X., Zhou, Y., Duke, J., Hripcsak, G., Shah, N., Banda, J. M., Reeves, R., Miller, T., Waitman, L. R., Natarajan, K., Xu, H. 2023; 2023: 834-843

Abstract

The types of clinical notes in electronic health records (EHRs) are diverse and it would be great to standardize them to ensure unified data retrieval, exchange, and integration. The LOINC Document Ontology (DO) is a subset of LOINC that is created specifically for naming and describing clinical documents. Despite the efforts of promoting and improving this ontology, how to efficiently deploy it in real-world clinical settings has yet to be explored. In this study we evaluated the utility of LOINC DO by mapping clinical note titles collected from five institutions to the LOINC DO and classifying the mapping into three classes based on semantic similarity between note titles and LOINC DO codes. Additionally, we developed a standardization pipeline that automatically maps clinical note titles from multiple sites to suitable LOINC DO codes, without accessing the content of clinical notes. The pipeline can be initialized with different large language models, and we compared the performances between them. The results showed that our automated pipeline achieved an accuracy of 0.90. By comparing the manual and automated mapping results, we analyzed the coverage of LOINC DO in describing multi-site clinical note titles and summarized the potential scope for extension.

View details for DOI 10.13140/RG.2.2.26682.24006

View details for PubMedID 38222429

View details for PubMedCentralID PMC10785935
Efficient Diagnosis Assignment Using Unstructured Clinical Notes Blankemeier, L., Fries, J., Tinn, R., Preston, S., Shah, N., Chaudhari, A. edited by Boyd-Graber, J., Okazaki, N., Rogers, A. ASSOC COMPUTATIONAL LINGUISTICS-ACL. 2023: 485-494

View details for Web of Science ID 001181088800042
EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models Wornow, M., Thapa, R., Steinberg, E., Fries, J. A., Shah, N. H. edited by Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023

View details for Web of Science ID 001220818804011
INSPECT: A Multimodal Dataset for Pulmonary Embolism Diagnosis and Prognosis Huang, S., Huo, Z., Steinberg, E., Chiang, C., Lungren, M. P., Langlotz, C. P., Yeung, S., Shah, N. H., Fries, J. A. edited by Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023

View details for Web of Science ID 001224281507036
A computational approach to measure the linguistic characteristics of psychotherapy timing, responsiveness, and consistency. Npj mental health research Miner, A. S., Fleming, S. L., Haque, A., Fries, J. A., Althoff, T., Wilfley, D. E., Agras, W. S., Milstein, A., Hancock, J., Asch, S. M., Stirman, S. W., Arnow, B. A., Shah, N. H. 2022; 1 (1): 19

Abstract

Although individual psychotherapy is generally effective for a range of mental health conditions, little is known about the moment-to-moment language use of effective therapists. Increased access to computational power, coupled with a rise in computer-mediated communication (telehealth), makes feasible the large-scale analyses of language use during psychotherapy. Transparent methodological approaches are lacking, however. Here we present novel methods to increase the efficiency of efforts to examine language use in psychotherapy. We evaluate three important aspects of therapist language use - timing, responsiveness, and consistency - across five clinically relevant language domains: pronouns, time orientation, emotional polarity, therapist tactics, and paralinguistic style. We find therapist language is dynamic within sessions, responds to patient language, and relates to patient symptom diagnosis but not symptom severity. Our results demonstrate that analyzing therapist language at scale is feasible and may help answer longstanding questions about specific behaviors of effective therapists.

View details for DOI 10.1038/s44184-022-00020-9

View details for PubMedID 38609510

View details for PubMedCentralID 3665892
Developing medical imaging AI for emerging infectious diseases. Nature communications Huang, S., Chaudhari, A. S., Langlotz, C. P., Shah, N., Yeung, S., Lungren, M. P. 2022; 13 (1): 7060

View details for DOI 10.1038/s41467-022-34234-4

View details for PubMedID 36400764
Use of Machine Learning and Lay Care Coaches to Increase Advance Care Planning Conversations for Patients With Metastatic Cancer. JCO oncology practice Gensheimer, M. F., Gupta, D., Patel, M. I., Fardeen, T., Hildebrand, R., Teuteberg, W., Seevaratnam, B., Asuncion, M. K., Alves, N., Rogers, B., Hansen, J., DeNofrio, J., Shah, N. H., Parikh, D., Neal, J., Fan, A. C., Moore, K., Ruiz, S., Li, C., Khaki, A. R., Pagtama, J., Chien, J., Brown, T., Tisch, A. H., Das, M., Srinivas, S., Roy, M., Wakelee, H., Myall, N. J., Huang, J., Shah, S., Lee, H., Ramchandran, K. 2022: OP2200128

Abstract

Patients with metastatic cancer benefit from advance care planning (ACP) conversations. We aimed to improve ACP using a computer model to select high-risk patients, with shorter predicted survival, for conversations with providers and lay care coaches. Outcomes included ACP documentation frequency and end-of-life quality measures.In this study of a quality improvement initiative, providers in four medical oncology clinics received Serious Illness Care Program training. Two clinics (thoracic/genitourinary) participated in an intervention, and two (cutaneous/sarcoma) served as controls. ACP conversations were documented in a centralized form in the electronic medical record. In the intervention, providers and care coaches received weekly e-mails highlighting upcoming clinic patients with < 2 year computer-predicted survival and no prior prognosis documentation. Care coaches contacted these patients for an ACP conversation (excluding prognosis). Providers were asked to discuss and document prognosis.In the four clinics, 4,968 clinic visits by 1,251 patients met inclusion criteria (metastatic cancer with no prognosis previously documented). In their first visit, 28% of patients were high-risk (< 2 year predicted survival). Preintervention, 3% of both intervention and control clinic patients had ACP documentation during a visit. By intervention end (February 2021), 35% of intervention clinic patients had ACP documentation compared with 3% of control clinic patients. Providers' prognosis documentation rate also increased in intervention clinics after the intervention (2%-27% in intervention clinics, P < .0001; 0%-1% in control clinics). End-of-life care intensity was similar in intervention versus control clinics, but patients with ≥ 1 provider ACP edit met fewer high-intensity care measures (P = .04).Combining a computer prognosis model with care coaches increased ACP documentation.

View details for DOI 10.1200/OP.22.00128

View details for PubMedID 36395436
Perspective Toward Machine Learning Implementation in Pediatric Medicine: Mixed Methods Study. JMIR medical informatics Alexander, N., Aftandilian, C., Guo, L. L., Plenert, E., Posada, J., Fries, J., Fleming, S., Johnson, A., Shah, N., Sung, L. 2022; 10 (11): e40039

Abstract

BACKGROUND: Given the costs of machine learning implementation, a systematic approach to prioritizing which models to implement into clinical practice may be valuable.OBJECTIVE: The primary objective was to determine the health care attributes respondents at 2 pediatric institutions rate as important when prioritizing machine learning model implementation. The secondary objective was to describe their perspectives on implementation using a qualitative approach.METHODS: In this mixed methods study, we distributed a survey to health system leaders, physicians, and data scientists at 2 pediatric institutions. We asked respondents to rank the following 5 attributes in terms of implementation usefulness: the clinical problem was common, the clinical problem caused substantial morbidity and mortality, risk stratification led to different actions that could reasonably improve patient outcomes, reducing physician workload, and saving money. Important attributes were those ranked as first or second most important. Individual qualitative interviews were conducted with a subsample of respondents.RESULTS: Among 613 eligible respondents, 275 (44.9%) responded. Qualitative interviews were conducted with 17 respondents. The most common important attributes were risk stratification leading to different actions (205/275, 74.5%) and clinical problem causing substantial morbidity or mortality (177/275, 64.4%). The attributes considered least important were reducing physician workload and saving money. Qualitative interviews consistently prioritized implementations that improved patient outcomes.CONCLUSIONS: Respondents prioritized machine learning model implementation where risk stratification would lead to different actions and clinical problems that caused substantial morbidity and mortality. Implementations that improved patient outcomes were prioritized. These results can help provide a framework for machine learning model implementation.

View details for DOI 10.2196/40039

View details for PubMedID 36394938
A network paradigm predicts drug synergistic effects using downstream protein-protein interactions. CPT: pharmacometrics & systems pharmacology Wilson, J. L., Steinberg, E., Racz, R., Altman, R. B., Shah, N., Grimes, K. 2022

Abstract

In some cases, drug combinations affect adverse outcome phenotypes by binding the same protein; however, drug-binding proteins are associated through protein-protein interaction (PPI) networks within the cell, suggesting that drug phenotypes may result from long-range network effects. We first used PPI network analysis to classify drugs based on proteins downstream of their targets and next predicted drug combination effects where drugs shared network proteins but had distinct binding proteins (e.g., targets, enzymes, or transporters). By classifying drugs using their downstream proteins, we had an 80.7% sensitivity for predicting rare drug combination effects documented in gold-standard datasets. We further measured the effect of predicted drug combinations on adverse outcome phenotypes using novel observational studies in the electronic health record. We tested predictions for 60 network-drug classes on seven adverse outcomes and measured changes in clinical outcomes for predicted combinations. These results demonstrate a novel paradigm for anticipating drug synergistic effects using proteins downstream of drug targets.

View details for DOI 10.1002/psp4.12861

View details for PubMedID 36204824
User-centred design for machine learning in health care: a case study from care management. BMJ health & care informatics Seneviratne, M. G., Li, R. C., Schreier, M., Lopez-Martinez, D., Patel, B. S., Yakubovich, A., Kemp, J. B., Loreaux, E., Gamble, P., El-Khoury, K., Vardoulakis, L., Wong, D., Desai, J., Chen, J. H., Morse, K. E., Downing, N. L., Finger, L. T., Chen, M., Shah, N. 2022; 29 (1)

Abstract

OBJECTIVES: Few machine learning (ML) models are successfully deployed in clinical practice. One of the common pitfalls across the field is inappropriate problem formulation: designing ML to fit the data rather than to address a real-world clinical pain point.METHODS: We introduce a practical toolkit for user-centred design consisting of four questions covering: (1) solvable pain points, (2) the unique value of ML (eg, automation and augmentation), (3) the actionability pathway and (4) the model's reward function. This toolkit was implemented in a series of six participatory design workshops with care managers in an academic medical centre.RESULTS: Pain points amenable to ML solutions included outpatient risk stratification and risk factor identification. The endpoint definitions, triggering frequency and evaluation metrics of the proposed risk scoring model were directly influenced by care manager workflows and real-world constraints.CONCLUSIONS: Integrating user-centred design early in the ML life cycle is key for configuring models in a clinically actionable way. This toolkit can guide problem selection and influence choices about the technical setup of the ML problem.

View details for DOI 10.1136/bmjhci-2022-100656

View details for PubMedID 36220304
Assessment of Adherence to Reporting Guidelines by Commonly Used Clinical Prediction Models From a Single Vendor: A Systematic Review. JAMA network open Lu, J. H., Callahan, A., Patel, B. S., Morse, K. E., Dash, D., Pfeffer, M. A., Shah, N. H. 2022; 5 (8): e2227779

Abstract

Importance: Various model reporting guidelines have been proposed to ensure clinical prediction models are reliable and fair. However, no consensus exists about which model details are essential to report, and commonalities and differences among reporting guidelines have not been characterized. Furthermore, how well documentation of deployed models adheres to these guidelines has not been studied.Objectives: To assess information requested by model reporting guidelines and whether the documentation for commonly used machine learning models developed by a single vendor provides the information requested.Evidence Review: MEDLINE was queried using machine learning model card and reporting machine learning from November 4 to December 6, 2020. References were reviewed to find additional publications, and publications without specific reporting recommendations were excluded. Similar elements requested for reporting were merged into representative items. Four independent reviewers and 1 adjudicator assessed how often documentation for the most commonly used models developed by a single vendor reported the items.Findings: From 15 model reporting guidelines, 220 unique items were identified that represented the collective reporting requirements. Although 12 items were commonly requested (requested by 10 or more guidelines), 77 items were requested by just 1 guideline. Documentation for 12 commonly used models from a single vendor reported a median of 39% (IQR, 37%-43%; range, 31%-47%) of items from the collective reporting requirements. Many of the commonly requested items had 100% reporting rates, including items concerning outcome definition, area under the receiver operating characteristics curve, internal validation, and intended clinical use. Several items reported half the time or less related to reliability, such as external validation, uncertainty measures, and strategy for handling missing data. Other frequently unreported items related to fairness (summary statistics and subgroup analyses, including for race and ethnicity or sex).Conclusions and Relevance: These findings suggest that consistent reporting recommendations for clinical predictive models are needed for model developers to share necessary information for model deployment. The many published guidelines would, collectively, require reporting more than 200 items. Model documentation from 1 vendor reported the most commonly requested items from model reporting guidelines. However, areas for improvement were identified in reporting items related to model reliability and fairness. This analysis led to feedback to the vendor, which motivated updates to the documentation for future users.

View details for DOI 10.1001/jamanetworkopen.2022.27779

View details for PubMedID 35984654
Nursing Workflow Change in a COVID-19 Inpatient Unit Following the Deployment of Inpatient Telehealth: An Observational Study Using a Real-Time Locating System. Journal of medical Internet research Vilendrer, S., Lough, M. E., Garvert, D. W., Lambert, M. H., Lu, J. H., Patel, B., Shah, N. H., Williams, M. Y., Kling, S. M. 2022

Abstract

BACKGROUND: The COVID-19 pandemic prompted widespread implementation of telehealth, including in the inpatient setting with the goals to reduce potential pathogen exposure events and personal protective equipment (PPE) utilization. Nursing workflow adaptations in these novel environments is of particular interest given the association between nursing time at the bedside and patient safety. Understanding the frequency and duration of nurse-patient encounters following the introduction of a novel telehealth platform in the context of COVID-19 may therefore provide insight into downstream impacts on patient safety, pathogen exposure, and PPE utilization.OBJECTIVE: To evaluate changes in nursing workflow relative to pre-pandemic levels using real-time locating system (RTLS) following the deployment of inpatient telehealth on a COVID-19 unit.METHODS: In March 2020, telehealth was installed in patient rooms in a COVID-19 unit and on movable carts in 3 comparison units. Existing RTLS captured nurse movement during 1 pre- and 5 post-pandemic stages (January-December 2020). Change in direct nurse-patient encounters, time spent in patient rooms per encounter, and total time spent with patients per shift relative to baseline were calculated. Generalized linear models assessed difference-in-differences in outcomes between COVID-19 and comparison units. Telehealth adoption was captured and reported at the unit level.RESULTS: Change in frequency of encounters and time spent per encounter from baseline differed between the COVID-19 and comparison units at all stages of the pandemic (all P's<0.0001). Frequency of encounters decreased (difference-in-differences range: -6.6 to -14.1 encounters) and duration of encounters increased (difference-in-differences range: 1.8 to 6.2 minutes) from baseline to a greater extent in the COVID-19 units compared to the comparison units. At most stages of the pandemic, the change in total time nurses spent in patient rooms per patient per shift from baseline did not differ between the COVID-19 and comparison units (p's>0.17). The primary COVID-19 unit quickly adopted telehealth technology during the observation period, initiating 15,088 encounters that averaged 6.6 minutes (standard deviation = 13.6) each.CONCLUSIONS: RTLS movement data suggests total nursing time at the bedside remained unchanged following the deployment of inpatient telehealth in a COVID-19 unit. Compared to other units with shared mobile telehealth units, frequency of nurse-patient in-person encounters decreased and duration lengthened on a COVID-19 unit with in-room telehealth availability, indicating "batched" redistribution of work to maintain total time at bedside relative to pre-pandemic periods. The simultaneous adoption of telehealth suggests virtual care was a complement to, rather than a replacement for, in-person care. Study limitations, however, preclude our ability to draw a causal link between nursing workflow change and telehealth adoption, and further evaluation is needed to determine potential downstream implications on disease transmission, PPE utilization, and patient safety.CLINICALTRIAL:

View details for DOI 10.2196/36882

View details for PubMedID 35635840
Evaluating algorithmic fairness in the presence of clinical guidelines: the case of atherosclerotic cardiovascular disease risk estimation. BMJ health & care informatics Foryciarz, A., Pfohl, S. R., Patel, B., Shah, N. 2022; 29 (1)

Abstract

OBJECTIVES: The American College of Cardiology and the American Heart Association guidelines on primary prevention of atherosclerotic cardiovascular disease (ASCVD) recommend using 10-year ASCVD risk estimation models to initiate statin treatment. For guideline-concordant decision-making, risk estimates need to be calibrated. However, existing models are often miscalibrated for race, ethnicity and sex based subgroups. This study evaluates two algorithmic fairness approaches to adjust the risk estimators (group recalibration and equalised odds) for their compatibility with the assumptions underpinning the guidelines' decision rules.MethodsUsing an updated pooled cohorts data set, we derive unconstrained, group-recalibrated and equalised odds-constrained versions of the 10-year ASCVD risk estimators, and compare their calibration at guideline-concordant decision thresholds.RESULTS: We find that, compared with the unconstrained model, group-recalibration improves calibration at one of the relevant thresholds for each group, but exacerbates differences in false positive and false negative rates between groups. An equalised odds constraint, meant to equalise error rates across groups, does so by miscalibrating the model overall and at relevant decision thresholds.DISCUSSION: Hence, because of induced miscalibration, decisions guided by risk estimators learned with an equalised odds fairness constraint are not concordant with existing guidelines. Conversely, recalibrating the model separately for each group can increase guideline compatibility, while increasing intergroup differences in error rates. As such, comparisons of error rates across groups can be misleading when guidelines recommend treating at fixed decision thresholds.CONCLUSION: The illustrated tradeoffs between satisfying a fairness criterion and retaining guideline compatibility underscore the need to evaluate models in the context of downstream interventions.

View details for DOI 10.1136/bmjhci-2021-100460

View details for PubMedID 35396247
DLMM as a lossless one-shot algorithm for collaborative multi-site distributed linear mixed models. Nature communications Luo, C., Islam, M. N., Sheils, N. E., Buresh, J., Reps, J., Schuemie, M. J., Ryan, P. B., Edmondson, M., Duan, R., Tong, J., Marks-Anglin, A., Bian, J., Chen, Z., Duarte-Salles, T., Fernandez-Bertolin, S., Falconer, T., Kim, C., Park, R. W., Pfohl, S. R., Shah, N. H., Williams, A. E., Xu, H., Zhou, Y., Lautenbach, E., Doshi, J. A., Werner, R. M., Asch, D. A., Chen, Y. 2022; 13 (1): 1678

Abstract

Linear mixed models are commonly used in healthcare-based association analyses for analyzing multi-site data with heterogeneous site-specific random effects. Due to regulations for protecting patients' privacy, sensitive individual patient data (IPD) typically cannot be shared across sites. We propose an algorithm for fitting distributed linear mixed models (DLMMs) without sharing IPD across sites. This algorithm achieves results identical to those achieved using pooled IPD from multiple sites (i.e., the same effect size and standard error estimates), hence demonstrating the lossless property. The algorithm requires each site to contribute minimalaggregated data in only one round of communication. We demonstrate the lossless property of the proposed DLMM algorithm by investigating the associations between demographic and clinical characteristics and length of hospital stay in COVID-19 patients using administrative claims from the UnitedHealth Group Clinical Discovery Database. We extend this association study by incorporating 120,609 COVID-19 patients from 11 collaborative data sources worldwide.

View details for DOI 10.1038/s41467-022-29160-4

View details for PubMedID 35354802
A comparison of approaches to improve worst-case predictive model performance over patient subpopulations. Scientific reports Pfohl, S. R., Zhang, H., Xu, Y., Foryciarz, A., Ghassemi, M., Shah, N. H. 2022; 12 (1): 3254

Abstract

Predictive models for clinical outcomes that are accurate on average in a patient population may underperform drastically for some subpopulations, potentially introducing or reinforcing inequities in care access and quality. Model training approaches that aim to maximize worst-case model performance across subpopulations, such as distributionally robust optimization (DRO), attempt to address this problem without introducing additional harms. We conduct a large-scale empirical study of DRO and several variations of standard learning procedures to identify approaches for model development and selection that consistently improve disaggregated and worst-case performance over subpopulations compared to standard approaches for learning predictive models from electronic health records data. In the course of our evaluation, we introduce an extension to DRO approaches that allows for specification of the metric used to assess worst-case performance. We conduct the analysis for models that predict in-hospital mortality, prolonged length of stay, and 30-day readmission for inpatient admissions, and predict in-hospital mortality using intensive care data. We find that, with relatively few exceptions, no approach performs better, for each patient subpopulation examined, than standard learning procedures using the entire training dataset. These results imply that when it is of interest to improve model performance for patient subpopulations beyond what can be achieved with standard practices, it may be necessary to do so via data collection techniques that increase the effective sample size or reduce the level of noise in the prediction problem.

View details for DOI 10.1038/s41598-022-07167-7

View details for PubMedID 35228563
Evaluation of domain generalization and adaptation on improving model robustness to temporal dataset shift in clinical medicine. Scientific reports Guo, L. L., Pfohl, S. R., Fries, J., Johnson, A. E., Posada, J., Aftandilian, C., Shah, N., Sung, L. 2022; 12 (1): 2726

Abstract

Temporal dataset shift associated with changes in healthcare over time is a barrier to deploying machine learning-based clinical decision support systems. Algorithms that learn robust models by estimating invariant properties across time periods for domain generalization (DG) and unsupervised domain adaptation (UDA) might be suitable to proactively mitigate dataset shift. The objective wasto characterize the impact of temporal dataset shift on clinical prediction models and benchmark DG and UDA algorithms on improving model robustness. In this cohort study, intensive care unit patients from the MIMIC-IV database were categorized by year groups (2008-2010, 2011-2013, 2014-2016 and 2017-2019). Tasks were predicting mortality, long length of stay, sepsis and invasive ventilation. Feedforward neural networks were used as prediction models. The baseline experiment trained models using empirical risk minimization (ERM) on 2008-2010 (ERM[08-10]) and evaluated them on subsequent year groups. DG experiment trained models using algorithms that estimated invariant properties using 2008-2016 and evaluated them on 2017-2019. UDA experiment leveraged unlabelled samples from 2017 to 2019 for unsupervised distribution matching. DG and UDA models were compared to ERM[08-16] models trained using 2008-2016. Main performance measures were area-under-the-receiver-operating-characteristic curve (AUROC), area-under-the-precision-recall curve and absolute calibration error. Threshold-based metrics including false-positives and false-negatives were used to assess the clinical impact of temporal dataset shift and its mitigation strategies. In the baseline experiments, dataset shift was most evident for sepsis prediction (maximum AUROC drop, 0.090; 95% confidence interval (CI), 0.080-0.101). Considering a scenario of 100 consecutively admitted patients showed that ERM[08-10] applied to 2017-2019 was associated with one additional false-negative among 11 patients with sepsis, when compared to the model applied to 2008-2010. When compared with ERM[08-16], DG and UDA experiments failed to produce more robust models (range of AUROC difference, -0.003 to 0.050). In conclusion,DG and UDA failed to produce more robust models compared to ERM in the setting of temporal dataset shift. Alternate approaches are required to preserve model performance over time in clinical medicine.

View details for DOI 10.1038/s41598-022-06484-1

View details for PubMedID 35177653
Characteristics and outcomes of COVID-19 patients with and without asthma from the United States, South Korea, and Europe. The Journal of asthma : official journal of the Association for the Care of Asthma Morales, D., Ostropolets, A., Lai, L., Sena, A., Duvall, S., Suchard, M., Verhamme, K., Rjinbeek, P., Posada, J., Ahmed, W., Alshammary, T., Alghoul, H., Alser, O., Areia, C., Blacketer, C., Burn, E., Casajust, P., You, S., Dawoud, D., Golozar, A., Gong, M., Jonnagaddala, J., Lynch, K., Matheny, M., Minty, E., Nyberg, F., Uribe, A., Recalde, M., Reich, C., Scheumie, M., Shah, K., Shah, N., Schilling, L., Vizcaya, D., Zhang, L., Hripcsak, G., Ryan, P., Prieto-Alhambra, D., Durate-Salles, T., Kostka, K. 1800: 1-14

Abstract

Objective: Large international comparisons describing the clinical characteristics of patients with COVID-19 are limited. The aim of the study was to perform a large-scale descriptive characterization of COVID-19 patients with asthma.Methods: We included nine databases contributing data from January-June 2020 from the US, South Korea (KR), Spain, UK and the Netherlands. We defined two cohorts of COVID-19 patients ('diagnosed' and 'hospitalized') based on COVID-19 disease codes. We followed patients from COVID-19 index date to 30days or death. We performed descriptive analysis and reported the frequency of characteristics and outcomes in people with asthma defined by codes and prescriptions.Results: The diagnosed and hospitalized cohorts contained 666,933 and 159,552 COVID-19 patients respectively. Exacerbation in people with asthma was recorded in 1.6%-8.6% of patients at presentation. Asthma prevalence ranged from 6.2% (95%CI 5.7-6.8) to 18.5% (95%CI 18.2-18.8) in the diagnosed cohort and 5.2% (95%CI 4.0-6.8) to 20.5% (95%CI 18.6-22.6) in the hospitalized cohort. Asthma patients with COVID-19 had high prevalence of comorbidity including hypertension, heart disease, diabetes and obesity. Mortality ranged from 2.1% (95%CI 1.8-2.4) to 16.9% (95%CI 13.8-20.5) and similar or lower compared to COVID-19 patients without asthma. Acute respiratory distress syndrome occurred in 15%-30% of hospitalized COVID-19 asthma patients.Conclusion: The prevalence of asthma among COVID-19 patients varies internationally. Asthma patients with COVID-19 have high comorbidity. The prevalence of asthma exacerbation at presentation was low. Whilst mortality was similar among COVID-19 patients with and without asthma, this could be confounded by differences in clinical characteristics. Further research could help identify high-risk asthma patients.

View details for DOI 10.1080/02770903.2021.2025392

View details for PubMedID 35012410
Considerations in the reliability and fairness audits of predictive models for advance care planning Frontiers in Digital Health Lu, J., Sattler, A., Wang, S., Khaki, A. R., Callahan, A., Fleming, S., Fong, R., Ehlert, B., Li, R., Shieh, L., Ramchandran, K., Gensheimer, M., Chobot, S., Pfohl, S., Li, S., Shum, K., Parikh, N., Desai, P., Seevaratnam, B., Hanson, M., Smith, M., Xu, Y., Gokhale, A., Lin, S., Shah, N. 2022: 943768

View details for DOI 10.3389/fdgth.2022.943768
Predicting patients who are likely to develop Lupus Nephritis of those newly diagnosed with Systemic Lupus Erythematosus. AMIA ... Annual Symposium proceedings. AMIA Symposium Bechler, K. K., Stolyar, L., Steinberg, E., Posada, J., Minty, E., Shah, N. H. 2022; 2022: 221-230

Abstract

Patients diagnosed with systemic lupus erythematosus (SLE) suffer from a decreased quality of life, an increased risk of medical complications, and an increased risk of death. In particular, approximately 50% of SLE patients progress to develop lupus nephritis, which oftentimes leads to life-threatening end stage renal disease (ESRD) and requires dialysis or kidney transplant1. The challenge is that lupus nephritis is diagnosed via a kidney biopsy, which is typically performed only after noticeable decreased kidney function, leaving little room for proactive or preventative measures. The ability to predict which patients are most likely to develop lupus nephritis has the potential to shift lupus nephritis disease management from reactive to proactive. We present a clinically useful prediction model to predict which patients with newly diagnosed SLE will go on to develop lupus nephritis in the next five years.

View details for PubMedID 37128416
Characteristics and outcomes of COVID-19 patients with COPD from the United States, South Korea, and Europe. Wellcome open research Moreno-Martos, D., Verhamme, K., Ostropolets, A., Kostka, K., Duarte-Sales, T., Prieto-Alhambra, D., Alshammari, T. M., Alghoul, H., Ahmed, W., Blacketer, C., DuVall, S., Lai, L., Matheny, M., Nyberg, F., Posada, J., Rijnbeek, P., Spotnitz, M., Sena, A., Shah, N., Suchard, M., Chan You, S., Hripcsak, G., Ryan, P., Morales, D. 2022; 7: 22

Abstract

Background: Characterization studies of COVID-19 patients with chronic obstructive pulmonary disease (COPD) are limited in size and scope. The aim of the study is to provide a large-scale characterization of COVID-19 patients with COPD. Methods: We included thirteen databases contributing data from January-June 2020 from North America (US), Europe and Asia. We defined two cohorts of patients with COVID-19 namely a 'diagnosed' and 'hospitalized' cohort. We followed patients from COVID-19 index date to 30 days or death. We performed descriptive analysis and reported the frequency of characteristics and outcomes among COPD patients with COVID-19. Results: The study included 934,778 patients in the diagnosed COVID-19 cohort and 177,201 in the hospitalized COVID-19 cohort. Observed COPD prevalence in the diagnosed cohort ranged from 3.8% (95%CI 3.5-4.1%) in French data to 22.7% (95%CI 22.4-23.0) in US data, and from 1.9% (95%CI 1.6-2.2) in South Korean to 44.0% (95%CI 43.1-45.0) in US data, in the hospitalized cohorts. COPD patients in the hospitalized cohort had greater comorbidity than those in the diagnosed cohort, including hypertension, heart disease, diabetes and obesity. Mortality was higher in COPD patients in the hospitalized cohort and ranged from 7.6% (95%CI 6.9-8.4) to 32.2% (95%CI 28.0-36.7) across databases. ARDS, acute renal failure, cardiac arrhythmia and sepsis were the most common outcomes among hospitalized COPD patients. Conclusion: COPD patients with COVID-19 have high levels of COVID-19-associated comorbidities and poor COVID-19 outcomes. Further research is required to identify patients with COPD at high risk of worse outcomes.

View details for DOI 10.12688/wellcomeopenres.17403.1

View details for PubMedID 36845321
Building a Learning Health System: Creating an Analytical Workflow for Evidence Generation to Inform Institutional Clinical Care Guidelines. Applied clinical informatics Dash, D., Gokhale, A., Patel, B. S., Callahan, A., Posada, J., Krishnan, G., Collins, W., Li, R., Schulman, K., Ren, L., Shah, N. H. 2022; 13 (1): 315-321

Abstract

BACKGROUND: One key aspect of a learning health system (LHS) is utilizing data generated during care delivery to inform clinical care. However, institutional guidelines that utilize observational data are rare and require months to create, making current processes impractical for more urgent scenarios such as those posed by the COVID-19 pandemic. There exists a need to rapidly analyze institutional data to drive guideline creation where evidence from randomized control trials are unavailable.OBJECTIVES: This article provides a background on the current state of observational data generation in institutional guideline creation and details our institution's experience in creating a novel workflow to (1) demonstrate the value of such a workflow, (2) demonstrate a real-world example, and (3) discuss difficulties encountered and future directions.METHODS: Utilizing a multidisciplinary team of database specialists, clinicians, and informaticists, we created a workflow for identifying and translating a clinical need into a queryable format in our clinical data warehouse, creating data summaries and feeding this information back into clinical guideline creation.RESULTS: Clinical questions posed by the hospital medicine division were answered in a rapid time frame and informed creation of institutional guidelines for the care of patients with COVID-19. The cost of setting up a workflow, answering the questions, and producing data summaries required around 300hours of effort and $300,000 USD.CONCLUSION: A key component of an LHS is the ability to learn from data generated during care delivery. There are rare examples in the literature and we demonstrate one such example along with proposed thoughts of ideal multidisciplinary team formation and deployment.

View details for DOI 10.1055/s-0042-1743241

View details for PubMedID 35235994
Unraveling COVID-19: A Large-Scale Characterization of 4.5 Million COVID-19 Cases Using CHARYBDIS. Clinical epidemiology Kostka, K., Duarte-Salles, T., Prats-Uribe, A., Sena, A. G., Pistillo, A., Khalid, S., Lai, L. Y., Golozar, A., Alshammari, T. M., Dawoud, D. M., Nyberg, F., Wilcox, A. B., Andryc, A., Williams, A., Ostropolets, A., Areia, C., Jung, C. Y., Harle, C. A., Reich, C. G., Blacketer, C., Morales, D. R., Dorr, D. A., Burn, E., Roel, E., Tan, E. H., Minty, E., DeFalco, F., de Maeztu, G., Lipori, G., Alghoul, H., Zhu, H., Thomas, J. A., Bian, J., Park, J., Martinez Roldan, J., Posada, J. D., Banda, J. M., Horcajada, J. P., Kohler, J., Shah, K., Natarajan, K., Lynch, K. E., Liu, L., Schilling, L. M., Recalde, M., Spotnitz, M., Gong, M., Matheny, M. E., Valveny, N., Weiskopf, N. G., Shah, N., Alser, O., Casajust, P., Park, R. W., Schuff, R., Seager, S., DuVall, S. L., You, S. C., Song, S., Fernandez-Bertolin, S., Fortin, S., Magoc, T., Falconer, T., Subbian, V., Huser, V., Ahmed, W., Carter, W., Guan, Y., Galvan, Y., He, X., Rijnbeek, P. R., Hripcsak, G., Ryan, P. B., Suchard, M. A., Prieto-Alhambra, D. 2022; 14: 369-384

Abstract

Purpose: Routinely collected real world data (RWD) have great utility in aiding the novel coronavirus disease (COVID-19) pandemic response. Here we present the international Observational Health Data Sciences and Informatics (OHDSI) Characterizing Health Associated Risks and Your Baseline Disease In SARS-COV-2 (CHARYBDIS) framework for standardisation and analysis of COVID-19 RWD.Patients and Methods: We conducted a descriptive retrospective database study using a federated network of data partners in the United States, Europe (the Netherlands, Spain, the UK, Germany, France and Italy) and Asia (South Korea and China). The study protocol and analytical package were released on 11th June 2020 and are iteratively updated via GitHub. We identified three non-mutually exclusive cohorts of 4,537,153 individuals with a clinical COVID-19 diagnosis or positive test, 886,193 hospitalized with COVID-19, and 113,627 hospitalized with COVID-19 requiring intensive services.Results: We aggregated over 22,000 unique characteristics describing patients with COVID-19. All comorbidities, symptoms, medications, and outcomes are described by cohort in aggregate counts and are readily available online. Globally, we observed similarities in the USA and Europe: more women diagnosed than men but more men hospitalized than women, most diagnosed cases between 25 and 60 years of age versus most hospitalized cases between 60 and 80 years of age. South Korea differed with more women than men hospitalized. Common comorbidities included type 2 diabetes, hypertension, chronic kidney disease and heart disease. Common presenting symptoms were dyspnea, cough and fever. Symptom data availability was more common in hospitalized cohorts than diagnosed.Conclusion: We constructed a global, multi-centre view to describe trends in COVID-19 progression, management and evolution over time. By characterising baseline variability in patients and geography, our work provides critical context that may otherwise be misconstrued as data quality issues. This is important as we perform studies on adverse events of special interest in COVID-19 vaccine surveillance.

View details for DOI 10.2147/CLEP.S323292

View details for PubMedID 35345821
Monitoring Approaches for a Pediatric Chronic Kidney Disease Machine Learning Model. Applied clinical informatics Morse, K. E., Brown, C., Fleming, S., Todd, I., Powell, A., Russell, A., Scheinker, D., Sutherland, S. M., Lu, J., Watkins, B., Shah, N. H., Pageler, N. M., Palma, J. P. 2022; 13 (2): 431-438

Abstract

OBJECTIVE: The purpose of this study is to evaluate the ability of three metrics to monitor for a reduction in performance of a chronic kidney disease (CKD) model deployed at a pediatric hospital.METHODS: The CKD risk model estimates a patient's risk of developing CKD 3 to 12 months following an inpatient admission. The model was developed on a retrospective dataset of 4,879 admissions from 2014 to 2018, then run silently on 1,270 admissions from April to October, 2019. Three metrics were used to monitor its performance during the silent phase: (1) standardized mean differences (SMDs); (2) performance of a "membership model"; and (3) response distribution analysis. Observed patient outcomes for the 1,270 admissions were used to calculate prospective model performance and the ability of the three metrics to detect performance changes.RESULTS: The deployed model had an area under the receiver-operator curve (AUROC) of 0.63 in the prospective evaluation, which was a significant decrease from an AUROC of 0.76 on retrospective data (p=0.033). Among the three metrics, SMDs were significantly different for 66/75 (88%) of the model's input variables (p <0.05) between retrospective and deployment data. The membership model was able to discriminate between the two settings (AUROC=0.71, p <0.0001) and the response distributions were significantly different (p <0.0001) for the two settings.CONCLUSION: This study suggests that the three metrics examined could provide early indication of performance deterioration in deployed models' performance.

View details for DOI 10.1055/s-0042-1746168

View details for PubMedID 35508197
Characteristics and outcomes of patients with COVID-19 with and without prevalent hypertension: a multinational cohort study. BMJ open Reyes, C., Pistillo, A., Fernandez-Bertolin, S., Recalde, M., Roel, E., Puente, D., Sena, A. G., Blacketer, C., Lai, L., Alshammari, T. M., Ahmed, W., Alser, O., Alghoul, H., Areia, C., Dawoud, D., Prats-Uribe, A., Valveny, N., de Maeztu, G., Sorli Redo, L., Martinez Roldan, J., Lopez Montesinos, I., Schilling, L. M., Golozar, A., Reich, C., Posada, J. D., Shah, N., You, S. C., Lynch, K. E., DuVall, S. L., Matheny, M. E., Nyberg, F., Ostropolets, A., Hripcsak, G., Rijnbeek, P. R., Suchard, M. A., Ryan, P., Kostka, K., Duarte-Salles, T. 1800; 11 (12): e057632

Abstract

OBJECTIVE: To characterise patients with and without prevalent hypertension and COVID-19 and to assess adverse outcomes in both inpatients and outpatients.DESIGN AND SETTING: This is a retrospective cohort study using 15 healthcare databases (primary and secondary electronic healthcare records, insurance and national claims data) from the USA, Europe and South Korea, standardised to the Observational Medical Outcomes Partnership common data model. Data were gathered from 1 March to 31 October 2020.PARTICIPANTS: Two non-mutually exclusive cohorts were defined: (1) individuals diagnosed with COVID-19 (diagnosed cohort) and (2) individuals hospitalised with COVID-19 (hospitalised cohort), and stratified by hypertension status. Follow-up was from COVID-19 diagnosis/hospitalisation to death, end of the study period or 30 days.OUTCOMES: Demographics, comorbidities and 30-day outcomes (hospitalisation and death for the 'diagnosed' cohort and adverse events and death for the 'hospitalised' cohort) were reported.RESULTS: We identified 2851035 diagnosed and 563708 hospitalised patients with COVID-19. Hypertension was more prevalent in the latter (ranging across databases from 17.4% (95% CI 17.2 to 17.6) to 61.4% (95% CI 61.0 to 61.8) and from 25.6% (95% CI 24.6 to 26.6) to 85.9% (95% CI 85.2 to 86.6)). Patients in both cohorts with hypertension were predominantly >50 years old and female. Patients with hypertension were frequently diagnosed with obesity, heart disease, dyslipidaemia and diabetes. Compared with patients without hypertension, patients with hypertension in the COVID-19 diagnosed cohort had more hospitalisations (ranging from 1.3% (95% CI 0.4 to 2.2) to 41.1% (95% CI 39.5 to 42.7) vs from 1.4% (95% CI 0.9 to 1.9) to 15.9% (95% CI 14.9 to 16.9)) and increased mortality (ranging from 0.3% (95% CI 0.1 to 0.5) to 18.5% (95% CI 15.7 to 21.3) vs from 0.2% (95% CI 0.2 to 0.2) to 11.8% (95% CI 10.8 to 12.8)). Patients in the COVID-19 hospitalised cohort with hypertension were more likely to have acute respiratory distress syndrome (ranging from 0.1% (95% CI 0.0 to 0.2) to 65.6% (95% CI 62.5 to 68.7) vs from 0.1% (95% CI 0.0 to 0.2) to 54.7% (95% CI 50.5 to 58.9)), arrhythmia (ranging from 0.5% (95% CI 0.3 to 0.7) to 45.8% (95% CI 42.6 to 49.0) vs from 0.4% (95% CI 0.3 to 0.5) to 36.8% (95% CI 32.7 to 40.9)) and increased mortality (ranging from 1.8% (95% CI 0.4 to 3.2) to 25.1% (95% CI 23.0 to 27.2) vs from 0.7% (95% CI 0.5 to 0.9) to 10.9% (95% CI 10.4 to 11.4)) than patients without hypertension.CONCLUSIONS: COVID-19 patients with hypertension were more likely to suffer severe outcomes, hospitalisations and deaths compared with those without hypertension.

View details for DOI 10.1136/bmjopen-2021-057632

View details for PubMedID 34937726
Predictors of diagnostic transition from major depressive disorder to bipolar disorder: a retrospective observational network study. Translational psychiatry Nestsiarovich, A., Reps, J. M., Matheny, M. E., DuVall, S. L., Lynch, K. E., Beaton, M., Jiang, X., Spotnitz, M., Pfohl, S. R., Shah, N. H., Torre, C. O., Reich, C. G., Lee, D. Y., Son, S. J., You, S. C., Park, R. W., Ryan, P. B., Lambert, C. G. 1800; 11 (1): 642

Abstract

Many patients with bipolar disorder (BD) are initially misdiagnosed with major depressive disorder (MDD) and are treated with antidepressants, whose potential iatrogenic effects are widely discussed. It is unknown whether MDD is a comorbidity of BD or its earlier stage, and no consensus exists on individual conversion predictors, delaying BD's timely recognition and treatment. We aimed to build a predictive model of MDD to BD conversion and to validate it across a multi-national network of patient databases using the standardization afforded by the Observational Medical Outcomes Partnership (OMOP) common data model. Five "training" US databases were retrospectively analyzed: IBM MarketScan CCAE, MDCR, MDCD, Optum EHR, and Optum Claims. Cyclops regularized logistic regression models were developed on one-year MDD-BD conversion with all standard covariates from the HADES PatientLevelPrediction package. Time-to-conversion Kaplan-Meier analysis was performed up to a decade after MDD, stratified by model-estimated risk. External validation of the final prediction model was performed across 9 patient record databases within the Observational Health Data Sciences and Informatics (OHDSI) network internationally. The model's area under the curve (AUC) varied 0.633-0.745 ( = 0.689) across the five US training databases. Nine variables predicted one-year MDD-BD transition. Factors that increased risk were: younger age, severe depression, psychosis, anxiety, substance misuse, self-harm thoughts/actions, and prior mental disorder. AUCs of the validation datasets ranged 0.570-0.785 ( = 0.664). An assessment algorithm was built for MDD to BD conversion that allows distinguishing as much as 100-fold risk differences among patients and validates well across multiple international data sources.

View details for DOI 10.1038/s41398-021-01760-6

View details for PubMedID 34930903
Unsupervised Learning for Automated Detection of Coronary Artery Disease Subgroups. Journal of the American Heart Association Flores, A. M., Schuler, A., Eberhard, A. V., Olin, J. W., Cooke, J. P., Leeper, N. J., Shah, N. H., Ross, E. G. 2021: e021976

Abstract

Background The promise of precision population health includes the ability to use robust patient data to tailor prevention and care to specific groups. Advanced analytics may allow for automated detection of clinically informative subgroups that account for clinical, genetic, and environmental variability. This study sought to evaluate whether unsupervised machine learning approaches could interpret heterogeneous and missing clinical data to discover clinically important coronary artery disease subgroups. Methods and Results The Genetic Determinants of Peripheral Arterial Disease study is a prospective cohort that includes individuals with newly diagnosed and/or symptomatic coronary artery disease. We applied generalized low rank modeling and K-means cluster analysis using 155 phenotypic and genetic variables from 1329 participants. Cox proportional hazard models were used to examine associations between clusters and major adverse cardiovascular and cerebrovascular events and all-cause mortality. We then compared performance of risk stratification based on clusters and the American College of Cardiology/American Heart Association pooled cohort equations. Unsupervised analysis identified 4 phenotypically and prognostically distinct clusters. All-cause mortality was highest in cluster 1 (oldest/most comorbid; 26%), whereas major adverse cardiovascular and cerebrovascular event rates were highest in cluster 2 (youngest/multiethnic; 41%). Cluster 4 (middle-aged/healthiest behaviors) experienced more incident major adverse cardiovascular and cerebrovascular events (30%) than cluster 3 (middle-aged/lowest medication adherence; 23%), despite apparently similar risk factor and lifestyle profiles. In comparison with the pooled cohort equations, cluster membership was more informative for risk assessment of myocardial infarction, stroke, and mortality. Conclusions Unsupervised clustering identified 4 unique coronary artery disease subgroups with distinct clinical trajectories. Flexible unsupervised machine learning algorithms offer the ability to meaningfully process heterogeneous patient data and provide sharper insights into disease characterization and risk assessment. Registration URL: https://www.clinicaltrials.gov; Unique identifier: NCT00380185.

View details for DOI 10.1161/JAHA.121.021976

View details for PubMedID 34845917
An informatics consult approach for generating clinical evidence for treatment decisions. BMC medical informatics and decision making Lai, A. G., Chang, W. H., Parisinos, C. A., Katsoulis, M., Blackburn, R. M., Shah, A. D., Nguyen, V., Denaxas, S., Davey Smith, G., Gaunt, T. R., Nirantharakumar, K., Cox, M. P., Forde, D., Asselbergs, F. W., Harris, S., Richardson, S., Sofat, R., Dobson, R. J., Hingorani, A., Patel, R., Sterne, J., Banerjee, A., Denniston, A. K., Ball, S., Sebire, N. J., Shah, N. H., Foster, G. R., Williams, B., Hemingway, H. 2021; 21 (1): 281

Abstract

BACKGROUND: An Informatics Consult has been proposed in which clinicians request novel evidence from large scale health data resources, tailored to the treatment of a specific patient. However, the availability of such consultations is lacking. We seek to provide an Informatics Consult for a situation where a treatment indication and contraindication coexist in the same patient, i.e., anti-coagulation use for stroke prevention in a patient with both atrial fibrillation (AF) and liver cirrhosis.METHODS: We examined four sources of evidence for the effect of warfarin on stroke risk or all-cause mortality from: (1) randomised controlled trials (RCTs), (2) meta-analysis of prior observational studies, (3) trial emulation (using population electronic health records (N=3,854,710) and (4) genetic evidence (Mendelian randomisation). We developed prototype forms to request an Informatics Consult and return of results in electronic health record systems.RESULTS: We found 0 RCT reports and 0 trials recruiting for patients with AF and cirrhosis. We found broad concordance across the three new sources of evidence we generated. Meta-analysis of prior observational studies showed that warfarin use was associated with lower stroke risk (hazard ratio [HR]=0.71, CI 0.39-1.29). In a target trial emulation, warfarin was associated with lower all-cause mortality (HR=0.61, CI 0.49-0.76) and ischaemic stroke (HR=0.27, CI 0.08-0.91). Mendelian randomisation served as a drug target validation where we found that lower levels of vitamin K1 (warfarin is a vitamin K1 antagonist) are associated with lower stroke risk. A pilot survey with an independent sample of 34 clinicians revealed that 85% of clinicians found information on prognosis useful and that 79% thought that they should have access to the Informatics Consult as a service within their healthcare systems. We identified candidate steps for automation to scale evidence generation and to accelerate the return of results.CONCLUSION: We performed a proof-of-concept Informatics Consult for evidence generation, which may inform treatment decisions in situations where there is dearth of randomised trials. Patients are surprised to know that their clinicians are currently not able to learn in clinic from data on 'patients like me'. We identify the key challenges in offering such an Informatics Consult as a service.

View details for DOI 10.1186/s12911-021-01638-z

View details for PubMedID 34641870
A quality assessment tool for artificial intelligence-centered diagnostic test accuracy studies: QUADAS-AI. Nature medicine Sounderajah, V., Ashrafian, H., Rose, S., Shah, N. H., Ghassemi, M., Golub, R., Kahn, C. E., Esteva, A., Karthikesalingam, A., Mateen, B., Webster, D., Milea, D., Ting, D., Treanor, D., Cushnan, D., King, D., McPherson, D., Glocker, B., Greaves, F., Harling, L., Ordish, J., Cohen, J. F., Deeks, J., Leeflang, M., Diamond, M., McInnes, M. D., McCradden, M., Abramoff, M. D., Normahani, P., Markar, S. R., Chang, S., Liu, X., Mallett, S., Shetty, S., Denniston, A., Collins, G. S., Moher, D., Whiting, P., Bossuyt, P. M., Darzi, A. 2021

View details for DOI 10.1038/s41591-021-01517-0

View details for PubMedID 34635854
Exploring Workplace Testing with Real-Time Polymerase Chain Reaction SARS-CoV-2 Testing. Journal of the American Board of Family Medicine : JABFM Fuentes, L., Shah, N., Kelly, S., Harnett, G., Schulman, K. A. 2021; 35 (1): 96-101

Abstract

Molecular tests (ie, real-time polymerase chain reaction [RT-PCR]) and antigen tests are used to detect SARS-CoV-2. RT-PCR tests are generally considered to be the standard for clinical diagnosis of SARS-CoV-2 due to accuracy and reliability but can take longer to return results than antigen tests. Our aim was to examine if point-of-care (POC) testing for SARS-CoV-2 infection would provide a flexible resource to help achieve workplace safety. We compared test results and time-to-test results between a POC RT-PCR test and a send-out PCR test in a program implemented in summer 2020.POC testing shortened the time to results to 110 minutes in the POC setting from the 754 minutes for send-out tests. The specificity of POC RT-PCR single POC testing was 98.7% compared with send-out RT-PCR testing and was confirmed at 99.8% in a validation analysis. The sensitivity of the POC testing was 100% compared with send-out RT-PCR, although in a validation analysis, sensitivity appeared as 0% because only the 12 positive or indeterminate samples on the first analysis were retested and the majority were false-positives that were correctly ruled out.POC testing for SARS-CoV-2 with RT-PCR technology is possible at reduced time compared with send-out PCR testing.

View details for DOI 10.3122/jabfm.2022.01.210284

View details for PubMedID 35039415
Computational drug repositioning of atorvastatin for ulcerative colitis. Journal of the American Medical Informatics Association : JAMIA Bai, L., Scott, M. K., Steinberg, E., Kalesinskas, L., Habtezion, A., Shah, N. H., Khatri, P. 2021

Abstract

OBJECTIVE: Ulcerative colitis (UC) is a chronic inflammatory disorder with limited effective therapeutic options for long-term treatment and disease maintenance. We hypothesized that a multi-cohort analysis of independent cohorts representing real-world heterogeneity of UC would identify a robust transcriptomic signature to improve identification of FDA-approved drugs that can be repurposed to treat patients with UC.MATERIALS AND METHODS: We performed a multi-cohort analysis of 272 colon biopsy transcriptome samples across 11 publicly available datasets to identify a robust UC disease gene signature. We compared the gene signature to in vitro transcriptomic profiles induced by 781 FDA-approved drugs to identify potential drug targets. We used a retrospective cohort study design modeled after a target trial to evaluate the protective effect of predicted drugs on colectomy risk in patients with UC from the Stanford Research Repository (STARR) database and Optum Clinformatics DataMart.RESULTS: Atorvastatin treatment had the highest inverse-correlation with the UC gene signature among non-oncolytic FDA-approved therapies. In both STARR (n = 827) and Optum (n = 7821), atorvastatin intake was significantly associated with a decreased risk of colectomy, a marker of treatment-refractory disease, compared to patients prescribed a comparator drug (STARR: HR = 0.47, P = .03; Optum: HR = 0.66, P = .03), irrespective of age and length of atorvastatin treatment.DISCUSSION & CONCLUSION: These findings suggest that atorvastatin may serve as a novel therapeutic option for ameliorating disease in patients with UC. Importantly, we provide a systematic framework for integrating publicly available heterogeneous molecular data with clinical data at a large scale to repurpose existing FDA-approved drugs for a wide range of human diseases.

View details for DOI 10.1093/jamia/ocab165

View details for PubMedID 34529084
Summarizing Patients Like Mine via an On-demand Consultation Service PROCEEDINGS OF THE VLDB ENDOWMENT Shah, N. 2021; 14 (13): 3417

View details for DOI 10.14778/3484224.3484242

View details for Web of Science ID 000742944600016
A survey of extant organizational and computational setups for deploying predictive models in health systems. Journal of the American Medical Informatics Association : JAMIA Kashyap, S., Morse, K. E., Patel, B., Shah, N. H. 2021

Abstract

OBJECTIVE: Artificial intelligence (AI) and machine learning (ML) enabled healthcare is now feasible for many health systems, yet little is known about effective strategies of system architecture and governance mechanisms for implementation. Our objective was to identify the different computational and organizational setups that early-adopter health systems have utilized to integrate AI/ML clinical decision support (AI-CDS) and scrutinize their trade-offs.MATERIALS AND METHODS: We conducted structured interviews with health systems with AI deployment experience about their organizational and computational setups for deploying AI-CDS at point of care.RESULTS: We contacted 34 health systems and interviewed 20 healthcare sites (58% response rate). Twelve (60%) sites used the native electronic health record vendor configuration for model development and deployment, making it the most common shared infrastructure. Nine (45%) sites used alternative computational configurations which varied significantly. Organizational configurations for managing AI-CDS were distinguished by how they identified model needs, built and implemented models, and were separable into 3 major types: Decentralized translation (n=10, 50%), IT Department led (n=2, 10%), and AI in Healthcare (AIHC) Team (n=8, 40%).DISCUSSION: No singular computational configuration enables all current use cases for AI-CDS. Health systems need to consider their desired applications for AI-CDS and whether investment in extending the off-the-shelf infrastructure is needed. Each organizational setup confers trade-offs for health systems planning strategies to implement AI-CDS.CONCLUSION: Health systems will be able to use this framework to understand strengths and weaknesses of alternative organizational and computational setups when designing their strategy for artificial intelligence.

View details for DOI 10.1093/jamia/ocab154

View details for PubMedID 34423364
Learning decision thresholds for risk stratification models from aggregate clinician behavior. Journal of the American Medical Informatics Association : JAMIA Patel, B. S., Steinberg, E., Pfohl, S. R., Shah, N. H. 2021

Abstract

Using a risk stratification model to guide clinical practice often requires the choice of a cutoff-called the decision threshold-on the model's output to trigger a subsequent action such as an electronic alert. Choosing this cutoff is not always straightforward. We propose a flexible approach that leverages the collective information in treatment decisions made in real life to learn reference decision thresholds from physician practice. Using the example of prescribing a statin for primary prevention of cardiovascular disease based on 10-year risk calculated by the 2013 pooled cohort equations, we demonstrate the feasibility of using real-world data to learn the implicit decision threshold that reflects existing physician behavior. Learning a decision threshold in this manner allows for evaluation of a proposed operating point against the threshold reflective of the community standard of care. Furthermore, this approach can be used to monitor and audit model-guided clinical decision making following model deployment.

View details for DOI 10.1093/jamia/ocab159

View details for PubMedID 34350942
Systematic Review of Approaches to Preserve Machine Learning Performance in the Presence of Temporal Dataset Shift in Clinical Medicine. Applied clinical informatics Guo, L. L., Pfohl, S. R., Fries, J., Posada, J., Fleming, S. L., Aftandilian, C., Shah, N., Sung, L. 2021; 12 (4): 808-815

Abstract

OBJECTIVE: The change in performance of machine learning models over time as a result of temporal dataset shift is a barrier to machine learning-derived models facilitating decision-making in clinical practice. Our aim was to describe technical procedures used to preserve the performance of machine learning models in the presence of temporal dataset shifts.METHODS: Studies were included if they were fully published articles that used machine learning and implemented a procedure to mitigate the effects of temporal dataset shift in a clinical setting. We described how dataset shift was measured, the procedures used to preserve model performance, and their effects.RESULTS: Of 4,457 potentially relevant publications identified, 15 were included. The impact of temporal dataset shift was primarily quantified using changes, usually deterioration, in calibration or discrimination. Calibration deterioration was more common (n=11) than discrimination deterioration (n=3). Mitigation strategies were categorized as model level or feature level. Model-level approaches (n=15) were more common than feature-level approaches (n=2), with the most common approaches being model refitting (n=12), probability calibration (n=7), model updating (n=6), and model selection (n=6). In general, all mitigation strategies were successful at preserving calibration but not uniformly successful in preserving discrimination.CONCLUSION: There was limited research in preserving the performance of machine learning models in the presence of temporal dataset shift in clinical medicine. Future research could focus on the impact of dataset shift on clinical decision making, benchmark the mitigation strategies on a wider range of datasets and tasks, and identify optimal strategies for specific settings.

View details for DOI 10.1055/s-0041-1735184

View details for PubMedID 34470057
Heterogeneity and temporal variation in the management of COVID-19: A multinational drug utilisation study including 274,719 hospitalised patients from, the United States of America, China, Spain, and South Korea Prats-Uribe, A., Sena, A. G., Lai, L., Ahmed, W., Alghoul, H., Alser, O., Alshammari, T. M., Areia, C., Carter, W. A., Casajust, P., Dawoud, D., Golozar, A., Jonnagaddala, J., Mehta, P., Mengchun, G., Morales, D. R., Nyberg, F., Posada, J. D., Recalde, M., Roel, E., Shah, K., Shah, N. H., Schilling, L. M., Subbian, V., Vizcaya, D., Zhang, L., Zhang, Y., Zhu, H., Liu, L., You, S., Rijnbeek, P. R., Hripcsak, G., Lane, J., Burn, E., Reich, C., Suchard, M. A., Duartes-Salles, T., Kostka, K., Ryan, P., Prieto-Alhambra, D. WILEY. 2021: 78-79

View details for Web of Science ID 000687807300157
Characteristics and outcomes of over 300,000 COVID-19 individuals with history of cancer in the United States and Spain. Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology Roel, E., Pistillo, A., Recalde, M., Sena, A. G., Fernandez-Bertolin, S., Aragon, M., Puente, D., Ahmed, W., Alghoul, H., Alser, O., Alshammari, T. M., Areia, C., Blacketer, C., Carter, W., Casajust, P., Culhane, A. C., Dawoud, D., DeFalco, F., DuVall, S. L., Falconer, T., Golozar, A., Gong, M., Hester, L., Hripcsak, G., Tan, E. H., Jeon, H., Jonnagaddala, J., Lai, L. Y., Lynch, K. E., Matheny, M. E., Morales, D. R., Natarajan, K., Nyberg, F., Ostropolets, A., Posada, J. D., Prats-Uribe, A., Reich, C. G., Rivera, D. R., Schilling, L. M., Soerjomataram, I., Shah, K., Shah, N. H., Shen, Y., Spotnitz, M., Subbian, V., Suchard, M. A., Trama, A., Zhang, L., Zhang, Y., Ryan, P. B., Prieto-Alhambra, D., Kostka, K., Duarte-Salles, T. 2021

Abstract

BACKGROUND: We described the demographics, cancer subtypes, comorbidities, and outcomes of patients with a history of cancer and COVID-19. Secondly, we compared patients hospitalized with COVID-19 to patients diagnosed with COVID-19 and patients hospitalized with influenza.METHODS: We conducted a cohort study using eight routinely-collected healthcare databases from Spain and the US, standardized to the Observational Medical Outcome Partnership common data model. Three cohorts of patients with a history of cancer were included: i) diagnosed with COVID-19, ii) hospitalized with COVID-19, and iii) hospitalized with influenza in 2017-2018. Patients were followed from index date to 30 days or death. We reported demographics, cancer subtypes, comorbidities, and 30-day outcomes.RESULTS: We included 366,050 and 119,597 patients diagnosed and hospitalized with COVID-19, respectively. Prostate and breast cancers were the most frequent cancers (range: 5-19% and 1-14% in the diagnosed cohort, respectively). Hematological malignancies were also frequent, with non-Hodgkin's lymphoma being among the 5 most common cancer subtypes in the diagnosed cohort. Overall, patients were aged above 65 years and had multiple comorbidities. Occurrence of death ranged from 2% to 14% and from 6% to 26% in the diagnosed and hospitalized COVID-19 cohorts, respectively. Patients hospitalized with influenza (n=67,743) had a similar distribution of cancer subtypes, sex, age and comorbidities but lower occurrence of adverse events.CONCLUSIONS: Patients with a history of cancer and COVID-19 had multiple comorbidities and a high occurrence of COVID-19-related events. Hematological malignancies were frequent.IMPACT: This study provides epidemiologic characteristics that can inform clinical care and etiological studies.

View details for DOI 10.1158/1055-9965.EPI-21-0266

View details for PubMedID 34272262
Characteristics and outcomes of 627 044 COVID-19 patients living with and without obesity in the United States, Spain, and the United Kingdom. International journal of obesity (2005) Recalde, M., Roel, E., Pistillo, A., Sena, A. G., Prats-Uribe, A., Ahmed, W., Alghoul, H., Alshammari, T. M., Alser, O., Areia, C., Burn, E., Casajust, P., Dawoud, D., DuVall, S. L., Falconer, T., Fernandez-Bertolin, S., Golozar, A., Gong, M., Lai, L. Y., Lane, J. C., Lynch, K. E., Matheny, M. E., Mehta, P. P., Morales, D. R., Natarjan, K., Nyberg, F., Posada, J. D., Reich, C. G., Rijnbeek, P. R., Schilling, L. M., Shah, K., Shah, N. H., Subbian, V., Zhang, L., Zhu, H., Ryan, P., Prieto-Alhambra, D., Kostka, K., Duarte-Salles, T. 2021

Abstract

BACKGROUND: A detailed characterization of patients with COVID-19 living with obesity has not yet been undertaken. We aimed to describe and compare the demographics, medical conditions, and outcomes of COVID-19 patients living with obesity (PLWO) to those of patients living without obesity.METHODS: We conducted a cohort study based on outpatient/inpatient care and claims data from January to June 2020 from Spain, the UK, and the US. We used six databases standardized to the OMOP common data model. We defined two non-mutually exclusive cohorts of patients diagnosed and/or hospitalized with COVID-19; patients were followed from index date to 30 days or death. We report the frequency of demographics, prior medical conditions, and 30-days outcomes (hospitalization, events, and death) by obesity status.RESULTS: We included 627 044 (Spain: 122 058, UK: 2336, and US: 502 650) diagnosed and 160 013 (Spain: 18 197, US: 141 816) hospitalized patients with COVID-19. The prevalence of obesity was higher among patients hospitalized (39.9%, 95%CI: 39.8-40.0) than among those diagnosed with COVID-19 (33.1%; 95%CI: 33.0-33.2). In both cohorts, PLWO were more often female. Hospitalized PLWO were younger than patients without obesity. Overall, COVID-19 PLWO were more likely to have prior medical conditions, present with cardiovascular and respiratory events during hospitalization, or require intensive services compared to COVID-19 patients without obesity.CONCLUSION: We show that PLWO differ from patients without obesity in a wide range of medical conditions and present with more severe forms of COVID-19, with higher hospitalization rates and intensive services requirements. These findings can help guiding preventive strategies of COVID-19 infection and complications and generating hypotheses for causal inference studies.

View details for DOI 10.1038/s41366-021-00893-4

View details for PubMedID 34267326
30-Day Outcomes of Children and Adolescents With COVID-19: An International Experience. Pediatrics Talita, D., Vizcaya, D., Pistillo, A., Casajust, P., Sena, A. G., Lai, L. Y., Prats-Uribe, A., Ahmed, W., Alshammari, T. M., Alghoul, H., Alser, O., Burn, E., You, S. C., Areia, C., Blacketer, C., DuVall, S., Falconer, T., Fernandez-Bertolin, S., Fortin, S., Golozar, A., Gong, M., Tan, E. H., Huser, V., Iveli, P., Morales, D. R., Nyberg, F., Posada, J. D., Recalde, M., Roe, E., Schilling, L. M., Shah, N. H., Shah, K., Suchard, M. A., Zhang, L., Zhang, Y., Williams, A. E., Reich, C. G., Hripcsak, G., Rijnbeek, P., Ryan, P., Kostka, K., Prieto-Alhambra, D. 2021

View details for DOI 10.1542/peds.2020-042929

View details for PubMedID 34049958
Use of repurposed and adjuvant drugs in hospital patients with covid-19: multinational network cohort study. BMJ (Clinical research ed.) Prats-Uribe, A., Sena, A. G., Lai, L. Y., Ahmed, W., Alghoul, H., Alser, O., Alshammari, T. M., Areia, C., Carter, W., Casajust, P., Dawoud, D., Golozar, A., Jonnagaddala, J., Mehta, P. P., Gong, M., Morales, D. R., Nyberg, F., Posada, J. D., Recalde, M., Roel, E., Shah, K., Shah, N. H., Schilling, L. M., Subbian, V., Vizcaya, D., Zhang, L., Zhang, Y., Zhu, H., Liu, L., Cho, J., Lynch, K. E., Matheny, M. E., You, S. C., Rijnbeek, P. R., Hripcsak, G., Lane, J. C., Burn, E., Reich, C., Suchard, M. A., Duarte-Salles, T., Kostka, K., Ryan, P. B., Prieto-Alhambra, D. 2021; 373: n1038

Abstract

OBJECTIVE: To investigate the use of repurposed and adjuvant drugs in patients admitted to hospital with covid-19 across three continents.DESIGN: Multinational network cohort study.SETTING: Hospital electronic health records from the United States, Spain, and China, and nationwide claims data from South Korea.PARTICIPANTS: 303264 patients admitted to hospital with covid-19 from January 2020 to December 2020.MAIN OUTCOME MEASURES: Prescriptions or dispensations of any drug on or 30 days after the date of hospital admission for covid-19.RESULTS: Of the 303264 patients included, 290131 were from the US, 7599 from South Korea, 5230 from Spain, and 304 from China. 3455 drugs were identified. Common repurposed drugs were hydroxychloroquine (used in from <5 (<2%) patients in China to 2165 (85.1%) in Spain), azithromycin (from 15 (4.9%) in China to 1473 (57.9%) in Spain), combined lopinavir and ritonavir (from 156 (<2%) in the VA-OMOP US to 2,652 (34.9%) in South Korea and 1285 (50.5%) in Spain), and umifenovir (0% in the US, South Korea, and Spain and 238 (78.3%) in China). Use of adjunctive drugs varied greatly, with the five most used treatments being enoxaparin, fluoroquinolones, ceftriaxone, vitamin D, and corticosteroids. Hydroxychloroquine use increased rapidly from March to April 2020 but declined steeply in May to June and remained low for the rest of the year. The use of dexamethasone and corticosteroids increased steadily during 2020.CONCLUSIONS: Multiple drugs were used in the first few months of the covid-19 pandemic, with substantial geographical and temporal variation. Hydroxychloroquine, azithromycin, lopinavir-ritonavir, and umifenovir (in China only) were the most prescribed repurposed drugs. Antithrombotics, antibiotics, H2 receptor antagonists, and corticosteroids were often used as adjunctive treatments. Research is needed on the comparative risk and benefit of these treatments in the management of covid-19.

View details for DOI 10.1136/bmj.n1038

View details for PubMedID 33975825
Ontology-driven weak supervision for clinical entity classification in electronic health records. Nature communications Fries, J. A., Steinberg, E., Khattar, S., Fleming, S. L., Posada, J., Callahan, A., Shah, N. H. 2021; 12 (1): 2017

Abstract

In the electronic health record, using clinical notes to identify entities such as disorders and their temporality (e.g. the order of an event relative to a time index) can inform many important analyses. However, creating training data for clinical entity tasks is time consuming and sharing labeled data is challenging due to privacy concerns. The information needs of the COVID-19 pandemic highlight the need for agile methods of training machine learning models for clinical notes. We present Trove, a framework for weakly supervised entity classification using medical ontologies and expert-generated rules. Our approach, unlike hand-labeled notes, is easy to share and modify, while offering performance comparable to learning from manually labeled training data. In this work, we validate our framework on six benchmark tasks and demonstrate Trove's ability to analyze the records of patients visiting the emergency department at Stanford Health Care for COVID-19 presenting symptoms and risk factors.

View details for DOI 10.1038/s41467-021-22328-4

View details for PubMedID 33795682
ACE: the Advanced Cohort Engine for searching longitudinal patient records. Journal of the American Medical Informatics Association : JAMIA Callahan, A., Polony, V., Posada, J. D., Banda, J. M., Gombar, S., Shah, N. H. 2021

Abstract

OBJECTIVE: To propose a paradigm for a scalable time-aware clinical data search, and to describe the design, implementation and use of a search engine realizing this paradigm.MATERIALS AND METHODS: The Advanced Cohort Engine (ACE) uses a temporal query language and in-memory datastore of patient objects to provide a fast, scalable, and expressive time-aware search. ACE accepts data in the Observational Medicine Outcomes Partnership Common Data Model, and is configurable to balance performance with compute cost. ACE's temporal query language supports automatic query expansion using clinical knowledge graphs. The ACE API can be used with R, Python, Java, HTTP, and a Web UI.RESULTS: ACE offers an expressive query language for complex temporal search across many clinical data types with multiple output options. ACE enables electronic phenotyping and cohort-building with subsecond response times in searching the data of millions of patients for a variety of use cases.DISCUSSION: ACE enables fast, time-aware search using a patient object-centric datastore, thereby overcoming many technical and design shortcomings of relational algebra-based querying. Integrating electronic phenotype development with cohort-building enables a variety of high-value uses for a learning health system. Tradeoffs include the need to learn a new query language and the technical setup burden.CONCLUSION: ACE is a tool that combines a unique query language for time-aware search of longitudinal patient records with a patient object datastore for rapid electronic phenotyping, cohort extraction, and exploratory data analyses.

View details for DOI 10.1093/jamia/ocab027

View details for PubMedID 33712854
Conflicting information from the Food and Drug Administration: Missed opportunity to lead standards for safe and effective medical artificial intelligence solutions. Journal of the American Medical Informatics Association : JAMIA Hernandez-Boussard, T., Lundgren, M. P., Shah, N. 2021

Abstract

The Food & Drug Administration (FDA) is considering the permanent exemption of premarket notification requirements for several Class I and II medical device products, including several artificial Intelligence (AI)-driven devices. The exemption is based on the need to rapidly more quickly disseminate devices to the public, estimated cost-savings, a lack of documented adverse events reported to the FDA's database. However, this ignores emerging issues related to AI-based devices, including utility, reproducibility and bias that may not only affect an individual but entire populations. We urge the FDA to reinforce the messaging on safety and effectiveness regulations of AI-based Software as a Medical Device products to better promote fair AI-driven clinical decision tools and for preventing harm to the patients we serve.

View details for DOI 10.1093/jamia/ocab035

View details for PubMedID 33674865
Unraveling COVID-19: a large-scale characterization of 4.5 million COVID-19 cases using CHARYBDIS. Research square Prieto-Alhambra, D., Kostka, K., Duarte-Salles, T., Prats-Uribe, A., Sena, A., Pistillo, A., Khalid, S., Lai, L., Golozar, A., Alshammari, T. M., Dawoud, D., Nyberg, F., Wilcox, A., Andryc, A., Williams, A., Ostropolets, A., Areia, C., Jung, C. Y., Harle, C., Reich, C., Blacketer, C., Morales, D., Dorr, D. A., Burn, E., Roel, E., Tan, E. H., Minty, E., DeFalco, F., de Maeztu, G., Lipori, G., Alghoul, H., Zhu, H., Thomas, J., Bian, J., Park, J., Roldán, J. M., Posada, J., Banda, J. M., Horcajada, J. P., Kohler, J., Shah, K., Natarajan, K., Lynch, K., Liu, L., Schilling, L., Recalde, M., Spotnitz, M., Gong, M., Matheny, M., Valveny, N., Weiskopf, N., Shah, N., Alser, O., Casajust, P., Park, R. W., Schuff, R., Seager, S., DuVall, S., You, S. C., Song, S., Fernández-Bertolín, S., Fortin, S., Magoc, T., Falconer, T., Subbian, V., Huser, V., Ahmed, W. U., Carter, W., Guan, Y., Galvan, Y., He, X., Rijnbeek, P., Hripcsak, G., Ryan, P., Suchard, M. 2021

Abstract

Background: Routinely collected real world data (RWD) have great utility in aiding the novel coronavirus disease (COVID-19) pandemic response [1,2]. Here we present the international Observational Health Data Sciences and Informatics (OHDSI) [3] Characterizing Health Associated Risks, and Your Baseline Disease In SARS-COV-2 (CHARYBDIS) framework for standardisation and analysis of COVID-19 RWD. Methods: We conducted a descriptive cohort study using a federated network of data partners in the United States, Europe (the Netherlands, Spain, the UK, Germany, France and Italy) and Asia (South Korea and China). The study protocol and analytical package were released on 11 th June 2020 and are iteratively updated via GitHub [4]. Findings: We identified three non-mutually exclusive cohorts of 4,537,153 individuals with a clinical COVID-19 diagnosis or positive test, 886,193 hospitalized with COVID-19 , and 113,627 hospitalized with COVID-19 requiring intensive services . All comorbidities, symptoms, medications, and outcomes are described by cohort in aggregate counts, and are available in an interactive website: https://data.ohdsi.org/Covid19CharacterizationCharybdis/. Interpretation: CHARYBDIS findings provide benchmarks that contribute to our understanding of COVID-19 progression, management and evolution over time. This can enable timely assessment of real-world outcomes of preventative and therapeutic options as they are introduced in clinical practice.

View details for DOI 10.21203/rs.3.rs-279400/v1

View details for PubMedID 33688639

View details for PubMedCentralID PMC7941629
Assessment of Extractability and Accuracy of Electronic Health Record Data for Joint Implant Registries. JAMA network open Giori, N. J., Radin, J., Callahan, A., Fries, J. A., Halilaj, E., Re, C., Delp, S. L., Shah, N. H., Harris, A. H. 2021; 4 (3): e211728

Abstract

Importance: Implant registries provide valuable information on the performance of implants in a real-world setting, yet they have traditionally been expensive to establish and maintain. Electronic health records (EHRs) are widely used and may include the information needed to generate clinically meaningful reports similar to a formal implant registry.Objectives: To quantify the extractability and accuracy of registry-relevant data from the EHR and to assess the ability of these data to track trends in implant use and the durability of implants (hereafter referred to as implant survivorship), using data stored since 2000 in the EHR of the largest integrated health care system in the United States.Design, Setting, and Participants: Retrospective cohort study of a large EHR of veterans who had 45 351 total hip arthroplasty procedures in Veterans Health Administration hospitals from 2000 to 2017. Data analysis was performed from January 1, 2000, to December 31, 2017.Exposures: Total hip arthroplasty.Main Outcomes and Measures: Number of total hip arthroplasty procedures extracted from the EHR, trends in implant use, and relative survivorship of implants.Results: A total of 45 351 total hip arthroplasty procedures were identified from 2000 to 2017 with 192 805 implant parts. Data completeness improved over the time. After 2014, 85% of prosthetic heads, 91% of shells, 81% of stems, and 85% of liners used in the Veterans Health Administration health care system were identified by part number. Revision burden and trends in metal vs ceramic prosthetic femoral head use were found to reflect data from the American Joint Replacement Registry. Recalled implants were obvious negative outliers in implant survivorship using Kaplan-Meier curves.Conclusions and Relevance: Although loss to follow-up remains a challenge that requires additional attention to improve the quantitative nature of calculated implant survivorship, we conclude that data collected during routine clinical care and stored in the EHR of a large health system over 18 years were sufficient to provide clinically meaningful data on trends in implant use and to identify poor implants that were subsequently recalled. This automated approach was low cost and had no reporting burden. This low-cost, low-overhead method to assess implant use and performance within a large health care setting may be useful to internal quality assurance programs and, on a larger scale, to postmarket surveillance of implant performance.

View details for DOI 10.1001/jamanetworkopen.2021.1728

View details for PubMedID 33720372
Occurrence and Timing of Subsequent Severe Acute Respiratory Syndrome Coronavirus 2 Reverse-transcription Polymerase Chain Reaction Positivity Among Initially Negative Patients. Clinical infectious diseases : an official publication of the Infectious Diseases Society of America Long, D. R., Gombar, S., Hogan, C. A., Greninger, A. L., O'Reilly-Shah, V., Bryson-Cahn, C., Stevens, B., Rustagi, A., Jerome, K. R., Kong, C. S., Zehnder, J., Shah, N. H., Weiss, N. S., Pinsky, B. A., Sunshine, J. E. 2021; 72 (2): 323-326

Abstract

Using data for 20 912 patients from 2 large academic health systems, we analyzed the frequency of severe acute respiratory syndrome coronavirus 2 reverse-transcription polymerase chain reaction test discordance among individuals initially testing negative by nasopharyngeal swab who were retested on clinical grounds within 7 days. The frequency of subsequent positivity within this window was 3.5% and was similar across institutions.

View details for DOI 10.1093/cid/ciaa722

View details for PubMedID 33501950
Occurrence and Timing of Subsequent Severe Acute Respiratory Syndrome Coronavirus 2 Reverse-transcription Polymerase Chain Reaction Positivity Among Initially Negative Patients. Clinical infectious diseases : an official publication of the Infectious Diseases Society of America Long, D. R., Gombar, S. n., Hogan, C. A., Greninger, A. L., O'Reilly-Shah, V. n., Bryson-Cahn, C. n., Stevens, B. n., Rustagi, A. n., Jerome, K. R., Kong, C. S., Zehnder, J. n., Shah, N. H., Weiss, N. S., Pinsky, B. A., Sunshine, J. E. 2021; 72 (2): 323–26

Abstract

Using data for 20 912 patients from 2 large academic health systems, we analyzed the frequency of severe acute respiratory syndrome coronavirus 2 reverse-transcription polymerase chain reaction test discordance among individuals initially testing negative by nasopharyngeal swab who were retested on clinical grounds within 7 days. The frequency of subsequent positivity within this window was 3.5% and was similar across institutions.

View details for DOI 10.1093/cid/ciaa722

View details for PubMedID 33543250
Treatment and Monitoring Variability in US Metastatic Breast Cancer Care. JCO clinical cancer informatics Caswell-Jin, J. L., Callahan, A., Purington, N., Han, S. S., Itakura, H., John, E. M., Blayney, D. W., Sledge, G. W., Shah, N. H., Kurian, A. W. 2021; 5: 600-614

Abstract

Treatment and monitoring options for patients with metastatic breast cancer (MBC) are increasing, but little is known about variability in care. We sought to improve understanding of MBC care and its correlates by analyzing real-world claims data using a search engine with a novel query language to enable temporal electronic phenotyping.Using the Advanced Cohort Engine, we identified 6,180 women who met criteria for having estrogen receptor-positive, human epidermal growth factor receptor 2-negative MBC from IBM MarketScan US insurance claims (2007-2014). We characterized treatment, monitoring, and hospice usage, along with clinical and nonclinical factors affecting care.We observed wide variability in treatment modality and monitoring across patients and geography. Most women received first-recorded therapy with endocrine (67%) versus chemotherapy, underwent more computed tomography (CT) (76%) than positron emission tomography-CT, and were monitored using tumor markers (58%). Nearly half (46%) met criteria for aggressive disease, which were associated with receiving chemotherapy first, monitoring primarily with CT, and more frequent imaging. Older age was associated with endocrine therapy first, less frequent imaging, and less use of tumor markers. After controlling for clinical factors, care strategies varied significantly by nonclinical factors (median regional income with first-recorded therapy and imaging type, geographic region with these and with imaging frequency and use of tumor markers; P < .0001).Variability in US MBC care is explained by patient and disease factors and by nonclinical factors such as geographic region, suggesting that treatment decisions are influenced by local practice patterns and/or resources. A search engine designed to express complex electronic phenotypes from longitudinal patient records enables the identification of variability in patient care, helping to define disparities and areas for improvement.

View details for DOI 10.1200/CCI.21.00031

View details for PubMedID 34043432
Improving Hospital Readmission Prediction using Individualized Utility Analysis. Journal of biomedical informatics Ko, M., Chen, E., Agrawal, A., Rajpurkar, P., Avati, A., Ng, A., Basu, S., Shah, N. H. 2021: 103826

Abstract

Machine learning (ML) models for allocating readmission-mitigating interventions are typically selected according to their discriminative ability, which may not necessarily translate into utility in allocation of resources. Our objective was to determine whether ML models for allocating readmission-mitigating interventions have different usefulness based on their overall utility and discriminative ability.We conducted a retrospective utility analysis of ML models using claims data acquired from the Optum Clinformatics Data Mart, including 513,495 commercially-insured inpatients (mean [SD] age 69 [19] years; 294,895 [57%] Female) over the period January 2016 through January 2017 from all 50 states with mean 90 day cost of $11,552. Utility analysis estimates the cost, in dollars, of allocating interventions for lowering readmission risk based on the reduction in the 90-day cost.Allocating readmission-mitigating interventions based on a GBDT model trained to predict readmissions achieved an estimated utility gain of $104 per patient, and an AUC of 0.76 (95% CI 0.76, 0.77); allocating interventions based on a model trained to predict cost as a proxy achieved a higher utility of $175.94 per patient, and an AUC of 0.62 (95% CI 0.61, 0.62). A hybrid model combining both intervention strategies is comparable with the best models on either metric. Estimated utility varies by intervention cost and efficacy, with each model performing the best under different intervention settings.We demonstrate that machine learning models may be ranked differently based on overall utility and discriminative ability. Machine learning models for allocation of limited health resources should consider directly optimizing for utility.

View details for DOI 10.1016/j.jbi.2021.103826

View details for PubMedID 34087428
An open repository of real-time COVID-19 indicators. Proceedings of the National Academy of Sciences of the United States of America Reinhart, A., Brooks, L., Jahja, M., Rumack, A., Tang, J., Agrawal, S., Al Saeed, W., Arnold, T., Basu, A., Bien, J., Cabrera, Á. A., Chin, A., Chua, E. J., Clark, B., Colquhoun, S., DeFries, N., Farrow, D. C., Forlizzi, J., Grabman, J., Gratzl, S., Green, A., Haff, G., Han, R., Harwood, K., Hu, A. J., Hyde, R., Hyun, S., Joshi, A., Kim, J., Kuznetsov, A., La Motte-Kerr, W., Lee, Y. J., Lee, K., Lipton, Z. C., Liu, M. X., Mackey, L., Mazaitis, K., McDonald, D. J., McGuinness, P., Narasimhan, B., O'Brien, M. P., Oliveira, N. L., Patil, P., Perer, A., Politsch, C. A., Rajanala, S., Rucker, D., Scott, C., Shah, N. H., Shankar, V., Sharpnack, J., Shemetov, D., Simon, N., Smith, B. Y., Srivastava, V., Tan, S., Tibshirani, R., Tuzhilina, E., Van Nortwick, A. K., Ventura, V., Wasserman, L., Weaver, B., Weiss, J. C., Whitman, S., Williams, K., Rosenfeld, R., Tibshirani, R. J. 2021; 118 (51)

Abstract

The COVID-19 pandemic presented enormous data challenges in the United States. Policy makers, epidemiological modelers, and health researchers all require up-to-date data on the pandemic and relevant public behavior, ideally at fine spatial and temporal resolution. The COVIDcast API is our attempt to fill this need: Operational since April 2020, it provides open access to both traditional public health surveillance signals (cases, deaths, and hospitalizations) and many auxiliary indicators of COVID-19 activity, such as signals extracted from deidentified medical claims data, massive online surveys, cell phone mobility data, and internet search trends. These are available at a fine geographic resolution (mostly at the county level) and are updated daily. The COVIDcast API also tracks all revisions to historical data, allowing modelers to account for the frequent revisions and backfill that are common for many public health data sources. All of the data are available in a common format through the API and accompanying R and Python software packages. This paper describes the data sources and signals, and provides examples demonstrating that the auxiliary signals in the COVIDcast API present information relevant to tracking COVID activity, augmenting traditional public health reporting and empowering research and decision-making.

View details for DOI 10.1073/pnas.2111452118

View details for PubMedID 34903654
SARS-CoV-2 infection and COVID-19 severity in individuals with prior seasonal coronavirus infection. Diagnostic microbiology and infectious disease Gombar, S. n., Bergquist, T. n., Pejaver, V. n., Hammarlund, N. E., Murugesan, K. n., Mooney, S. n., Shah, N. n., Pinsky, B. A., Banaei, N. n. 2021; 100 (2): 115338

Abstract

We show that individuals with documented history of seasonal coronavirus have a similar SARS-CoV-2 infection rate and COVID-19 severity as those with no prior history of seasonal coronavirus. Our findings suggest prior infection with seasonal coronavirus does not provide immunity to subsequent infection with SARS-CoV-2.

View details for DOI 10.1016/j.diagmicrobio.2021.115338

View details for PubMedID 33610036
COVID-19 in patients with autoimmune diseases: characteristics and outcomes in a multinational network of cohorts across three countries. Rheumatology (Oxford, England) Tan, E. H., Sena, A. G., Prats-Uribe, A. n., You, S. C., Ahmed, W. U., Kostka, K. n., Reich, C. n., Duvall, S. L., Lynch, K. E., Matheny, M. E., Duarte-Salles, T. n., Bertolin, S. F., Hripcsak, G. n., Natarajan, K. n., Falconer, T. n., Spotnitz, M. n., Ostropolets, A. n., Blacketer, C. n., Alshammari, T. M., Alghoul, H. n., Alser, O. n., Lane, J. C., Dawoud, D. M., Shah, K. n., Yang, Y. n., Zhang, L. n., Areia, C. n., Golozar, A. n., Recalde, M. n., Casajust, P. n., Jonnagaddala, J. n., Subbian, V. n., Vizcaya, D. n., Lai, L. Y., Nyberg, F. n., Morales, D. R., Posada, J. D., Shah, N. H., Gong, M. n., Vivekanantham, A. n., Abend, A. n., Minty, E. P., Suchard, M. n., Rijnbeek, P. n., Ryan, P. B., Prieto-Alhambra, D. n. 2021

Abstract

Patients with autoimmune diseases were advised to shield to avoid COVID-19, but information on their prognosis is lacking. We characterised 30-day outcomes and mortality after hospitalisation with COVID-19 among patients with prevalent autoimmune diseases, and compared outcomes after hospital admissions among similar patients with seasonal influenza.A multinational network cohort study was conducted using electronic health records data from Columbia University Irving Medical Center (CUIMC) (United States [US]), Optum [US], Department of Veterans Affairs (VA) (US), Information System for Research in Primary Care-Hospitalisation Linked Data (SIDIAP-H) (Spain), and claims data from IQVIA Open Claims (US) and Health Insurance and Review Assessment (HIRA) (South Korea). All patients with prevalent autoimmune diseases, diagnosed and/or hospitalised between January and June 2020 with COVID-19, and similar patients hospitalised with influenza in 2017-2018 were included. Outcomes were death and complications within 30 days of hospitalisation.We studied 133 589 patients diagnosed and 48 418 hospitalised with COVID-19 with prevalent autoimmune diseases. Most patients were female, aged ≥50 years with previous comorbidities. The prevalence of hypertension (45.5-93.2%), chronic kidney disease (14.0-52.7%) and heart disease (29.0-83.8%) was higher in hospitalised vs diagnosed patients with COVID-19. Compared with 70 660 hospitalised with influenza, those admitted with COVID-19 had more respiratory complications including pneumonia and acute respiratory distress syndrome, and higher 30-day mortality (2.2% to 4.3% vs 6.3% to 24.6%).Compared with influenza, COVID-19 is a more severe disease, leading to more complications and higher mortality.

View details for DOI 10.1093/rheumatology/keab250

View details for PubMedID 33725121
A framework for making predictive models useful in practice. Journal of the American Medical Informatics Association : JAMIA Jung, K., Kashyap, S., Avati, A., Harman, S., Shaw, H., Li, R., Smith, M., Shum, K., Javitz, J., Vetteth, Y., Seto, T., Bagley, S. C., Shah, N. H. 2020

Abstract

OBJECTIVE: To analyze the impact of factors in healthcare delivery on the net benefit of triggering an Advanced Care Planning (ACP) workflow based on predictions of 12-month mortality.MATERIALS AND METHODS: We built a predictive model of 12-month mortality using electronic health record data and evaluated the impact of healthcare delivery factors on the net benefit of triggering an ACP workflow based on the models' predictions. Factors included nonclinical reasons that make ACP inappropriate: limited capacity for ACP, inability to follow up due to patient discharge, and availability of an outpatient workflow to follow up on missed cases. We also quantified the relative benefits of increasing capacity for inpatient ACP versus outpatient ACP.RESULTS: Work capacity constraints and discharge timing can significantly reduce the net benefit of triggering the ACP workflow based on a model's predictions. However, the reduction can be mitigated by creating an outpatient ACP workflow. Given limited resources to either add capacity for inpatient ACP versus developing outpatient ACP capability, the latter is likely to provide more benefit to patient care.DISCUSSION: The benefit of using a predictive model for identifying patients for interventions is highly dependent on the capacity to execute the workflow triggered by the model. We provide a framework for quantifying the impact of healthcare delivery factors and work capacity constraints on achieved benefit.CONCLUSION: An analysis of the sensitivity of the net benefit realized by a predictive model triggered clinical workflow to various healthcare delivery factors is necessary for making predictive models useful in practice.

View details for DOI 10.1093/jamia/ocaa318

View details for PubMedID 33355350
Prediction of Major Depressive Disorder Following Beta-Blocker Therapy in Patients with Cardiovascular Diseases. Journal of personalized medicine Jin, S., Kostka, K., Posada, J. D., Kim, Y., Seo, S. I., Lee, D. Y., Shah, N. H., Roh, S., Lim, Y., Chae, S. G., Jin, U., Son, S. J., Reich, C., Rijnbeek, P. R., Park, R. W., You, S. C. 2020; 10 (4)

Abstract

Incident depression has been reported to be associated with poor prognosis in patients with cardiovascular disease (CVD), which might be associated with beta-blocker therapy. Because early detection and intervention can alleviate the severity of depression, we aimed to develop a machine learning (ML) model predicting the onset of major depressive disorder (MDD). A model based on L1 regularized logistic regression was trained against the South Korean nationwide administrative claims database to identify risk factors for the incident MDD after beta-blocker therapy in patients with CVD. We identified 50,397 patients initiating beta-blockers for CVD, with 774 patients developing MDD within 365 days after initiating beta-blocker therapy. An area under the receiver operating characteristic curve (AUC) of 0.74 was achieved. A history of non-selective beta-blockers and factors related to anxiety disorder, sleeping problems, and other chronic diseases were the most strong predictors. AUCs of 0.62-0.71 were achieved in the external validation conducted on six independent electronic health records and claims databases in the USA and South Korea. In conclusion, an ML model that identifies patients at high-risk for incident MDD was developed. Application of ML to identify susceptible patients for adverse events of treatment may serve as an important approach for personalized medicine.

View details for DOI 10.3390/jpm10040288

View details for PubMedID 33352870
Use of dialysis, tracheostomy, and extracorporeal membrane oxygenation among 240,392 patients hospitalized with COVID-19 in the United States. medRxiv : the preprint server for health sciences Burn, E., Sena, A. G., Prats-Uribe, A., Spotnitz, M., DuVall, S., Lynch, K. E., Matheny, M. E., Nyberg, F., Ahmed, W. U., Alser, O., Alghoul, H., Alshammari, T., Zhang, L., Casajust, P., Areia, C., Shah, K., Reich, C., Blacketer, C., Andryc, A., Fortin, S., Natarajan, K., Gong, M., Golozar, A., Morales, D., Rijnbeek, P., Subbian, V., Roel, E., Recalde, M., Lane, J. C., Vizcaya, D., Posada, J. D., Shah, N. H., Jonnagaddala, J., Lai, L. Y., Avilés-Jurado, F. X., Hripcsak, G., Suchard, M. A., Ranzani, O. T., Ryan, P., Prieto-Alhambra, D., Kostka, K., Duarte-Salles, T. 2020

Abstract

To estimate the proportion of patients hospitalized with COVID-19 who undergo dialysis, tracheostomy, and extracorporeal membrane oxygenation (ECMO).A network cohort study.Six databases from the United States containing routinely-collected patient data: HealthVerity, Premier, IQVIA Open Claims, Optum EHR, Optum SES, and VA-OMOP.Patients hospitalized with a clinical diagnosis or a positive test result for COVID-19.Dialysis, tracheostomy, and ECMO.240,392 patients hospitalized with COVID-19 were included (22,887 from HealthVerity, 139,971 from IQVIA Open Claims, 29,061 from Optum EHR, 4,336 from OPTUM SES, 36,019 from Premier, and 8,118 from VA-OMOP). Across the six databases, 9,703 (4.04% [95% CI: 3.96% to 4.11%]) patients received dialysis, 1,681 (0.70% [0.67% to 0.73%]) had a tracheostomy, and 398 (0.17% [95% CI: 0.15% to 0.18%]) patients underwent ECMO over the 30 days following hospitalization. Use of ECMO was generally concentrated among patients who were younger, male, and with fewer comorbidities except for obesity. Tracheostomy was used for a similar proportion of patients regardless of age, sex, or comorbidity. While dialysis was used for a similar proportion among younger and older patients, it was more frequent among male patients and among those with chronic kidney disease.Use of dialysis among those hospitalized with COVID-19 is high at around 4%. Although less than one percent of patients undergo tracheostomy and ECMO, the absolute numbers of patients who have undergone these interventions is substantial and can be expected to continue grow given the continuing spread of the COVID-19.

View details for DOI 10.1101/2020.11.25.20229088

View details for PubMedID 33269356

View details for PubMedCentralID PMC7709172
An empirical characterization of fair machine learning for clinical risk prediction. Journal of biomedical informatics Pfohl, S. R., Foryciarz, A., Shah, N. H. 2020: 103621

Abstract

The use of machine learning to guide clinical decision making has the potential to worsen existing health disparities. Several recent works frame the problem as that of algorithmic fairness, a framework that has attracted considerable attention and criticism. However, the appropriateness of this framework is unclear due to both ethical as well as technical considerations, the latter of which include trade-offs between measures of fairness and model performance that are not well-understood for predictive models of clinical outcomes. To inform the ongoing debate, we conduct an empirical study to characterize the impact of penalizing group fairness violations on an array of measures of model performance and group fairness. We repeat the analysis across multiple observational healthcare databases, clinical outcomes, and sensitive attributes. We find that procedures that penalize differences between the distributions of predictions across groups induce nearly-universal degradation of multiple performance metrics within groups. On examining the secondary impact of these procedures, we observe heterogeneity of the effect of these procedures on measures of fairness in calibration and ranking across experimental conditions. Beyond the reported trade-offs, we emphasize that analyses of algorithmic fairness in healthcare lack the contextual grounding and causal awareness necessary to reason about the mechanisms that lead to health disparities, as well as about the potential of algorithmic fairness methods to counteract those mechanisms. In light of these limitations, we encourage researchers building predictive models for clinical use to step outside the algorithmic fairness frame and engage critically with the broader sociotechnical context surrounding the use of machine learning in healthcare.

View details for DOI 10.1016/j.jbi.2020.103621

View details for PubMedID 33220494
Development and utility assessment of a machine learning bloodstream infection classifier in pediatric patients receiving cancer treatments. BMC cancer Sung, L., Corbin, C., Steinberg, E., Vettese, E., Campigotto, A., Lecce, L., Tomlinson, G. A., Shah, N. 2020; 20 (1): 1103

Abstract

BACKGROUND: Objectives were to build a machine learning algorithm to identify bloodstream infection (BSI) among pediatric patients with cancer and hematopoietic stem cell transplantation (HSCT) recipients, and to compare this approach with presence of neutropenia to identify BSI.METHODS: We included patients 0-18years of age at cancer diagnosis or HSCT between January 2009 and November 2018. Eligible blood cultures were those with no previous blood culture (regardless of result) within 7days. The primary outcome was BSI. Four machine learning algorithms were used: elastic net, support vector machine and two implementations of gradient boosting machine (GBM and XGBoost). Model training and evaluation were performed using temporally disjoint training (60%), validation (20%) and test (20%) sets. The best model was compared to neutropenia alone in the test set.RESULTS: Of 11,183 eligible blood cultures, 624 (5.6%) were positive. The best model in the validation set was GBM, which achieved an area-under-the-receiver-operator-curve (AUROC) of 0.74 in the test set. Among the 2236 in the test set, the number of false positives and specificity of GBM vs. neutropenia were 508 vs. 592 and 0.76 vs. 0.72 respectively. Among 139 test set BSIs, six (4.3%) non-neutropenic patients were identified by GBM. All received antibiotics prior to culture result availability.CONCLUSIONS: We developed a machine learning algorithm to classify BSI. GBM achieved an AUROC of 0.74 and identified 4.3% additional true cases in the test set. The machine learning algorithm did not perform substantially better than using presence of neutropenia alone to predict BSI.

View details for DOI 10.1186/s12885-020-07618-2

View details for PubMedID 33187484
Baseline characteristics, management, and outcomes of 55,270 children and adolescents diagnosed with COVID-19 and 1,952,693 with influenza in France, Germany, Spain, South Korea and the United States: an international network cohort study. medRxiv : the preprint server for health sciences Duarte-Salles, T., Vizcaya, D., Pistillo, A., Casajust, P., Sena, A. G., Lai, L. Y., Prats-Uribe, A., Ahmed, W. U., Alshammari, T. M., Alghoul, H., Alser, O., Burn, E., You, S. C., Areia, C., Blacketer, C., DuVall, S., Falconer, T., Fernandez-Bertolin, S., Fortin, S., Golozar, A., Gong, M., Tan, E. H., Huser, V., Iveli, P., Morales, D. R., Nyberg, F., Posada, J. D., Recalde, M., Roel, E., Schilling, L. M., Shah, N. H., Shah, K., Suchard, M. A., Zhang, L., Zhang, Y., Williams, A. E., Reich, C. G., Hripcsak, G., Rijnbeek, P., Ryan, P., Kostka, K., Prieto-Alhambra, D. 2020

Abstract

Objectives To characterize the demographics, comorbidities, symptoms, in-hospital treatments, and health outcomes among children/adolescents diagnosed or hospitalized with COVID-19. Secondly, to describe health outcomes amongst children/adolescents diagnosed with previous seasonal influenza. Design International network cohort. Setting Real-world data from European primary care records (France/Germany/Spain), South Korean claims and US claims and hospital databases. Participants Diagnosed and/or hospitalized children/adolescents with COVID-19 at age <18 between January and June 2020; diagnosed with influenza in 2017-2018. Main outcome measures Baseline demographics and comorbidities, symptoms, 30-day in-hospital treatments and outcomes including hospitalization, pneumonia, acute respiratory distress syndrome (ARDS), multi-system inflammatory syndrome (MIS-C), and death. Results A total of 55,270 children/adolescents diagnosed and 3,693 hospitalized with COVID-19 and 1,952,693 diagnosed with influenza were studied. Comorbidities including neurodevelopmental disorders, heart disease, and cancer were all more common among those hospitalized vs diagnosed with COVID-19. The most common COVID-19 symptom was fever. Dyspnea, bronchiolitis, anosmia and gastrointestinal symptoms were more common in COVID-19 than influenza. In-hospital treatments for COVID-19 included repurposed medications (<10%), and adjunctive therapies: systemic corticosteroids (6.8% to 37.6%), famotidine (9.0% to 28.1%), and antithrombotics such as aspirin (2.0% to 21.4%), heparin (2.2% to 18.1%), and enoxaparin (2.8% to 14.8%). Hospitalization was observed in 0.3% to 1.3% of the COVID-19 diagnosed cohort, with undetectable (N<5 per database) 30-day fatality. Thirty-day outcomes including pneumonia, ARDS, and MIS-C were more frequent in COVID-19 than influenza. Conclusions Despite negligible fatality, complications including pneumonia, ARDS and MIS-C were more frequent in children/adolescents with COVID-19 than with influenza. Dyspnea, anosmia and gastrointestinal symptoms could help differential diagnosis. A wide range of medications were used for the inpatient management of pediatric COVID-19.

View details for DOI 10.1101/2020.10.29.20222083

View details for PubMedID 33140074

View details for PubMedCentralID PMC7605587
Baseline phenotype and 30-day outcomes of people tested for COVID-19: an international network cohort including >3.32 million people tested with real-time PCR and >219,000 tested positive for SARS-CoV-2 in South Korea, Spain and the United States. medRxiv : the preprint server for health sciences Golozar, A., Lai, L. Y., Sena, A. G., Vizcaya, D., Schilling, L. M., Huser, V., Nyberg, F., Duvall, S. L., Morales, D. R., Alshammari, T. M., Abedtash, H., Ahmed, W. U., Alser, O., Alghoul, H., Zhang, Y., Gong, M., Guan, Y., Areia, C., Jonnagaddala, J., Shah, K., Lane, J. C., Prats-Uribe, A., Posada, J. D., Shah, N. H., Subbian, V., Zhang, L., Abrahão, M. T., Rijnbeek, P. R., You, S. C., Casajust, P., Roel, E., Recalde, M., Fernández-Bertolín, S., Andryc, A., Thomas, J. A., Wilcox, A. B., Fortin, S., Blacketer, C., DeFalco, F., Natarajan, K., Falconer, T., Spotnitz, M., Ostropolets, A., Hripcsak, G., Suchard, M., Lynch, K. E., Matheny, M. E., Williams, A., Reich, C., Duarte-Salles, T., Kostka, K., Ryan, P. B., Prieto-Alhambra, D. 2020

Abstract

Early identification of symptoms and comorbidities most predictive of COVID-19 is critical to identify infection, guide policies to effectively contain the pandemic, and improve health systems' response. Here, we characterised socio-demographics and comorbidity in 3,316,107persons tested and 219,072 persons tested positive for SARS-CoV-2 since January 2020, and their key health outcomes in the month following the first positive test. Routine care data from primary care electronic health records (EHR) from Spain, hospital EHR from the United States (US), and claims data from South Korea and the US were used. The majority of study participants were women aged 18-65 years old. Positive/tested ratio varied greatly geographically (2.2:100 to 31.2:100) and over time (from 50:100 in February-April to 6.8:100 in May-June). Fever, cough and dyspnoea were the most common symptoms at presentation. Between 4%-38% required admission and 1-10.5% died within a month from their first positive test. Observed disparity in testing practices led to variable baseline characteristics and outcomes, both nationally (US) and internationally. Our findings highlight the importance of large scale characterization of COVID-19 international cohorts to inform planning and resource allocation including testing as countries face a second wave.

View details for DOI 10.1101/2020.10.25.20218875

View details for PubMedID 33140068

View details for PubMedCentralID PMC7605581
Deep phenotyping of 34,128 adult patients hospitalised with COVID-19 in an international network study. Nature communications Burn, E., You, S. C., Sena, A. G., Kostka, K., Abedtash, H., Abrahao, M. T., Alberga, A., Alghoul, H., Alser, O., Alshammari, T. M., Aragon, M., Areia, C., Banda, J. M., Cho, J., Culhane, A. C., Davydov, A., DeFalco, F. J., Duarte-Salles, T., DuVall, S., Falconer, T., Fernandez-Bertolin, S., Gao, W., Golozar, A., Hardin, J., Hripcsak, G., Huser, V., Jeon, H., Jing, Y., Jung, C. Y., Kaas-Hansen, B. S., Kaduk, D., Kent, S., Kim, Y., Kolovos, S., Lane, J. C., Lee, H., Lynch, K. E., Makadia, R., Matheny, M. E., Mehta, P. P., Morales, D. R., Natarajan, K., Nyberg, F., Ostropolets, A., Park, R. W., Park, J., Posada, J. D., Prats-Uribe, A., Rao, G., Reich, C., Rho, Y., Rijnbeek, P., Schilling, L. M., Schuemie, M., Shah, N. H., Shoaibi, A., Song, S., Spotnitz, M., Suchard, M. A., Swerdel, J. N., Vizcaya, D., Volpe, S., Wen, H., Williams, A. E., Yimer, B. B., Zhang, L., Zhuk, O., Prieto-Alhambra, D., Ryan, P. 2020; 11 (1): 5009

Abstract

Comorbid conditions appear to be common among individuals hospitalised with coronavirus disease 2019 (COVID-19) but estimates of prevalence vary and little is known about the prior medication use of patients. Here, we describe the characteristics of adults hospitalised with COVID-19 and compare them with influenza patients. We include 34,128 (US: 8362, South Korea: 7341, Spain: 18,425) COVID-19 patients, summarising between 4811 and 11,643 unique aggregate characteristics. COVID-19 patients have been majority male in the US and Spain, but predominantly female in South Korea. Age profiles vary across data sources. Compared to 84,585 individuals hospitalised with influenza in 2014-19, COVID-19 patients have more typically been male, younger, and with fewer comorbidities and lower medication use. While protecting groups vulnerable to influenza is likely a useful starting point in the response to COVID-19, strategies will likely need to be broadened to reflect the particular characteristics of individuals being hospitalised with COVID-19.

View details for DOI 10.1038/s41467-020-18849-z

View details for PubMedID 33024121
Developing a delivery science for artificial intelligence in healthcare. NPJ digital medicine Li, R. C., Asch, S. M., Shah, N. H. 2020; 3 (1): 107

View details for DOI 10.1038/s41746-020-00318-y

View details for PubMedID 33597602
SARS-CoV-2 Antibody Responses Correlate with Resolution of RNAemia But Are Short-Lived in Patients with Mild Illness. medRxiv : the preprint server for health sciences Röltgen, K., Wirz, O. F., Stevens, B. A., Powell, A. E., Hogan, C. A., Najeeb, J., Hunter, M., Sahoo, M. K., Huang, C., Yamamoto, F., Manalac, J., Otrelo-Cardoso, A. R., Pham, T. D., Rustagi, A., Rogers, A. J., Shah, N. H., Blish, C. A., Cochran, J. R., Nadeau, K. C., Jardetzky, T. S., Zehnder, J. L., Wang, T. T., Kim, P. S., Gombar, S., Tibshirani, R., Pinsky, B. A., Boyd, S. D. 2020

Abstract

SARS-CoV-2-specific antibodies, particularly those preventing viral spike receptor binding domain (RBD) interaction with host angiotensin-converting enzyme 2 (ACE2) receptor, could offer protective immunity, and may affect clinical outcomes of COVID-19 patients. We analyzed 625 serial plasma samples from 40 hospitalized COVID-19 patients and 170 SARS-CoV-2-infected outpatients and asymptomatic individuals. Severely ill patients developed significantly higher SARS-CoV-2-specific antibody responses than outpatients and asymptomatic individuals. The development of plasma antibodies was correlated with decreases in viral RNAemia, consistent with potential humoral immune clearance of virus. Using a novel competition ELISA, we detected antibodies blocking RBD-ACE2 interactions in 68% of inpatients and 40% of outpatients tested. Cross-reactive antibodies recognizing SARS-CoV RBD were found almost exclusively in hospitalized patients. Outpatient and asymptomatic individuals' serological responses to SARS-CoV-2 decreased within 2 months, suggesting that humoral protection may be short-lived.

View details for DOI 10.1101/2020.08.15.20175794

View details for PubMedID 32839786

View details for PubMedCentralID PMC7444305
Artificial Intelligence and Suicide Prevention: A Systematic Review of Machine Learning Investigations. International journal of environmental research and public health Bernert, R. A., Hilberg, A. M., Melia, R., Kim, J. P., Shah, N. H., Abnousi, F. 2020; 17 (16)

Abstract

Suicide is a leading cause of death that defies prediction and challenges prevention efforts worldwide. Artificial intelligence (AI) and machine learning (ML) have emerged as a means of investigating large datasets to enhance risk detection. A systematic review of ML investigations evaluating suicidal behaviors was conducted using PubMed/MEDLINE, PsychInfo, Web-of-Science, and EMBASE, employing search strings and MeSH terms relevant to suicide and AI. Databases were supplemented by hand-search techniques and Google Scholar. Inclusion criteria: (1) journal article, available in English, (2) original investigation, (3) employment of AI/ML, (4) evaluation of a suicide risk outcome. N = 594 records were identified based on abstract search, and 25 hand-searched reports. N = 461 reports remained after duplicates were removed, n = 316 were excluded after abstract screening. Of n = 149 full-text articles assessed for eligibility, n = 87 were included for quantitative synthesis, grouped according to suicide behavior outcome. Reports varied widely in methodology and outcomes. Results suggest high levels of risk classification accuracy (>90%) and Area Under the Curve (AUC) in the prediction of suicidal behaviors. We report key findings and central limitations in the use of AI/ML frameworks to guide additional research, which hold the potential to impact suicide on broad scale.

View details for DOI 10.3390/ijerph17165929

View details for PubMedID 32824149
Trove: Ontology-driven weak supervision for medical entity classification. ArXiv Fries, J. A., Steinberg, E., Khattar, S., Fleming, S. L., Posada, J., Callahan, A., Shah, N. H. 2020

Abstract

Recognizing named entities (NER) and their associated attributes like negation are core tasks in natural language processing. However, manually labeling data for entity tasks is time consuming and expensive, creating barriers to using machine learning in new medical applications. Weakly supervised learning, which automatically builds imperfect training sets from low cost, less accurate labeling rules, offers a potential solution. Medical ontologies are compelling sources for generating labels, however combining multiple ontologies without ground truth data creates challenges due to label noise introduced by conflicting entity definitions. Key questions remain on the extent to which weakly supervised entity classification can be automated using ontologies, or how much additional task-specific rule engineering is required for state-of-the-art performance. Also unclear is how pre-trained language models, such as BioBERT, improve the ability to generalize from imperfectly labeled data.We present Trove, a framework for weakly supervised entity classification using medical ontologies. We report state-of-the-art, weakly supervised performance on two NER benchmark datasets and establish new baselines for two entity classification tasks in clinical text. We perform within an average of 3.5 F1 points (4.2%) of NER classifiers trained with hand-labeled data. Automatically learning label source accuracies to correct for label noise provided an average improvement of 3.9 F1 points. BioBERT provided an average improvement of 0.9 F1 points. We measure the impact of combining large numbers of ontologies and present a case study on rapidly building classifiers for COVID-19 clinical tasks. Our framework demonstrates how a wide range of medical entity classifiers can be quickly constructed using weak supervision and without requiring manually-labeled training data.

View details for PubMedID 32793768

View details for PubMedCentralID PMC7418750
Estimating the efficacy of symptom-based screening for COVID-19. NPJ digital medicine Callahan, A., Steinberg, E., Fries, J. A., Gombar, S., Patel, B., Corbin, C. K., Shah, N. H. 2020; 3 (1): 95

Abstract

There is substantial interest in using presenting symptoms to prioritize testing for COVID-19 and establish symptom-based surveillance. However, little is currently known about the specificity of COVID-19 symptoms. To assess the feasibility of symptom-based screening for COVID-19, we used data from tests for common respiratory viruses and SARS-CoV-2 in our health system to measure the ability to correctly classify virus test results based on presenting symptoms. Based on these results, symptom-based screening may not be an effective strategy to identify individuals who should be tested for SARS-CoV-2 infection or to obtain a leading indicator of new COVID-19 cases.

View details for DOI 10.1038/s41746-020-0300-0

View details for PubMedID 33597700
Comparative safety and effectiveness of alendronate versus raloxifene in women with osteoporosis. Scientific reports Kim, Y., Tian, Y., Yang, J., Huser, V., Jin, P., Lambert, C. G., Park, H., You, S. C., Park, R. W., Rijnbeek, P. R., Van Zandt, M., Reich, C., Vashisht, R., Wu, Y., Duke, J., Hripcsak, G., Madigan, D., Shah, N. H., Ryan, P. B., Schuemie, M. J., Suchard, M. A. 2020; 10 (1): 11115

Abstract

Alendronate and raloxifene are among the most popular anti-osteoporosis medications. However, there is a lack of head-to-head comparative effectiveness studies comparing the two treatments. We conducted a retrospective large-scale multicenter study encompassing over 300 million patients across nine databases encoded in the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). The primary outcome was the incidence of osteoporotic hip fracture, while secondary outcomes were vertebral fracture, atypical femoral fracture (AFF), osteonecrosis of the jaw (ONJ), and esophageal cancer. We used propensity score trimming and stratification based on an expansive propensity score model with all pre-treatment patient characteritistcs. We accounted for unmeasured confounding using negative control outcomes to estimate and adjust for residual systematic bias in each data source. We identified 283,586 alendronate patients and 40,463 raloxifene patients. There were 7.48 hip fracture, 8.18 vertebral fracture, 1.14 AFF, 0.21 esophageal cancer and 0.09 ONJ events per 1,000 person-years in the alendronate cohort and 6.62, 7.36, 0.69, 0.22 and 0.06 events per 1,000 person-years, respectively, in the raloxifene cohort. Alendronate and raloxifene have a similar hip fracture risk (hazard ratio [HR] 1.03, 95% confidence interval [CI] 0.94-1.13), but alendronate users are more likely to have vertebral fractures (HR 1.07, 95% CI 1.01-1.14). Alendronate has higher risk for AFF (HR 1.51, 95% CI 1.23-1.84) but similar risk for esophageal cancer (HR 0.95, 95% CI 0.53-1.70), and ONJ (HR 1.62, 95% CI 0.78-3.34). We demonstrated substantial control of measured confounding by propensity score adjustment, and minimal residual systematic bias through negative control experiments, lending credibility to our effect estimates. Raloxifene is as effective as alendronate and may remain an option in the prevention of osteoporotic fracture.

View details for DOI 10.1038/s41598-020-68037-8

View details for PubMedID 32632237
Toward Automated Detection of Peripheral Artery Disease Using Electronic Health Records Vy Thuy Ho, Leeper, N., Shah, N., Ross, E. MOSBY-ELSEVIER. 2020: E41

View details for Web of Science ID 000544100700060
MINIMAR (MINimum Information for Medical AI Reporting): Developing reporting standards for artificial intelligence in health care. Journal of the American Medical Informatics Association : JAMIA Hernandez-Boussard, T., Bozkurt, S., Ioannidis, J. P., Shah, N. H. 2020

Abstract

The rise of digital data and computing power have contributed to significant advancements in artificial intelligence (AI), leading to the use of classification and prediction models in health care to enhance clinical decision-making for diagnosis, treatment and prognosis. However, such advances are limited by the lack of reporting standards for the data used to develop those models, the model architecture, and the model evaluation and validation processes. Here, we present MINIMAR (MINimum Information for Medical AI Reporting), a proposal describing the minimum information necessary to understand intended predictions, target populations, and hidden biases, and the ability to generalize these emerging technologies. We call for a standard to accurately and responsibly report on AI in health care. This will facilitate the design and implementation of these models and promote the development and use of associated clinical decision support tools, as well as manage concerns regarding accuracy and bias.

View details for DOI 10.1093/jamia/ocaa088

View details for PubMedID 32594179
Measure what matters: Counts of hospitalized patients are a better metric for health system capacity planning for a reopening. Journal of the American Medical Informatics Association : JAMIA Kashyap, S., Gombar, S., Yadlowsky, S., Callahan, A., Fries, J., Pinsky, B. A., Shah, N. H. 2020

Abstract

OBJECTIVE: Responding to the COVID-19 pandemic requires accurate forecasting of health system capacity requirements using readily available inputs. We examined whether testing and hospitalization data could help quantify the anticipated burden on the health system given shelter-in-place (SIP) order.MATERIALS AND METHODS: 16,103 SARS-CoV-2 RT-PCR tests were performed on 15,807 patients at Stanford facilities between March 2 and April 11, 2020. We analyzed the fraction of tested patients that were confirmed positive for COVID-19, the fraction of those needing hospitalization, and the fraction requiring ICU admission over the 40 days between March 2nd and April 11th 2020.RESULTS: We find a marked slowdown in the hospitalization rate within ten days of SIP even as cases continued to rise. We also find a shift towards younger patients in the age distribution of those testing positive for COVID-19 over the four weeks of SIP. The impact of this shift is a divergence between increasing positive case confirmations and slowing new hospitalizations, both of which affects the demand on health systems.CONCLUSION: Without using local hospitalization rates and the age distribution of positive patients, current models are likely to overestimate the resource burden of COVID-19. It is imperative that health systems start using these data to quantify effects of SIP and aid reopening planning.

View details for DOI 10.1093/jamia/ocaa076

View details for PubMedID 32548636
Assessing the accuracy of automatic speech recognition for psychotherapy NPJ DIGITAL MEDICINE Miner, A. S., Haque, A., Fries, J. A., Fleming, S. L., Wilfley, D. E., Wilson, G., Milstein, A., Jurafsky, D., Arnow, B. A., Agras, W., Li Fei-Fei, Shah, N. H. 2020; 3 (1)

View details for DOI 10.1038/s41746-020-0285-8

View details for Web of Science ID 000537719700001
Assessing the accuracy of automatic speech recognition for psychotherapy. NPJ digital medicine Miner, A. S., Haque, A., Fries, J. A., Fleming, S. L., Wilfley, D. E., Terence Wilson, G., Milstein, A., Jurafsky, D., Arnow, B. A., Stewart Agras, W., Fei-Fei, L., Shah, N. H. 2020; 3: 82

Abstract

Accurate transcription of audio recordings in psychotherapy would improve therapy effectiveness, clinician training, and safety monitoring. Although automatic speech recognition software is commercially available, its accuracy in mental health settings has not been well described. It is unclear which metrics and thresholds are appropriate for different clinical use cases, which may range from population descriptions to individual safety monitoring. Here we show that automatic speech recognition is feasible in psychotherapy, but further improvements in accuracy are needed before widespread use. Our HIPAA-compliant automatic speech recognition system demonstrated a transcription word error rate of 25%. For depression-related utterances, sensitivity was 80% and positive predictive value was 83%. For clinician-identified harm-related sentences, the word error rate was 34%. These results suggest that automatic speech recognition may support understanding of language patterns and subgroup variation in existing treatments but may not be ready for individual-level safety surveillance.

View details for DOI 10.1038/s41746-020-0285-8

View details for PubMedID 32550644

View details for PubMedCentralID PMC7270106
Research and Reporting Considerations for Observational Studies Using Electronic Health Record Data. Annals of internal medicine Callahan, A., Shah, N. H., Chen, J. H. 2020; 172 (11_Supplement): S79–S84

Abstract

Electronic health records (EHRs) are an increasingly important source of real-world health care data for observational research. Analyses of data collected for purposes other than research require careful consideration of data quality as well as the general research and reporting principles relevant to observational studies. The core principles for observational research in general also apply to observational research using EHR data, and these are well addressed in prior literature and guidelines. This article provides additional recommendations for EHR-based research. Considerations unique to EHR-based studies include assessment of the accuracy of computer-executable cohort definitions that can incorporate unstructured data from clinical notes and management of data challenges, such as irregular sampling, missingness, and variation across time and place. Principled application of existing research and reporting guidelines alongside these additional considerations will improve the quality of EHR-based observational studies.

View details for DOI 10.7326/M19-0873

View details for PubMedID 32479175
Persistent detection of SARS-CoV-2 RNA in patients and healthcare workers with COVID-19. Journal of clinical virology : the official publication of the Pan American Society for Clinical Virology Gombar, S., Chang, M., Hogan, C. A., Zehnder, J., Boyd, S., Pinsky, B. A., Shah, N. H. 2020; 129: 104477

Abstract

BACKGROUND: Current guidelines for returning health care workers (HCW) to service after a positive SARS-CoV-2 RT-PCR test and ceasing of transmission precautions for patients is based on two general strategies. A test-based strategy that requires negative respiratory RT-PCR tests obtained after the resolution of symptoms. Alternatively, due to the limited availability of testing, many sites employ a symptom-based strategy that recommends excluding HCW from the workforce and keeping patients on contact precautions until a fixed period of time has elapsed from symptom recovery. The underlying assumption of the symptom-based strategy is that waiting for a fixed period of time is a surrogate for negative RT-PCR testing, which itself is a surrogate for the absence of shedding infectious virus.OBJECTIVES: To better understand the appropriate length of symptom based return to work and contact precaution strategies.STUDY DESIGN: We performed an observational analysis of 150 patients and HCW that transitioned from RT-PCR SARS-CoV-2 positive to negative over the course of 2 months at a US academic medical center.RESULTS: We found that the average time to transition from RT-PCR positive to negative was 24 days after symptom onset and 10 % remained positive even 33 days after symptom onset. No difference was seen in HCW and patients.CONCLUSIONS: These findings suggest until definitive evidence of the length of infective viral shedding is obtained that the fixed length of time before returning to work or ceasing contract precautions be revised to over one-month.

View details for DOI 10.1016/j.jcv.2020.104477

View details for PubMedID 32505778
Linking insurance claims across time to characterize treatment, monitoring, and end-of-life care in metastatic breast cancer. Caswell-Jin, J., Callahan, A., Purington, N., Han, S. S., Itakura, H., Sledge, G. W., Shah, N., Kurian, A. W. AMER SOC CLINICAL ONCOLOGY. 2020

View details for Web of Science ID 000560368303141
Occurrence and Timing of Subsequent SARS-CoV-2 RT-PCR Positivity Among Initially Negative Patients. medRxiv : the preprint server for health sciences Long, D. R., Gombar, S., Hogan, C. A., Greninger, A. L., OReilly Shah, V., Bryson-Cahn, C., Stevens, B., Rustagi, A., Jerome, K. R., Kong, C. S., Zehnder, J., Shah, N. H., Weiss, N. S., Pinsky, B. A., Sunshine, J. 2020

Abstract

BACKGROUND: SARS-CoV-2 reverse transcriptase polymerase chain reaction (RT-PCR) testing remains the cornerstone of laboratory-based identification of patients with COVID-19. As the availability and speed of SARS-CoV-2 testing platforms improve, results are increasingly relied upon to inform critical decisions related to therapy, use of personal protective equipment, and workforce readiness. However, early reports of RT-PCR test performance have left clinicians and the public with concerns regarding the reliability of this predominant testing modality and the interpretation of negative results. In this work, two independent research teams report the frequency of discordant SARS-CoV-2 test results among initially negative, repeatedly tested patients in regions of the United States with early community transmission and access to testing.METHODS: All patients at the University of Washington (UW) and Stanford Health Care undergoing initial testing by nasopharyngeal (NP) swab between March 2nd and April 7th, 2020 were included. SARS-CoV-2 RT-PCR was performed targeting the N, RdRp, S, and E genes and ORF1ab, using a combination of Emergency Use Authorization laboratory-developed tests and commercial assays. Results through April 14th were extracted to allow for a complete 7-day observation period and an additional day for reporting.RESULTS: A total of 23,126 SARS-CoV-2 RT-PCR tests (10,583 UW, 12,543 Stanford) were performed in 20,912 eligible patients (8,977 UW, 11,935 Stanford) undergoing initial testing by NP swab; 626 initially test-negative patients were re-tested within 7 days. Among this group, repeat testing within 7 days yielded a positive result in 3.5% (4.3% UW, 2.8% Stanford) of cases, suggesting an initial false negative RT-PCR result; the majority (96.5%) of patients with an initial negative result who warranted reevaluation for any reason remained negative on all subsequent tests performed within this window.CONCLUSIONS: Two independent research teams report the similar finding that, among initially negative patients subjected to repeat SARS-CoV-2 RT-PCR testing, the occurrence of a newly positive result within 7 days is uncommon. These observations suggest that false negative results at the time of initial presentation do occur, but potentially at a lower frequency than is currently believed. Although it is not possible to infer the clinical sensitivity of NP SARS-CoV-2 RT-PCR testing using these data, they may be used in combination with other reports to guide the use and interpretation of this common testing modality.

View details for DOI 10.1101/2020.05.03.20089151

View details for PubMedID 32511542
Development and validation of phenotype classifiers across multiple sites in the observational health data sciences and informatics network. Journal of the American Medical Informatics Association : JAMIA Kashyap, M., Seneviratne, M., Banda, J. M., Falconer, T., Ryu, B., Yoo, S., Hripcsak, G., Shah, N. H. 2020

Abstract

OBJECTIVE: Accurate electronic phenotyping is essential to support collaborative observational research. Supervised machine learning methods can be used to train phenotype classifiers in a high-throughput manner using imperfectly labeled data. We developed 10 phenotype classifiers using this approach and evaluated performance across multiple sites within the Observational Health Data Sciences and Informatics (OHDSI) network.MATERIALS AND METHODS: We constructed classifiers using the Automated PHenotype Routine for Observational Definition, Identification, Training and Evaluation (APHRODITE) R-package, an open-source framework for learning phenotype classifiers using datasets in the Observational Medical Outcomes Partnership Common Data Model. We labeled training data based on the presence of multiple mentions of disease-specific codes. Performance was evaluated on cohorts derived using rule-based definitions and real-world disease prevalence. Classifiers were developed and evaluated across 3 medical centers, including 1 international site.RESULTS: Compared to the multiple mentions labeling heuristic, classifiers showed a mean recall boost of 0.43 with a mean precision loss of 0.17. Performance decreased slightly when classifiers were shared across medical centers, with mean recall and precision decreasing by 0.08 and 0.01, respectively, at a site within the USA, and by 0.18 and 0.10, respectively, at an international site.DISCUSSION AND CONCLUSION: We demonstrate a high-throughput pipeline for constructing and sharing phenotype classifiers across sites within the OHDSI network using APHRODITE. Classifiers exhibit good portability between sites within the USA, however limited portability internationally, indicating that classifier generalizability may have geographic limitations, and, consequently, sharing the classifier-building recipe, rather than the pretrained classifiers, may be more useful for facilitating collaborative observational research.

View details for DOI 10.1093/jamia/ocaa032

View details for PubMedID 32374408
Deep Phenotyping: Embracing Complexity and Temporality-Towards Scalability, Portability, and Interoperability. Journal of biomedical informatics Weng, C., Shah, N., Hripcsak, G. 2020: 103433

View details for DOI 10.1016/j.jbi.2020.103433

View details for PubMedID 32335224
Rates of Co-infection Between SARS-CoV-2 and Other Respiratory Pathogens. JAMA Kim, D., Quinn, J., Pinsky, B., Shah, N. H., Brown, I. 2020

View details for DOI 10.1001/jama.2020.6266

View details for PubMedID 32293646
Bridging the implementation gap of machine learning in healthcare BMJ INNOVATIONS Seneviratne, M. G., Shah, N. H., Chu, L. 2020; 6 (2): 45-47

View details for DOI 10.1136/bmjinnov-2019-000359

View details for Web of Science ID 000850799300001
Estimate the hidden deployment cost of predictive models to improve patient care. Nature medicine Morse, K. E., Bagely, S. C., Shah, N. H. 2020; 26 (1): 18–19

View details for DOI 10.1038/s41591-019-0651-8

View details for PubMedID 31932778
Characteristics, outcomes, and mortality amongst 133,589 patients with prevalent autoimmune diseases diagnosed with, and 48,418 hospitalised for COVID-19: a multinational distributed network cohort analysis. medRxiv : the preprint server for health sciences Tan, E. H., Sena, A. G., Prats-Uribe, A., You, S. C., Ahmed, W. U., Kostka, K., Reich, C., Duvall, S. L., Lynch, K. E., Matheny, M. E., Duarte-Salles, T., Bertolin, S. F., Hripcsak, G., Natarajan, K., Falconer, T., Spotnitz, M., Ostropolets, A., Blacketer, C., Alshammari, T. M., Alghoul, H., Alser, O., Lane, J. C., Dawoud, D. M., Shah, K., Yang, Y., Zhang, L., Areia, C., Golozar, A., Relcade, M., Casajust, P., Jonnagaddala, J., Subbian, V., Vizcaya, D., Lai, L. Y., Nyberg, F., Morales, D. R., Posada, J. D., Shah, N. H., Gong, M., Vivekanantham, A., Abend, A., Minty, E. P., Suchard, M., Rijnbeek, P., Ryan, P. B., Prieto-Alhambra, D. 2020

Abstract

Patients with autoimmune diseases were advised to shield to avoid COVID-19, but information on their prognosis is lacking. We characterised 30-day outcomes and mortality after hospitalisation with COVID-19 among patients with prevalent autoimmune diseases, and compared outcomes after hospital admissions among similar patients with seasonal influenza.Multinational network cohort study.Electronic health records data from Columbia University Irving Medical Center (CUIMC) (NYC, United States [US]), Optum [US], Department of Veterans Affairs (VA) (US), Information System for Research in Primary Care-Hospitalisation Linked Data (SIDIAP-H) (Spain), and claims data from IQVIA Open Claims (US) and Health Insurance and Review Assessment (HIRA) (South Korea).All patients with prevalent autoimmune diseases, diagnosed and/or hospitalised between January and June 2020 with COVID-19, and similar patients hospitalised with influenza in 2017-2018 were included.30-day complications during hospitalisation and death.We studied 133,589 patients diagnosed and 48,418 hospitalised with COVID-19 with prevalent autoimmune diseases. The majority of participants were female (60.5% to 65.9%) and aged ≥50 years. The most prevalent autoimmune conditions were psoriasis (3.5 to 32.5%), rheumatoid arthritis (3.9 to 18.9%), and vasculitis (3.3 to 17.6%). Amongst hospitalised patients, Type 1 diabetes was the most common autoimmune condition (4.8% to 7.5%) in US databases, rheumatoid arthritis in HIRA (18.9%), and psoriasis in SIDIAP-H (26.4%).Compared to 70,660 hospitalised with influenza, those admitted with COVID-19 had more respiratory complications including pneumonia and acute respiratory distress syndrome, and higher 30-day mortality (2.2% to 4.3% versus 6.3% to 24.6%).Patients with autoimmune diseases had high rates of respiratory complications and 30-day mortality following a hospitalization with COVID-19. Compared to influenza, COVID-19 is a more severe disease, leading to more complications and higher mortality. Future studies should investigate predictors of poor outcomes in COVID-19 patients with autoimmune diseases.Patients with autoimmune conditions may be at increased risk of COVID-19 infection andcomplications.There is a paucity of evidence characterising the outcomes of hospitalised COVID-19 patients with prevalent autoimmune conditions.Most people with autoimmune diseases who required hospitalisation for COVID-19 were women, aged 50 years or older, and had substantial previous comorbidities.Patients who were hospitalised with COVID-19 and had prevalent autoimmune diseases had higher prevalence of hypertension, chronic kidney disease, heart disease, and Type 2 diabetes as compared to those with prevalent autoimmune diseases who were diagnosed with COVID-19.A variable proportion of 6% to 25% across data sources died within one month of hospitalisation with COVID-19 and prevalent autoimmune diseases.For people with autoimmune diseases, COVID-19 hospitalisation was associated with worse outcomes and 30-day mortality compared to admission with influenza in the 2017-2018 season.

View details for DOI 10.1101/2020.11.24.20236802

View details for PubMedID 33269355

View details for PubMedCentralID PMC7709171
Countdown Regression: Sharp and Calibrated Survival Predictions Avati, A., Duan, T., Zhou, S., Jung, K., Shah, N. H., Ng, A. Y. edited by Adams, R. P., Gogate JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2020: 145-155

View details for Web of Science ID 000722423500013
Normalizing Clinical Document Titles to LOINC Document Ontology: an Initial Study. AMIA ... Annual Symposium proceedings. AMIA Symposium Zuo, X., Li, J., Zhao, B., Zhou, Y., Dong, X., Duke, J., Natarajan, K., Hripcsak, G., Shah, N., Banda, J. M., Reeves, R., Miller, T., Xu, H. 2020; 2020: 1441–50

Abstract

The normalization of clinical documents is essential for health information management with the enormous amount of clinical documentation generated each year. The LOINC Document Ontology (DO) is a universal clinical document standard in a hierarchical structure. The objective of this study is to investigate the feasibility and generalizability of LOINC DO by mapping from clinical note titles across five institutions to five DO axes. We first developed an annotation framework based on the definition of LOINC DO axes and manually mapped 4,000 titles. Then we introduced a pre-trained deep learning model named Bidirectional Encoder Representations from Transformers (BERT) to enable automatic mapping from titles to LOINC DO axes. The results showed that the BERT-based automatic mapping achieved improved performance compared with the baseline model. By analyzing both manual annotations and predicted results, ambiguities in LOINC DO axes definition were discussed.

View details for PubMedID 33936520
Treatment Patterns for Chronic Comorbid Conditions in Patients With Cancer Using a Large-Scale Observational Data Network. JCO clinical cancer informatics Chen, R. n., Ryan, P. n., Natarajan, K. n., Falconer, T. n., Crew, K. D., Reich, C. G., Vashisht, R. n., Randhawa, G. n., Shah, N. H., Hripcsak, G. n. 2020; 4: 171–83

Abstract

Patients with cancer are predisposed to developing chronic, comorbid conditions that affect prognosis, quality of life, and mortality. While treatment guidelines and care variations for these comorbidities have been described for the general noncancer population, less is known about real-world treatment patterns in patients with cancer. We sought to characterize the prevalence and distribution of initial treatment patterns across a large-scale data network for depression, hypertension, and type II diabetes mellitus (T2DM) among patients with cancer.We used the Observational Health Data Sciences and Informatics network, an international collaborative implementing the Observational Medical Outcomes Partnership Common Data Model to standardize more than 2 billion patient records. For this study, we used 8 databases across 3 countries-the United States, France, and Germany-with 295,529,655 patient records. We identified patients with cancer using SNOMED (Systematized Nomenclature of Medicine) codes validated via manual review. We then characterized the treatment patterns of these patients initiating treatment of depression, hypertension, or T2DM with persistent treatment and at least 365 days of observation.Across databases, wide variations exist in treatment patterns for depression (n = 1,145,510), hypertension (n = 3,178,944), and T2DM (n = 886,766). When limited to 6-node (6-drug) sequences, we identified 61,052 unique sequences for depression, 346,067 sequences for hypertension, and 40,629 sequences for T2DM. These variations persisted across sites, databases, countries, and conditions, with the exception of metformin (73.8%) being the most common initial T2DM treatment. The most common initial medications were sertraline (17.5%) and escitalopram (17.5%) for depression and hydrochlorothiazide (20.5%) and lisinopril (19.6%) for hypertension.We identified wide variations in the treatment of common comorbidities in patients with cancer, similar to the general population, and demonstrate the feasibility of conducting research on patients with cancer across a large-scale observational data network using a common data model.

View details for DOI 10.1200/CCI.19.00107

View details for PubMedID 32134687
A predictive tool for identification of SARS-CoV-2 PCR-negative emergency department patients using routine test results. Journal of clinical virology : the official publication of the Pan American Society for Clinical Virology Joshi, R. P., Pejaver, V. n., Hammarlund, N. E., Sung, H. n., Lee, S. K., Furmanchuk, A. n., Lee, H. Y., Scott, G. n., Gombar, S. n., Shah, N. n., Shen, S. n., Nassiri, A. n., Schneider, D. n., Ahmad, F. S., Liebovitz, D. n., Kho, A. n., Mooney, S. n., Pinsky, B. A., Banaei, N. n. 2020; 129: 104502

Abstract

Testing for COVID-19 remains limited in the United States and across the world. Poor allocation of limited testing resources leads to misutilization of health system resources, which complementary rapid testing tools could ameliorate.To predict SARS-CoV-2 PCR positivity based on complete blood count components and patient sex.A retrospective case-control design for collection of data and a logistic regression prediction model was used. Participants were emergency department patients > 18 years old who had concurrent complete blood counts and SARS-CoV-2 PCR testing. 33 confirmed SARS-CoV-2 PCR positive and 357 negative patients at Stanford Health Care were used for model training. Validation cohorts consisted of emergency department patients > 18 years old who had concurrent complete blood counts and SARS-CoV-2 PCR testing in Northern California (41 PCR positive, 495 PCR negative), Seattle, Washington (40 PCR positive, 306 PCR negative), Chicago, Illinois (245 PCR positive, 1015 PCR negative), and South Korea (9 PCR positive, 236 PCR negative).A decision support tool that utilizes components of complete blood count and patient sex for prediction of SARS-CoV-2 PCR positivity demonstrated a C-statistic of 78 %, an optimized sensitivity of 93 %, and generalizability to other emergency department populations. By restricting PCR testing to predicted positive patients in a hypothetical scenario of 1000 patients requiring testing but testing resources limited to 60 % of patients, this tool would allow a 33 % increase in properly allocated resources.A prediction tool based on complete blood count results can better allocate SARS-CoV-2 testing and other health care resources such as personal protective equipment during a pandemic surge.

View details for DOI 10.1016/j.jcv.2020.104502

View details for PubMedID 32544861
An international characterisation of patients hospitalised with COVID-19 and a comparison with those previously hospitalised with influenza. medRxiv : the preprint server for health sciences Burn, E. n., You, S. C., Sena, A. G., Kostka, K. n., Abedtash, H. n., Abrahão, M. T., Alberga, A. n., Alghoul, H. n., Alser, O. n., Alshammari, T. M., Areia, C. n., Banda, J. M., Cho, J. n., Culhane, A. C., Davydov, A. n., DeFalco, F. J., Duarte-Salles, T. n., DuVall, S. n., Falconer, T. n., Gao, W. n., Golozar, A. n., Hardin, J. n., Hripcsak, G. n., Huser, V. n., Jeon, H. n., Jing, Y. n., Jung, C. Y., Kaas-Hansen, B. S., Kaduk, D. n., Kent, S. n., Kim, Y. n., Kolovos, S. n., Lane, J. C., Lee, H. n., Lynch, K. E., Makadia, R. n., Matheny, M. E., Mehta, P. n., Morales, D. R., Natarajan, K. n., Nyberg, F. n., Ostropolets, A. n., Park, R. W., Park, J. n., Posada, J. D., Prats-Uribe, A. n., Rao, G. n., Reich, C. n., Rho, Y. n., Rijnbeek, P. n., Sathappan, S. M., Schilling, L. M., Schuemie, M. n., Shah, N. H., Shoaibi, A. n., Song, S. n., Spotnitz, M. n., Suchard, M. A., Swerdel, J. N., Vizcaya, D. n., Volpe, S. n., Wen, H. n., Williams, A. E., Yimer, B. B., Zhang, L. n., Zhuk, O. n., Prieto-Alhambra, D. n., Ryan, P. n. 2020

Abstract

To better understand the profile of individuals with severe coronavirus disease 2019 (COVID-19), we characterised individuals hospitalised with COVID-19 and compared them to individuals previously hospitalised with influenza.We report the characteristics (demographics, prior conditions and medication use) of patients hospitalised with COVID-19 between December 2019 and April 2020 in the US (Columbia University Irving Medical Center [CUIMC], STAnford Medicine Research data Repository [STARR-OMOP], and the Department of Veterans Affairs [VA OMOP]) and Health Insurance Review & Assessment [HIRA] of South Korea. Patients hospitalised with COVID-19 were compared with patients previously hospitalised with influenza in 2014-19.6,806 (US: 1,634, South Korea: 5,172) individuals hospitalised with COVID-19 were included. Patients in the US were majority male (VA OMOP: 94%, STARR-OMOP: 57%, CUIMC: 52%), but were majority female in HIRA (56%). Age profiles varied across data sources. Prevalence of asthma ranged from 7% to 14%, diabetes from 18% to 43%, and hypertensive disorder from 22% to 70% across data sources, while between 9% and 39% were taking drugs acting on the renin-angiotensin system in the 30 days prior to their hospitalisation. Compared to 52,422 individuals hospitalised with influenza, patients admitted with COVID-19 were more likely male, younger, and, in the US, had fewer comorbidities and lower medication use.Rates of comorbidities and medication use are high among individuals hospitalised with COVID-19. However, COVID-19 patients are more likely to be male and appear to be younger and, in the US, generally healthier than those typically admitted with influenza.

View details for DOI 10.1101/2020.04.22.20074336

View details for PubMedID 32511443

View details for PubMedCentralID PMC7239064
The accuracy vs. coverage trade-off in patient-facing diagnosis models. AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science Kannan, A., Fries, J. A., Kramer, E., Chen, J. J., Shah, N., Amatriain, X. 2020; 2020: 298–307

Abstract

A third of adults in America use the Internet to diagnose medical concerns, and online symptom checkers are increasingly part of this process. These tools are powered by diagnosis models similar to clinical decision support systems, with the primary difference being the coverage of symptoms and diagnoses. To be useful to patients and physicians, these models must have high accuracy while covering a meaningful space of symptoms and diagnoses. To the best of our knowledge, this paper is the first in studying the trade-off between the coverage of the model and its performance for diagnosis. To this end, we learn diagnosis models with different coverage from EHR data. We find a 1% drop in top-3 accuracy for every 10 diseases added to the coverage. We also observe that complexity for these models does not affect performance, with linear models performing as well as neural networks.

View details for PubMedID 32477649
Author Correction: Estimate the hidden deployment cost of predictive models to improve patient care. Nature medicine Morse, K. E., Bagley, S. C., Shah, N. H. 2020

Abstract

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

View details for DOI 10.1038/s41591-020-0862-z

View details for PubMedID 32291415
Developing a delivery science for artificial intelligence in healthcare. NPJ digital medicine Li, R. C., Asch, S. M., Shah, N. H. 2020; 3: 107

Abstract

Artificial Intelligence (AI) has generated a large amount of excitement in healthcare, mostly driven by the emergence of increasingly accurate machine learning models. However, the promise of AI delivering scalable and sustained value for patient care in the real world setting has yet to be realized. In order to safely and effectively bring AI into use in healthcare, there needs to be a concerted effort around not just the creation, but also the delivery of AI. This AI "delivery science" will require a broader set of tools, such as design thinking, process improvement, and implementation science, as well as a broader definition of what AI will look like in practice, which includes not just machine learning models and their predictions, but also the new systems for care delivery that they enable. The careful design, implementation, and evaluation of these AI enabled systems will be important in the effort to understand how AI can improve healthcare.

View details for DOI 10.1038/s41746-020-00318-y

View details for PubMedID 32885053

View details for PubMedCentralID PMC7443141
Automated model versus treating physician for predicting survival time of patients with metastatic cancer. Journal of the American Medical Informatics Association : JAMIA Gensheimer, M. F., Aggarwal, S. n., Benson, K. R., Carter, J. N., Henry, A. S., Wood, D. J., Soltys, S. G., Hancock, S. n., Pollom, E. n., Shah, N. H., Chang, D. T. 2020

Abstract

Being able to predict a patient's life expectancy can help doctors and patients prioritize treatments and supportive care. For predicting life expectancy, physicians have been shown to outperform traditional models that use only a few predictor variables. It is possible that a machine learning model that uses many predictor variables and diverse data sources from the electronic medical record can improve on physicians' performance. For patients with metastatic cancer, we compared accuracy of life expectancy predictions by the treating physician, a machine learning model, and a traditional model.A machine learning model was trained using 14 600 metastatic cancer patients' data to predict each patient's distribution of survival time. Data sources included note text, laboratory values, and vital signs. From 2015-2016, 899 patients receiving radiotherapy for metastatic cancer were enrolled in a study in which their radiation oncologist estimated life expectancy. Survival predictions were also made by the machine learning model and a traditional model using only performance status. Performance was assessed with area under the curve for 1-year survival and calibration plots.The radiotherapy study included 1190 treatment courses in 899 patients. A total of 879 treatment courses in 685 patients were included in this analysis. Median overall survival was 11.7 months. Physicians, machine learning model, and traditional model had area under the curve for 1-year survival of 0.72 (95% CI 0.63-0.81), 0.77 (0.73-0.81), and 0.68 (0.65-0.71), respectively.The machine learning model's predictions were more accurate than those of the treating physician or a traditional model.

View details for DOI 10.1093/jamia/ocaa290

View details for PubMedID 33313792
Estimating the efficacy of symptom-based screening for COVID-19. NPJ digital medicine Callahan, A., Steinberg, E., Fries, J. A., Gombar, S., Patel, B., Corbin, C. K., Shah, N. H. 2020; 3: 95

Abstract

There is substantial interest in using presenting symptoms to prioritize testing for COVID-19 and establish symptom-based surveillance. However, little is currently known about the specificity of COVID-19 symptoms. To assess the feasibility of symptom-based screening for COVID-19, we used data from tests for common respiratory viruses and SARS-CoV-2 in our health system to measure the ability to correctly classify virus test results based on presenting symptoms. Based on these results, symptom-based screening may not be an effective strategy to identify individuals who should be tested for SARS-CoV-2 infection or to obtain a leading indicator of new COVID-19 cases.

View details for DOI 10.1038/s41746-020-0300-0

View details for PubMedID 32695885
Language models are an effective representation learning technique for electronic health record data. Journal of biomedical informatics Steinberg, E. n., Jung, K. n., Fries, J. A., Corbin, C. K., Pfohl, S. R., Shah, N. H. 2020: 103637

Abstract

Widespread adoption of electronic health records (EHRs) has fueled the development of using machine learning to build prediction models for various clinical outcomes. However, this process is often constrained by having a relatively small number of patient records for training the model. We demonstrate that using patient representation schemes inspired from techniques in natural language processing can increase the accuracy of clinical prediction models by transferring information learned from the entire patient population to the task of training a specific model, where only a subset of the population is relevant. Such patient representation schemes enable a 3.5% mean improvement in AUROC on five prediction tasks compared to standard baselines, with the average improvement rising to 19% when only a small number of patient records are available for training the clinical prediction model.

View details for DOI 10.1016/j.jbi.2020.103637

View details for PubMedID 33290879
Development and validation of a prognostic model predicting symptomatic hemorrhagic transformation in acute ischemic stroke at scale in the OHDSI network. PloS one Wang, Q., Reps, J. M., Kostka, K. F., Ryan, P. B., Zou, Y., Voss, E. A., Rijnbeek, P. R., Chen, R., Rao, G. A., Morgan Stewart, H., Williams, A. E., Williams, R. D., Van Zandt, M., Falconer, T., Fernandez-Chas, M., Vashisht, R., Pfohl, S. R., Shah, N. H., Kasthurirathne, S. N., You, S. C., Jiang, Q., Reich, C., Zhou, Y. 2020; 15 (1): e0226718

Abstract

BACKGROUND AND PURPOSE: Hemorrhagic transformation (HT) after cerebral infarction is a complex and multifactorial phenomenon in the acute stage of ischemic stroke, and often results in a poor prognosis. Thus, identifying risk factors and making an early prediction of HT in acute cerebral infarction contributes not only to the selections of therapeutic regimen but also, more importantly, to the improvement of prognosis of acute cerebral infarction. The purpose of this study was to develop and validate a model to predict a patient's risk of HT within 30 days of initial ischemic stroke.METHODS: We utilized a retrospective multicenter observational cohort study design to develop a Lasso Logistic Regression prediction model with a large, US Electronic Health Record dataset which structured to the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). To examine clinical transportability, the model was externally validated across 10 additional real-world healthcare datasets include EHR records for patients from America, Europe and Asia.RESULTS: In the database the model was developed, the target population cohort contained 621,178 patients with ischemic stroke, of which 5,624 patients had HT within 30 days following initial ischemic stroke. 612 risk predictors, including the distance a patient travels in an ambulance to get to care for a HT, were identified. An area under the receiver operating characteristic curve (AUC) of 0.75 was achieved in the internal validation of the risk model. External validation was performed across 10 databases totaling 5,515,508 patients with ischemic stroke, of which 86,401 patients had HT within 30 days following initial ischemic stroke. The mean external AUC was 0.71 and ranged between 0.60-0.78.CONCLUSIONS: A HT prognostic predict model was developed with Lasso Logistic Regression based on routinely collected EMR data. This model can identify patients who have a higher risk of HT than the population average with an AUC of 0.78. It shows the OMOP CDM is an appropriate data standard for EMR secondary use in clinical multicenter research for prognostic prediction model development and validation. In the future, combining this model with clinical information systems will assist clinicians to make the right therapy decision for patients with acute ischemic stroke.

View details for DOI 10.1371/journal.pone.0226718

View details for PubMedID 31910437
Assessing the accuracy of automatic speech recognition for psychotherapy. NPJ digital medicine Miner, A. S., Haque, A. n., Fries, J. A., Fleming, S. L., Wilfley, D. E., Terence Wilson, G. n., Milstein, A. n., Jurafsky, D. n., Arnow, B. A., Stewart Agras, W. n., Fei-Fei, L. n., Shah, N. H. 2020; 3 (1): 82

Abstract

Accurate transcription of audio recordings in psychotherapy would improve therapy effectiveness, clinician training, and safety monitoring. Although automatic speech recognition software is commercially available, its accuracy in mental health settings has not been well described. It is unclear which metrics and thresholds are appropriate for different clinical use cases, which may range from population descriptions to individual safety monitoring. Here we show that automatic speech recognition is feasible in psychotherapy, but further improvements in accuracy are needed before widespread use. Our HIPAA-compliant automatic speech recognition system demonstrated a transcription word error rate of 25%. For depression-related utterances, sensitivity was 80% and positive predictive value was 83%. For clinician-identified harm-related sentences, the word error rate was 34%. These results suggest that automatic speech recognition may support understanding of language patterns and subgroup variation in existing treatments but may not be ready for individual-level safety surveillance.

View details for DOI 10.1038/s41746-020-0285-8

View details for PubMedID 33597677
Occurrence and Timing of Subsequent SARS-CoV-2 RT-PCR Positivity Among Initially Negative Patients. Clinical infectious diseases : an official publication of the Infectious Diseases Society of America Long, D. R., Gombar, S. n., Hogan, C. A., Greninger, A. L., Shah, V. O., Bryson-Cahn, C. n., Stevens, B. n., Rustagi, A. n., Jerome, K. R., Kong, C. S., Zehnder, J. n., Shah, N. H., Weiss, N. S., Pinsky, B. A., Sunshine, J. n. 2020

Abstract

Using data for 20,912 patients from two large academic health systems, we analyzed the frequency of SARS-CoV-2 RT-PCR test-discordance among individuals initially testing negative by nasopharyngeal swab who were retested on clinical grounds within 7 days. The frequency of subsequent positivity within this window was 3.5% and similar across institutions.

View details for DOI 10.1093/cid/ciaa722

View details for PubMedID 32506118
Ethics of Using and Sharing Clinical Imaging Data for Artificial Intelligence: A Proposed Framework. Radiology Larson, D. B., Magnus, D. C., Lungren, M. P., Shah, N. H., Langlotz, C. P. 2020: 192536

Abstract

In this article, the authors propose an ethical framework for using and sharing clinical data for the development of artificial intelligence (AI) applications. The philosophical premise is as follows: when clinical data are used to provide care, the primary purpose for acquiring the data is fulfilled. At that point, clinical data should be treated as a form of public good, to be used for the benefit of future patients. In their 2013 article, Faden et al argued that all who participate in the health care system, including patients, have a moral obligation to contribute to improving that system. The authors extend that framework to questions surrounding the secondary use of clinical data for AI applications. Specifically, the authors propose that all individuals and entities with access to clinical data become data stewards, with fiduciary (or trust) responsibilities to patients to carefully safeguard patient privacy, and to the public to ensure that the data are made widely available for the development of knowledge and tools to benefit future patients. According to this framework, the authors maintain that it is unethical for providers to "sell" clinical data to other parties by granting access to clinical data, especially under exclusive arrangements, in exchange for monetary or in-kind payments that exceed costs. The authors also propose that patient consent is not required before the data are used for secondary purposes when obtaining such consent is prohibitively costly or burdensome, as long as mechanisms are in place to ensure that ethical standards are strictly followed. Rather than debate whether patients or provider organizations "own" the data, the authors propose that clinical data are not owned at all in the traditional sense, but rather that all who interact with or control the data have an obligation to ensure that the data are used for the benefit of future patients and society.

View details for DOI 10.1148/radiol.2020192536

View details for PubMedID 32208097
Defining the features and duration of antibody responses to SARS-CoV-2 infection associated with disease severity and outcome. Science immunology Röltgen, K. n., Powell, A. E., Wirz, O. F., Stevens, B. A., Hogan, C. A., Najeeb, J. n., Hunter, M. n., Wang, H. n., Sahoo, M. K., Huang, C. n., Yamamoto, F. n., Manohar, M. n., Manalac, J. n., Otrelo-Cardoso, A. R., Pham, T. D., Rustagi, A. n., Rogers, A. J., Shah, N. H., Blish, C. A., Cochran, J. R., Jardetzky, T. S., Zehnder, J. L., Wang, T. T., Narasimhan, B. n., Gombar, S. n., Tibshirani, R. n., Nadeau, K. C., Kim, P. S., Pinsky, B. A., Boyd, S. D. 2020; 5 (54)

Abstract

SARS-CoV-2-specific antibodies, particularly those preventing viral spike receptor binding domain (RBD) interaction with host angiotensin-converting enzyme 2 (ACE2) receptor, can neutralize the virus. It is, however, unknown which features of the serological response may affect clinical outcomes of COVID-19 patients. We analyzed 983 longitudinal plasma samples from 79 hospitalized COVID-19 patients and 175 SARS-CoV-2-infected outpatients and asymptomatic individuals. Within this cohort, 25 patients died of their illness. Higher ratios of IgG antibodies targeting S1 or RBD domains of spike compared to nucleocapsid antigen were seen in outpatients who had mild illness versus severely ill patients. Plasma antibody increases correlated with decreases in viral RNAemia, but antibody responses in acute illness were insufficient to predict inpatient outcomes. Pseudovirus neutralization assays and a scalable ELISA measuring antibodies blocking RBD-ACE2 interaction were well correlated with patient IgG titers to RBD. Outpatient and asymptomatic individuals' SARS-CoV-2 antibodies, including IgG, progressively decreased during observation up to five months post-infection.

View details for DOI 10.1126/sciimmunol.abe0240

View details for PubMedID 33288645
Precision screening for familial hypercholesterolaemia: a machine learning study applied to electronic health encounter data. The Lancet. Digital health Myers, K. D., Knowles, J. W., Staszak, D., Shapiro, M. D., Howard, W., Yadava, M., Zuzick, D., Williamson, L., Shah, N. H., Banda, J. M., Leader, J., Cromwell, W. C., Trautman, E., Murray, M. F., Baum, S. J., Myers, S., Gidding, S. S., Wilemon, K., Rader, D. J. 2019; 1 (8): e393-e402

Abstract

Cardiovascular outcomes for people with familial hypercholesterolaemia can be improved with diagnosis and medical management. However, 90% of individuals with familial hypercholesterolaemia remain undiagnosed in the USA. We aimed to accelerate early diagnosis and timely intervention for more than 1·3 million undiagnosed individuals with familial hypercholesterolaemia at high risk for early heart attacks and strokes by applying machine learning to large health-care encounter datasets.We trained the FIND FH machine learning model using deidentified health-care encounter data, including procedure and diagnostic codes, prescriptions, and laboratory findings, from 939 clinically diagnosed individuals with familial hypercholesterolaemia (395 of whom had a molecular diagnosis) and 83 136 individuals presumed free of familial hypercholesterolaemia, sampled from four US institutions. The model was then applied to a national health-care encounter database (170 million individuals) and an integrated health-care delivery system dataset (174 000 individuals). Individuals used in model training and those evaluated by the model were required to have at least one cardiovascular disease risk factor (eg, hypertension, hypercholesterolaemia, or hyperlipidemia). A Health Insurance Portability and Accountability Act of 1996-compliant programme was developed to allow providers to receive identification of individuals likely to have familial hypercholesterolaemia in their practice.Using a model with a measured precision (positive predictive value) of 0·85, recall (sensitivity) of 0·45, area under the precision-recall curve of 0·55, and area under the receiver operating characteristic curve of 0·89, we flagged 1 331 759 of 170 416 201 patients in the national database and 866 of 173 733 individuals in the health-care delivery system dataset as likely to have familial hypercholesterolaemia. Familial hypercholesterolaemia experts reviewed a sample of flagged individuals (45 from the national database and 103 from the health-care delivery system dataset) and applied clinical familial hypercholesterolaemia diagnostic criteria. Of those reviewed, 87% (95% Cl 73-100) in the national database and 77% (68-86) in the health-care delivery system dataset were categorised as having a high enough clinical suspicion of familial hypercholesterolaemia to warrant guideline-based clinical evaluation and treatment.The FIND FH model successfully scans large, diverse, and disparate health-care encounter databases to identify individuals with familial hypercholesterolaemia.The FH Foundation funded this study. Support was received from Amgen, Sanofi, and Regeneron.

View details for DOI 10.1016/S2589-7500(19)30150-5

View details for PubMedID 33323221
Precision screening for familial hypercholesterolaemia: a machine learning study applied to electronic health encounter data LANCET DIGITAL HEALTH Myers, K. D., Knowles, J. W., Staszak, D., Shapiro, M. D., Howard, W., Yadava, M., Zuzick, D., Williamson, L., Shah, N. H., Banda, J. M., Leader, J., Cromwell, W. C., Trautman, E., Murray, M. F., Baum, S. J., Myers, S., Gidding, S. S., Wilemon, K., Rader, D. J. 2019; 1 (8): E393–E402

View details for DOI 10.1016/S2589-7500(19)30150-5

View details for Web of Science ID 000525874100011
Profiling off-label prescriptions in cancer treatment using social health networks. JAMIA open Nikfarjam, A., Ransohoff, J. D., Callahan, A., Polony, V., Shah, N. H. 2019; 2 (3): 301–5

Abstract

Objectives: To investigate using patient posts in social media as a resource to profile off-label prescriptions of cancer drugs.Methods: We analyzed patient posts from the Inspire health forums (www.inspire.com) and extracted mentions of cancer drugs from the 14 most active cancer-type specific support groups. To quantify drug-disease associations, we calculated information component scores from the frequency of posts in each cancer-specific group with mentions of a given drug. We evaluated the results against three sources: manual review, Wolters-Kluwer Medi-span, and Truven MarketScan insurance claims.Results: We identified 279 frequently discussed and therefore highly associated drug-disease pairs from Inspire posts. Of these, 96 are FDA approved, 9 are known off-label uses, and 174 do not have records of known usage (potentially novel off-label uses). We achieved a mean average precision of 74.9% in identifying drug-disease pairs with a true indication association from patient posts and found consistent evidence in medical claims records. We achieved a recall of 69.2% in identifying known off-label drug uses (based on Wolters-Kluwer Medi-span) from patient posts.

View details for DOI 10.1093/jamiaopen/ooz025

View details for PubMedID 31709388
Development and Performance of the Pulmonary Embolism Result Forecast Model (PERFORM) for Computed Tomography Clinical Decision Support. JAMA network open Banerjee, I., Sofela, M., Yang, J., Chen, J. H., Shah, N. H., Ball, R., Mushlin, A. I., Desai, M., Bledsoe, J., Amrhein, T., Rubin, D. L., Zamanian, R., Lungren, M. P. 2019; 2 (8): e198719

Abstract

Importance: Pulmonary embolism (PE) is a life-threatening clinical problem, and computed tomographic imaging is the standard for diagnosis. Clinical decision support rules based on PE risk-scoring models have been developed to compute pretest probability but are underused and tend to underperform in practice, leading to persistent overuse of CT imaging for PE.Objective: To develop a machine learning model to generate a patient-specific risk score for PE by analyzing longitudinal clinical data as clinical decision support for patients referred for CT imaging for PE.Design, Setting, and Participants: In this diagnostic study, the proposed workflow for the machine learning model, the Pulmonary Embolism Result Forecast Model (PERFORM), transforms raw electronic medical record (EMR) data into temporal feature vectors and develops a decision analytical model targeted toward adult patients referred for CT imaging for PE. The model was tested on holdout patient EMR data from 2 large, academic medical practices. A total of 3397 annotated CT imaging examinations for PE from 3214 unique patients seen at Stanford University hospitals and clinics were used for training and validation. The models were externally validated on 240 unique patients seen at Duke University Medical Center. The comparison with clinical scoring systems was done on randomly selected 100 outpatient samples from Stanford University hospitals and clinics and 101 outpatient samples from Duke University Medical Center.Main Outcomes and Measures: Prediction performance of diagnosing acute PE was evaluated using ElasticNet, artificial neural networks, and other machine learning approaches on holdout data sets from both institutions, and performance of models was measured by area under the receiver operating characteristic curve (AUROC).Results: Of the 3214 patients included in the study, 1704 (53.0%) were women from Stanford University hospitals and clinics; mean (SD) age was 60.53 (19.43) years. The 240 patients from Duke University Medical Center used for validation included 132 women (55.0%); mean (SD) age was 70.2 (14.2) years. In the samples for clinical scoring system comparisons, the 100 outpatients from Stanford University hospitals and clinics included 67 women (67.0%); mean (SD) age was 57.74 (19.87) years, and the 101 patients from Duke University Medical Center included 59 women (58.4%); mean (SD) age was 73.06 (15.3) years. The best-performing model achieved an AUROC performance of predicting a positive PE study of 0.90 (95% CI, 0.87-0.91) on intrainstitutional holdout data with an AUROC of 0.71 (95% CI, 0.69-0.72) on an external data set from Duke University Medical Center; superior AUROC performance and cross-institutional generalization of the model of 0.81 (95% CI, 0.77-0.87) and 0.81 (95% CI, 0.73-0.82), respectively, were noted on holdout outpatient populations from both intrainstitutional and extrainstitutional data.Conclusions and Relevance: The machine learning model, PERFORM, may consider multitudes of applicable patient-specific risk factors and dependencies to arrive at a PE risk prediction that generalizes to new population distributions. This approach might be used as an automated clinical decision-support tool for patients referred for CT PE imaging to improve CT use.

View details for DOI 10.1001/jamanetworkopen.2019.8719

View details for PubMedID 31390040
The Emotional Toll of Inflammatory Bowel Disease: Using Machine Learning to Analyze Online Community Forum Discourse CROHNS & COLITIS 360 Lerrigo, R., Coffey, J. T. R., Kravitz, J. L., Jadhav, P., Nikfarjam, A., Shah, N. H., Jurafsky, D., Sinha, S. R. 2019; 1 (2)

View details for DOI 10.1093/crocol/otz011

View details for Web of Science ID 000755688900002
Increased monocyte count as a cellular biomarker for poor outcomes in fibrotic diseases: a retrospective, multicentre cohort study LANCET RESPIRATORY MEDICINE Scott, M. K. D., Quinn, K., Li, Q., Carroll, R., Warsinske, H., Vallania, F., Chen, S., Carns, M. A., Aren, K., Sun, J., Koloms, K., Lee, J., Baral, J., Kropski, J., Zhao, H., Herzog, E., Martinez, F. J., Moore, B. B., Hinchcliff, M., Denny, J., Kaminski, N., Herazo-Maya, J. D., Shah, N. H., Khatri, P. 2019; 7 (6): 497–508

View details for DOI 10.1016/S2213-2600(18)30508-3

View details for Web of Science ID 000468488400018
Finding missed cases of familial hypercholesterolemia in health systems using machine learning. NPJ digital medicine Banda, J. M., Sarraju, A., Abbasi, F., Parizo, J., Pariani, M., Ison, H., Briskin, E., Wand, H., Dubois, S., Jung, K., Myers, S. A., Rader, D. J., Leader, J. B., Murray, M. F., Myers, K. D., Wilemon, K., Shah, N. H., Knowles, J. W. 2019; 2: 23

Abstract

Familial hypercholesterolemia (FH) is an underdiagnosed dominant genetic condition affecting approximately 0.4% of the population and has up to a 20-fold increased risk of coronary artery disease if untreated. Simple screening strategies have false positive rates greater than 95%. As part of the FH Foundation's FIND FH initiative, we developed a classifier to identify potential FH patients using electronic health record (EHR) data at Stanford Health Care. We trained a random forest classifier using data from known patients (n = 197) and matched non-cases (n = 6590). Our classifier obtained a positive predictive value (PPV) of 0.88 and sensitivity of 0.75 on a held-out test-set. We evaluated the accuracy of the classifier's predictions by chart review of 100 patients at risk of FH not included in the original dataset. The classifier correctly flagged 84% of patients at the highest probability threshold, with decreasing performance as the threshold lowers. In external validation on 466 FH patients (236 with genetically proven FH) and 5000 matched non-cases from the Geisinger Healthcare System our FH classifier achieved a PPV of 0.85. Our EHR-derived FH classifier is effective in finding candidate patients for further FH screening. Such machine learning guided strategies can lead to effective identification of the highest risk patients for enhanced management strategies.

View details for DOI 10.1038/s41746-019-0101-5

View details for PubMedID 31304370

View details for PubMedCentralID PMC6550268
Finding missed cases of familial hypercholesterolemia in health systems using machine learning NPJ DIGITAL MEDICINE Banda, J. M., Sarraju, A., Abbasi, F., Parizo, J., Pariani, M., Ison, H., Briskin, E., Wand, H., Dubois, S., Jung, K., Myers, S. A., Rader, D. J., Leader, J. B., Murray, M. F., Myers, K. D., Wilemon, K., Shah, N. H., Knowles, J. W. 2019; 2

View details for DOI 10.1038/s41746-019-0101-5

View details for Web of Science ID 000466788500001
Predicting need for advanced illness or palliative care in a primary care population using electronic health record data JOURNAL OF BIOMEDICAL INFORMATICS Jung, K., Sudat, S. E. K., Kwon, N., Stewart, W. F., Shah, N. H. 2019; 92

View details for DOI 10.1016/j.jbi.2019.103115

View details for Web of Science ID 000525688900009
Increased monocyte count as a cellular biomarker for poor outcomes in fibrotic diseases: a retrospective, multicentre cohort study. The Lancet. Respiratory medicine Scott, M. K., Quinn, K., Li, Q., Carroll, R., Warsinske, H., Vallania, F., Chen, S., Carns, M. A., Aren, K., Sun, J., Koloms, K., Lee, J., Baral, J., Kropski, J., Zhao, H., Herzog, E., Martinez, F. J., Moore, B. B., Hinchcliff, M., Denny, J., Kaminski, N., Herazo-Maya, J. D., Shah, N. H., Khatri, P. 2019

Abstract

BACKGROUND: There is an urgent need for biomarkers to better stratify patients with idiopathic pulmonary fibrosis by risk for lung transplantation allocation who have the same clinical presentation. We aimed to investigate whether a specific immune cell type from patients with idiopathic pulmonary fibrosis could identify those at higher risk of poor outcomes. We then sought to validate our findings using cytometry and electronic health records.METHODS: We first did a discovery analysis with transcriptome data from the Gene Expression Omnibus at the National Center for Biotechnology Information for 120 peripheral blood mononuclear cell (PBMC) samples of patients with idiopathic pulmonary fibrosis. We estimated percentages of 13 immune cell types using statistical deconvolution, and investigated the association of these cell types with transplant-free survival. We validated these results using PBMC samples from patients with idiopathic pulmonary fibrosis in two independent cohorts (COMET and Yale). COMET profiled monocyte counts in 45 patients with idiopathic pulmonary fibrosis from March 12, 2010, to March 10, 2011, using flow cytometry; we tested if increased monocyte count was associated with the primary outcome of disease progression. In the Yale cohort, 15 patients with idiopathic pulmonary fibrosis (with five healthy controls) were classed as high risk or low risk from April 28, 2014, to Aug 20, 2015, using a 52-gene signature, and we assessed whether monocyte percentage (measured by cytometry by time of flight) was higher in high-risk patients. We then examined complete blood count values in the electronic health records (EHR) of 45 068 patients with idiopathic pulmonary fibrosis, systemic sclerosis, hypertrophic cardiomyopathy, or myelofibrosis from Stanford (Jan 01, 2008, to Dec 31, 2015), Northwestern (Feb 15, 2001 to July 31, 2017), Vanderbilt (Jan 01, 2008, to Dec 31, 2016), and Optum Clinformatics DataMart (Jan 01, 2004, to Dec 31, 2016) cohorts, and examined whether absolute monocyte counts of 0·95 K/muL or greater were associated with all-cause mortality in these patients.FINDINGS: In the discovery analysis, estimated CD14+ classical monocyte percentages above the mean were associated with shorter transplant-free survival times (hazard ratio [HR] 1·82, 95% CI 1·05-3·14), whereas higher percentages of T cells and B cells were not (0·97, 0·59-1·66; and 0·78, 0·45-1·34 respectively). In two validation cohorts (COMET trial and the Yale cohort), patients with higher monocyte counts were at higher risk for poor outcomes (COMET Wilcoxon p=0·025; Yale Wilcoxon p=0·049). Monocyte counts of 0·95 K/muL or greater were associated with mortality after adjusting for forced vital capacity (HR 2·47, 95% CI 1·48-4·15; p=0·0063), and the gender, age, and physiology index (HR 2·06, 95% CI 1·22-3·47; p=0·0068) across the COMET, Stanford, and Northwestern datasets). Analysis of medical records of 7459 patients with idiopathic pulmonary fibrosis showed that patients with monocyte counts of 0·95 K/muL or greater were at increased risk of mortality with lung transplantation as a censoring event, after adjusting for age at diagnosis and sex (Stanford HR=2·30, 95% CI 0·94-5·63; Vanderbilt 1·52, 1·21-1·89; Optum 1·74, 1·33-2·27). Likewise, higher absolute monocyte count was associated with shortened survival in patients with hypertrophic cardiomyopathy across all three cohorts, and in patients with systemic sclerosis or myelofibrosis in two of the three cohorts.INTERPRETATION: Monocyte count could be incorporated into the clinical assessment of patients with idiopathic pulmonary fibrosis and other fibrotic disorders. Further investigation into the mechanistic role of monocytes in fibrosis might lead to insights that assist the development of new therapies.FUNDING: Bill & Melinda Gates Foundation, US National Institute of Allergy and Infectious Diseases, and US National Library of Medicine.

View details for PubMedID 30935881
It is time to learn from patients like mine. NPJ digital medicine Gombar, S., Callahan, A., Califf, R., Harrington, R., Shah, N. H. 2019; 2: 16

Abstract

Clinicians are often faced with situations where published treatment guidelines do not provide a clear recommendation. In such situations, evidence generated from similar patients' data captured in electronic health records (EHRs) can aid decision making. However, challenges in generating and making such evidence available have prevented its on-demand use to inform patient care. We propose that a specialty consultation service staffed by a team of medical and informatics experts can rapidly summarize 'what happened to patients like mine' using data from the EHR and other health data sources. By emulating a familiar physician workflow, and keeping experts in the loop, such a service can translate physician inquiries about situations with evidence gaps into actionable reports. The demand for and benefits gained from such a consult service will naturally vary by practice type and data robustness. However, we cannot afford to miss the opportunity to use the patient data captured every day via EHR systems to close the evidence gap between available clinical guidelines and realities of clinical practice. We have begun offering such a service to physicians at our academic medical center and believe that such a service should be core offering by clinical informatics professional throughout the country. Only if we launch such efforts broadly can we systematically study the utility of learning from the record of routine clinical practice.

View details for DOI 10.1038/s41746-019-0091-3

View details for PubMedID 31304364

View details for PubMedCentralID PMC6550176
It is time to learn from patients like mine NPJ DIGITAL MEDICINE Gombar, S., Callahan, A., Califf, R., Harrington, R., Shah, N. H. 2019; 2

View details for DOI 10.1038/s41746-019-0091-3

View details for Web of Science ID 000462452600001
Predicting Future Cardiovascular Events in Patients With Peripheral Artery Disease Using Electronic Health Record Data CIRCULATION-CARDIOVASCULAR QUALITY AND OUTCOMES Ross, E., Jung, K., Dudley, J. T., Li, L., Leeper, N. J., Shah, N. H. 2019; 12 (3)

View details for DOI 10.1161/CIRCOUTCOMES.118.004741

View details for Web of Science ID 000469334000002
Predicting Future Cardiovascular Events in Patients With Peripheral Artery Disease Using Electronic Health Record Data. Circulation. Cardiovascular quality and outcomes Ross, E. G., Jung, K., Dudley, J. T., Li, L., Leeper, N. J., Shah, N. H. 2019; 12 (3): e004741

Abstract

BACKGROUND: Patients with peripheral artery disease (PAD) are at risk of major adverse cardiac and cerebrovascular events. There are no readily available risk scores that can accurately identify which patients are most likely to sustain an event, making it difficult to identify those who might benefit from more aggressive intervention. Thus, we aimed to develop a novel predictive model-using machine learning methods on electronic health record data-to identify which PAD patients are most likely to develop major adverse cardiac and cerebrovascular events.METHODS AND RESULTS: Data were derived from patients diagnosed with PAD at 2 tertiary care institutions. Predictive models were built using a common data model that allowed for utilization of both structured (coded) and unstructured (text) data. Only data from time of entry into the health system up to PAD diagnosis were used for modeling. Models were developed and tested using nested cross-validation. A total of 7686 patients were included in learning our predictive models. Utilizing almost 1000 variables, our best predictive model accurately determined which PAD patients would go on to develop major adverse cardiac and cerebrovascular events with an area under the curve of 0.81 (95% CI, 0.80-0.83).CONCLUSIONS: Machine learning algorithms applied to data in the electronic health record can learn models that accurately identify PAD patients at risk of future major adverse cardiac and cerebrovascular events, highlighting the great potential of electronic health records to provide automated risk stratification for cardiovascular diseases. Common data models that can enable cross-institution research and technology development could potentially be an important aspect of widespread adoption of newer risk-stratification models.

View details for PubMedID 30857412
Predicting Need for Advanced Illness or Palliative Care In A Primary Care Population Using Electronic Health Record Data. Journal of biomedical informatics Jung, K., Sudat, S. E., Kwon, N., Stewart, W. F., Shah, N. H. 2019: 103115

Abstract

Timely outreach to individuals in an advanced stage of illness offers opportunities to exercise decision control over health care. Predictive models built using Electronic health record (EHR) data are being explored as a way to anticipate such need with enough lead time for patient engagement. Prior studies have focused on hospitalized patients, who typically have more data available for predicting care needs. It is unclear if prediction driven outreach is feasible in the primary care setting. In this study, we apply predictive modeling to the primary care population of a large, regional health system and systematically examine the impact of technical choices, such as requiring a minimum number of health care encounters (data density requirements) and aggregating diagnosis codes using Clinical Classifications Software (CCS) groupings to reduce dimensionality, on model performance in terms of discrimination and positive predictive value. We assembled a cohort of 349,667 primary care patients between 65 and 90 years of age who sought care from Sutter Health between July 1, 2011 and June 30, 2014, of whom 2.1% died during the study period. EHR data comprising demographics, encounters, orders, and diagnoses for each patient from a 12 month observation window prior to the point when a prediction is made were extracted. L1 regularized logistic regression and gradient boosted tree models were fit to training data and tuned by cross validation. Model performance in predicting one year mortality was assessed using held-out test patients. Our experiments systematically varied three factors: model type, diagnosis coding, and data density requirements. We found substantial, consistent benefit from using gradient boosting vs logistic regression (mean AUROC over all other technical choices of 84.8% vs 80.7% respectively). There was no benefit from aggregation of ICD codes into CCS code groups (mean AUROC over all other technical choices of 82.9% vs 82.6% respectively). Likewise increasing data density requirements did not affect discrimination (mean AUROC over other technical choices ranged from 82.5% to 83%). We also examine model performance as a function of lead time, which is the interval between death and when a prediction was made. In subgroup analysis by lead time, mean AUROC over all other choices ranged from 87.9% for patients who died within 0 to 3 months to 83.6% for those who died 9 to 12 months after prediction time.

View details for PubMedID 30753951
Incorporating Observed Physiological Data to Personalize Pediatric Vital Sign Alarm Thresholds BIOMEDICAL INFORMATICS INSIGHTS Poole, S., Shah, N. 2019; 11

View details for DOI 10.1177/1178222618818478

View details for Web of Science ID 000457550100001
Early Detection of Adverse Drug Reactions in Social Health Networks: A Natural Language Processing Pipeline for Signal Detection. JMIR public health and surveillance Nikfarjam, A. n., Ransohoff, J. D., Callahan, A. n., Jones, E. n., Loew, B. n., Kwong, B. Y., Sarin, K. Y., Shah, N. H. 2019; 5 (2): e11264

Abstract

Adverse drug reactions (ADRs) occur in nearly all patients on chemotherapy, causing morbidity and therapy disruptions. Detection of such ADRs is limited in clinical trials, which are underpowered to detect rare events. Early recognition of ADRs in the postmarketing phase could substantially reduce morbidity and decrease societal costs. Internet community health forums provide a mechanism for individuals to discuss real-time health concerns and can enable computational detection of ADRs.The goal of this study is to identify cutaneous ADR signals in social health networks and compare the frequency and timing of these ADRs to clinical reports in the literature.We present a natural language processing-based, ADR signal-generation pipeline based on patient posts on Internet social health networks. We identified user posts from the Inspire health forums related to two chemotherapy classes: erlotinib, an epidermal growth factor receptor inhibitor, and nivolumab and pembrolizumab, immune checkpoint inhibitors. We extracted mentions of ADRs from unstructured content of patient posts. We then performed population-level association analyses and time-to-detection analyses.Our system detected cutaneous ADRs from patient reports with high precision (0.90) and at frequencies comparable to those documented in the literature but an average of 7 months ahead of their literature reporting. Known ADRs were associated with higher proportional reporting ratios compared to negative controls, demonstrating the robustness of our analyses. Our named entity recognition system achieved a 0.738 microaveraged F-measure in detecting ADR entities, not limited to cutaneous ADRs, in health forum posts. Additionally, we discovered the novel ADR of hypohidrosis reported by 23 patients in erlotinib-related posts; this ADR was absent from 15 years of literature on this medication and we recently reported the finding in a clinical oncology journal.Several hundred million patients report health concerns in social health networks, yet this information is markedly underutilized for pharmacosurveillance. We demonstrated the ability of a natural language processing-based signal-generation pipeline to accurately detect patient reports of ADRs months in advance of literature reporting and the robustness of statistical analyses to validate system detections. Our findings suggest the important contributions that social health network data can play in contributing to more comprehensive and timely pharmacovigilance.

View details for DOI 10.2196/11264

View details for PubMedID 31162134
Creating Fair Models of Atherosclerotic Cardiovascular Disease Pfohl, S., Marafino, B., Coulet, A., Rodriguez, F., Palaniappan, L., Shah, N. H., Assoc Comp Machinery ASSOC COMPUTING MACHINERY. 2019: 271–78

View details for DOI 10.1145/3306618.3314278

View details for Web of Science ID 000556121100038
Key Considerations for Incorporating Conversational AI in Psychotherapy. Frontiers in psychiatry Miner, A. S., Shah, N., Bullock, K. D., Arnow, B. A., Bailenson, J., Hancock, J. 2019; 10: 746

Abstract

Conversational artificial intelligence (AI) is changing the way mental health care is delivered. By gathering diagnostic information, facilitating treatment, and reviewing clinician behavior, conversational AI is poised to impact traditional approaches to delivering psychotherapy. While this transition is not disconnected from existing professional services, specific formulations of clinician-AI collaboration and migration paths between forms remain vague. In this viewpoint, we introduce four approaches to AI-human integration in mental health service delivery. To inform future research and policy, these four approaches are addressed through four dimensions of impact: access to care, quality, clinician-patient relationship, and patient self-disclosure and sharing. Although many research questions are yet to be investigated, we view safety, trust, and oversight as crucial first steps. If conversational AI isn't safe it should not be used, and if it isn't trusted, it won't be. In order to assess safety, trust, interfaces, procedures, and system level workflows, oversight and collaboration is needed between AI systems, patients, clinicians, and administrators.

View details for DOI 10.3389/fpsyt.2019.00746

View details for PubMedID 31681047
Making Machine Learning Models Clinically Useful. JAMA Shah, N. H., Milstein, A. n., Bagley PhD, S. C. 2019

View details for DOI 10.1001/jama.2019.10306

View details for PubMedID 31393527
The Effectiveness of Multitask Learning for Phenotyping with Electronic Health Records Data Ding, D., Simpson, C., Pfohl, S., Kale, D. C., Jung, K., Shah, N. H. edited by Altman, R. B., Dunker, A. K., Hunter, L., Ritchie, M. D., Murray, T., Klein, T. E. WORLD SCIENTIFIC PUBL CO PTE LTD. 2019: 18–29

View details for Web of Science ID 000461866400003
Incorporating Observed Physiological Data to Personalize Pediatric Vital Sign Alarm Thresholds. Biomedical informatics insights Poole, S., Shah, N. 2019; 11: 1178222618818478

Abstract

Bedside monitors are intended as a safety net in patient care, but their management in the inpatient setting is a significant patient safety concern. The low precision of vital sign alarm systems leads to clinical staff becoming desensitized to the sound of the alarm, a phenomenon known as alarm fatigue. Alarm fatigue has been shown to increase response time to alarms or result in alarms being ignored altogether and has negative consequences for patient safety. We present methods to establish personalized thresholds for heart rate and respiratory rate alarms. These thresholds are first chosen based on patient characteristics available at the time of admission and are then adapted to incorporate vital signs observed in the first 2 hours of monitoring. We demonstrate that the adapted thresholds are similar to those chosen by clinicians for individual patients and would result in fewer alarms than the currently used age-based thresholds. Personalized vital sign alarm thresholds can help to alleviate the problem of alarm fatigue in an inpatient setting while ensuring that all critical vital signs are detected.

View details for PubMedID 30675101
The Effectiveness of Multitask Learning for Phenotyping with Electronic Health Records Data. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Ding, D. Y., Simpson, C., Pfohl, S., Kale, D. C., Jung, K., Shah, N. H. 2019; 24: 18–29

Abstract

Electronic phenotyping is the task of ascertaining whether an individual has a medical condition of interest by analyzing their medical record and is foundational in clinical informatics. Increasingly, electronic phenotyping is performed via supervised learning. We investigate the effectiveness of multitask learning for phenotyping using electronic health records (EHR) data. Multitask learning aims to improve model performance on a target task by jointly learning additional auxiliary tasks and has been used in disparate areas of machine learning. However, its utility when applied to EHR data has not been established, and prior work suggests that its benefits are inconsistent. We present experiments that elucidate when multitask learning with neural nets improves performance for phenotyping using EHR data relative to neural nets trained for a single phenotype and to well-tuned baselines. We find that multitask neural nets consistently outperform single-task neural nets for rare phenotypes but underperform for relatively more common phenotypes. The effect size increases as more auxiliary tasks are added. Moreover, multitask learning reduces the sensitivity of neural nets to hyperparameter settings for rare phenotypes. Last, we quantify phenotype complexity and find that neural nets trained with or without multitask learning do not improve on simple baselines unless the phenotypes are sufficiently complex.

View details for PubMedID 30864307
The number needed to benefit: estimating the value of predictive analytics in healthcare. Journal of the American Medical Informatics Association : JAMIA Liu, V. X., Bates, D. W., Wiens, J. n., Shah, N. H. 2019

Abstract

Predictive analytics in health care has generated increasing enthusiasm recently, as reflected in a rapidly growing body of predictive models reported in literature and in real-time embedded models using electronic health record data. However, estimating the benefit of applying any single model to a specific clinical problem remains challenging today. Developing a shared framework for estimating model value is therefore critical to facilitate the effective, safe, and sustainable use of predictive tools into the future. We highlight key concepts within the prediction-action dyad that together are expected to impact model benefit. These include factors relevant to model prediction (including the number needed to screen) as well as those relevant to the subsequent action (number needed to treat). In the simplest terms, a number needed to benefit contextualizes the numbers needed to screen and treat, offering an opportunity to estimate the value of a clinical predictive model in action.

View details for DOI 10.1093/jamia/ocz088

View details for PubMedID 31192367
Using natural language processing to construct a metastatic breast cancer cohort from linked cancer registry and electronic medical records data. JAMIA open Ling, A. Y., Kurian, A. W., Caswell-Jin, J. L., Sledge, G. W., Shah, N. H., Tamang, S. R. 2019; 2 (4): 528–37

Abstract

Most population-based cancer databases lack information on metastatic recurrence. Electronic medical records (EMR) and cancer registries contain complementary information on cancer diagnosis, treatment and outcome, yet are rarely used synergistically. To construct a cohort of metastatic breast cancer (MBC) patients, we applied natural language processing techniques within a semisupervised machine learning framework to linked EMR-California Cancer Registry (CCR) data.We studied all female patients treated at Stanford Health Care with an incident breast cancer diagnosis from 2000 to 2014. Our database consisted of structured fields and unstructured free-text clinical notes from EMR, linked to CCR, a component of the Surveillance, Epidemiology and End Results Program (SEER). We identified de novo MBC patients from CCR and extracted information on distant recurrences from patient notes in EMR. Furthermore, we trained a regularized logistic regression model for recurrent MBC classification and evaluated its performance on a gold standard set of 146 patients.There were 11 459 breast cancer patients in total and the median follow-up time was 96.3 months. We identified 1886 MBC patients, 512 (27.1%) of whom were de novo MBC patients and 1374 (72.9%) were recurrent MBC patients. Our final MBC classifier achieved an area under the receiver operating characteristic curve (AUC) of 0.917, with sensitivity 0.861, specificity 0.878, and accuracy 0.870.To enable population-based research on MBC, we developed a framework for retrospective case detection combining EMR and CCR data. Our classifier achieved good AUC, sensitivity, and specificity without expert-labeled examples.

View details for DOI 10.1093/jamiaopen/ooz040

View details for PubMedID 32025650

View details for PubMedCentralID PMC6994019
Medical device surveillance with electronic health records. NPJ digital medicine Callahan, A. n., Fries, J. A., Ré, C. n., Huddleston, J. I., Giori, N. J., Delp, S. n., Shah, N. H. 2019; 2: 94

Abstract

Post-market medical device surveillance is a challenge facing manufacturers, regulatory agencies, and health care providers. Electronic health records are valuable sources of real-world evidence for assessing device safety and tracking device-related patient outcomes over time. However, distilling this evidence remains challenging, as information is fractured across clinical notes and structured records. Modern machine learning methods for machine reading promise to unlock increasingly complex information from text, but face barriers due to their reliance on large and expensive hand-labeled training sets. To address these challenges, we developed and validated state-of-the-art deep learning methods that identify patient outcomes from clinical notes without requiring hand-labeled training data. Using hip replacements-one of the most common implantable devices-as a test case, our methods accurately extracted implant details and reports of complications and pain from electronic health records with up to 96.3% precision, 98.5% recall, and 97.4% F1, improved classification performance by 12.8-53.9% over rule-based methods, and detected over six times as many complication events compared to using structured data alone. Using these additional events to assess complication-free survivorship of different implant systems, we found significant variation between implants, including for risk of revision surgery, which could not be detected using coded data alone. Patients with revision surgeries had more hip pain mentions in the post-hip replacement, pre-revision period compared to patients with no evidence of revision surgery (mean hip pain mentions 4.97 vs. 3.23; t = 5.14; p < 0.001). Some implant models were associated with higher or lower rates of hip pain mentions. Our methods complement existing surveillance mechanisms by requiring orders of magnitude less hand-labeled training data, offering a scalable solution for national medical device surveillance using electronic health records.

View details for DOI 10.1038/s41746-019-0168-z

View details for PubMedID 31583282

View details for PubMedCentralID PMC6761113
Improving palliative care with deep learning. BMC medical informatics and decision making Avati, A., Jung, K., Harman, S., Downing, L., Ng, A., Shah, N. H. 2018; 18 (Suppl 4): 122

Abstract

BACKGROUND: Access to palliative care is a key quality metric which most healthcare organizations strive to improve. The primary challenges to increasing palliative care access are a combination of physicians over-estimating patient prognoses, and a shortage of palliative staff in general. This, in combination with treatment inertia can result in a mismatch between patient wishes, and their actual care towards the end of life.METHODS: In this work, we address this problem, with Institutional Review Board approval, using machine learning and Electronic Health Record (EHR) data of patients. We train a Deep Neural Network model on the EHR data of patients from previous years, to predict mortality of patients within the next 3-12 month period. This prediction is used as a proxy decision for identifying patients who could benefit from palliative care.RESULTS: The EHR data of all admitted patients are evaluated every night by this algorithm, and the palliative care team is automatically notified of the list of patients with a positive prediction. In addition, we present a novel technique for decision interpretation, using which we provide explanations for the model's predictions.CONCLUSION: The automatic screening and notification saves the palliative care team the burden of time consuming chart reviews of all patients, and allows them to take a proactive approach in reaching out to such patients rather then relying on referrals from the treating physicians.

View details for PubMedID 30537977
Improving palliative care with deep learning Avati, A., Jung, K., Harman, S., Downing, L., Ng, A., Shah, N. H. BMC. 2018

View details for DOI 10.1186/s12911-018-0677-8

View details for Web of Science ID 000452837700005
A Second Opinion From Observational Data on Second-line Diabetes Drugs. JAMA network open Callahan, A., Shah, N. H. 2018; 1 (8): e186119

View details for PubMedID 30646309
A Second Opinion From Observational Data on Second-line Diabetes Drugs JAMA NETWORK OPEN Callahan, A., Shah, N. H. 2018; 1 (8)

View details for DOI 10.1001/jamanetworkopen.2018.6119

View details for Web of Science ID 000456295400012
Predicting the need for a reduced drug dose, at first prescription. Scientific reports Coulet, A., Shah, N. H., Wack, M., Chawki, M. B., Jay, N., Dumontier, M. 2018; 8 (1): 15558

Abstract

Prescribing the right drug with the right dose is a central tenet of precision medicine. We examined the use of patients' prior Electronic Health Records to predict a reduction in drug dosage. We focus on drugs that interact with the P450 enzyme family, because their dosage is known to be sensitive and variable. We extracted diagnostic codes, conditions reported in clinical notes, and laboratory orders from Stanford's clinical data warehouse to construct cohorts of patients that either did or did not need a dose change. After feature selection, we trained models to predict the patients who will (or will not) require a dose change after being prescribed one of 34 drugs across 23 drug classes. Overall, we can predict (AUC≥0.70-0.95) a dose reduction for 23 drugs and 22 drug classes. Several of these drugs are associated with clinical guidelines that recommend dose reduction exclusively in the case of adverse reaction. For these cases, a reduction in dosage may be considered as a surrogate for an adverse reaction, which our system could indirectly help predict and prevent. Our study illustrates the role machine learning may take in providing guidance in setting the starting dose for drugs associated with response variability.

View details for PubMedID 30349060
Predicting the need for a reduced drug dose, at first prescription SCIENTIFIC REPORTS Coulet, A., Shah, N. H., Wack, M., Chawki, M. B., Jay, N., Dumontier, M. 2018; 8

View details for DOI 10.1038/s41598-018-33980-0

View details for Web of Science ID 000447848300006
An evaluation of clinical order patterns machine-learned from clinician cohorts stratified by patient mortality outcomes JOURNAL OF BIOMEDICAL INFORMATICS Wang, J. K., Hom, J., Balasubramanian, S., Schuler, A., Shah, N. H., Goldstein, M. K., Baiocchi, M. T. M., Chen, J. H. 2018; 86: 109–19

View details for DOI 10.1016/j.jbi.2018.09.005

View details for Web of Science ID 000460600800011
An Evaluation of Clinical Order Patterns Machine-Learned from Clinician Cohorts Stratified by Patient Mortality Outcomes. Journal of biomedical informatics Wang, J. K., Hom, J., Balasubramanian, S., Schuler, A., Shah, N. H., Goldstein, M. K., Baiocchi, M. T., Chen, J. H. 2018

Abstract

OBJECTIVE: Evaluate the quality of clinical order practice patterns machine-learned from clinician cohorts stratified by patient mortality outcomes.MATERIALS AND METHODS: Inpatient electronic health records from 2010-2013 were extracted from a tertiary academic hospital. Clinicians (n=1,822) were stratified into low-mortality (21.8%, n=397) and high-mortality (6.0%, n=110) extremes using a two-sided P-value score quantifying deviation of observed vs. expected 30-day patient mortality rates. Three patient cohorts were assembled: patients seen by low-mortality clinicians, high-mortality clinicians, and an unfiltered crowd of all clinicians (n=1,046, 1,046, and 5,230 post-propensity score matching, respectively). Predicted order lists were automatically generated from recommender system algorithms trained on each patient cohort and evaluated against i) real-world practice patterns reflected in patient cases with better-than-expected mortality outcomes and ii) reference standards derived from clinical practice guidelines.RESULTS: Across six common admission diagnoses, order lists learned from the crowd demonstrated the greatest alignment with guideline references (AUROC range=0.86-0.91), performing on par or better than those learned from low-mortality clinicians (0.79-0.84, P<10-5) or manually-authored hospital order sets (0.65-0.77, P<10-3). The same trend was observed in evaluating model predictions against better-than-expected patient cases, with the crowd model (AUROC mean=0.91) outperforming the low-mortality model (0.87, P<10-16) and order set benchmarks (0.78, P<10-35).DISCUSSION: Whether machine-learning models are trained on all clinicians or a subset of experts illustrates a bias-variance tradeoff in data usage. Defining robust metrics to assess quality based on internal (e.g. practice patterns from better-than-expected patient cases) or external reference standards (e.g. clinical practice guidelines) is critical to assess decision support content.CONCLUSION: Learning relevant decision support content from all clinicians is as, if not more, robust than learning from a select subgroup of clinicians favored by patient outcomes.

View details for PubMedID 30195660
Association of Hemoglobin A1c Levels With Use of Sulfonylureas, Dipeptidyl Peptidase 4 Inhibitors, and Thiazolidinediones in Patients With Type 2 Diabetes Treated With Metformin: Analysis From the Observational Health Data Sciences and Informatics Initiative. JAMA network open Vashisht, R., Jung, K., Schuler, A., Banda, J. M., Park, R. W., Jin, S., Li, L., Dudley, J. T., Johnson, K. W., Shervey, M. M., Xu, H., Wu, Y., Natrajan, K., Hripcsak, G., Jin, P., Van Zandt, M., Reckard, A., Reich, C. G., Weaver, J., Schuemie, M. J., Ryan, P. B., Callahan, A., Shah, N. H. 2018; 1 (4): e181755

Abstract

Consensus around an efficient second-line treatment option for type 2 diabetes (T2D) remains ambiguous. The availability of electronic medical records and insurance claims data, which capture routine medical practice, accessed via the Observational Health Data Sciences and Informatics network presents an opportunity to generate evidence for the effectiveness of second-line treatments.To identify which drug classes among sulfonylureas, dipeptidyl peptidase 4 (DPP-4) inhibitors, and thiazolidinediones are associated with reduced hemoglobin A1c (HbA1c) levels and lower risk of myocardial infarction, kidney disorders, and eye disorders in patients with T2D treated with metformin as a first-line therapy.Three retrospective, propensity-matched, new-user cohort studies with replication across 8 sites were performed from 1975 to 2017. Medical data of 246 558 805 patients from multiple countries from the Observational Health Data Sciences and Informatics (OHDSI) initiative were included and medical data sets were transformed into a unified common data model, with analysis done using open-source analytical tools. Participants included patients with T2D receiving metformin with at least 1 prior HbA1c laboratory test who were then prescribed either sulfonylureas, DPP-4 inhibitors, or thiazolidinediones. Data analysis was conducted from 2015 to 2018.Treatment with sulfonylureas, DPP-4 inhibitors, or thiazolidinediones starting at least 90 days after the initial prescription of metformin.The primary outcome is the first observation of the reduction of HbA1c level to 7% of total hemoglobin or less after prescription of a second-line drug. Secondary outcomes are myocardial infarction, kidney disorder, and eye disorder after prescription of a second-line drug.A total of 246 558 805 patients (126 977 785 women [51.5%]) were analyzed. Effectiveness of sulfonylureas, DPP-4 inhibitors, and thiazolidinediones prescribed after metformin to lower HbA1c level to 7% or less of total hemoglobin remained indistinguishable in patients with T2D. Patients treated with sulfonylureas compared with DPP-4 inhibitors had a small increased consensus hazard ratio of myocardial infarction (1.12; 95% CI, 1.02-1.24) and eye disorders (1.15; 95% CI, 1.11-1.19) in the meta-analysis. Hazard of observing kidney disorders after treatment with sulfonylureas, DPP-4 inhibitors, or thiazolidinediones was equally likely.The examined drug classes did not differ in lowering HbA1c and in hazards of kidney disorders in patients with T2D treated with metformin as a first-line therapy. Sulfonylureas had a small, higher observed hazard of myocardial infarction and eye disorders compared with DPP-4 inhibitors in the meta-analysis. The OHDSI collaborative network can be used to conduct a large international study examining the effectiveness of second-line treatment choices made in clinical management of T2D.

View details for DOI 10.1001/jamanetworkopen.2018.1755

View details for PubMedID 30646124

View details for PubMedCentralID PMC6324274
Comparative safety and effectiveness of alendronate versus raloxifene in women with osteoporosis Tian, Y., Kim, Y., Yang, J., Huser, V., Jin, P., Lambert, C., Park, H., Park, R., Rijnbeek, P., Van Zandt, M., Vashisht, R., Wu, Y., You, S., Duke, J., Hripcsak, G., Madigan, D., Reich, C., Shah, N., Ryan, P., Schuemie, M., Suchard, M. WILEY. 2018: 184

View details for Web of Science ID 000441893801170
The Impact of Acute Organ Dysfunction on Long-Term Survival in Sepsis. Critical care medicine Schuler, A., Wulf, D. A., Lu, Y., Iwashyna, T. J., Escobar, G. J., Shah, N. H., Liu, V. X. 2018; 46 (6): 843–49

Abstract

OBJECTIVES: To estimate the impact of each of six types of acute organ dysfunction (hepatic, renal, coagulation, neurologic, cardiac, and respiratory) on long-term mortality after surviving sepsis hospitalization.DESIGN: Multicenter, retrospective study.SETTINGS: Twenty-one hospitals within an integrated healthcare delivery system in Northern California.PATIENTS: Thirty thousand one hundred sixty-three sepsis patients admitted through the emergency department between 2010 and 2013, with mortality follow-up through April 2015.INTERVENTIONS: None.MEASUREMENTS AND MAIN RESULTS: Acute organ dysfunction was quantified using modified Sequential Organ Failure Assessment scores. The main outcome was long-term mortality among sepsis patients who survived hospitalization. The estimates of the impact of each type of acute organ dysfunction on long-term mortality were based on adjusted Cox proportional hazards models. Sensitivity analyses were conducted based on propensity score-matching and adjusted logistic regression. Hospital mortality was 9.4% and mortality was 31.7% at 1 year. Median follow-up time among sepsis survivors was 797 days (interquartile range: 384-1,219 d). Acute neurologic (odds ratio, 1.86; p < 0.001), respiratory (odds ratio, 1.43; p < 0.001), and cardiac (odds ratio, 1.31; p < 0.001) dysfunction were most strongly associated with short-term hospital mortality, compared with sepsis patients without these organ dysfunctions. Evaluating only patients surviving their sepsis hospitalization, acute neurologic dysfunction was also most strongly associated with long-term mortality (odds ratio, 1.52; p < 0.001) corresponding to a marginal increase in predicted 1-year mortality of 6.0% for the presence of any neurologic dysfunction (p < 0.001). Liver dysfunction was also associated with long-term mortality in all models, whereas the association for other organ dysfunction subtypes was inconsistent between models.CONCLUSIONS: Acute sepsis-related neurologic dysfunction was the organ dysfunction most strongly associated with short- and long-term mortality and represents a key mediator of long-term adverse outcomes following sepsis.

View details for PubMedID 29432349
The Impact of Acute Organ Dysfunction on Long-Term Survival in Sepsis CRITICAL CARE MEDICINE Schuler, A., Wulf, D. A., Lu, Y., Iwashyna, T. J., Escobar, G. J., Shah, N. H., Liu, V. X. 2018; 46 (6): 843–49

View details for DOI 10.1097/CCM.0000000000003023

View details for Web of Science ID 000439575100032
Some methods for heterogeneous treatment effect estimation in high dimensions STATISTICS IN MEDICINE Powers, S., Qian, J., Jung, K., Schuler, A., Shah, N. H., Hastie, T., Tibshirani, R. 2018; 37 (11): 1767–87

Abstract

When devising a course of treatment for a patient, doctors often have little quantitative evidence on which to base their decisions, beyond their medical education and published clinical trials. Stanford Health Care alone has millions of electronic medical records that are only just recently being leveraged to inform better treatment recommendations. These data present a unique challenge because they are high dimensional and observational. Our goal is to make personalized treatment recommendations based on the outcomes for past patients similar to a new patient. We propose and analyze 3 methods for estimating heterogeneous treatment effects using observational data. Our methods perform well in simulations using a wide variety of treatment effect functions, and we present results of applying the 2 most promising methods to data from The SPRINT Data Analysis Challenge, from a large randomized trial of a treatment for high blood pressure.

View details for PubMedID 29508417

View details for PubMedCentralID PMC5938172
Scalable and accurate deep learning with electronic health records NPJ DIGITAL MEDICINE Rajkomar, A., Oren, E., Chen, K., Dai, A. M., Hajaj, N., Hardt, M., Liu, P. J., Liu, X., Marcus, J., Sun, M., Sundberg, P., Yee, H., Zhang, K., Zhang, Y., Flores, G., Duggan, G. E., Irvine, J., Quoc Le, Litsch, K., Mossin, A., Tansuwan, J., Wang, D., Wexler, J., Wilson, J., Ludwig, D., Volchenboum, S. L., Chou, K., Pearson, M., Madabushi, S., Shah, N. H., Butte, A. J., Howell, M. D., Cui, C., Corrado, G. S., Dean, J. 2018; 1

View details for DOI 10.1038/s41746-018-0029-1

View details for Web of Science ID 000444179800001
Interpretation of biological experiments changes with evolution of the Gene Ontology and its annotations SCIENTIFIC REPORTS Tomczak, A., Mortensen, J. M., Winnenburg, R., Liu, C., Alessi, D. T., Swamy, V., Vallania, F., Lofgren, S., Haynes, W., Shah, N. H., Musen, M. A., Khatri, P. 2018; 8: 5115

Abstract

Gene Ontology (GO) enrichment analysis is ubiquitously used for interpreting high throughput molecular data and generating hypotheses about underlying biological phenomena of experiments. However, the two building blocks of this analysis - the ontology and the annotations - evolve rapidly. We used gene signatures derived from 104 disease analyses to systematically evaluate how enrichment analysis results were affected by evolution of the GO over a decade. We found low consistency between enrichment analyses results obtained with early and more recent GO versions. Furthermore, there continues to be a strong annotation bias in the GO annotations where 58% of the annotations are for 16% of the human genes. Our analysis suggests that GO evolution may have affected the interpretation and possibly reproducibility of experiments over time. Hence, researchers must exercise caution when interpreting GO enrichment analyses and should reexamine previous analyses with the most recent GO version.

View details for PubMedID 29572502
U-Index, a dataset and an impact metric for informatics tools and databases SCIENTIFIC DATA Callahan, A., Winnenburg, R., Shah, N. H. 2018; 5: 180043

Abstract

Measuring the usage of informatics resources such as software tools and databases is essential to quantifying their impact, value and return on investment. We have developed a publicly available dataset of informatics resource publications and their citation network, along with an associated metric (u-Index) to measure informatics resources' impact over time. Our dataset differentiates the context in which citations occur to distinguish between 'awareness' and 'usage', and uses a citing universe of open access publications to derive citation counts for quantifying impact. Resources with a high ratio of usage citations to awareness citations are likely to be widely used by others and have a high u-Index score. We have pre-calculated the u-Index for nearly 100,000 informatics resources. We demonstrate how the u-Index can be used to track informatics resource impact over time. The method of calculating the u-Index metric, the pre-computed u-Index values, and the dataset we compiled to calculate the u-Index are publicly available.

View details for PubMedID 29557976
Implementing Machine Learning in Health Care - Addressing Ethical Challenges NEW ENGLAND JOURNAL OF MEDICINE Char, D. S., Shah, N. H., Magnus, D. 2018; 378 (11): 981–83

View details for PubMedID 29539284

View details for PubMedCentralID PMC5962261
Performing an Informatics Consult: Methods and Challenges JOURNAL OF THE AMERICAN COLLEGE OF RADIOLOGY Schuler, A., Callahan, A., Jung, K., Shah, N. H. 2018; 15 (3): 563–68

Abstract

Our health care system is plagued by missed opportunities, waste, and harm. Data generated in the course of care are often underutilized, scientific insight goes untranslated, and evidence is overlooked. To address these problems, we envisioned a system where aggregate patient data can be used at the bedside to provide practice-based evidence. To create that system, we directly connect practicing physicians to clinical researchers and data scientists through an informatics consult. Our team processes and classifies questions posed by clinicians, identifies the appropriate patient data to use, runs the appropriate analyses, and returns an answer, ideally in a 48-hour time window. Here, we discuss the methods that are used for data extraction, processing, and analysis in our consult. We continue to refine our informatics consult service, moving closer to a learning health care system.

View details for PubMedID 29396125
What This Computer Needs Is a Physician Humanism and Artificial Intelligence JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION Verghese, A., Shah, N. H., Harrington, R. 2018; 319 (1): 19–20

View details for PubMedID 29261830
Inpatient Clinical Order Patterns Machine-Learned From Teaching Versus Attending-Only Medical Services. AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science Wang, J. K., Schuler, A., Shah, N. H., Baiocchi, M. T., Chen, J. H. 2018; 2017: 226–35

Abstract

Clinical order patterns derived from data-mining electronic health records can be a valuable source of decision support content. However, the quality of crowdsourcing such patterns may be suspect depending on the population learned from. For example, it is unclear whether learning inpatient practice patterns from a university teaching service, characterized by physician-trainee teams with an emphasis on medical education, will be of variable quality versus an attending-only medical service that focuses strictly on clinical care. Machine learning clinical order patterns by association rule episode mining from teaching versus attending-only inpatient medical services illustrated some practice variability, but converged towards similar top results in either case. We further validated the automatically generated content by confirming alignment with external reference standards extracted from clinical practice guidelines.

View details for PubMedID 29888077
Inferring Physical Function from Wearable Activity Monitors: Analysis of Activity Data from Patients with Knee Osteoarthritis. JMIR Mhealth Uhealth Agarwal, V., Smuck, M., Tomkins-Lane, C., Shah, N. H. 2018

View details for DOI 10.2196/11315
Democratizing Health Data for Translational Research Payne, P. R. O., Shah, N. H., Tenenbaum, J. D., Mangravite, L. edited by Altman, R. B., Dunker, A. K., Hunter, L., Ritchie, M. D., Murray, T., Klein, T. E. WORLD SCIENTIFIC PUBL CO PTE LTD. 2018: 240–46

Abstract

There is an expanding and intensive focus on the accessibility, reproducibility, and rigor of basic, clinical, and translational research. This focus complements the need to identify sustainable ways to generate actionable research results that improve human health. The principles and practices of open science offer a promising path to address both issues by facilitating: 1) increased transparency of data and methods which promotes research reproducibility and rigor; and 2) cumulative efficiencies wherein research tools and the output of research are combined to accelerate the delivery of new knowledge. While great strides have been in made in terms of enabling the open science paradigm in the biological sciences, progress in sharing of patient-derived health data has been more moderate. This lack of widespread access to common and well characterized health data is a substantial impediment to the timely, efficient, and multi-disciplinary conduct of translational research, particularly in those instances where hypotheses spanning multiple scales (from molecules to patients to populations) are being developed and tested. To address such challenges, we review current best practices and lessons learned, and explore the need for policy changes and technical innovation that can enhance the sharing of health data for translational research.

View details for Web of Science ID 000461831500022

View details for PubMedID 29218885
Addressing vital sign alarm fatigue using personalized alarm thresholds Poole, S., Shah, N. edited by Altman, R. B., Dunker, A. K., Hunter, L., Ritchie, M. D., Murray, T., Klein, T. E. WORLD SCIENTIFIC PUBL CO PTE LTD. 2018: 472–83

Abstract

Alarm fatigue, a condition in which clinical staff become desensitized to alarms due to the high frequency of unnecessary alarms, is a major patient safety concern. Alarm fatigue is particularly prevalent in the pediatric setting, due to the high level of variation in vital signs with patient age. Existing studies have shown that the current default pediatric vital sign alarm thresholds are inappropriate, and lead to a larger than necessary alarm load. This study leverages a large database containing over 190 patient-years of heart rate data to accurately identify the 1st and 99th percentiles of an individual's heart rate on their first day of vital sign monitoring. These percentiles are then used as personalized vital sign thresholds, which are evaluated by comparing to non-default alarm thresholds used in practice, and by using the presence of major clinical events to infer alarm labels. Using the proposed personalized thresholds would decrease low and high heart rate alarms by up to 50% and 44% respectively, while maintaining sensitivity of 62% and increasing specificity to 49%. The proposed personalized vital sign alarm thresholds will reduce alarm fatigue, thus contributing to improved patient outcomes, shorter hospital stays, and reduced hospital costs.

View details for PubMedID 29218906
Inferring Physical Function From Wearable Activity Monitors: Analysis of Free-Living Activity Data From Patients With Knee Osteoarthritis. JMIR mHealth and uHealth Agarwal, V. n., Smuck, M. n., Tomkins-Lane, C. n., Shah, N. H. 2018; 6 (12): e11315

Abstract

Clinical assessments for physical function do not objectively quantify routine daily activities. Wearable activity monitors (WAMs) enable objective measurement of daily activities, but it remains unclear how these map to clinically measured physical function measures.This study aims to derive a representation of physical function from daily measurements of free-living activity obtained through a WAM. In addition, we evaluate our derived measure against objectively measured function using an ordinal classification setup.We defined function profiles representing average time spent in a set of pattern classes over consecutive days. We constructed a function profile using minute-level activity data from a WAM available from the Osteoarthritis Initiative. Using the function profile as input, we trained statistical models that classified subjects into quartiles of objective measurements of physical function as measured through the 400-m walk test, 20-m walk test, and 5 times sit-stand test. Furthermore, we evaluated model performance on held-out data.The function profile derived from minute-level activity data can accurately predict physical performance as measured through clinical assessments. Using held-out data, the Goodman-Kruskal Gamma statistic obtained in classifying performance values in the first quartile, interquartile range, and the fourth quartile was 0.62, 0.53, and 0.51 for the 400-m walk, 20-m walk, and 5 times sit-stand tests, respectively.Function profiles accurately represent physical function, as demonstrated by the relationship between the profiles and clinically measured physical performance. The estimation of physical performance through function profiles derived from free-living activity data may enable remote functional monitoring of patients.

View details for PubMedID 30394876
Association of Hemoglobin A1c Levels With Use of Sulfonylureas, Dipeptidyl Peptidase 4 Inhibitors, and Thiazolidinediones in Patients With Type 2 Diabetes Treated With MetforminAnalysis From the Observational Health Data Sciences and Informatics Initiative. JAMA Network Open Vashisht, R., Jung, ., Schuler, A., Banda, . M., , , , , , , , , et al 2018

View details for DOI 10.1001/jamanetworkopen.2018.1755
Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models ANNUAL REVIEW OF BIOMEDICAL DATA SCIENCE, VOL 1 Banda, J. M., Seneviratne, M., Hernandez-Boussard, T., Shah, N. H. edited by Altman, R. B., Levitt, M. 2018; 1: 53–68

View details for DOI 10.1146/annurev-biodatasci-080917-013315

View details for Web of Science ID 000466876200003
Identifying Cases of Metastatic Prostate Cancer Using Machine Learning on Electronic Health Records. AMIA ... Annual Symposium proceedings. AMIA Symposium Seneviratne, M. G., Banda, J. M., Brooks, J. D., Shah, N. H., Hernandez-Boussard, T. M. 2018; 2018: 1498–1504

Abstract

Cancer stage is rarely captured in structured form in the electronic health record (EHR). We evaluate the performance of a classifier, trained on structured EHR data, in identifying prostate cancer patients with metastatic disease. Using EHR data for a cohort of 5,861 prostate cancer patients mapped to the Observational Health Data Sciences and Informatics (OHDSI) data model, we constructed feature vectors containing frequency counts of conditions, procedures, medications, observations and laboratory values. Staging information from the California Cancer Registry was used as the ground-truth. For identifying patients with metastatic disease, a random forest model achieved precision and recall of 0.90, 0.40 using data within 12 months of diagnosis. This compared to precision 0.33, recall 0.54 for an ICD code-based query. High-precision classifiers using hundreds of structured data elements significantly outperform ICD queries, and may assist in identifying cohorts for observational research or clinical trial matching.

View details for PubMedID 30815195
Accurate and interpretable intensive care risk adjustment for fused clinical data with generalized additive models. AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science Marafino, B. J., Dudley, R. A., Shah, N. H., Chen, J. H. 2018; 2017: 166–75

Abstract

Risk adjustment models for intensive care outcomes have yet to realize the full potential of data unlocked by the increasing adoption of EHRs. In particular, they fail to fully leverage the information present in longitudinal, structured clinical data - including laboratory test results and vital signs - nor can they infer patient state from unstructured clinical narratives without lengthy manual abstraction. A fully electronic ICU risk model fusing these two types of data sources may yield improved accuracy and more personalized risk estimates, and in obviating manual abstraction, could also be used for real-time decision-making. As a first step towards fully "electronic" ICU models based on fused data, we present results of generalized additive modeling applied to a sample of over 36,000 ICU patients. Our approach outperforms those based on the SAPS and OASIS systems (A UC: 0.908 vs. 0.794 and 0.874), and appears to yield more granular and easily visualized risk estimates.

View details for PubMedID 29888065
Detecting Chemotherapeutic Skin Adverse Reactions in Social Health Networks Using Deep Learning. JAMA oncology Ransohoff, J. D., Nikfarjam, A. n., Jones, E. n., Loew, B. n., Kwong, B. Y., Sarin, K. Y., Shah, N. H. 2018; 4 (4): 581–83

View details for PubMedID 29494731
Toward multimodal signal detection of adverse drug reactions JOURNAL OF BIOMEDICAL INFORMATICS Harpaz, R., DuMouchel, W., Schuemie, M., Bodenreider, O., Friedman, C., Horvitz, E., Ripple, A., Sorbello, A., White, R. W., Winnenburg, R., Shah, N. H. 2017; 76: 41–49

View details for DOI 10.1016/j.jbi.2017.10.013

View details for Web of Science ID 000426221400005
Toward multimodal signal detection of adverse drug reactions. Journal of biomedical informatics Harpaz, R., DuMouchel, W., Schuemie, M., Bodenreider, O., Friedman, C., Horvitz, E., Ripple, A., Sorbello, A., White, R. W., Winnenburg, R., Shah, N. H. 2017; 76: 41-49

Abstract

Improving mechanisms to detect adverse drug reactions (ADRs) is key to strengthening post-marketing drug safety surveillance. Signal detection is presently unimodal, relying on a single information source. Multimodal signal detection is based on jointly analyzing multiple information sources. Building on, and expanding the work done in prior studies, the aim of the article is to further research on multimodal signal detection, explore its potential benefits, and propose methods for its construction and evaluation.Four data sources are investigated; FDA's adverse event reporting system, insurance claims, the MEDLINE citation database, and the logs of major Web search engines. Published methods are used to generate and combine signals from each data source. Two distinct reference benchmarks corresponding to well-established and recently labeled ADRs respectively are used to evaluate the performance of multimodal signal detection in terms of area under the ROC curve (AUC) and lead-time-to-detection, with the latter relative to labeling revision dates.Limited to our reference benchmarks, multimodal signal detection provides AUC improvements ranging from 0.04 to 0.09 based on a widely used evaluation benchmark, and a comparative added lead-time of 7-22 months relative to labeling revision dates from a time-indexed benchmark.The results support the notion that utilizing and jointly analyzing multiple data sources may lead to improved signal detection. Given certain data and benchmark limitations, the early stage of development, and the complexity of ADRs, it is currently not possible to make definitive statements about the ultimate utility of the concept. Continued development of multimodal signal detection requires a deeper understanding the data sources used, additional benchmarks, and further research on methods to generate and synthesize signals.

View details for DOI 10.1016/j.jbi.2017.10.013

View details for PubMedID 29081385
Pharmacovigilance Using Textual Data: The Need to Go Deeper and Wider into the Con(text) DRUG SAFETY Sethi, T., Shah, N. H. 2017; 40 (11): 1047-1048

View details for DOI 10.1007/s40264-017-0585-3

View details for Web of Science ID 000413287500001

View details for PubMedID 28786036
A dataset quantifying polypharmacy in the United States SCIENTIFIC DATA Quinn, K. J., Shah, N. H. 2017; 4: 170167

Abstract

Polypharmacy is increasingly common in the United States, and contributes to the substantial burden of drug-related morbidity. Yet real-world polypharmacy patterns remain poorly characterized. We have counted the incidence of multi-drug combinations observed in four billion patient-months of outpatient prescription drug claims from 2007-2014 in the Truven Health MarketScan® Databases. Prescriptions are grouped into discrete windows of concomitant drug exposure, which are used to count exposure incidences for combinations of up to five drug ingredients or ATC drug classes. Among patients taking any prescription drug, half are exposed to two or more drugs, and 5% are exposed to 8 or more. The most common multi-drug combinations treat manifestations of metabolic syndrome. Patients are exposed to unique drug combinations in 10% of all exposure windows. Our analysis of multi-drug exposure incidences provides a detailed summary of polypharmacy in a large US cohort, which can prioritize common drug combinations for future safety and efficacy studies.

View details for DOI 10.1038/sdata.2017.167

View details for Web of Science ID 000414093600001

View details for PubMedID 29087369

View details for PubMedCentralID PMC5663207
Androgen Deprivation Therapy and Subsequent Dementia-Reply. JAMA oncology Nead, K. T., Swisher-McClure, S., Shah, N. H. 2017

View details for DOI 10.1001/jamaoncol.2017.0405

View details for PubMedID 28472237
Research on Gun Violence vs Other Causes of Death. JAMA Stark, D. E., Shah, N. H. 2017; 317 (13): 1379

View details for DOI 10.1001/jama.2017.2440

View details for PubMedID 28384827
Predicting patient 'cost blooms' in Denmark: a longitudinal population-based study. BMJ open Tamang, S., Milstein, A., Sørensen, H. T., Pedersen, L., Mackey, L., Betterton, J., Janson, L., Shah, N. 2017; 7 (1)

Abstract

To compare the ability of standard versus enhanced models to predict future high-cost patients, especially those who move from a lower to the upper decile of per capita healthcare expenditures within 1 year-that is, 'cost bloomers'.We developed alternative models to predict being in the upper decile of healthcare expenditures in year 2 of a sample, based on data from year 1. Our 6 alternative models ranged from a standard cost-prediction model with 4 variables (ie, traditional model features), to our largest enhanced model with 1053 non-traditional model features. To quantify any increases in predictive power that enhanced models achieved over standard tools, we compared the prospective predictive performance of each model.We used the population of Western Denmark between 2004 and 2011 (2 146 801 individuals) to predict future high-cost patients and characterise high-cost patient subgroups. Using the most recent 2-year period (2010-2011) for model evaluation, our whole-population model used a cohort of 1 557 950 individuals with a full year of active residency in year 1 (2010). Our cost-bloom model excluded the 155 795 individuals who were already high cost at the population level in year 1, resulting in 1 402 155 individuals for prediction of cost bloomers in year 2 (2011).Using unseen data from a future year, we evaluated each model's prospective predictive performance by calculating the ratio of predicted high-cost patient expenditures to the actual high-cost patient expenditures in Year 2-that is, cost capture.Our best enhanced model achieved a 21% and 30% improvement in cost capture over a standard diagnosis-based model for predicting population-level high-cost patients and cost bloomers, respectively.In combination with modern statistical learning methods for analysing large data sets, models enhanced with a large and diverse set of features led to better performance-especially for predicting future cost bloomers.

View details for DOI 10.1136/bmjopen-2016-011580

View details for PubMedID 28077408

View details for PubMedCentralID PMC5253526
Funding and Publication of Research on Gun Violence and Other Leading Causes of Death. JAMA Stark, D. E., Shah, N. H. 2017; 317 (1): 84-85

View details for DOI 10.1001/jama.2016.16215

View details for PubMedID 28030692
Association Between Androgen Deprivation Therapy and Risk of Dementia JAMA ONCOLOGY Nead, K. T., Gaskin, G., Chester, C., Swisher-McClure, S., Leeper, N. J., Shah, N. H. 2017; 3 (1): 49-55

Abstract

A growing body of evidence supports a link between androgen deprivation therapy (ADT) and cognitive dysfunction, including Alzheimer disease. However, it is currently unknown whether ADT may contribute to the risk of dementia more broadly.To use an informatics approach to examine the association of ADT as a treatment for prostate cancer with the subsequent development of dementia (eg, senile dementia, vascular dementia, frontotemporal dementia, and Alzheimer dementia).In this cohort study, a text-processing method was used to analyze electronic medical record data from an academic medical center from 1994 to 2013, with a median follow-up of 3.4 years (interquartile range, 1.0-7.2 years). We identified 9455 individuals with prostate cancer who were 18 years or older at diagnosis with data recorded in the electronic health record and follow-up after diagnosis. We excluded 183 patients with a previous diagnosis of dementia. Our final cohort comprised 9272 individuals with prostate cancer, including 1826 men (19.7%) who received ADT.We tested the effect of ADT on the risk of dementia using propensity score-matched Cox proportional hazards regression models and Kaplan-Meier survival analysis.Among 9272 men with prostate cancer (mean [SD] age, 66.9 [10.9] years; 5450 [58.8%] white), there was a statistically significant association between use of ADT and risk of dementia (hazard ratio, 2.17; 95% CI, 1.58-2.99; P < .001). In sensitivity analyses, results were similar when excluding patients with Alzheimer disease (hazard ratio, 2.32; 95% CI, 1.73-3.12; P < .001). The absolute increased risk of developing dementia among those who received ADT was 4.4% at 5 years (7.9% among those who received ADT vs 3.5% in those who did not receive ADT). Analyses stratified by duration of ADT found that individuals with at least 12 months of ADT use had the greatest absolute increased risk of dementia (hazard ratio, 2.36; 95% CI, 1.64-3.38; P < .001). Kaplan-Meier analysis demonstrated that ADT users 70 years or older had the lowest cumulative probability of remaining dementia free (log-rank P < .001).Androgen deprivation therapy in the treatment of prostate cancer may be associated with an increased risk of dementia. This finding should be further evaluated in prospective studies.

View details for DOI 10.1001/jamaoncol.2016.3662

View details for Web of Science ID 000394257300011
OPEN DATA FOR DISCOVERY SCIENCE Payne, P. R. O., Huang, K., Shah, N. H., Tenenbaum, J. edited by Altman, R. B., Dunker, A. K., Hunter, L., Ritchie, M. D., Murray, T., Klein, T. E. WORLD SCIENTIFIC PUBL CO PTE LTD. 2017: 649-652

Abstract

The modern healthcare and life sciences ecosystem is moving towards an increasingly open and data-centric approach to discovery science. This evolving paradigm is predicated on a complex set of information needs related to our collective ability to share, discover, reuse, integrate, and analyze open biological, clinical, and population level data resources of varying composition, granularity, and syntactic or semantic consistency. Such an evolution is further impacted by a concomitant growth in the size of data sets that can and should be employed for both hypothesis discovery and testing. When such open data can be accessed and employed for discovery purposes, a broad spectrum of high impact end-points is made possible. These span the spectrum from identification of de novo biomarker complexes that can inform precision medicine, to the repositioning or repurposing of extant agents for new and cost-effective therapies, to the assessment of population level influences on disease and wellness. Of note, these types of uses of open data can be either primary, wherein open data is the substantive basis for inquiry, or secondary, wherein open data is used to augment or enrich project-specific or proprietary data that is not open in and of itself. This workshop is concerned with the key challenges, opportunities, and methodological best practices whereby open data can be used to drive the advancement of discovery science in all of the aforementioned capacities.

View details for Web of Science ID 000391254200061

View details for PubMedID 27897016
Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network. AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science Banda, J. M., Halpern, Y., Sontag, D., Shah, N. H. 2017; 2017: 48-57

Abstract

The widespread usage of electronic health records (EHRs) for clinical research has produced multiple electronic phenotyping approaches. Methods for electronic phenotyping range from those needing extensive specialized medical expert supervision to those based on semi-supervised learning techniques. We present Automated PHenotype Routine for Observational Definition, Identification, Training and Evaluation (APHRODITE), an R- package phenotyping framework that combines noisy labeling and anchor learning. APHRODITE makes these cutting-edge phenotyping approaches available for use with the Observational Health Data Sciences and Informatics (OHDSI) data model for standardized and scalable deployment. APHRODITE uses EHR data available in the OHDSI Common Data Model to build classification models for electronic phenotyping. We demonstrate the utility of APHRODITE by comparing its performance versus traditional rule-based phenotyping approaches. Finally, the resulting phenotype models and model construction workflows built with APHRODITE can be shared between multiple OHDSI sites. Such sharing allows their application on large and diverse patient populations.

View details for PubMedID 28815104
Quantifying the relative change in physical activity after Total Knee Arthroplasty using accelerometer based measurements. AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science Agarwal, V., Smuck, M., Shah, N. H. 2017; 2017: 463–72

Abstract

Osteoarthritis is amongst the top five most disabling conditions affecting Americans over 65 years of age and imposes an annual economic burden estimated at $ 89.1 billion. Nearly half of the cost of care of Osteoarthritis is attributable to hospitalizations for total knee arthroplasties (TKA) and total hip arthroplasties (THA). The current clinical practice relies predominantly on subjective assessment of physical function and pain via patient reported outcome measures (PROM) that have proven inadequate for providing a validated, reliable and responsive measure of TKA outcomes. Wearable activity monitors, which produce a trace of regularly monitored physical activity derived from accelerometer measurements, provide a novel opportunity to objectively assess physical functional status in Osteoarthritis patients. Using data from the Osteoarthritis Initiative (OAI), we demonstrate the feasibility of quantifying the relative change in physical activity patterns in Osteoarthritis subjects using accelerometer based measurements of daily physical activity.

View details for PubMedID 28815146
Risk of angioedema associated with levetiracetam compared with phenytoin: Findings of the observational health data sciences and informatics research network. Epilepsia Duke, J. D., Ryan, P. B., Suchard, M. A., Hripcsak, G. n., Jin, P. n., Reich, C. n., Schwalm, M. S., Khoma, Y. n., Wu, Y. n., Xu, H. n., Shah, N. H., Banda, J. M., Schuemie, M. J. 2017; 58 (8): e101–e106

Abstract

Recent adverse event reports have raised the question of increased angioedema risk associated with exposure to levetiracetam. To help address this question, the Observational Health Data Sciences and Informatics research network conducted a retrospective observational new-user cohort study of seizure patients exposed to levetiracetam (n = 276,665) across 10 databases. With phenytoin users (n = 74,682) as a comparator group, propensity score-matching was conducted and hazard ratios computed for angioedema events by per-protocol and intent-to-treat analyses. Angioedema events were rare in both the levetiracetam and phenytoin groups (54 vs. 71 in per-protocol and 248 vs. 435 in intent-to-treat). No significant increase in angioedema risk with levetiracetam was seen in any individual database (hazard ratios ranging from 0.43 to 1.31). Meta-analysis showed a summary hazard ratio of 0.72 (95% confidence interval [CI] 0.39-1.31) and 0.64 (95% CI 0.52-0.79) for the per-protocol and intent-to-treat analyses, respectively. The results suggest that levetiracetam has the same or lower risk for angioedema than phenytoin, which does not currently carry a labeled warning for angioedema. Further studies are warranted to evaluate angioedema risk across all antiepileptic drugs.

View details for PubMedID 28681416
Assessing Screening Guidelines for Cardiovascular Disease Risk Factors using Routinely Collected Data. Scientific reports Pannu, J. n., Poole, S. n., Shah, N. n., Shah, N. H. 2017; 7 (1): 6488

Abstract

This study investigates if laboratory data can be used to assess whether physician-retesting patterns are in line with established guidelines, and if these guidelines identify deteriorating patients in a timely manner. A total of 7594 patients with high cholesterol were studied, along with 2764 patients with diabetes. More than 90% of borderline high cholesterol patients are retested within the 3 year recommended period, however less than 75% of pre-diabetic patients have repeated tests within the suggested 1-year time frame. Patients with borderline high cholesterol typically progress to full high cholesterol in 2-3 years, and pre-diabetic patients progress to full diabetes in 1-2 years. Data from routinely ordered laboratory tests can be used to monitor adherence to clinical guidelines. These data may also be useful in the design of adaptive testing strategies that reduce unnecessary testing, while ensuring that patient deterioration is identified in a timely manner. Established guidelines for testing of total serum cholesterol for hypercholesterolemia are appropriate and are well-adhered to, whereas guidelines for glycated hemoglobin A1c testing for type 2 diabetes mellitus could be improved to bring them in line with current practice and avoid unnecessary testing.

View details for PubMedID 28747722
Improving Palliative Care with Deep Learning Avati, A., Jung, K., Harman, S., Downing, L., Ng, A., Shah, N. H. edited by Hu, X. H., Shyu, C. R., Bromberg, Y., Gao, J., Gong, Y., Korkin, D., Yoo, Zheng, J. H. IEEE. 2017: 311–16

View details for Web of Science ID 000426504100061
A Clinical Score for Predicting Atrial Fibrillation in Patients with Cryptogenic Stroke or Transient Ischemic Attack CARDIOLOGY Kwong, C., Ling, A. Y., Crawford, M. H., Zhao, S. X., Shah, N. H. 2017; 138 (3): 133–40

Abstract

Detection of atrial fibrillation (AF) in post-cryptogenic stroke (CS) or transient ischemic attack (TIA) patients carries important therapeutic implications.To risk stratify CS/TIA patients for later development of AF, we conducted a retrospective cohort study using data from 1995 to 2015 in the Stanford Translational Research Integrated Database Environment (STRIDE).Of the 9,589 adult patients (age ≥40 years) with CS/TIA included, 482 (5%) patients developed AF post CS/TIA. Of those patients, 28.4, 26.3, and 45.3% were diagnosed with AF 1-12 months, 1-3 years, and >3 years after the index CS/TIA, respectively. Age (≥75 years), obesity, congestive heart failure, hypertension, coronary artery disease, peripheral vascular disease, and valve disease are significant risk factors, with the following respective odds ratios (95% CI): 1.73 (1.39-2.16), 1.53 (1.05-2.18), 3.34 (2.61-4.28), 2.01 (1.53-2.68), 1.72 (1.35-2.19), 1.37 (1.02-1.84), and 2.05 (1.55-2.69). A risk-scoring system, i.e., the HAVOC score, was constructed using these 7 clinical variables that successfully stratify patients into 3 risk groups, with good model discrimination (area under the curve = 0.77).Findings from this study support the strategy of looking longer and harder for AF in post-CS/TIA patients. The HAVOC score identifies different levels of AF risk and may be used to select patients for extended rhythm monitoring.

View details for DOI 10.1159/000476030

View details for Web of Science ID 000414421500001

View details for PubMedID 28654919

View details for PubMedCentralID PMC5683906
Machine Learning in Healthcare KEY ADVANCES IN CLINICAL INFORMATICS: TRANSFORMING HEALTH CARE THROUGH HEALTH INFORMATION TECHNOLOGY Callahan, A., Shah, N. H. edited by Sheikh, A., Cresswell, K. M., Wright, A., Bates, D. W. 2017: 279–91

View details for DOI 10.1016/B978-0-12-809523-2.00019-4

View details for Web of Science ID 000416260800020
Enhanced Quality Measurement Event Detection: An Application to Physician Reporting. EGEMS (Washington, DC) Tamang, S. R., Hernandez-Boussard, T. n., Ross, E. G., Gaskin, G. n., Patel, M. I., Shah, N. H. 2017; 5 (1): 5

Abstract

The wide-scale adoption of electronic health records (EHR)s has increased the availability of routinely collected clinical data in electronic form that can be used to improve the reporting of quality of care. However, the bulk of information in the EHR is in unstructured form (e.g., free-text clinical notes) and not amenable to automated reporting. Traditional methods are based on structured diagnostic and billing data that provide efficient, but inaccurate or incomplete summaries of actual or relevant care processes and patient outcomes. To assess the feasibility and benefit of implementing enhanced EHR- based physician quality measurement and reporting, which includes the analysis of unstructured free- text clinical notes, we conducted a retrospective study to compare traditional and enhanced approaches for reporting ten physician quality measures from multiple National Quality Strategy domains. We found that our enhanced approach enabled the calculation of five Physician Quality and Performance System measures not measureable in billing or diagnostic codes and resulted in over a five-fold increase in event at an average precision of 88 percent (95 percent CI: 83-93 percent). Our work suggests that enhanced EHR-based quality measurement can increase event detection for establishing value-based payment arrangements and can expedite quality reporting for physician practices, which are increasingly burdened by the process of manual chart review for quality reporting.

View details for PubMedID 29881731
Thematic issue of the Second combined Bio-ontologies and Phenotypes Workshop JOURNAL OF BIOMEDICAL SEMANTICS Verspoor, K., Oellrich, A., Collier, N., Groza, T., Rocca-Serra, P., Soldatova, L., Dumontier, M., Shah, N. 2016; 7

Abstract

This special issue covers selected papers from the 18th Bio-Ontologies Special Interest Group meeting and Phenotype Day, which took place at the Intelligent Systems for Molecular Biology (ISMB) conference in Dublin in 2015. The papers presented in this collection range from descriptions of software tools supporting ontology development and annotation of objects with ontology terms, to applications of text mining for structured relation extraction involving diseases and phenotypes, to detailed proposals for new ontologies and mapping of existing ontologies. Together, the papers consider a range of representational issues in bio-ontology development, and demonstrate the applicability of bio-ontologies to support biological and clinical knowledge-based decision making and analysis.The full set of papers in the Thematic Issue is available at http://www.biomedcentral.com/collections/sig .

View details for DOI 10.1186/s13326-016-0108-7

View details for Web of Science ID 000391059800001

View details for PubMedID 27955708

View details for PubMedCentralID PMC5154111
Synergistic drug combinations from electronic health records and gene expression. Journal of the American Medical Informatics Association Low, Y. S., Daugherty, A. C., Schroeder, E. A., Chen, W., Seto, T., Weber, S., Lim, M., Hastie, T., Mathur, M., Desai, M., Farrington, C., Radin, A. A., Sirota, M., Kenkare, P., Thompson, C. A., Yu, P. P., Gomez, S. L., Sledge, G. W., Kurian, A. W., Shah, N. H. 2016

Abstract

Using electronic health records (EHRs) and biomolecular data, we sought to discover drug pairs with synergistic repurposing potential. EHRs provide real-world treatment and outcome patterns, while complementary biomolecular data, including disease-specific gene expression and drug-protein interactions, provide mechanistic understanding.We applied Group Lasso INTERaction NETwork (glinternet), an overlap group lasso penalty on a logistic regression model, with pairwise interactions to identify variables and interacting drug pairs associated with reduced 5-year mortality using EHRs of 9945 breast cancer patients. We identified differentially expressed genes from 14 case-control human breast cancer gene expression datasets and integrated them with drug-protein networks. Drugs in the network were scored according to their association with breast cancer individually or in pairs. Lastly, we determined whether synergistic drug pairs found in the EHRs were enriched among synergistic drug pairs from gene-expression data using a method similar to gene set enrichment analysis.From EHRs, we discovered 3 drug-class pairs associated with lower mortality: anti-inflammatories and hormone antagonists, anti-inflammatories and lipid modifiers, and lipid modifiers and obstructive airway drugs. The first 2 pairs were also enriched among pairs discovered using gene expression data and are supported by molecular interactions in drug-protein networks and preclinical and epidemiologic evidence.This is a proof-of-concept study demonstrating that a combination of complementary data sources, such as EHRs and gene expression, can corroborate discoveries and provide mechanistic insight into drug synergism for repurposing.

View details for DOI 10.1093/jamia/ocw161

View details for PubMedID 27940607
The use of machine learning for the identification of peripheral artery disease and future mortality risk. Journal of vascular surgery Ross, E. G., Shah, N. H., Dalman, R. L., Nead, K. T., Cooke, J. P., Leeper, N. J. 2016; 64 (5): 1515-1522 e3

Abstract

A key aspect of the precision medicine effort is the development of informatics tools that can analyze and interpret "big data" sets in an automated and adaptive fashion while providing accurate and actionable clinical information. The aims of this study were to develop machine learning algorithms for the identification of disease and the prognostication of mortality risk and to determine whether such models perform better than classical statistical analyses.Focusing on peripheral artery disease (PAD), patient data were derived from a prospective, observational study of 1755 patients who presented for elective coronary angiography. We employed multiple supervised machine learning algorithms and used diverse clinical, demographic, imaging, and genomic information in a hypothesis-free manner to build models that could identify patients with PAD and predict future mortality. Comparison was made to standard stepwise linear regression models.Our machine-learned models outperformed stepwise logistic regression models both for the identification of patients with PAD (area under the curve, 0.87 vs 0.76, respectively; P = .03) and for the prediction of future mortality (area under the curve, 0.76 vs 0.65, respectively; P = .10). Both machine-learned models were markedly better calibrated than the stepwise logistic regression models, thus providing more accurate disease and mortality risk estimates.Machine learning approaches can produce more accurate disease classification and prediction models. These tools may prove clinically useful for the automated identification of patients with highly morbid diseases for which aggressive risk factor management can improve outcomes.

View details for DOI 10.1016/j.jvs.2016.04.026

View details for PubMedID 27266594

View details for PubMedCentralID PMC5079774
Influence of age on androgen deprivation therapy-associated Alzheimer's disease SCIENTIFIC REPORTS Nead, K. T., Gaskin, G., Chester, C., Swisher-McClure, S., Dudley, J. T., Leeper, N. J., Shah, N. H. 2016; 6

Abstract

We recently found an association between androgen deprivation therapy (ADT) and Alzheimer's disease. As Alzheimer's disease is a disease of advanced age, we hypothesize that older individuals on ADT may be at greatest risk. We conducted a retrospective multi-institutional analysis among 16,888 individuals with prostate cancer using an informatics approach. We tested the effect of ADT on Alzheimer's disease using Kaplan-Meier age stratified analyses in a propensity score matched cohort. We found a lower cumulative probability of remaining Alzheimer's disease-free between non-ADT users age ≥70 versus those age <70 years (p < 0.001) and between ADT versus non-ADT users ≥70 years (p = 0.034). The 5-year probability of developing Alzheimer's disease was 2.9%, 1.9% and 0.5% among ADT users ≥70, non-ADT users ≥70 and individuals <70 years, respectively. Compared to younger individuals older men on ADT may have the greatest absolute Alzheimer's disease risk. Future work should investigate the ADT Alzheimer's disease association in advanced age populations given the greater potential clinical impact.

View details for DOI 10.1038/srep35695

View details for Web of Science ID 000385588200002

View details for PubMedID 27752112

View details for PubMedCentralID PMC5067668
Association Between Androgen Deprivation Therapy and Risk of Dementia. JAMA oncology Nead, K. T., Gaskin, G., Chester, C., Swisher-McClure, S., Leeper, N. J., Shah, N. H. 2016

Abstract

A growing body of evidence supports a link between androgen deprivation therapy (ADT) and cognitive dysfunction, including Alzheimer disease. However, it is currently unknown whether ADT may contribute to the risk of dementia more broadly.To use an informatics approach to examine the association of ADT as a treatment for prostate cancer with the subsequent development of dementia (eg, senile dementia, vascular dementia, frontotemporal dementia, and Alzheimer dementia).In this cohort study, a text-processing method was used to analyze electronic medical record data from an academic medical center from 1994 to 2013, with a median follow-up of 3.4 years (interquartile range, 1.0-7.2 years). We identified 9455 individuals with prostate cancer who were 18 years or older at diagnosis with data recorded in the electronic health record and follow-up after diagnosis. We excluded 183 patients with a previous diagnosis of dementia. Our final cohort comprised 9272 individuals with prostate cancer, including 1826 men (19.7%) who received ADT.We tested the effect of ADT on the risk of dementia using propensity score-matched Cox proportional hazards regression models and Kaplan-Meier survival analysis.Among 9272 men with prostate cancer (mean [SD] age, 66.9 [10.9] years; 5450 [58.8%] white), there was a statistically significant association between use of ADT and risk of dementia (hazard ratio, 2.17; 95% CI, 1.58-2.99; P < .001). In sensitivity analyses, results were similar when excluding patients with Alzheimer disease (hazard ratio, 2.32; 95% CI, 1.73-3.12; P < .001). The absolute increased risk of developing dementia among those who received ADT was 4.4% at 5 years (7.9% among those who received ADT vs 3.5% in those who did not receive ADT). Analyses stratified by duration of ADT found that individuals with at least 12 months of ADT use had the greatest absolute increased risk of dementia (hazard ratio, 2.36; 95% CI, 1.64-3.38; P < .001). Kaplan-Meier analysis demonstrated that ADT users 70 years or older had the lowest cumulative probability of remaining dementia free (log-rank P < .001).Androgen deprivation therapy in the treatment of prostate cancer may be associated with an increased risk of dementia. This finding should be further evaluated in prospective studies.

View details for DOI 10.1001/jamaoncol.2016.3662

View details for PubMedID 27737437
Evolutionary Pressures on the Electronic Health Record: Caring for Complexity. JAMA Zulman, D. M., Shah, N. H., Verghese, A. 2016; 316 (9): 923-924

View details for DOI 10.1001/jama.2016.9538

View details for PubMedID 27532804
Impact of Predicting Health Care Utilization Via Web Search Behavior: A Data-Driven Analysis JOURNAL OF MEDICAL INTERNET RESEARCH Agarwal, V., Zhang, L., Zhu, J., Fang, S., Cheng, T., Hong, C., Shah, N. H. 2016; 18 (9): 241-253

Abstract

By recent estimates, the steady rise in health care costs has deprived more than 45 million Americans of health care services and has encouraged health care providers to better understand the key drivers of health care utilization from a population health management perspective. Prior studies suggest the feasibility of mining population-level patterns of health care resource utilization from observational analysis of Internet search logs; however, the utility of the endeavor to the various stakeholders in a health ecosystem remains unclear.The aim was to carry out a closed-loop evaluation of the utility of health care use predictions using the conversion rates of advertisements that were displayed to the predicted future utilizers as a surrogate. The statistical models to predict the probability of user's future visit to a medical facility were built using effective predictors of health care resource utilization, extracted from a deidentified dataset of geotagged mobile Internet search logs representing searches made by users of the Baidu search engine between March 2015 and May 2015.We inferred presence within the geofence of a medical facility from location and duration information from users' search logs and putatively assigned medical facility visit labels to qualifying search logs. We constructed a matrix of general, semantic, and location-based features from search logs of users that had 42 or more search days preceding a medical facility visit as well as from search logs of users that had no medical visits and trained statistical learners for predicting future medical visits. We then carried out a closed-loop evaluation of the utility of health care use predictions using the show conversion rates of advertisements displayed to the predicted future utilizers. In the context of behaviorally targeted advertising, wherein health care providers are interested in minimizing their cost per conversion, the association between show conversion rate and predicted utilization score, served as a surrogate measure of the model's utility.We obtained the highest area under the curve (0.796) in medical visit prediction with our random forests model and daywise features. Ablating feature categories one at a time showed that the model performance worsened the most when location features were dropped. An online evaluation in which advertisements were served to users who had a high predicted probability of a future medical visit showed a 3.96% increase in the show conversion rate.Results from our experiments done in a research setting suggest that it is possible to accurately predict future patient visits from geotagged mobile search logs. Results from the offline and online experiments on the utility of health utilization predictions suggest that such prediction can have utility for health care providers.

View details for DOI 10.2196/jmir.6240

View details for Web of Science ID 000384107200020

View details for PubMedID 27655225

View details for PubMedCentralID PMC5052461
The digital revolution in phenotyping BRIEFINGS IN BIOINFORMATICS Oellrich, A., Collier, N., Groza, T., Rebholz-Schuhmann, D., Shah, N., Bodenreider, O., Boland, M. R., Georgiev, I., Liu, H., Livingston, K., Luna, A., Mallon, A., Manda, P., Robinson, P. N., Rustici, G., Simon, M., Wang, L., Winnenburg, R., Dumontier, M. 2016; 17 (5): 819-830

Abstract

Phenotypes have gained increased notoriety in the clinical and biological domain owing to their application in numerous areas such as the discovery of disease genes and drug targets, phylogenetics and pharmacogenomics. Phenotypes, defined as observable characteristics of organisms, can be seen as one of the bridges that lead to a translation of experimental findings into clinical applications and thereby support 'bench to bedside' efforts. However, to build this translational bridge, a common and universal understanding of phenotypes is required that goes beyond domain-specific definitions. To achieve this ambitious goal, a digital revolution is ongoing that enables the encoding of data in computer-readable formats and the data storage in specialized repositories, ready for integration, enabling translational research. While phenome research is an ongoing endeavor, the true potential hidden in the currently available data still needs to be unlocked, offering exciting opportunities for the forthcoming years. Here, we provide insights into the state-of-the-art in digital phenotyping, by means of representing, acquiring and analyzing phenotype data. In addition, we provide visions of this field for future research work that could enable better applications of phenotype data.

View details for DOI 10.1093/bib/bbv083

View details for Web of Science ID 000386971500008

View details for PubMedID 26420780
Reply to R.L. Bowen et al, M. Froehner et al, J.L. Leow et al, and C. Brady et al. Journal of clinical oncology Nead, K. T., Gaskin, G., Chester, C., Swisher-McClure, S., Dudley, J. T., Leeper, N. J., Shah, N. H. 2016; 34 (23): 2804-2805

View details for DOI 10.1200/JCO.2016.67.9449

View details for PubMedID 27298415

View details for PubMedCentralID PMC5019764
Characterizing treatment pathways at scale using the OHDSI network PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Hripcsak, G., Ryan, P. B., Duke, J. D., Shah, N. H., Park, R. W., Huser, V., Suchard, M. A., Schuemie, M. J., DeFalco, F. J., Perotte, A., Banda, J. M., Reich, C. G., Schilling, L. M., Matheny, M. E., Meeker, D., Pratt, N., Madigan, D. 2016; 113 (27): 7329-7336

Abstract

Observational research promises to complement experimental research by providing large, diverse populations that would be infeasible for an experiment. Observational research can test its own clinical hypotheses, and observational studies also can contribute to the design of experiments and inform the generalizability of experimental research. Understanding the diversity of populations and the variance in care is one component. In this study, the Observational Health Data Sciences and Informatics (OHDSI) collaboration created an international data network with 11 data sources from four countries, including electronic health records and administrative claims data on 250 million patients. All data were mapped to common data standards, patient privacy was maintained by using a distributed model, and results were aggregated centrally. Treatment pathways were elucidated for type 2 diabetes mellitus, hypertension, and depression. The pathways revealed that the world is moving toward more consistent therapy over time across diseases and across locations, but significant heterogeneity remains among sources, pointing to challenges in generalizing clinical trial results. Diabetes favored a single first-line medication, metformin, to a much greater extent than hypertension or depression. About 10% of diabetes and depression patients and almost 25% of hypertension patients followed a treatment pathway that was unique within the cohort. Aside from factors such as sample size and underlying population (academic medical center versus general population), electronic health records data and administrative claims data revealed similar results. Large-scale international observational research is feasible.

View details for DOI 10.1073/pnas.1510502113

View details for Web of Science ID 000379021700036

View details for PubMedID 27274072

View details for PubMedCentralID PMC4941483
Generalized enrichment analysis improves the detection of adverse drug events from the biomedical literature BMC BIOINFORMATICS Winnenburg, R., Shah, N. H. 2016; 17

Abstract

Identification of associations between marketed drugs and adverse events from the biomedical literature assists drug safety monitoring efforts. Assessing the significance of such literature-derived associations and determining the granularity at which they should be captured remains a challenge. Here, we assess how defining a selection of adverse event terms from MeSH, based on information content, can improve the detection of adverse events for drugs and drug classes.We analyze a set of 105,354 candidate drug adverse event pairs extracted from article indexes in MEDLINE. First, we harmonize extracted adverse event terms by aggregating them into higher-level MeSH terms based on the terms' information content. Then, we determine statistical enrichment of adverse events associated with drug and drug classes using a conditional hypergeometric test that adjusts for dependencies among associated terms. We compare our results with methods based on disproportionality analysis (proportional reporting ratio, PRR) and quantify the improvement in signal detection with our generalized enrichment analysis (GEA) approach using a gold standard of drug-adverse event associations spanning 174 drugs and four events. For single drugs, the best GEA method (Precision: .92/Recall: .71/F1-measure: .80) outperforms the best PRR based method (.69/.69/.69) on all four adverse event outcomes in our gold standard. For drug classes, our GEA performs similarly (.85/.69/.74) when increasing the level of abstraction for adverse event terms. Finally, on examining the 1609 individual drugs in our MEDLINE set, which map to chemical substances in ATC, we find signals for 1379 drugs (10,122 unique adverse event associations) on applying GEA with p < 0.005.We present an approach based on generalized enrichment analysis that can be used to detect associations between drugs, drug classes and adverse events at a given level of granularity, at the same time correcting for known dependencies among events. Our study demonstrates the use of GEA, and the importance of choosing appropriate abstraction levels to complement current drug safety methods. We provide an R package for exploration of alternative abstraction levels of adverse event terms based on information content.

View details for DOI 10.1186/s12859-016-1080-z

View details for Web of Science ID 000378846600002

View details for PubMedID 27333889

View details for PubMedCentralID PMC4918084
Predictive modeling of risk factors and complications of cataract surgery. European journal of ophthalmology Gaskin, G. L., Pershing, S., Cole, T. S., Shah, N. H. 2016; 26 (4): 328-337

Abstract

Cataract surgery is generally safe; however, severe complications exist. Preexisting conditions are known to predispose patients to intraoperative and postoperative complications. This study quantifies the relationship between aggregated preoperative risk factors and cataract surgery complications, and builds a model predicting outcomes on an individual level, given a constellation of patient characteristics.This study utilized a retrospective cohort of patients age 40 years or older who received cataract surgery. Risk factors, complications, and demographic information were extracted from the Electronic Health Record based on International Classification of Diseases, 9th edition codes, Current Procedural Terminology codes, drug prescription information, and text data mining. We used a bootstrapped least absolute shrinkage and selection operator model to identify highly associated variables. We built random forest classifiers for each complication to create predictive models.Our data corroborated existing literature, including the association of intraoperative complications, complex cataract surgery, black race, and/or prior eye surgery with increased risk of any postoperative complications. We also found other, less well-described risk factors, including diabetes mellitus, young age (<60 years), and hyperopia, as risk factors for complex cataract surgery and intraoperative and postoperative complications. Our predictive models outperformed existing published models.The aggregated risk factors and complications described here can guide new avenues of research and provide specific, personalized risk assessment for a patient considering cataract surgery. Furthermore, the predictive capacity of our models can enable risk stratification of patients, which has utility as a teaching tool as well as informing quality/value-based reimbursements.

View details for DOI 10.5301/ejo.5000706

View details for PubMedID 26692059

View details for PubMedCentralID PMC4930873
Use of Predictive Analytics for the Identification of Latent Vascular Disease and Future Adverse Cardiac Events Ross, E. G., Shah, N., Dalman, R. L., Nead, K., Leeper, N. J. MOSBY-ELSEVIER. 2016: 28S–29S

View details for DOI 10.1016/j.jvs.2016.03.209

View details for Web of Science ID 000376230600042
Use of Machine Learning to Accurately Predict Adverse Events in Patients with Peripheral Artery Disease Using Electronic Health Record Data Ross, E. G., Shah, N., Leeper, N. SAGE PUBLICATIONS LTD. 2016: 290

View details for Web of Science ID 000377101000015
Statin Intensity or Achieved LDL? Practice-based Evidence for the Evaluation of New Cholesterol Treatment Guidelines PLOS ONE Ross, E. G., Shah, N., Leeper, N. 2016; 11 (5)

Abstract

The recently updated American College of Cardiology/American Heart Association cholesterol treatment guidelines outline a paradigm shift in the approach to cardiovascular risk reduction. One major change included a recommendation that practitioners prescribe fixed dose statin regimens rather than focus on specific LDL targets. The goal of this study was to determine whether achieved LDL or statin intensity was more strongly associated with major adverse cardiac events (MACE) using practice-based data from electronic health records (EHR).We analyzed the EHR data of more than 40,000 adult patients on statin therapy between 1995 and 2013. Demographic and clinical variables were extracted from coded data and unstructured clinical text. To account for treatment selection bias we performed propensity score stratification as well as 1:1 propensity score matched analyses. Conditional Cox proportional hazards modeling was used to identify variables associated with MACE.We identified 7,373 adults with complete data whose cholesterol appeared to be actively managed. In a stratified propensity score analysis of the entire cohort over 3.3 years of follow-up, achieved LDL was a significant predictor of MACE outcome (Hazard Ratio 1.1; 95% confidence interval, 1.05-1.2; P < 0.0004), while statin intensity was not. In a 1:1 propensity score matched analysis performed to more aggressively control for covariate balance between treatment groups, achieved LDL remained significantly associated with MACE (HR 1.3; 95% CI, 1.03-1.7; P = 0.03) while treatment intensity again was not a significant predictor.Using EHR data we found that on-treatment achieved LDL level was a significant predictor of MACE. Statin intensity alone was not associated with outcomes. These findings imply that despite recent guidelines, achieved LDL levels are clinically important and LDL titration strategies warrant further investigation in clinical trials.

View details for DOI 10.1371/journal.pone.0154952

View details for Web of Science ID 000376882500009

View details for PubMedID 27227451

View details for PubMedCentralID PMC4881915
Learning statistical models of phenotypes using noisy labeled training data. Journal of the American Medical Informatics Association Agarwal, V., Podchiyska, T., Banda, J. M., Goel, V., Leung, T. I., Minty, E. P., Sweeney, T. E., Gyang, E., Shah, N. H. 2016: -?

Abstract

Traditionally, patient groups with a phenotype are selected through rule-based definitions whose creation and validation are time-consuming. Machine learning approaches to electronic phenotyping are limited by the paucity of labeled training datasets. We demonstrate the feasibility of utilizing semi-automatically labeled training sets to create phenotype models via machine learning, using a comprehensive representation of the patient medical record.We use a list of keywords specific to the phenotype of interest to generate noisy labeled training data. We train L1 penalized logistic regression models for a chronic and an acute disease and evaluate the performance of the models against a gold standard.Our models for Type 2 diabetes mellitus and myocardial infarction achieve precision and accuracy of 0.90, 0.89, and 0.86, 0.89, respectively. Local implementations of the previously validated rule-based definitions for Type 2 diabetes mellitus and myocardial infarction achieve precision and accuracy of 0.96, 0.92 and 0.84, 0.87, respectively.We have demonstrated feasibility of learning phenotype models using imperfectly labeled data for a chronic and acute phenotype. Further research in feature engineering and in specification of the keyword list can improve the performance of the models and the scalability of the approach.Our method provides an alternative to manual labeling for creating training sets for statistical models of phenotypes. Such an approach can accelerate research with large observational healthcare datasets and may also be used to create local phenotype models.

View details for DOI 10.1093/jamia/ocw028

View details for PubMedID 27174893

View details for PubMedCentralID PMC5070523
Postmarket Surveillance of Point-of-Care Glucose Meters through Analysis of Electronic Medical Records CLINICAL CHEMISTRY Schroeder, L. F., Giacherio, D., Gianchandani, R., Engoren, M., Shah, N. H. 2016; 62 (5): 716-724

Abstract

The electronic medical record (EMR) holds a promising source of data for active postmarket surveillance of diagnostic accuracy, particularly for point-of-care (POC) devices. Through a comparison with prospective bedside and laboratory accuracy studies, we demonstrate the validity of active surveillance via an EMR data mining method [Data Mining EMRs to Evaluate Coincident Testing (DETECT)], comparing POC glucose results to near-in-time central laboratory glucose results.The Roche ACCU-CHEK Inform II(®) POC glucose meter was evaluated in a laboratory validation study (n = 73), a prospective bedside intensive care unit (ICU) study (n = 124), and with DETECT (n = 852-27 503). For DETECT, the EMR was queried for POC and central laboratory glucose results with filtering based on of bedside collection timestamps, central laboratory time delays, patient location, time period, absence of repeat testing, and presence of peripheral lines.DETECT and the bedside ICU study produced similar estimates of average bias (4.5 vs 5.0 mg/dL) and relative random error (6.3% vs 5.6%), with overlapping CIs. For glucose <100 mg/dL, the laboratory validation study estimated a lower relative random error of 3.6%. POC average bias correlated with central laboratory turnaround times, consistent with 4.8 mg · dL(-1) · h(-1) glycolysis. After glycolysis adjustment, average bias was estimated by the bedside ICU study at -0.4 mg/dL (CI, -1.6 to 0.9) and DETECT at -0.7 (CI, -1.3 to 0.2), and percentage POC results occurring outside Clinical Laboratory Standards Institute quality goals were 2.4% and 4.8%, respectively.This study validates DETECT for estimating POC glucose meter accuracy compared with a prospective bedside ICU study and establishes it as a reliable postmarket surveillance methodology.

View details for DOI 10.1373/clinchem.2015.251827

View details for Web of Science ID 000375173400014

View details for PubMedID 26988586
RegenBase: a knowledge base of spinal cord injury biology for translational research DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION Callahan, A., Abeyruwan, S. W., Al-Ali, H., Sakurai, K., Ferguson, A. R., Popovich, P. G., Shah, N. H., Visser, U., Bixby, J. L., Lemmon, V. P. 2016

Abstract

Spinal cord injury (SCI) research is a data-rich field that aims to identify the biological mechanisms resulting in loss of function and mobility after SCI, as well as develop therapies that promote recovery after injury. SCI experimental methods, data and domain knowledge are locked in the largely unstructured text of scientific publications, making large scale integration with existing bioinformatics resources and subsequent analysis infeasible. The lack of standard reporting for experiment variables and results also makes experiment replicability a significant challenge. To address these challenges, we have developed RegenBase, a knowledge base of SCI biology. RegenBase integrates curated literature-sourced facts and experimental details, raw assay data profiling the effect of compounds on enzyme activity and cell growth, and structured SCI domain knowledge in the form of the first ontology for SCI, using Semantic Web representation languages and frameworks. RegenBase uses consistent identifier schemes and data representations that enable automated linking among RegenBase statements and also to other biological databases and electronic resources. By querying RegenBase, we have identified novel biological hypotheses linking the effects of perturbagens to observed behavioral outcomes after SCI. RegenBase is publicly available for browsing, querying and download.Database URL:http://regenbase.org.

View details for DOI 10.1093/database/baw040

View details for Web of Science ID 000374094100001

View details for PubMedID 27055827

View details for PubMedCentralID PMC4823819
Comparing high-dimensional confounder control methods for rapid cohort studies from electronic health records JOURNAL OF COMPARATIVE EFFECTIVENESS RESEARCH Low, Y. S., Gallego, B., Shah, N. H. 2016; 5 (2): 179-192

Abstract

Electronic health records (EHR), containing rich clinical histories of large patient populations, can provide evidence for clinical decisions when evidence from trials and literature is absent. To enable such observational studies from EHR in real time, particularly in emergencies, rapid confounder control methods that can handle numerous variables and adjust for biases are imperative. This study compares the performance of 18 automatic confounder control methods.Methods include propensity scores, direct adjustment by machine learning, similarity matching and resampling in two simulated and one real-world EHR datasets.Direct adjustment by lasso regression and ensemble models involving multiple resamples have performance comparable to expert-based propensity scores and thus, may help provide real-time EHR-based evidence for timely clinical decisions. [Box: see text].

View details for DOI 10.2217/cer.15.53

View details for Web of Science ID 000372475700007
Harnessing next-generation informatics for personalizing medicine: a report from AMIA's 2014 Health Policy Invitational Meeting. Journal of the American Medical Informatics Association : JAMIA Wiley, L. K., Tarczy-Hornoch, P., Denny, J. C., Freimuth, R. R., Overby, C. L., Shah, N., Martin, R. D., Sarkar, I. N. 2016; 23 (2): 413-9

Abstract

The American Medical Informatics Association convened the 2014 Health Policy Invitational Meeting to develop recommendations for updates to current policies and to establish an informatics research agenda for personalizing medicine. In particular, the meeting focused on discussing informatics challenges related to personalizing care through the integration of genomic or other high-volume biomolecular data with data from clinical systems to make health care more efficient and effective. This report summarizes the findings (n = 6) and recommendations (n = 15) from the policy meeting, which were clustered into 3 broad areas: (1) policies governing data access for research and personalization of care; (2) policy and research needs for evolving data interpretation and knowledge representation; and (3) policy and research needs to ensure data integrity and preservation. The meeting outcome underscored the need to address a number of important policy and technical considerations in order to realize the potential of personalized or precision medicine in actual clinical contexts.

View details for DOI 10.1093/jamia/ocv111

View details for PubMedID 26911808

View details for PubMedCentralID PMC6457095
Comparing high-dimensional confounder control methods for rapid cohort studies from electronic health records. Journal of comparative effectiveness research Low, Y. S., Gallego, B., Shah, N. H. 2016; 5 (2): 179-192

Abstract

Electronic health records (EHR), containing rich clinical histories of large patient populations, can provide evidence for clinical decisions when evidence from trials and literature is absent. To enable such observational studies from EHR in real time, particularly in emergencies, rapid confounder control methods that can handle numerous variables and adjust for biases are imperative. This study compares the performance of 18 automatic confounder control methods.Methods include propensity scores, direct adjustment by machine learning, similarity matching and resampling in two simulated and one real-world EHR datasets.Direct adjustment by lasso regression and ensemble models involving multiple resamples have performance comparable to expert-based propensity scores and thus, may help provide real-time EHR-based evidence for timely clinical decisions. [Box: see text].

View details for DOI 10.2217/cer.15.53

View details for PubMedID 26634383
Androgen Deprivation Therapy and Future Alzheimer's Disease Risk. Journal of clinical oncology Nead, K. T., Gaskin, G., Chester, C., Swisher-McClure, S., Dudley, J. T., Leeper, N. J., Shah, N. H. 2016; 34 (6): 566-571

Abstract

To test the association of androgen deprivation therapy (ADT) in the treatment of prostate cancer with subsequent Alzheimer's disease risk.We used a previously validated and implemented text-processing pipeline to analyze electronic medical record data in a retrospective cohort of patients at Stanford University and Mt. Sinai hospitals. Specifically, we extracted International Classification of Diseases-9th revision diagnosis and Current Procedural Terminology codes, medication lists, and positive-present mentions of drug and disease concepts from all clinical notes. We then tested the effect of ADT on risk of Alzheimer's disease using 1:5 propensity score-matched and traditional multivariable-adjusted Cox proportional hazards models. The duration of ADT use was also tested for association with Alzheimer's disease risk.There were 16,888 individuals with prostate cancer meeting all inclusion and exclusion criteria, with 2,397 (14.2%) receiving ADT during a median follow-up period of 2.7 years (interquartile range, 1.0-5.4 years). Propensity score-matched analysis (hazard ratio, 1.88; 95% CI, 1.10 to 3.20; P = .021) and traditional multivariable-adjusted Cox regression analysis (hazard ratio, 1.66; 95% CI, 1.05 to 2.64; P = .031) both supported a statistically significant association between ADT use and Alzheimer's disease risk. We also observed a statistically significant increased risk of Alzheimer's disease with increasing duration of ADT (P = .016).Our results support an association between the use of ADT in the treatment of prostate cancer and an increased risk of Alzheimer's disease in a general population cohort. This study demonstrates the utility of novel methods to analyze electronic medical record data to generate practice-based evidence.

View details for DOI 10.1200/JCO.2015.63.6266

View details for PubMedID 26644522
Reply. Gastroenterology Shah, N. H., Cooke, J. P., Leeper, N. J. 2016; 150 (2): 528-?

View details for DOI 10.1053/j.gastro.2015.12.017

View details for PubMedID 26721609
Distribution of Opioids by Different Types of Medicare Prescribers. JAMA internal medicine Chen, J. H., Humphreys, K., Shah, N. H., Lembke, A. 2016; 176 (2): 259-61

View details for DOI 10.1001/jamainternmed.2015.6662

View details for PubMedID 26658497

View details for PubMedCentralID PMC5374118
An unsupervised learning method to identify reference intervals from a clinical database. Journal of biomedical informatics Poole, S., Schroeder, L. F., Shah, N. 2016; 59: 276-284

Abstract

Reference intervals are critical for the interpretation of laboratory results. The development of reference intervals using traditional methods is time consuming and costly. An alternative approach, known as an a posteriori method, requires an expert to enumerate diagnoses and procedures that can affect the measurement of interest. We develop a method, LIMIT, to use laboratory test results from a clinical database to identify ICD9 codes that are associated with extreme laboratory results, thus automating the a posteriori method. LIMIT was developed using sodium serum levels, and validated using potassium serum levels, both tests for which harmonized reference intervals already exist. To test LIMIT, reference intervals for total hemoglobin in whole blood were learned, and were compared with the hemoglobin reference intervals found using an existing a posteriori approach. In addition, prescription of iron supplements were used to identify individuals whose hemoglobin levels were low enough for a clinician to choose to take action. This prescription data indicating clinical action was then used to estimate the validity of the hemoglobin reference interval sets. Results show that LIMIT produces usable reference intervals for sodium, potassium and hemoglobin laboratory tests. The hemoglobin intervals produced using the data driven approaches consistently had higher positive predictive value and specificity in predicting an iron supplement prescription than the existing intervals. LIMIT represents a fast and inexpensive solution for calculating reference intervals, and shows that it is possible to use laboratory results and coded diagnoses to learn laboratory test reference intervals from clinical data warehouses.

View details for DOI 10.1016/j.jbi.2015.12.010

View details for PubMedID 26707631

View details for PubMedCentralID PMC4792744
Feasibility of Prioritizing Drug-Drug-Event Associations Found in Electronic Health Records. Drug safety Banda, J. M., Callahan, A., Winnenburg, R., Strasberg, H. R., Cami, A., Reis, B. Y., Vilar, S., Hripcsak, G., Dumontier, M., Shah, N. H. 2016; 39 (1): 45-57

Abstract

Several studies have demonstrated the ability to detect adverse events potentially related to multiple drug exposure via data mining. However, the number of putative associations produced by such computational approaches is typically large, making experimental validation difficult. We theorized that those potential associations for which there is evidence from multiple complementary sources are more likely to be true, and explored this idea using a published database of drug-drug-adverse event associations derived from electronic health records (EHRs).We prioritized drug-drug-event associations derived from EHRs using four sources of information: (1) public databases, (2) sources of spontaneous reports, (3) literature, and (4) non-EHR drug-drug interaction (DDI) prediction methods. After pre-filtering the associations by removing those found in public databases, we devised a ranking for associations based on the support from the remaining sources, and evaluated the results of this rank-based prioritization.We collected information for 5983 putative EHR-derived drug-drug-event associations involving 345 drugs and ten adverse events from four data sources and four prediction methods. Only seven drug-drug-event associations (<0.5 %) had support from the majority of evidence sources, and about one third (1777) had support from at least one of the evidence sources.Our proof-of-concept method for scoring putative drug-drug-event associations from EHRs offers a systematic and reproducible way of prioritizing associations for further study. Our findings also quantify the agreement (or lack thereof) among complementary sources of evidence for drug-drug-event associations and highlight the challenges of developing a robust approach for prioritizing signals of these associations.

View details for DOI 10.1007/s40264-015-0352-2

View details for PubMedID 26446143
DISCOVERY OF MOLECULARLY TARGETED THERAPIES Regan, K., Abrams, Z., Sharpnack, M., Srivastava, A., Huang, K., Shah, N., Payne, P. R. O. edited by Altman, R. B., Dunker, A. K., Hunter, L., Ritchie, M. D., Murray, T., Klein, T. E. WORLD SCIENTIFIC PUBL CO PTE LTD. 2016: 1-8

View details for Web of Science ID 000386326200001

View details for PubMedID 26776168

View details for PubMedCentralID PMC4874173
Predicting hospital visits from geo-tagged Internet search logs. AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science Agarwal, V., Han, L., Madan, I., Saluja, S., Shidham, A., Shah, N. H. 2016; 2016: 15-24

Abstract

The steady rise in healthcare costs has deprived over 45 million Americans of healthcare services (1, 2) and has encouraged healthcare providers to look for opportunities to improve their operational efficiency. Prior studies have shown that evidence of healthcare seeking intent in Internet searches correlates well with healthcare resource utilization. Given the ubiquitous nature of mobile Internet search, we hypothesized that analyzing geo-tagged mobile search logs could enable us to machine-learn predictors of future patient visits. Using a de-identified dataset of geo-tagged mobile Internet search logs, we mined text and location patterns that are predictors of healthcare resource utilization and built statistical models that predict the probability of a user's future visit to a medical facility. Our efforts will enable the development of innovative methods for modeling and optimizing the use of healthcare resources-a crucial prerequisite for securing healthcare access for everyone in the days to come.

View details for PubMedID 27570641
LEARNING ATTRIBUTES OF DISEASE PROGRESSION FROM TRAJECTORIES OF SPARSE LAB VALUES. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Agarwal, V., Shah, N. H. 2016; 22: 184-194

Abstract

There is heterogeneity in the manifestation of diseases, therefore it is essential to understand the patterns of progression of a disease in a given population for disease management as well as for clinical research. Disease status is often summarized by repeated recordings of one or more physiological measures. As a result, historical values of these physiological measures for a population sample can be used to characterize disease progression patterns. We use a method for clustering sparse functional data for identifying sub-groups within a cohort of patients with chronic kidney disease (CKD), based on the trajectories of their Creatinine measurements. We demonstrate through a proof-of-principle study how the two sub-groups that display distinct patterns of disease progression may be compared on clinical attributes that correspond to the maximum difference in progression patterns. The key attributes that distinguish the two sub-groups appear to have support in published literature clinical practice related to CKD.

View details for PubMedID 27896974
Learning Effective Treatment Pathways for Type-2 Diabetes from a clinical data warehouse. AMIA ... Annual Symposium proceedings. AMIA Symposium Vashisht, R., Jung, K., Shah, N. 2016; 2016: 2036-2042

Abstract

Treatment guidelines for management of type-2 diabetes mellitus (T2DM) are controversial because existing evidence from randomized clinical trials do not address many important clinical questions. Data from Electronic Medical Records (EMRs) has been used to profile first line therapy choices, but this work did not elucidate the factors underlying deviations from current treatment guidelines and the relative efficacy of different treatment options. We have used data from the Stanford Hospital to attempt to address these issues. Clinical features associated with the initial choice of treatment were effectively re-discovered using a machine learning approach. In addition, the efficacies of first and second line treatments were evaluated using Cox proportional hazard models for control of Hemoglobin A1c. Factors such as acute kidney disorder and liver disorder were predictive of first line therapy choices. Sitagliptin was the most effective second-line therapy, and as effective as metformin as a first line therapy.

View details for PubMedID 28269963
New Paradigms for Patient-Centered Outcomes Research in Electronic Medical Records: An Example of Detecting Urinary Incontinence Following Prostatectomy. EGEMS (Washington, DC) Hernandez-Boussard, T., Tamang, S., Blayney, D., Brooks, J., Shah, N. 2016; 4 (3): 1231-?

Abstract

National initiatives to develop quality metrics emphasize the need to include patient-centered outcomes. Patient-centered outcomes are complex, require documentation of patient communications, and have not been routinely collected by healthcare providers. The widespread implementation of electronic medical records (EHR) offers opportunities to assess patient-centered outcomes within the routine healthcare delivery system. The objective of this study was to test the feasibility and accuracy of identifying patient centered outcomes within the EHR.Data from patients with localized prostate cancer undergoing prostatectomy were used to develop and test algorithms to accurately identify patient-centered outcomes in post-operative EHRs - we used urinary incontinence as the use case. Standard data mining techniques were used to extract and annotate free text and structured data to assess urinary incontinence recorded within the EHRs.A total 5,349 prostate cancer patients were identified in our EHR-system between 1998-2013. Among these EHRs, 30.3% had a text mention of urinary incontinence within 90 days post-operative compared to less than 1.0% with a structured data field for urinary incontinence (i.e. ICD-9 code). Our workflow had good precision and recall for urinary incontinence (positive predictive value: 0.73 and sensitivity: 0.84).Our data indicate that important patient-centered outcomes, such as urinary incontinence, are being captured in EHRs as free text and highlight the long-standing importance of accurate clinician documentation. Standard data mining algorithms can accurately and efficiently identify these outcomes in existing EHRs; the complete assessment of these outcomes is essential to move practice into the patient-centered realm of healthcare.

View details for DOI 10.13063/2327-9214.1231

View details for PubMedID 27347492
A curated and standardized adverse drug event resource to accelerate drug safety research. Scientific data Banda, J. M., Evans, L., Vanguri, R. S., Tatonetti, N. P., Ryan, P. B., Shah, N. H. 2016; 3: 160026-?

Abstract

Identification of adverse drug reactions (ADRs) during the post-marketing phase is one of the most important goals of drug safety surveillance. Spontaneous reporting systems (SRS) data, which are the mainstay of traditional drug safety surveillance, are used for hypothesis generation and to validate the newer approaches. The publicly available US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) data requires substantial curation before they can be used appropriately, and applying different strategies for data cleaning and normalization can have material impact on analysis results. We provide a curated and standardized version of FAERS removing duplicate case records, applying standardized vocabularies with drug names mapped to RxNorm concepts and outcomes mapped to SNOMED-CT concepts, and pre-computed summary statistics about drug-outcome relationships for general consumption. This publicly available resource, along with the source code, will accelerate drug safety research by reducing the amount of time spent performing data management on the source FAERS reports, improving the quality of the underlying data, and enabling standardized analyses using common vocabularies.

View details for DOI 10.1038/sdata.2016.26

View details for PubMedID 27193236

View details for PubMedCentralID PMC4872271
Rapid identification of slow healing wounds. Wound repair and regeneration Jung, K., Covington, S., Sen, C. K., Januszyk, M., Kirsner, R. S., Gurtner, G. C., Shah, N. H. 2016; 24 (1): 181-188

Abstract

Chronic nonhealing wounds have a prevalence of 2% in the United States, and cost an estimated $50 billion annually. Accurate stratification of wounds for risk of slow healing may help guide treatment and referral decisions. We have applied modern machine learning methods and feature engineering to develop a predictive model for delayed wound healing that uses information collected during routine care in outpatient wound care centers. Patient and wound data was collected at 68 outpatient wound care centers operated by Healogics Inc. in 26 states between 2009 and 2013. The dataset included basic demographic information on 59,953 patients, as well as both quantitative and categorical information on 180,696 wounds. Wounds were split into training and test sets by randomly assigning patients to training and test sets. Wounds were considered delayed with respect to healing time if they took more than 15 weeks to heal after presentation at a wound care center. Eleven percent of wounds in this dataset met this criterion. Prognostic models were developed on training data available in the first week of care to predict delayed healing wounds. A held out subset of the training set was used for model selection, and the final model was evaluated on the test set to evaluate discriminative power and calibration. The model achieved an area under the curve of 0.842 (95% confidence interval 0.834-0.847) for the delayed healing outcome and a Brier reliability score of 0.00018. Early, accurate prediction of delayed healing wounds can improve patient care by allowing clinicians to increase the aggressiveness of intervention in patients most at risk.

View details for DOI 10.1111/wrr.12384

View details for PubMedID 26606167

View details for PubMedCentralID PMC4820011
DISCOVERING PATIENT PHENOTYPES USING GENERALIZED LOW RANK MODELS. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Schuler, A., Liu, V., Wan, J., Callahan, A., Udell, M., Stark, D. E., Shah, N. H. 2016; 21: 144-155

Abstract

The practice of medicine is predicated on discovering commonalities or distinguishing characteristics among patients to inform corresponding treatment. Given a patient grouping (hereafter referred to as a phenotype), clinicians can implement a treatment pathway accounting for the underlying cause of disease in that phenotype. Traditionally, phenotypes have been discovered by intuition, experience in practice, and advancements in basic science, but these approaches are often heuristic, labor intensive, and can take decades to produce actionable knowledge. Although our understanding of disease has progressed substantially in the past century, there are still important domains in which our phenotypes are murky, such as in behavioral health or in hospital settings. To accelerate phenotype discovery, researchers have used machine learning to find patterns in electronic health records, but have often been thwarted by missing data, sparsity, and data heterogeneity. In this study, we use a flexible framework called Generalized Low Rank Modeling (GLRM) to overcome these barriers and discover phenotypes in two sources of patient data. First, we analyze data from the 2010 Healthcare Cost and Utilization Project National Inpatient Sample (NIS), which contains upwards of 8 million hospitalization records consisting of administrative codes and demographic information. Second, we analyze a small (N=1746), local dataset documenting the clinical progression of autism spectrum disorder patients using granular features from the electronic health record, including text from physician notes. We demonstrate that low rank modeling successfully captures known and putative phenotypes in these vastly different datasets.

View details for PubMedID 26776181
Rapid identification of slow healing wounds WOUND REPAIR AND REGENERATION Jung, K., Covington, S., Sen, C. K., Januszyk, M., Kirsner, R. S., Gurtner, G. C., Shah, N. H. 2016; 24 (1): 181-188

Abstract

Chronic nonhealing wounds have a prevalence of 2% in the United States, and cost an estimated $50 billion annually. Accurate stratification of wounds for risk of slow healing may help guide treatment and referral decisions. We have applied modern machine learning methods and feature engineering to develop a predictive model for delayed wound healing that uses information collected during routine care in outpatient wound care centers. Patient and wound data was collected at 68 outpatient wound care centers operated by Healogics Inc. in 26 states between 2009 and 2013. The dataset included basic demographic information on 59,953 patients, as well as both quantitative and categorical information on 180,696 wounds. Wounds were split into training and test sets by randomly assigning patients to training and test sets. Wounds were considered delayed with respect to healing time if they took more than 15 weeks to heal after presentation at a wound care center. Eleven percent of wounds in this dataset met this criterion. Prognostic models were developed on training data available in the first week of care to predict delayed healing wounds. A held out subset of the training set was used for model selection, and the final model was evaluated on the test set to evaluate discriminative power and calibration. The model achieved an area under the curve of 0.842 (95% confidence interval 0.834-0.847) for the delayed healing outcome and a Brier reliability score of 0.00018. Early, accurate prediction of delayed healing wounds can improve patient care by allowing clinicians to increase the aggressiveness of intervention in patients most at risk.

View details for DOI 10.1111/wrr.12384

View details for Web of Science ID 000372925500018

View details for PubMedCentralID PMC4820011
Special issue on bio-ontologies and phenotypes JOURNAL OF BIOMEDICAL SEMANTICS Soldatova, L. N., Collier, N., Oellrich, A., Groza, T., Verspoor, K., Rocca-Serra, P., Dumontier, M., Shah, N. H. 2015; 6

Abstract

The bio-ontologies and phenotypes special issue includes eight papers selected from the 11 papers presented at the Bio-Ontologies SIG (Special Interest Group) and the Phenotype Day at ISMB (Intelligent Systems for Molecular Biology) conference in Boston in 2014. The selected papers span a wide range of topics including the automated re-use and update of ontologies, quality assessment of ontological resources, and the systematic description of phenotype variation, driven by manual, semi- and fully automatic means.

View details for DOI 10.1186/s13326-015-0040-2

View details for PubMedID 26682035

View details for PubMedCentralID PMC4682270
Implications of non-stationarity on predictive modeling using EHRs JOURNAL OF BIOMEDICAL INFORMATICS Jung, K., Shah, N. H. 2015; 58: 168-174

Abstract

The rapidly increasing volume of clinical information captured in Electronic Health Records (EHRs) has led to the application of increasingly sophisticated models for purposes such as disease subtype discovery and predictive modeling. However, increasing adoption of EHRs implies that in the near future, much of the data available for such purposes will be from a time period during which both the practice of medicine and the clinical use of EHRs are in flux due to historic changes in both technology and incentives. In this work, we explore the implications of this phenomenon, called non-stationarity, on predictive modeling. We focus on the problem of predicting delayed wound healing using data available in the EHR during the first week of care in outpatient wound care centers, using a large dataset covering over 150,000 individual wounds and 59,958 patients seen over a period of four years. We manipulate the degree of non-stationarity seen by the model development process by changing the way data is split into training and test sets. We demonstrate that non-stationarity can lead to quite different conclusions regarding the relative merits of different models with respect to predictive power and calibration of their posterior probabilities. Under the non-stationarity exhibited in this dataset, the performance advantage of complex methods such as stacking relative to the best simple classifier disappears. Ignoring non-stationarity can thus lead to sub-optimal model selection in this task.

View details for DOI 10.1016/j.jbi.2015.10.006

View details for PubMedID 26483171

View details for PubMedCentralID PMC4684770
A method for systematic discovery of adverse drug events from clinical notes. Journal of the American Medical Informatics Association Wang, G., Jung, K., Winnenburg, R., Shah, N. H. 2015; 22 (6): 1196-1204

Abstract

Adverse drug events (ADEs) are undesired harmful effects resulting from use of a medication, and occur in 30% of hospitalized patients. The authors have developed a data-mining method for systematic, automated detection of ADEs from electronic medical records.This method uses the text from 9.5 million clinical notes, along with prior knowledge of drug usages and known ADEs, as inputs. These inputs are further processed into statistics used by a discriminative classifier which outputs the probability that a given drug-disorder pair represents a valid ADE association. Putative ADEs identified by the classifier are further filtered for positive support in 2 independent, complementary data sources. The authors evaluate this method by assessing support for the predictions in other curated data sources, including a manually curated, time-indexed reference standard of label change events.This method uses a classifier that achieves an area under the curve of 0.94 on a held out test set. The classifier is used on 2 362 950 possible drug-disorder pairs comprised of 1602 unique drugs and 1475 unique disorders for which we had data, resulting in 240 high-confidence, well-supported drug-AE associations. Eighty-seven of them (36%) are supported in at least one of the resources that have information that was not available to the classifier.This method demonstrates the feasibility of systematic post-marketing surveillance for ADEs using electronic medical records, a key component of the learning healthcare system.

View details for DOI 10.1093/jamia/ocv102

View details for PubMedID 26232442
Pattern mining of drug prescriptions suggests complications from chronic opioid use Low, Y., Podchiyska, T., Shah, N., Lembke, A. WILEY-BLACKWELL. 2015: 128

View details for Web of Science ID 000369980200221
Proton pump inhibitors and vascular function: A prospective cross-over pilot study VASCULAR MEDICINE Ghebremariam, Y. T., Cooke, J. P., Khan, F., Thakker, R. N., Chang, P., Shah, N. H., Nead, K. T., Leeper, N. J. 2015; 20 (4): 309-316

Abstract

Proton pump inhibitors (PPIs) are commonly used drugs for the treatment of gastric reflux. Recent retrospective cohorts and large database studies have raised concern that the use of PPIs is associated with increased cardiovascular (CV) risk. However, there is no prospective clinical study evaluating whether the use of PPIs directly causes CV harm. We conducted a controlled, open-label, cross-over pilot study among 21 adults aged 18 and older who are healthy (n=11) or have established clinical cardiovascular disease (n=10). Study subjects were assigned to receive a PPI (Prevacid; 30 mg) or a placebo pill once daily for 4 weeks. After a 2-week washout period, participants were crossed over to receive the alternate treatment for the ensuing 4 weeks. Subjects underwent evaluation of vascular function (by the EndoPAT technique) and had plasma levels of asymmetric dimethylarginine (ADMA, an endogenous inhibitor of endothelial function previously implicated in PPI-mediated risk) measured prior to and after each treatment interval. We observed a marginal inverse correlation between the EndoPAT score and plasma levels of ADMA (r = -0.364). Subjects experienced a greater worsening in plasma ADMA levels while on PPI than on placebo, and this trend was more pronounced amongst those subjects with a history of vascular disease. However, these trends did not reach statistical significance, and PPI use was also not associated with an impairment in flow-mediated vasodilation during the course of this study. In conclusion, in this open-label, cross-over pilot study conducted among healthy subjects and coronary disease patients, PPI use did not significantly influence vascular endothelial function. Larger, long-term and blinded trials are needed to mechanistically explain the correlation between PPI use and adverse clinical outcomes, which has recently been reported in retrospective cohort studies.

View details for DOI 10.1177/1358863X14568444

View details for Web of Science ID 000359414300001

View details for PubMedID 25835348

View details for PubMedCentralID PMC4572842
Text-mining methods applied to clinical records support an association between androgen deprivation therapy and subsequent cardiometabolic disease Nead, K. T., Gaskin, G. L., Chester, C., Shah, N. H., Leeper, N. J. AMER ASSOC CANCER RESEARCH. 2015

View details for DOI 10.1158/1538-7445.AM2015-5577

View details for Web of Science ID 000371597106236
Analyzing Information Seeking and Drug-Safety Alert Response by Health Care Professionals as New Methods for Surveillance JOURNAL OF MEDICAL INTERNET RESEARCH Callahan, A., Pernek, I., Stiglic, G., Leskovec, J., Strasberg, H. R., Shah, N. H. 2015; 17 (8)

Abstract

Patterns in general consumer online search logs have been used to monitor health conditions and to predict health-related activities, but the multiple contexts within which consumers perform online searches make significant associations difficult to interpret. Physician information-seeking behavior has typically been analyzed through survey-based approaches and literature reviews. Activity logs from health care professionals using online medical information resources are thus a valuable yet relatively untapped resource for large-scale medical surveillance.To analyze health care professionals' information-seeking behavior and assess the feasibility of measuring drug-safety alert response from the usage logs of an online medical information resource.Using two years (2011-2012) of usage logs from UpToDate, we measured the volume of searches related to medical conditions with significant burden in the United States, as well as the seasonal distribution of those searches. We quantified the relationship between searches and resulting page views. Using a large collection of online mainstream media articles and Web log posts we also characterized the uptake of a Food and Drug Administration (FDA) alert via changes in UpToDate search activity compared with general online media activity related to the subject of the alert.Diseases and symptoms dominate UpToDate searches. Some searches result in page views of only short duration, while others consistently result in longer-than-average page views. The response to an FDA alert for Celexa, characterized by a change in UpToDate search activity, differed considerably from general online media activity. Changes in search activity appeared later and persisted longer in UpToDate logs. The volume of searches and page view durations related to Celexa before the alert also differed from those after the alert.Understanding the information-seeking behavior associated with online evidence sources can offer insight into the information needs of health professionals and enable large-scale medical surveillance. Our Web log mining approach has the potential to monitor responses to FDA alerts at a national level. Our findings can also inform the design and content of evidence-based medical information resources such as UpToDate.

View details for DOI 10.2196/jmir.4427

View details for Web of Science ID 000360306600007
Proton Pump Inhibitor Usage and the Risk of Myocardial Infarction in the General Population PLOS ONE Shah, N. H., LePendu, P., Bauer-Mehren, A., Ghebremariam, Y. T., Iyer, S. V., Marcus, J., Nead, K. T., Cooke, J. P., Leeper, N. J. 2015; 10 (6)

Abstract

Proton pump inhibitors (PPIs) have been associated with adverse clinical outcomes amongst clopidogrel users after an acute coronary syndrome. Recent pre-clinical results suggest that this risk might extend to subjects without any prior history of cardiovascular disease. We explore this potential risk in the general population via data-mining approaches.Using a novel approach for mining clinical data for pharmacovigilance, we queried over 16 million clinical documents on 2.9 million individuals to examine whether PPI usage was associated with cardiovascular risk in the general population.In multiple data sources, we found gastroesophageal reflux disease (GERD) patients exposed to PPIs to have a 1.16 fold increased association (95% CI 1.09-1.24) with myocardial infarction (MI). Survival analysis in a prospective cohort found a two-fold (HR = 2.00; 95% CI 1.07-3.78; P = 0.031) increase in association with cardiovascular mortality. We found that this association exists regardless of clopidogrel use. We also found that H2 blockers, an alternate treatment for GERD, were not associated with increased cardiovascular risk; had they been in place, such pharmacovigilance algorithms could have flagged this risk as early as the year 2000.Consistent with our pre-clinical findings that PPIs may adversely impact vascular function, our data-mining study supports the association of PPI exposure with risk for MI in the general population. These data provide an example of how a combination of experimental studies and data-mining approaches can be applied to prioritize drug safety signals for further investigation.

View details for DOI 10.1371/journal.pone.0124653

View details for Web of Science ID 000355979500007

View details for PubMedID 26061035

View details for PubMedCentralID PMC4462578
What Matters Most, Statin Intensity or Achieved LDL? - Evaluating Concordance of AHA/ACC Guidelines for Statin Use with Practice Outcomes at Stanford Hospital & Clinics Gyang, E., Shah, N., Leeper, N. SAGE PUBLICATIONS LTD. 2015: 295

View details for Web of Science ID 000355334000049
A formal concept analysis and semantic query expansion cooperation to refine health outcomes of interest BMC MEDICAL INFORMATICS AND DECISION MAKING Cure, O. C., Maurer, H., Shah, N. H., Le Pendu, P. 2015; 15

Abstract

Electronic Health Records (EHRs) are frequently used by clinicians and researchers to search for, extract, and analyze groups of patients by defining Health Outcome of Interests (HOI). The definition of an HOI is generally considered a complex and time consuming task for health care professionals.In our clinical note-based pharmacovigilance research, we often operate upon potentially hundreds of ontologies at once, expand query inputs, and we also increase the search space over clinical text as well as structured data. Such a method implies to specify an initial set of seed concepts, which are based on concept unique identifiers. This paper presents a novel method based on Formal Concept Analysis (FCA) and Semantic Query Expansion (SQE) to assist the end-user in defining their seed queries and in refining the expanded search space that it encompasses.We evaluate our method over a gold-standard corpus from the 2008 i2b2 Obesity Challenge. This experimentation emphasizes positive results for sensitivity and specificity measures. Our new approach provides better recall with high precision of the obtained results. The most promising aspect of this approach consists in the discovery of positive results not present our Obesity NLP reference set.Together with a Web graphical user interface, our FCA and SQE cooperation end up being an efficient approach for refining health outcome of interest using plain terms. We consider that this approach can be extended to support other domains such as cohort building tools.

View details for DOI 10.1186/1472-6947-15-S1-S8

View details for Web of Science ID 000367479300008

View details for PubMedCentralID PMC4460622
Using clinical data text-mining analysis to examine the association between androgen deprivation therapy and depression. Nead, K. T., Gaskin, G. L., Chester, C., Leeper, N. J., Shah, N. H. AMER SOC CLINICAL ONCOLOGY. 2015

View details for DOI 10.1200/jco.2015.33.15_suppl.e12595

View details for Web of Science ID 000358036902399
Lymphopenia after adjuvant radiotherapy (RT) to predict poor survival in triple-negative breast cancer (TNBC). Afghahi, A., Mathur, M., Seto, T., Desai, M., Kenkare, P., Horst, K. C., Das, A. K., Thompson, C. A., Luft, H. S., Yu, P., Gomez, S., Low, Y., Shah, N. H., Kurian, A. W., Sledge, G. W. AMER SOC CLINICAL ONCOLOGY. 2015

View details for Web of Science ID 000358036900228
Detecting unplanned care from clinician notes in electronic health records. Journal of oncology practice / American Society of Clinical Oncology Tamang, S., Patel, M. I., Blayney, D. W., Kuznetsov, J., Finlayson, S. G., Vetteth, Y., Shah, N. 2015; 11 (3): e313-9

Abstract

Reduction in unplanned episodes of care, such as emergency department visits and unplanned hospitalizations, are important quality outcome measures. However, many events are only documented in free-text clinician notes and are labor intensive to detect by manual medical record review.We studied 308,096 free-text machine-readable documents linked to individual entries in our electronic health records, representing care for patients with breast, GI, or thoracic cancer, whose treatment was initiated at one academic medical center, Stanford Health Care (SHC). Using a clinical text-mining tool, we detected unplanned episodes documented in clinician notes (for non-SHC visits) or in coded encounter data for SHC-delivered care and the most frequent symptoms documented in emergency department (ED) notes.Combined reporting increased the identification of patients with one or more unplanned care visits by 32% (15% using coded data; 20% using all the data) among patients with 3 months of follow-up and by 21% (23% using coded data; 28% using all the data) among those with 1 year of follow-up. Based on the textual analysis of SHC ED notes, pain (75%), followed by nausea (54%), vomiting (47%), infection (36%), fever (28%), and anemia (27%), were the most frequent symptoms mentioned. Pain, nausea, and vomiting co-occur in 35% of all ED encounter notes.The text-mining methods we describe can be applied to automatically review free-text clinician notes to detect unplanned episodes of care mentioned in these notes. These methods have broad application for quality improvement efforts in which events of interest occur outside of a network that allows for patient data sharing.

View details for DOI 10.1200/JOP.2014.002741

View details for PubMedID 25980019

View details for PubMedCentralID PMC4438112
Comment on: "Zoo or savannah? Choice of training ground for evidence-based pharmacovigilance". Drug safety Harpaz, R., DuMouchel, W., Shah, N. H. 2015; 38 (1): 113-114

View details for DOI 10.1007/s40264-014-0245-9

View details for PubMedID 25432779
Provenance-Centered Dataset of Drug-Drug Interactions Banda, J. M., Kuhn, T., Shah, N. H., Dumontier, M. edited by Arenas, M., Corcho, O., Simperl, E., Strohmaier, M., DAquin, M., Srinivas, K., Groth, P., Dumontier, M., Heflin, J., Thirunarayan, K., Staab, S. SPRINGER INTERNATIONAL PUBLISHING AG. 2015: 293-300

View details for DOI 10.1007/978-3-319-25010-6_18

View details for Web of Science ID 000374242500018
Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers. Studies in health technology and informatics Hripcsak, G. n., Duke, J. D., Shah, N. H., Reich, C. G., Huser, V. n., Schuemie, M. J., Suchard, M. A., Park, R. W., Wong, I. C., Rijnbeek, P. R., van der Lei, J. n., Pratt, N. n., Norén, G. N., Li, Y. C., Stang, P. E., Madigan, D. n., Ryan, P. B. 2015; 216: 574–78

Abstract

The vision of creating accessible, reliable clinical evidence by accessing the clincial experience of hundreds of millions of patients across the globe is a reality. Observational Health Data Sciences and Informatics (OHDSI) has built on learnings from the Observational Medical Outcomes Partnership to turn methods research and insights into a suite of applications and exploration tools that move the field closer to the ultimate goal of generating evidence about all aspects of healthcare to serve the needs of patients, clinicians and all other decision-makers around the world.

View details for PubMedID 26262116
Analyzing Information Seeking and Drug-Safety Alert Response by Health Care Professionals as New Methods for Surveillance. Journal of medical Internet research Callahan, A., Pernek, I., Stiglic, G., Leskovec, J., Strasberg, H. R., Shah, N. H. 2015; 17 (8)

Abstract

Patterns in general consumer online search logs have been used to monitor health conditions and to predict health-related activities, but the multiple contexts within which consumers perform online searches make significant associations difficult to interpret. Physician information-seeking behavior has typically been analyzed through survey-based approaches and literature reviews. Activity logs from health care professionals using online medical information resources are thus a valuable yet relatively untapped resource for large-scale medical surveillance.To analyze health care professionals' information-seeking behavior and assess the feasibility of measuring drug-safety alert response from the usage logs of an online medical information resource.Using two years (2011-2012) of usage logs from UpToDate, we measured the volume of searches related to medical conditions with significant burden in the United States, as well as the seasonal distribution of those searches. We quantified the relationship between searches and resulting page views. Using a large collection of online mainstream media articles and Web log posts we also characterized the uptake of a Food and Drug Administration (FDA) alert via changes in UpToDate search activity compared with general online media activity related to the subject of the alert.Diseases and symptoms dominate UpToDate searches. Some searches result in page views of only short duration, while others consistently result in longer-than-average page views. The response to an FDA alert for Celexa, characterized by a change in UpToDate search activity, differed considerably from general online media activity. Changes in search activity appeared later and persisted longer in UpToDate logs. The volume of searches and page view durations related to Celexa before the alert also differed from those after the alert.Understanding the information-seeking behavior associated with online evidence sources can offer insight into the information needs of health professionals and enable large-scale medical surveillance. Our Web log mining approach has the potential to monitor responses to FDA alerts at a national level. Our findings can also inform the design and content of evidence-based medical information resources such as UpToDate.

View details for DOI 10.2196/jmir.4427

View details for PubMedID 26293444
Bringing cohort studies to the bedside: framework for a "green button' to support clinical decision-making JOURNAL OF COMPARATIVE EFFECTIVENESS RESEARCH Gallego, B., Walter, S. R., Day, R. O., Dunn, A. G., Sivaraman, V., Shah, N., Longhurst, C. A., Coiera, E. 2015; 4 (3): 191-197

Abstract

When providing care, clinicians are expected to take note of clinical practice guidelines, which offer recommendations based on the available evidence. However, guidelines may not apply to individual patients with comorbidities, as they are typically excluded from clinical trials. Guidelines also tend not to provide relevant evidence on risks, secondary effects and long-term outcomes. Querying the electronic health records of similar patients may for many provide an alternate source of evidence to inform decision-making. It is important to develop methods to support these personalized observational studies at the point-of-care, to understand when these methods may provide valid results, and to validate and integrate these findings with those from clinical trials.

View details for DOI 10.2217/cer.15.12

View details for Web of Science ID 000355701500002
Proton Pump Inhibitor Usage and the Risk of Myocardial Infarction in the General Population. PloS one Shah, N. H., LePendu, P., Bauer-Mehren, A., Ghebremariam, Y. T., Iyer, S. V., Marcus, J., Nead, K. T., Cooke, J. P., Leeper, N. J. 2015; 10 (6)

Abstract

Proton pump inhibitors (PPIs) have been associated with adverse clinical outcomes amongst clopidogrel users after an acute coronary syndrome. Recent pre-clinical results suggest that this risk might extend to subjects without any prior history of cardiovascular disease. We explore this potential risk in the general population via data-mining approaches.Using a novel approach for mining clinical data for pharmacovigilance, we queried over 16 million clinical documents on 2.9 million individuals to examine whether PPI usage was associated with cardiovascular risk in the general population.In multiple data sources, we found gastroesophageal reflux disease (GERD) patients exposed to PPIs to have a 1.16 fold increased association (95% CI 1.09-1.24) with myocardial infarction (MI). Survival analysis in a prospective cohort found a two-fold (HR = 2.00; 95% CI 1.07-3.78; P = 0.031) increase in association with cardiovascular mortality. We found that this association exists regardless of clopidogrel use. We also found that H2 blockers, an alternate treatment for GERD, were not associated with increased cardiovascular risk; had they been in place, such pharmacovigilance algorithms could have flagged this risk as early as the year 2000.Consistent with our pre-clinical findings that PPIs may adversely impact vascular function, our data-mining study supports the association of PPI exposure with risk for MI in the general population. These data provide an example of how a combination of experimental studies and data-mining approaches can be applied to prioritize drug safety signals for further investigation.

View details for DOI 10.1371/journal.pone.0124653

View details for PubMedID 26061035

View details for PubMedCentralID PMC4462578
A formal concept analysis and semantic query expansion cooperation to refine health outcomes of interest. BMC medical informatics and decision making Curé, O. C., Maurer, H., Shah, N. H., Le Pendu, P. 2015; 15: S8-?

Abstract

Electronic Health Records (EHRs) are frequently used by clinicians and researchers to search for, extract, and analyze groups of patients by defining Health Outcome of Interests (HOI). The definition of an HOI is generally considered a complex and time consuming task for health care professionals.In our clinical note-based pharmacovigilance research, we often operate upon potentially hundreds of ontologies at once, expand query inputs, and we also increase the search space over clinical text as well as structured data. Such a method implies to specify an initial set of seed concepts, which are based on concept unique identifiers. This paper presents a novel method based on Formal Concept Analysis (FCA) and Semantic Query Expansion (SQE) to assist the end-user in defining their seed queries and in refining the expanded search space that it encompasses.We evaluate our method over a gold-standard corpus from the 2008 i2b2 Obesity Challenge. This experimentation emphasizes positive results for sensitivity and specificity measures. Our new approach provides better recall with high precision of the obtained results. The most promising aspect of this approach consists in the discovery of positive results not present our Obesity NLP reference set.Together with a Web graphical user interface, our FCA and SQE cooperation end up being an efficient approach for refining health outcome of interest using plain terms. We consider that this approach can be extended to support other domains such as cohort building tools.

View details for DOI 10.1186/1472-6947-15-S1-S8

View details for PubMedID 26043839
Analyzing search behavior of healthcare professionals for drug safety surveillance. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Odgers, D. J., Harpaz, R., Callahan, A., Stiglic, G., Shah, N. H. 2015; 20: 306-317

Abstract

Post-market drug safety surveillance is hugely important and is a significant challenge despite the existence of adverse event (AE) reporting systems. Here we describe a preliminary analysis of search logs from healthcare professionals as a source for detecting adverse drug events. We annotate search log query terms with biomedical terminologies for drugs and events, and then perform a statistical analysis to identify associations among drugs and events within search sessions. We evaluate our approach using two different types of reference standards consisting of known adverse drug events (ADEs) and negative controls. Our approach achieves a discrimination accuracy of 0.85 in terms of the area under the receiver operator curve (AUC) for the reference set of well-established ADEs and an AUC of 0.68 for the reference set of recently labeled ADEs. We also find that the majority of associations in the reference sets have support in the search log data. Despite these promising results additional research is required to better understand users' search behavior, biasing factors, and the overall utility of analyzing healthcare professional search logs for drug safety surveillance.

View details for PubMedID 25592591
Functional evaluation of out-of-the-box text-mining tools for data-mining tasks. Journal of the American Medical Informatics Association Jung, K., LePendu, P., Iyer, S., Bauer-Mehren, A., Percha, B., Shah, N. H. 2015; 22 (1): 121-131

Abstract

The trade-off between the speed and simplicity of dictionary-based term recognition and the richer linguistic information provided by more advanced natural language processing (NLP) is an area of active discussion in clinical informatics. In this paper, we quantify this trade-off among text processing systems that make different trade-offs between speed and linguistic understanding. We tested both types of systems in three clinical research tasks: phase IV safety profiling of a drug, learning adverse drug-drug interactions, and learning used-to-treat relationships between drugs and indications.We first benchmarked the accuracy of the NCBO Annotator and REVEAL in a manually annotated, publically available dataset from the 2008 i2b2 Obesity Challenge. We then applied the NCBO Annotator and REVEAL to 9 million clinical notes from the Stanford Translational Research Integrated Database Environment (STRIDE) and used the resulting data for three research tasks.There is no significant difference between using the NCBO Annotator and REVEAL in the results of the three research tasks when using large datasets. In one subtask, REVEAL achieved higher sensitivity with smaller datasets.For a variety of tasks, employing simple term recognition methods instead of advanced NLP methods results in little or no impact on accuracy when using large datasets. Simpler dictionary-based methods have the advantage of scaling well to very large datasets. Promoting the use of simple, dictionary-based methods for population level analyses can advance adoption of NLP in practice.

View details for DOI 10.1136/amiajnl-2014-002902

View details for PubMedID 25336595
A time-indexed reference standard of adverse drug reactions. Scientific data Harpaz, R., Odgers, D., Gaskin, G., DuMouchel, W., Winnenburg, R., Bodenreider, O., Ripple, A., Szarfman, A., Sorbello, A., Horvitz, E., White, R. W., Shah, N. H. 2014; 1: 140043-?

Abstract

Undetected adverse drug reactions (ADRs) pose a major burden on the health system. Data mining methodologies designed to identify signals of novel ADRs are of deep importance for drug safety surveillance. The development and evaluation of these methodologies requires proper reference benchmarks. While progress has recently been made in developing such benchmarks, our understanding of the performance characteristics of the data mining methodologies is limited because existing benchmarks do not support prospective performance evaluations. We address this shortcoming by providing a reference standard to support prospective performance evaluations. The reference standard was systematically curated from drug labeling revisions, such as new warnings, which were issued and communicated by the US Food and Drug Administration in 2013. The reference standard includes 62 positive test cases and 75 negative controls, and covers 44 drugs and 38 events. We provide usage guidance and empirical support for the reference standard by applying it to analyze two data sources commonly mined for drug safety surveillance.

View details for DOI 10.1038/sdata.2014.43

View details for PubMedID 25632348

View details for PubMedCentralID PMC4306188
Toward personalizing treatment for depression: predicting diagnosis and severity JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Huang, S. H., LePendu, P., Iyer, S. V., Tai-Seale, M., Carrell, D., Shah, N. H. 2014; 21 (6): 1069-1075

Abstract

Depression is a prevalent disorder difficult to diagnose and treat. In particular, depressed patients exhibit largely unpredictable responses to treatment. Toward the goal of personalizing treatment for depression, we develop and evaluate computational models that use electronic health record (EHR) data for predicting the diagnosis and severity of depression, and response to treatment.We develop regression-based models for predicting depression, its severity, and response to treatment from EHR data, using structured diagnosis and medication codes as well as free-text clinical reports. We used two datasets: 35,000 patients (5000 depressed) from the Palo Alto Medical Foundation and 5651 patients treated for depression from the Group Health Research Institute.Our models are able to predict a future diagnosis of depression up to 12 months in advance (area under the receiver operating characteristic curve (AUC) 0.70-0.80). We can differentiate patients with severe baseline depression from those with minimal or mild baseline depression (AUC 0.72). Baseline depression severity was the strongest predictor of treatment response for medication and psychotherapy.It is possible to use EHR data to predict a diagnosis of depression up to 12 months in advance and to differentiate between extreme baseline levels of depression. The models use commonly available data on diagnosis, medication, and clinical progress notes, making them easily portable. The ability to automatically determine severity can facilitate assembly of large patient cohorts with similar severity from multiple sites, which may enable elucidation of the moderators of treatment response in the future.

View details for DOI 10.1136/amiajnl-2014-002733

View details for Web of Science ID 000343776700019

View details for PubMedCentralID PMC4215055
Toward personalizing treatment for depression: predicting diagnosis and severity. Journal of the American Medical Informatics Association Huang, S. H., LePendu, P., Iyer, S. V., Tai-Seale, M., Carrell, D., Shah, N. H. 2014; 21 (6): 1069-1075

Abstract

Depression is a prevalent disorder difficult to diagnose and treat. In particular, depressed patients exhibit largely unpredictable responses to treatment. Toward the goal of personalizing treatment for depression, we develop and evaluate computational models that use electronic health record (EHR) data for predicting the diagnosis and severity of depression, and response to treatment.We develop regression-based models for predicting depression, its severity, and response to treatment from EHR data, using structured diagnosis and medication codes as well as free-text clinical reports. We used two datasets: 35,000 patients (5000 depressed) from the Palo Alto Medical Foundation and 5651 patients treated for depression from the Group Health Research Institute.Our models are able to predict a future diagnosis of depression up to 12 months in advance (area under the receiver operating characteristic curve (AUC) 0.70-0.80). We can differentiate patients with severe baseline depression from those with minimal or mild baseline depression (AUC 0.72). Baseline depression severity was the strongest predictor of treatment response for medication and psychotherapy.It is possible to use EHR data to predict a diagnosis of depression up to 12 months in advance and to differentiate between extreme baseline levels of depression. The models use commonly available data on diagnosis, medication, and clinical progress notes, making them easily portable. The ability to automatically determine severity can facilitate assembly of large patient cohorts with similar severity from multiple sites, which may enable elucidation of the moderators of treatment response in the future.

View details for DOI 10.1136/amiajnl-2014-002733

View details for PubMedID 24988898

View details for PubMedCentralID PMC4215055
Repurposing cAMP-Modulating Medications to Promote beta-Cell Replication MOLECULAR ENDOCRINOLOGY Zhao, Z., Low, Y. S., Armstrong, N. A., Ryu, J. H., Sun, S. A., Arvanites, A. C., Hollister-Lock, J., Shah, N. H., Weir, G. C., Annes, J. P. 2014; 28 (10): 1682-1697

Abstract

Loss of β-cell mass is a cardinal feature of diabetes. Consequently, developing medications to promote β-cell regeneration is a priority. 3'-5'-Cyclic adenosine monophosphate (cAMP) is an intracellular second messenger that modulates β-cell replication. We investigated whether medications that increase cAMP stability or synthesis selectively stimulate β-cell growth. To identify cAMP stabilizing medications that promote β-cell replication we performed high-content screening of a phosphodiesterase-inhibitor (PDE-I) library. PDE3,4 and 10 inhibitors, including dipyridamole, were found to promote β-cell replication in an adenosine receptor-dependent manner. Dipyridamole's action is specific for β-cells and not α-cells. Next we demonstrated that norepinephrine (NE), a physiologic suppressor of cAMP synthesis in β-cells, impairs β-cell replication via activation of α2-adrenergic receptors. Accordingly, mirtazapine, an α2-adrenergic receptor antagonist and antidepressant, prevents NE-dependent suppression of β-cell replication. Interestingly, NE's growth-suppressive effect is modulated by endogenously expressed catecholamine-inactivating enzymes (COMT and MAO) and is dominant over the growth-promoting effects of PDE-Is. Treatment with dipyridamole and/or mirtazapine promote β-cell replication in mice and treatment with dipyridamole is associated with reduced glucose levels in humans. This work provides new mechanistic insights into cAMP-dependent growth regulation of β-cells and highlights the potential of commonly prescribed medications to influence β-cell growth.

View details for DOI 10.1210/me.2014-1120

View details for Web of Science ID 000346837000010

View details for PubMedID 25083741
Text mining for adverse drug events: the promise, challenges, and state of the art. Drug safety Harpaz, R., Callahan, A., Tamang, S., Low, Y., Odgers, D., Finlayson, S., Jung, K., LePendu, P., Shah, N. H. 2014; 37 (10): 777-790

Abstract

Text mining is the computational process of extracting meaningful information from large amounts of unstructured text. It is emerging as a tool to leverage underutilized data sources that can improve pharmacovigilance, including the objective of adverse drug event (ADE) detection and assessment. This article provides an overview of recent advances in pharmacovigilance driven by the application of text mining, and discusses several data sources-such as biomedical literature, clinical narratives, product labeling, social media, and Web search logs-that are amenable to text mining for pharmacovigilance. Given the state of the art, it appears text mining can be applied to extract useful ADE-related information from multiple textual sources. Nonetheless, further research is required to address remaining technical challenges associated with the text mining methodologies, and to conclusively determine the relative contribution of each textual source to improving pharmacovigilance.

View details for DOI 10.1007/s40264-014-0218-z

View details for PubMedID 25151493

View details for PubMedCentralID PMC4217510
Text Mining for Adverse Drug Events: the Promise, Challenges, and State of the Art DRUG SAFETY Harpaz, R., Callahan, A., Tamang, S., Low, Y., Odgers, D., Finlayson, S., Jung, K., LePendu, P., Shah, N. H. 2014; 37 (10): 777-790

Abstract

Text mining is the computational process of extracting meaningful information from large amounts of unstructured text. It is emerging as a tool to leverage underutilized data sources that can improve pharmacovigilance, including the objective of adverse drug event (ADE) detection and assessment. This article provides an overview of recent advances in pharmacovigilance driven by the application of text mining, and discusses several data sources-such as biomedical literature, clinical narratives, product labeling, social media, and Web search logs-that are amenable to text mining for pharmacovigilance. Given the state of the art, it appears text mining can be applied to extract useful ADE-related information from multiple textual sources. Nonetheless, further research is required to address remaining technical challenges associated with the text mining methodologies, and to conclusively determine the relative contribution of each textual source to improving pharmacovigilance.

View details for DOI 10.1007/s40264-014-0218-z

View details for Web of Science ID 000344615300005

View details for PubMedCentralID PMC4217510
Toward Enhanced Pharmacovigilance Using Patient-Generated Data on the Internet CLINICAL PHARMACOLOGY & THERAPEUTICS WHITE, R. W., Harpaz, R., Shah, N. H., Dumouchel, W., Horvitz, E. 2014; 96 (2): 239-246

Abstract

The promise of augmenting pharmacovigilance with patient-generated data drawn from the Internet was called out by a scientific committee charged with conducting a review of the current and planned pharmacovigilance practices of the US Food and Drug Administration (FDA). To this end, we present a study on harnessing behavioral data drawn from Internet search logs to detect adverse drug reactions (ADRs). By analyzing search queries collected from 80 million consenting users and by using a widely recognized benchmark of ADRs, we found that the performance of ADR detection via search logs is comparable and complementary to detection based on the FDA's adverse event reporting system (AERS). We show that by jointly leveraging data from the AERS and search logs, the accuracy of ADR detection can be improved by 19% relative to the use of each data source independently. The results suggest that leveraging nontraditional sources such as online search logs could supplement existing pharmacovigilance approaches.

View details for DOI 10.1038/clpt.2014.77

View details for Web of Science ID 000339602900035

View details for PubMedID 24713590
A 'green button' for using aggregate patient data at the point of care. Health affairs Longhurst, C. A., Harrington, R. A., Shah, N. H. 2014; 33 (7): 1229-1235

Abstract

Randomized controlled trials have traditionally been the gold standard against which all other sources of clinical evidence are measured. However, the cost of conducting these trials can be prohibitive. In addition, evidence from the trials frequently rests on narrow patient-inclusion criteria and thus may not generalize well to real clinical situations. Given the increasing availability of comprehensive clinical data in electronic health records (EHRs), some health system leaders are now advocating for a shift away from traditional trials and toward large-scale retrospective studies, which can use practice-based evidence that is generated as a by-product of clinical processes. Other thought leaders in clinical research suggest that EHRs should be used to lower the cost of trials by integrating point-of-care randomization and data capture into clinical processes. We believe that a successful learning health care system will require both approaches, and we suggest a model that resolves this escalating tension: a "green button" function within EHRs to help clinicians leverage aggregate patient data for decision making at the point of care. Giving clinicians such a tool would support patient care decisions in the absence of gold-standard evidence and would help prioritize clinical questions for which EHR-enabled randomization should be carried out. The privacy rule in the Health Insurance Portability and Accountability Act (HIPAA) of 1996 may require revision to support this novel use of patient data.

View details for DOI 10.1377/hlthaff.2014.0099

View details for PubMedID 25006150
Selected papers from the 16th Annual Bio-Ontologies Special Interest Group Meeting JOURNAL OF BIOMEDICAL SEMANTICS Soldatova, L. N., Rocca-Serra, P., Dumontier, M., Shah, N. H. 2014; 5

View details for DOI 10.1186/2041-1480-5-S1-I1

View details for Web of Science ID 000345687700001
Mining the internet for drug information. Clinical advances in hematology & oncology : H&O Shah, N. 2014; 12 (6): 391-393

View details for PubMedID 25003570
Measurement of urinary incontinence after prostate surgery from data-mining electronic health records (EHR). Hernandez-Boussard, T., Tamang, S., Brooks, J. D., Blayney, D. W., Shah, N. AMER SOC CLINICAL ONCOLOGY. 2014

View details for Web of Science ID 000358613203796
Response to letters regarding article, "unexpected effect of proton pump inhibitors: elevation of the cardiovascular risk factor asymmetric dimethylarginine". Circulation Ghebremariam, Y. T., Lee, J. C., LePendu, P., Erlanson, D. A., Slaviero, A., Shah, N. H., Leiper, J. M., Cooke, J. P. 2014; 129 (13)

View details for DOI 10.1161/CIRCULATIONAHA.114.009343

View details for PubMedID 24687654
Mining clinical text for signals of adverse drug-drug interactions. Journal of the American Medical Informatics Association Iyer, S. V., Harpaz, R., LePendu, P., Bauer-Mehren, A., Shah, N. H. 2014; 21 (2): 353-362

Abstract

Electronic health records (EHRs) are increasingly being used to complement the FDA Adverse Event Reporting System (FAERS) and to enable active pharmacovigilance. Over 30% of all adverse drug reactions are caused by drug-drug interactions (DDIs) and result in significant morbidity every year, making their early identification vital. We present an approach for identifying DDI signals directly from the textual portion of EHRs.We recognize mentions of drug and event concepts from over 50 million clinical notes from two sites to create a timeline of concept mentions for each patient. We then use adjusted disproportionality ratios to identify significant drug-drug-event associations among 1165 drugs and 14 adverse events. To validate our results, we evaluate our performance on a gold standard of 1698 DDIs curated from existing knowledge bases, as well as with signaling DDI associations directly from FAERS using established methods.Our method achieves good performance, as measured by our gold standard (area under the receiver operator characteristic (ROC) curve >80%), on two independent EHR datasets and the performance is comparable to that of signaling DDIs from FAERS. We demonstrate the utility of our method for early detection of DDIs and for identifying alternatives for risky drug combinations. Finally, we publish a first of its kind database of population event rates among patients on drug combinations based on an EHR corpus.It is feasible to identify DDI signals and estimate the rate of adverse events among patients on drug combinations, directly from clinical text; this could have utility in prioritizing drug interaction surveillance as well as in clinical decision support.

View details for DOI 10.1136/amiajnl-2013-001612

View details for PubMedID 24158091

View details for PubMedCentralID PMC3932451
Automated detection of off-label drug use. PloS one Jung, K., LePendu, P., Chen, W. S., Iyer, S. V., Readhead, B., Dudley, J. T., Shah, N. H. 2014; 9 (2)

Abstract

Off-label drug use, defined as use of a drug in a manner that deviates from its approved use defined by the drug's FDA label, is problematic because such uses have not been evaluated for safety and efficacy. Studies estimate that 21% of prescriptions are off-label, and only 27% of those have evidence of safety and efficacy. We describe a data-mining approach for systematically identifying off-label usages using features derived from free text clinical notes and features extracted from two databases on known usage (Medi-Span and DrugBank). We trained a highly accurate predictive model that detects novel off-label uses among 1,602 unique drugs and 1,472 unique indications. We validated 403 predicted uses across independent data sources. Finally, we prioritize well-supported novel usages for further investigation on the basis of drug safety and cost.

View details for DOI 10.1371/journal.pone.0089324

View details for PubMedID 24586689

View details for PubMedCentralID PMC3929699
TEXT AND DATA MINING FOR BIOMEDICAL DISCOVERY Gonzalez, G., Cohen, K., Leaman, R., Greene, C. S., Shah, N., Kann, M. G., Ye, J. edited by Altman, R. B., Dunker, A. K., Hunter, L., Ritchie, M. D., Murray, T., Klein, T. E. WORLD SCIENTIFIC PUBL CO PTE LTD. 2014: 312-315

View details for Web of Science ID 000461865200029
Medicine in the Age of Electronic Health Records Shah, N., ACM ASSOC COMPUTING MACHINERY. 2014: 1518

View details for DOI 10.1145/2623330.2630822

View details for Web of Science ID 000668155900156
Finding Progression Stages in Time-evolving Event Sequences Yang, J., McAuley, J., Leskovec, J., LePendu, P., Shah, N., Assoc Comp Machinery ASSOC COMPUTING MACHINERY. 2014: 783–93

View details for DOI 10.1145/2566486.2568044

View details for Web of Science ID 000455945100071
Building the graph of medicine from millions of clinical narratives SCIENTIFIC DATA Finlayson, S. G., LePendu, P., Shah, N. H. 2014; 1

Abstract

Electronic health records (EHR) represent a rich and relatively untapped resource for characterizing the true nature of clinical practice and for quantifying the degree of inter-relatedness of medical entities such as drugs, diseases, procedures and devices. We provide a unique set of co-occurrence matrices, quantifying the pairwise mentions of 3 million terms mapped onto 1 million clinical concepts, calculated from the raw text of 20 million clinical notes spanning 19 years of data. Co-frequencies were computed by means of a parallelized annotation, hashing, and counting pipeline that was applied over clinical notes from Stanford Hospitals and Clinics. The co-occurrence matrix quantifies the relatedness among medical concepts which can serve as the basis for many statistical tests, and can be used to directly compute Bayesian conditional probabilities, association rules, as well as a range of test statistics such as relative risks and odds ratios. This dataset can be leveraged to quantitatively assess comorbidity, drug-drug, and drug-disease patterns for a range of clinical, epidemiological, and financial applications.

View details for DOI 10.1038/sdata.2014.32

View details for Web of Science ID 000209843500026

View details for PubMedCentralID PMC4322575
A time-indexed reference standard of adverse drug reactions SCIENTIFIC DATA Harpaz, R., Odgers, D., Gaskin, G., DuMouchel, W., Winnenburg, R., Bodenreider, O., Ripple, A., Szarfman, A., Sorbello, A., Horvitz, E., White, R. W., Shah, N. H. 2014; 1

Abstract

Undetected adverse drug reactions (ADRs) pose a major burden on the health system. Data mining methodologies designed to identify signals of novel ADRs are of deep importance for drug safety surveillance. The development and evaluation of these methodologies requires proper reference benchmarks. While progress has recently been made in developing such benchmarks, our understanding of the performance characteristics of the data mining methodologies is limited because existing benchmarks do not support prospective performance evaluations. We address this shortcoming by providing a reference standard to support prospective performance evaluations. The reference standard was systematically curated from drug labeling revisions, such as new warnings, which were issued and communicated by the US Food and Drug Administration in 2013. The reference standard includes 62 positive test cases and 75 negative controls, and covers 44 drugs and 38 events. We provide usage guidance and empirical support for the reference standard by applying it to analyze two data sources commonly mined for drug safety surveillance.

View details for DOI 10.1038/sdata.2014.43

View details for Web of Science ID 000209843500039

View details for PubMedCentralID PMC4306188
Building the graph of medicine from millions of clinical narratives. Scientific data Finlayson, S. G., LePendu, P., Shah, N. H. 2014; 1: 140032-?

Abstract

Electronic health records (EHR) represent a rich and relatively untapped resource for characterizing the true nature of clinical practice and for quantifying the degree of inter-relatedness of medical entities such as drugs, diseases, procedures and devices. We provide a unique set of co-occurrence matrices, quantifying the pairwise mentions of 3 million terms mapped onto 1 million clinical concepts, calculated from the raw text of 20 million clinical notes spanning 19 years of data. Co-frequencies were computed by means of a parallelized annotation, hashing, and counting pipeline that was applied over clinical notes from Stanford Hospitals and Clinics. The co-occurrence matrix quantifies the relatedness among medical concepts which can serve as the basis for many statistical tests, and can be used to directly compute Bayesian conditional probabilities, association rules, as well as a range of test statistics such as relative risks and odds ratios. This dataset can be leveraged to quantitatively assess comorbidity, drug-drug, and drug-disease patterns for a range of clinical, epidemiological, and financial applications.

View details for DOI 10.1038/sdata.2014.32

View details for PubMedID 25977789

View details for PubMedCentralID PMC4322575
Automated detection of off-label drug use. PloS one Jung, K., LePendu, P., Chen, W. S., Iyer, S. V., Readhead, B., Dudley, J. T., Shah, N. H. 2014; 9 (2): e89324

Abstract

Off-label drug use, defined as use of a drug in a manner that deviates from its approved use defined by the drug's FDA label, is problematic because such uses have not been evaluated for safety and efficacy. Studies estimate that 21% of prescriptions are off-label, and only 27% of those have evidence of safety and efficacy. We describe a data-mining approach for systematically identifying off-label usages using features derived from free text clinical notes and features extracted from two databases on known usage (Medi-Span and DrugBank). We trained a highly accurate predictive model that detects novel off-label uses among 1,602 unique drugs and 1,472 unique indications. We validated 403 predicted uses across independent data sources. Finally, we prioritize well-supported novel usages for further investigation on the basis of drug safety and cost.

View details for DOI 10.1371/journal.pone.0089324

View details for PubMedID 24586689

View details for PubMedCentralID PMC3929699
Profiling risk factors for chronic uveitis in juvenile idiopathic arthritis: a new model for EHR-based research PEDIATRIC RHEUMATOLOGY Cole, T. S., Frankovich, J., Iyer, S., LePendu, P., Bauer-Mehren, A., Shah, N. H. 2013; 11

Abstract

Juvenile idiopathic arthritis is the most common rheumatic disease in children. Chronic uveitis is a common and serious comorbid condition of juvenile idiopathic arthritis, with insidious presentation and potential to cause blindness. Knowledge of clinical associations will improve risk stratification. Based on clinical observation, we hypothesized that allergic conditions are associated with chronic uveitis in juvenile idiopathic arthritis patients.This study is a retrospective cohort study using Stanford's clinical data warehouse containing data from Lucile Packard Children's Hospital from 2000-2011 to analyze patient characteristics associated with chronic uveitis in a large juvenile idiopathic arthritis cohort. Clinical notes in patients under 16 years of age were processed via a validated text analytics pipeline. Bivariate-associated variables were used in a multivariate logistic regression adjusted for age, gender, and race. Previously reported associations were evaluated to validate our methods. The main outcome measure was presence of terms indicating allergy or allergy medications use overrepresented in juvenile idiopathic arthritis patients with chronic uveitis. Residual text features were then used in unsupervised hierarchical clustering to compare clinical text similarity between patients with and without uveitis.Previously reported associations with uveitis in juvenile idiopathic arthritis patients (earlier age at arthritis diagnosis, oligoarticular-onset disease, antinuclear antibody status, history of psoriasis) were reproduced in our study. Use of allergy medications and terms describing allergic conditions were independently associated with chronic uveitis. The association with allergy drugs when adjusted for known associations remained significant (OR 2.54, 95% CI 1.22-5.4).This study shows the potential of using a validated text analytics pipeline on clinical data warehouses to examine practice-based evidence for evaluating hypotheses formed during patient care. Our study reproduces four known associations with uveitis development in juvenile idiopathic arthritis patients, and reports a new association between allergic conditions and chronic uveitis in juvenile idiopathic arthritis patients.

View details for DOI 10.1186/1546-0096-11-45

View details for Web of Science ID 000328822300001

View details for PubMedID 24299016
Mining the ultimate phenome repository NATURE BIOTECHNOLOGY Shah, N. H. 2013; 31 (12): 1095-1097

View details for DOI 10.1038/nbt.2757

View details for Web of Science ID 000328251900020

View details for PubMedID 24316646
Identifying phenotypic signatures of neuropsychiatric disorders from electronic medical records. Journal of the American Medical Informatics Association Lyalina, S., Percha, B., LePendu, P., Iyer, S. V., Altman, R. B., Shah, N. H. 2013; 20 (e2): e297-305

View details for DOI 10.1136/amiajnl-2013-001933

View details for PubMedID 23956017
Identifying phenotypic signatures of neuropsychiatric disorders from electronic medical records. Journal of the American Medical Informatics Association Lyalina, S., Percha, B., LePendu, P., Iyer, S. V., Altman, R. B., Shah, N. H. 2013; 20 (e2): e297-305

Abstract

Mental illness is the leading cause of disability in the USA, but boundaries between different mental illnesses are notoriously difficult to define. Electronic medical records (EMRs) have recently emerged as a powerful new source of information for defining the phenotypic signatures of specific diseases. We investigated how EMR-based text mining and statistical analysis could elucidate the phenotypic boundaries of three important neuropsychiatric illnesses-autism, bipolar disorder, and schizophrenia.We analyzed the medical records of over 7000 patients at two facilities using an automated text-processing pipeline to annotate the clinical notes with Unified Medical Language System codes and then searching for enriched codes, and associations among codes, that were representative of the three disorders. We used dimensionality-reduction techniques on individual patient records to understand individual-level phenotypic variation within each disorder, as well as the degree of overlap among disorders.We demonstrate that automated EMR mining can be used to extract relevant drugs and phenotypes associated with neuropsychiatric disorders and characteristic patterns of associations among them. Patient-level analyses suggest a clear separation between autism and the other disorders, while revealing significant overlap between schizophrenia and bipolar disorder. They also enable localization of individual patients within the phenotypic 'landscape' of each disorder.Because EMRs reflect the realities of patient care rather than idealized conceptualizations of disease states, we argue that automated EMR mining can help define the boundaries between different mental illnesses, facilitate cohort building for clinical and genomic studies, and reveal how clear expert-defined disease boundaries are in practice.

View details for DOI 10.1136/amiajnl-2013-001933

View details for PubMedID 23956017

View details for PubMedCentralID PMC3861917
A Nondegenerate Code of Deleterious Variants in Mendelian Loci Contributes to Complex Disease Risk CELL Blair, D. R., Lyttle, C. S., Mortensen, J. M., Bearden, C. F., Jensen, A. B., Khiabanian, H., Melamed, R., Rabadan, R., Bernstam, E. V., Brunak, S., Jensen, L. J., Nicolae, D., Shah, N. H., Grossman, R. L., Cox, N. J., White, K. P., Rzhetsky, A. 2013; 155 (1): 70-80

Abstract

Although countless highly penetrant variants have been associated with Mendelian disorders, the genetic etiologies underlying complex diseases remain largely unresolved. By mining the medical records of over 110 million patients, we examine the extent to which Mendelian variation contributes to complex disease risk. We detect thousands of associations between Mendelian and complex diseases, revealing a nondegenerate, phenotypic code that links each complex disorder to a unique collection of Mendelian loci. Using genome-wide association results, we demonstrate that common variants associated with complex diseases are enriched in the genes indicated by this "Mendelian code." Finally, we detect hundreds of comorbidity associations among Mendelian disorders, and we use probabilistic genetic modeling to demonstrate that Mendelian variants likely contribute nonadditively to the risk for a subset of complex diseases. Overall, this study illustrates a complementary approach for mapping complex disease loci and provides unique predictions concerning the etiologies of specific diseases.

View details for DOI 10.1016/j.cell.2013.08.030

View details for Web of Science ID 000324916700010

View details for PubMedID 24074861
Response to "Logistic regression in signal detection: another piece added to the puzzle". Clinical pharmacology & therapeutics Harpaz, R., Dumouchel, W., Lependu, P., Bauer-Mehren, A., Ryan, P., Shah, N. H. 2013; 94 (3): 313-?

View details for DOI 10.1038/clpt.2013.125

View details for PubMedID 23756371
Unexpected effect of proton pump inhibitors: elevation of the cardiovascular risk factor asymmetric dimethylarginine. Circulation Ghebremariam, Y. T., LePendu, P., Lee, J. C., Erlanson, D. A., Slaviero, A., Shah, N. H., Leiper, J., Cooke, J. P. 2013; 128 (8): 845-853

Abstract

Proton pump inhibitors (PPIs) are gastric acid-suppressing agents widely prescribed for the treatment of gastroesophageal reflux disease. Recently, several studies in patients with acute coronary syndrome have raised the concern that use of PPIs in these patients may increase their risk of major adverse cardiovascular events. The mechanism of this possible adverse effect is not known. Whether the general population might also be at risk has not been addressed.Plasma asymmetrical dimethylarginine (ADMA) is an endogenous inhibitor of nitric oxide synthase. Elevated plasma ADMA is associated with increased risk for cardiovascular disease, likely because of its attenuation of the vasoprotective effects of endothelial nitric oxide synthase. We find that PPIs elevate plasma ADMA levels and reduce nitric oxide levels and endothelium-dependent vasodilation in a murine model and ex vivo human tissues. PPIs increase ADMA because they bind to and inhibit dimethylarginine dimethylaminohydrolase, the enzyme that degrades ADMA.We present a plausible biological mechanism to explain the association of PPIs with increased major adverse cardiovascular events in patients with unstable coronary syndromes. Of concern, this adverse mechanism is also likely to extend to the general population using PPIs. This finding compels additional clinical investigations and pharmacovigilance directed toward understanding the cardiovascular risk associated with the use of the PPIs in the general population.

View details for DOI 10.1161/CIRCULATIONAHA.113.003602

View details for PubMedID 23825361
Performance of pharmacovigilance signal-detection algorithms for the FDA adverse event reporting system. Clinical pharmacology & therapeutics Harpaz, R., Dumouchel, W., Lependu, P., Bauer-Mehren, A., Ryan, P., Shah, N. H. 2013; 93 (6): 539-546

Abstract

Signal-detection algorithms (SDAs) are recognized as vital tools in pharmacovigilance. However, their performance characteristics are generally unknown. By leveraging a unique gold standard recently made public by the Observational Medical Outcomes Partnership (OMOP) and by conducting a unique systematic evaluation, we provide new insights into the diagnostic potential and characteristics of SDAs that are routinely applied to the US Food and Drug Administration (FDA) Adverse Event Reporting System (AERS). We find that SDAs can attain reasonable predictive accuracy in signaling adverse events. Two performance classes emerge, indicating that the class of approaches that address confounding and masking effects benefits safety surveillance. Our study shows that not all events are equally detectable, suggesting that specific events might be monitored more effectively using other data sources. We provide performance guidelines for several operating scenarios to inform the trade-off between sensitivity and specificity for specific use cases. We also propose an approach and demonstrate its application in identifying optimal signaling thresholds, given specific misclassification tolerances.

View details for DOI 10.1038/clpt.2013.24

View details for PubMedID 23571771
Pharmacovigilance using clinical notes. Clinical pharmacology & therapeutics Lependu, P., Iyer, S. V., Bauer-Mehren, A., Harpaz, R., MORTENSEN, J. M., Podchiyska, T., Ferris, T. A., Shah, N. H. 2013; 93 (6): 547-555

Abstract

With increasing adoption of electronic health records (EHRs), there is an opportunity to use the free-text portion of EHRs for pharmacovigilance. We present novel methods that annotate the unstructured clinical notes and transform them into a deidentified patient-feature matrix encoded using medical terminologies. We demonstrate the use of the resulting high-throughput data for detecting drug-adverse event associations and adverse events associated with drug-drug interactions. We show that these methods flag adverse events early (in most cases before an official alert), allow filtering of spurious signals by adjusting for potential confounding, and compile prevalence information. We argue that analyzing large volumes of free-text clinical notes enables drug safety surveillance using a yet untapped data source. Such data mining can be used for hypothesis generation and for rapid analysis of suspected adverse event risk.

View details for DOI 10.1038/clpt.2013.47

View details for PubMedID 23571773
Combing signals from spontaneous reports and electronic health records for detection of adverse drug reactions JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Harpaz, R., Vilar, S., DuMouchel, W., Salmasian, H., Haerian, K., Shah, N. H., Chase, H. S., Friedman, C. 2013; 20 (3): 413-419

Abstract

Data-mining algorithms that can produce accurate signals of potentially novel adverse drug reactions (ADRs) are a central component of pharmacovigilance. We propose a signal-detection strategy that combines the adverse event reporting system (AERS) of the Food and Drug Administration and electronic health records (EHRs) by requiring signaling in both sources. We claim that this approach leads to improved accuracy of signal detection when the goal is to produce a highly selective ranked set of candidate ADRs.Our investigation was based on over 4 million AERS reports and information extracted from 1.2 million EHR narratives. Well-established methodologies were used to generate signals from each source. The study focused on ADRs related to three high-profile serious adverse reactions. A reference standard of over 600 established and plausible ADRs was created and used to evaluate the proposed approach against a comparator.The combined signaling system achieved a statistically significant large improvement over AERS (baseline) in the precision of top ranked signals. The average improvement ranged from 31% to almost threefold for different evaluation categories. Using this system, we identified a new association between the agent, rasburicase, and the adverse event, acute pancreatitis, which was supported by clinical review.The results provide promising initial evidence that combining AERS with EHRs via the framework of replicated signaling can improve the accuracy of signal detection for certain operating scenarios. The use of additional EHR data is required to further evaluate the capacity and limits of this system and to extend the generalizability of these results.

View details for DOI 10.1136/amiajnl-2012-000930

View details for Web of Science ID 000317477500003

View details for PubMedID 23118093

View details for PubMedCentralID PMC3628045
Web-scale pharmacovigilance: listening to signals from the crowd JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION White, R. W., Tatonetti, N. P., Shah, N. H., Altman, R. B., Horvitz, E. 2013; 20 (3): 404-408

Abstract

Adverse drug events cause substantial morbidity and mortality and are often discovered after a drug comes to market. We hypothesized that Internet users may provide early clues about adverse drug events via their online information-seeking. We conducted a large-scale study of Web search log data gathered during 2010. We pay particular attention to the specific drug pairing of paroxetine and pravastatin, whose interaction was reported to cause hyperglycemia after the time period of the online logs used in the analysis. We also examine sets of drug pairs known to be associated with hyperglycemia and those not associated with hyperglycemia. We find that anonymized signals on drug interactions can be mined from search logs. Compared to analyses of other sources such as electronic health records (EHR), logs are inexpensive to collect and mine. The results demonstrate that logs of the search activities of populations of computer users can contribute to drug safety surveillance.

View details for DOI 10.1136/amiajnl-2012-001482

View details for Web of Science ID 000317477500001

View details for PubMedID 23467469

View details for PubMedCentralID PMC3628066
Selected papers from the 15th Annual Bio-Ontologies Special Interest Group Meeting. Journal of biomedical semantics Soldatova, L. N., Sansone, S., Dumontier, M., Shah, N. H. 2013; 4: I1-?

Abstract

Over the 15 years, the Bio-Ontologies SIG at ISMB has provided a forum for discussion of the latest and most innovative research in the bio-ontologies development, its applications to biomedicine and more generally the organisation, presentation and dissemination of knowledge in biomedicine and the life sciences. The seven papers and the commentary selected for this supplement span a wide range of topics including: web-based querying over multiple ontologies, integration of data, annotating patent records, NCBO Web services, ontology developments for probabilistic reasoning and for physiological processes, and analysis of the progress of annotation and structural GO changes.

View details for DOI 10.1186/2041-1480-4-S1-I1

View details for PubMedID 23735191
STOP using just GO: a multi-ontology hypothesis generation tool for high throughput experimentation BMC BIOINFORMATICS Wittkop, T., Teravest, E., Evani, U. S., Fleisch, K. M., Berman, A. E., Powell, C., Shah, N. H., Mooney, S. D. 2013; 14

Abstract

Gene Ontology (GO) enrichment analysis remains one of the most common methods for hypothesis generation from high throughput datasets. However, we believe that researchers strive to test other hypotheses that fall outside of GO. Here, we developed and evaluated a tool for hypothesis generation from gene or protein lists using ontological concepts present in manually curated text that describes those genes and proteins.As a consequence we have developed the method Statistical Tracking of Ontological Phrases (STOP) that expands the realm of testable hypotheses in gene set enrichment analyses by integrating automated annotations of genes to terms from over 200 biomedical ontologies. While not as precise as manually curated terms, we find that the additional enriched concepts have value when coupled with traditional enrichment analyses using curated terms.Multiple ontologies have been developed for gene and protein annotation, by using a dataset of both manually curated GO terms and automatically recognized concepts from curated text we can expand the realm of hypotheses that can be discovered. The web application STOP is available at http://mooneygroup.org/stop/.

View details for DOI 10.1186/1471-2105-14-53

View details for Web of Science ID 000318030400001

View details for PubMedID 23409969

View details for PubMedCentralID PMC3635999
Practice-based evidence: profiling the safety of cilostazol by text-mining of clinical notes. PloS one Leeper, N. J., Bauer-Mehren, A., Iyer, S. V., LePendu, P., Olson, C., Shah, N. H. 2013; 8 (5)

Abstract

Peripheral arterial disease (PAD) is a growing problem with few available therapies. Cilostazol is the only FDA-approved medication with a class I indication for intermittent claudication, but carries a black box warning due to concerns for increased cardiovascular mortality. To assess the validity of this black box warning, we employed a novel text-analytics pipeline to quantify the adverse events associated with Cilostazol use in a clinical setting, including patients with congestive heart failure (CHF).We analyzed the electronic medical records of 1.8 million subjects from the Stanford clinical data warehouse spanning 18 years using a novel text-mining/statistical analytics pipeline. We identified 232 PAD patients taking Cilostazol and created a control group of 1,160 PAD patients not taking this drug using 1∶5 propensity-score matching. Over a mean follow up of 4.2 years, we observed no association between Cilostazol use and any major adverse cardiovascular event including stroke (OR = 1.13, CI [0.82, 1.55]), myocardial infarction (OR = 1.00, CI [0.71, 1.39]), or death (OR = 0.86, CI [0.63, 1.18]). Cilostazol was not associated with an increase in any arrhythmic complication. We also identified a subset of CHF patients who were prescribed Cilostazol despite its black box warning, and found that it did not increase mortality in this high-risk group of patients.This proof of principle study shows the potential of text-analytics to mine clinical data warehouses to uncover 'natural experiments' such as the use of Cilostazol in CHF patients. We envision this method will have broad applications for examining difficult to test clinical hypotheses and to aid in post-marketing drug safety surveillance. Moreover, our observations argue for a prospective study to examine the validity of a drug safety warning that may be unnecessarily limiting the use of an efficacious therapy.

View details for DOI 10.1371/journal.pone.0063499

View details for PubMedID 23717437

View details for PubMedCentralID PMC3662653
Empirical Bayes Model to Combine Signals of Adverse Drug Reactions Harpaz, R., DuMouchel, W., LePendu, P., Shah, N. H., ACM ASSOC COMPUTING MACHINERY. 2013: 1339-1347

View details for Web of Science ID 000502730600160
Mining Biomedical Ontologies and Data Using RDF Hypergraphs 12th International Conference on Machine Learning and Applications (ICMLA) Liu, H., Dou, D., Jin, R., LePendu, P., Shah, N. IEEE. 2013: 141–146

View details for Web of Science ID 000353637800023
Practice-based evidence: profiling the safety of cilostazol by text-mining of clinical notes. PloS one Leeper, N. J., Bauer-Mehren, A., Iyer, S. V., LePendu, P., Olson, C., Shah, N. H. 2013; 8 (5)

View details for DOI 10.1371/journal.pone.0063499

View details for PubMedID 23717437
Automated Detection of Systematic Off-label Drug Use in Free Text of Electronic Medical Records. AMIA Summits on Translational Science proceedings AMIA Summit on Translational Science Jung, K., LePendu, P., Shah, N. 2013; 2013: 94-98

Abstract

Off-label use of a drug occurs when it is used in a manner that deviates from its FDA label. Studies estimate that 21% of prescriptions are off-label, with only 27% of those uses supported by evidence of safety and efficacy. We have developed methods to detect population level off-label usage using computationally efficient annotation of free text from clinical notes to generate features encoding empirical information about drug-disease mentions. By including additional features encoding prior knowledge about drugs, diseases, and known usage, we trained a highly accurate predictive model that was used to detect novel candidate off-label usages in a very large clinical corpus. We show that the candidate uses are plausible and can be prioritized for further analysis in terms of safety and efficacy.

View details for PubMedID 24303308
Profiling risk factors for chronic uveitis in juvenile idiopathic arthritis: a new model for EHR-based research. Pediatric rheumatology online journal Cole, T. S., Frankovich, J., Iyer, S., LePendu, P., Bauer-Mehren, A., Shah, N. H. 2013; 11 (1): 45-?

Abstract

Juvenile idiopathic arthritis is the most common rheumatic disease in children. Chronic uveitis is a common and serious comorbid condition of juvenile idiopathic arthritis, with insidious presentation and potential to cause blindness. Knowledge of clinical associations will improve risk stratification. Based on clinical observation, we hypothesized that allergic conditions are associated with chronic uveitis in juvenile idiopathic arthritis patients.This study is a retrospective cohort study using Stanford's clinical data warehouse containing data from Lucile Packard Children's Hospital from 2000-2011 to analyze patient characteristics associated with chronic uveitis in a large juvenile idiopathic arthritis cohort. Clinical notes in patients under 16 years of age were processed via a validated text analytics pipeline. Bivariate-associated variables were used in a multivariate logistic regression adjusted for age, gender, and race. Previously reported associations were evaluated to validate our methods. The main outcome measure was presence of terms indicating allergy or allergy medications use overrepresented in juvenile idiopathic arthritis patients with chronic uveitis. Residual text features were then used in unsupervised hierarchical clustering to compare clinical text similarity between patients with and without uveitis.Previously reported associations with uveitis in juvenile idiopathic arthritis patients (earlier age at arthritis diagnosis, oligoarticular-onset disease, antinuclear antibody status, history of psoriasis) were reproduced in our study. Use of allergy medications and terms describing allergic conditions were independently associated with chronic uveitis. The association with allergy drugs when adjusted for known associations remained significant (OR 2.54, 95% CI 1.22-5.4).This study shows the potential of using a validated text analytics pipeline on clinical data warehouses to examine practice-based evidence for evaluating hypotheses formed during patient care. Our study reproduces four known associations with uveitis development in juvenile idiopathic arthritis patients, and reports a new association between allergic conditions and chronic uveitis in juvenile idiopathic arthritis patients.

View details for DOI 10.1186/1546-0096-11-45

View details for PubMedID 24299016
Network analysis of unstructured EHR data for clinical research. AMIA Summits on Translational Science proceedings AMIA Summit on Translational Science Bauer-Mehren, A., LePendu, P., Iyer, S. V., Harpaz, R., Leeper, N. J., Shah, N. H. 2013; 2013: 14-18

Abstract

In biomedical research, network analysis provides a conceptual framework for interpreting data from high-throughput experiments. For example, protein-protein interaction networks have been successfully used to identify candidate disease genes. Recently, advances in clinical text processing and the increasing availability of clinical data have enabled analogous analyses on data from electronic medical records. We constructed networks of diseases, drugs, medical devices and procedures using concepts recognized in clinical notes from the Stanford clinical data warehouse. We demonstrate the use of the resulting networks for clinical research informatics in two ways-cohort construction and outcomes analysis-by examining the safety of cilostazol in peripheral artery disease patients as a use case. We show that the network-based approaches can be used for constructing patient cohorts as well as for analyzing differences in outcomes by comparing with standard methods, and discuss the advantages offered by network-based approaches.

View details for PubMedID 24303229
Chapter 9: Analyses Using Disease Ontologies PLOS COMPUTATIONAL BIOLOGY Shah, N. H., Cole, T., Musen, M. A. 2012; 8 (12)

Abstract

Advanced statistical methods used to analyze high-throughput data such as gene-expression assays result in long lists of "significant genes." One way to gain insight into the significance of altered expression levels is to determine whether Gene Ontology (GO) terms associated with a particular biological process, molecular function, or cellular component are over- or under-represented in the set of genes deemed significant. This process, referred to as enrichment analysis, profiles a gene-set, and is widely used to makes sense of the results of high-throughput experiments. The canonical example of enrichment analysis is when the output dataset is a list of genes differentially expressed in some condition. To determine the biological relevance of a lengthy gene list, the usual solution is to perform enrichment analysis with the GO. We can aggregate the annotating GO concepts for each gene in this list, and arrive at a profile of the biological processes or mechanisms affected by the condition under study. While GO has been the principal target for enrichment analysis, the methods of enrichment analysis are generalizable. We can conduct the same sort of profiling along other ontologies of interest. Just as scientists can ask "Which biological process is over-represented in my set of interesting genes or proteins?" we can also ask "Which disease (or class of diseases) is over-represented in my set of interesting genes or proteins?". For example, by annotating known protein mutations with disease terms from the ontologies in BioPortal, Mort et al. recently identified a class of diseases--blood coagulation disorders--that were associated with a 14-fold depletion in substitutions at O-linked glycosylation sites. With the availability of tools for automatic annotation of datasets with terms from disease ontologies, there is no reason to restrict enrichment analyses to the GO. In this chapter, we will discuss methods to perform enrichment analysis using any ontology available in the biomedical domain. We will review the general methodology of enrichment analysis, the associated challenges, and discuss the novel translational analyses enabled by the existence of public, national computational infrastructure and by the use of disease ontologies in such analyses.

View details for DOI 10.1371/journal.pcbi.1002827

View details for Web of Science ID 000312901500032

View details for PubMedID 23300417

View details for PubMedCentralID PMC3531278
Analyzing Unstructured Clinical Notes for Phase IV Drug Safety Surveillance LePendu, P., Bauer-Mehren, A., Iyer, S., Shah, N. H. LIPPINCOTT WILLIAMS & WILKINS. 2012

View details for Web of Science ID 000208885005299
Proton Pump Inhibitors (PPIs) Increase Risk of MACE: Role of DDAH Ghebremariam, Y. T., Lee, J. C., LePendu, P., Erlanson, D. A., Shah, N. H., Cooke, J. P. LIPPINCOTT WILLIAMS & WILKINS. 2012

View details for Web of Science ID 000208885002082
Mining the pharmacogenomics literature-a survey of the state of the art BRIEFINGS IN BIOINFORMATICS Hahn, U., Cohen, K. B., Garten, Y., Shah, N. H. 2012; 13 (4): 460-494

Abstract

This article surveys efforts on text mining of the pharmacogenomics literature, mainly from the period 2008 to 2011. Pharmacogenomics (or pharmacogenetics) is the field that studies how human genetic variation impacts drug response. Therefore, publications span the intersection of research in genotypes, phenotypes and pharmacology, a topic that has increasingly become a focus of active research in recent years. This survey covers efforts dealing with the automatic recognition of relevant named entities (e.g. genes, gene variants and proteins, diseases and other pathological phenomena, drugs and other chemicals relevant for medical treatment), as well as various forms of relations between them. A wide range of text genres is considered, such as scientific publications (abstracts, as well as full texts), patent texts and clinical narratives. We also discuss infrastructure and resources needed for advanced text analytics, e.g. document corpora annotated with corresponding semantic metadata (gold standards and training data), biomedical terminologies and ontologies providing domain-specific background knowledge at different levels of formality and specificity, software architectures for building complex and scalable text analytics pipelines and Web services grounded to them, as well as comprehensive ways to disseminate and interact with the typically huge amounts of semiformal knowledge structures extracted by text mining tools. Finally, we consider some of the novel applications that have already been developed in the field of pharmacogenomic text mining and point out perspectives for future research.

View details for DOI 10.1093/bib/bbs018

View details for Web of Science ID 000306925000007

View details for PubMedID 22833496

View details for PubMedCentralID PMC3404399
Using ontology-based annotation to profile disease research JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Liu, Y., Coulet, A., LePendu, P., Shah, N. H. 2012; 19 (E1): E177-E186

Abstract

Profiling the allocation and trend of research activity is of interest to funding agencies, administrators, and researchers. However, the lack of a common classification system hinders the comprehensive and systematic profiling of research activities. This study introduces ontology-based annotation as a method to overcome this difficulty. Analyzing over a decade of funding data and publication data, the trends of disease research are profiled across topics, across institutions, and over time.This study introduces and explores the notions of research sponsorship and allocation and shows that leaders of research activity can be identified within specific disease areas of interest, such as those with high mortality or high sponsorship. The funding profiles of disease topics readily cluster themselves in agreement with the ontology hierarchy and closely mirror the funding agency priorities. Finally, four temporal trends are identified among research topics.This work utilizes disease ontology (DO)-based annotation to profile effectively the landscape of biomedical research activity. By using DO in this manner a use-case driven mechanism is also proposed to evaluate the utility of classification hierarchies.

View details for DOI 10.1136/amiajnl-2011-000631

View details for Web of Science ID 000314151400029

View details for PubMedID 22494789

View details for PubMedCentralID PMC3392849
Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Wu, S. T., Liu, H., Li, D., Tao, C., Musen, M. A., Chute, C. G., Shah, N. H. 2012; 19 (E1): E149-E156

Abstract

To characterise empirical instances of Unified Medical Language System (UMLS) Metathesaurus term strings in a large clinical corpus, and to illustrate what types of term characteristics are generalisable across data sources.Based on the occurrences of UMLS terms in a 51 million document corpus of Mayo Clinic clinical notes, this study computes statistics about the terms' string attributes, source terminologies, semantic types and syntactic categories. Term occurrences in 2010 i2b2/VA text were also mapped; eight example filters were designed from the Mayo-based statistics and applied to i2b2/VA data.For the corpus analysis, negligible numbers of mapped terms in the Mayo corpus had over six words or 55 characters. Of source terminologies in the UMLS, the Consumer Health Vocabulary and Systematized Nomenclature of Medicine-Clinical Terms (SNOMED-CT) had the best coverage in Mayo clinical notes at 106426 and 94788 unique terms, respectively. Of 15 semantic groups in the UMLS, seven groups accounted for 92.08% of term occurrences in Mayo data. Syntactically, over 90% of matched terms were in noun phrases. For the cross-institutional analysis, using five example filters on i2b2/VA data reduces the actual lexicon to 19.13% of the size of the UMLS and only sees a 2% reduction in matched terms.The corpus statistics presented here are instructive for building lexicons from the UMLS. Features intrinsic to Metathesaurus terms (well formedness, length and language) generalise easily across clinical institutions, but term frequencies should be adapted with caution. The semantic groups of mapped terms may differ slightly from institution to institution, but they differ greatly when moving to the biomedical literature domain.

View details for DOI 10.1136/amiajnl-2011-000744

View details for Web of Science ID 000314151400025

View details for PubMedID 22493050

View details for PubMedCentralID PMC3392861
Novel Data-Mining Methodologies for Adverse Drug Event Discovery and Analysis CLINICAL PHARMACOLOGY & THERAPEUTICS Harpaz, R., Dumouchel, W., Shah, N. H., Madigan, D., Ryan, P., Friedman, C. 2012; 91 (6): 1010-1021

Abstract

An important goal of the health system is to identify new adverse drug events (ADEs) in the postapproval period. Datamining methods that can transform data into meaningful knowledge to inform patient safety have proven essential for this purpose. New opportunities have emerged to harness data sources that have not been used within the traditional framework. This article provides an overview of recent methodological innovations and data sources used to support ADE discovery and analysis.

View details for DOI 10.1038/clpt.2012.50

View details for Web of Science ID 000304245800019

View details for PubMedID 22549283
The coming age of data-driven medicine: translational bioinformatics' next frontier JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Shah, N. H., Tenenbaum, J. D. 2012; 19 (E1): E2-E4

View details for DOI 10.1136/amiajnl-2012-000969

View details for Web of Science ID 000314151400002

View details for PubMedID 22718035

View details for PubMedCentralID PMC3392866
The National Center for Biomedical Ontology JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Musen, M. A., Noy, N. F., Shah, N. H., Whetzel, P. L., Chute, C. G., Story, M., Smith, B. 2012; 19 (2): 190-195

Abstract

The National Center for Biomedical Ontology is now in its seventh year. The goals of this National Center for Biomedical Computing are to: create and maintain a repository of biomedical ontologies and terminologies; build tools and web services to enable the use of ontologies and terminologies in clinical and translational research; educate their trainees and the scientific community broadly about biomedical ontology and ontology-based technology and best practices; and collaborate with a variety of groups who develop and use ontologies and terminologies in biomedicine. The centerpiece of the National Center for Biomedical Ontology is a web-based resource known as BioPortal. BioPortal makes available for research in computationally useful forms more than 270 of the world's biomedical ontologies and terminologies, and supports a wide range of web services that enable investigators to use the ontologies to annotate and retrieve data, to generate value sets and special-purpose lexicons, and to perform advanced analytics on a wide range of biomedical data.

View details for DOI 10.1136/amiajnl-2011-000523

View details for Web of Science ID 000300768100010

View details for PubMedID 22081220

View details for PubMedCentralID PMC3277625
Translational bioinformatics embraces big data. Yearbook of medical informatics Shah, N. H. 2012; 7 (1): 130-134

Abstract

We review the latest trends and major developments in translational bioinformatics in the year 2011-2012. Our emphasis is on highlighting the key events in the field and pointing at promising research areas for the future. The key take-home points are: • Translational informatics is ready to revolutionize human health and healthcare using large-scale measurements on individuals. • Data-centric approaches that compute on massive amounts of data (often called "Big Data") to discover patterns and to make clinically relevant predictions will gain adoption. • Research that bridges the latest multimodal measurement technologies with large amounts of electronic healthcare data is increasing; and is where new breakthroughs will occur.

View details for PubMedID 22890354
TEXT AND KNOWLEDGE MINING FOR PHARMACOGENOMICS: GENOTYPE-PHENOTYPE-DRUG RELATIONSHIPS Cohen, K., Garten, Y., Shah, N., Hahn, U. edited by Altman, R. B., Dunker, A. K., Hunter, L., Murray, T., Klein, T. E. WORLD SCIENTIFIC PUBL CO PTE LTD. 2012: 375

View details for Web of Science ID 000407150800036
Using temporal patterns in medical records to discern adverse drug events from indications. AMIA Summits on Translational Science proceedings AMIA Summit on Translational Science Liu, Y., LePendu, P., Iyer, S., Shah, N. H. 2012; 2012: 47-56

Abstract

Researchers estimate that electronic health record systems record roughly 2-million ambulatory adverse drug events and that patients suffer from adverse drug events in roughly 30% of hospital stays. Some have used structured databases of patient medical records and health insurance claims recently-going beyond the current paradigm of using spontaneous reporting systems like AERS-to detect drug-safety signals. However, most efforts do not use the free-text from clinical notes in monitoring for drug-safety signals. We hypothesize that drug-disease co-occurrences, extracted from ontology-based annotations of the clinical notes, can be examined for statistical enrichment and used for drug safety surveillance. When analyzing such co-occurrences of drugs and diseases, one major challenge is to differentiate whether the disease in a drug-disease pair represents an indication or an adverse event. We demonstrate that it is possible to make this distinction by combining the frequency distribution of the drug, the disease, and the drug-disease pair as well as the temporal ordering of the drugs and diseases in each pair across more than one million patients.

View details for PubMedID 22779050
Selected papers from the 14th Annual Bio-Ontologies Special Interest Group Meeting. Journal of biomedical semantics Soldatova, L. N., Sansone, S., Dumontier, M., Shah, N. H. 2012; 3: I1-?

Abstract

Over the 14 years, the Bio-Ontologies SIG at ISMB has provided a forum for discussion of the latest and most innovative research in the bio-ontologies development, its applications to biomedicine and more generally the organisation, presentation and dissemination of knowledge in biomedicine and the life sciences. The seven papers selected for this supplement span a wide range of topics including: web-based querying over multiple ontologies, integration of data from wikis, innovative methods of annotating and mining electronic health records, advances in annotating web documents and biomedical literature, quality control of ontology alignments, and the ontology support for predictive models about toxicity and open access to the toxicity data.

View details for DOI 10.1186/2041-1480-3-S1-I1

View details for PubMedID 22541591
Annotation Analysis for Testing Drug Safety Signals using Unstructured Clinical Notes. Journal of biomedical semantics LePendu, P., Iyer, S. V., Fairon, C., Shah, N. H. 2012; 3: S5-?

Abstract

The electronic surveillance for adverse drug events is largely based upon the analysis of coded data from reporting systems. Yet, the vast majority of electronic health data lies embedded within the free text of clinical notes and is not gathered into centralized repositories. With the increasing access to large volumes of electronic medical data-in particular the clinical notes-it may be possible to computationally encode and to test drug safety signals in an active manner.We describe the application of simple annotation tools on clinical text and the mining of the resulting annotations to compute the risk of getting a myocardial infarction for patients with rheumatoid arthritis that take Vioxx. Our analysis clearly reveals elevated risks for myocardial infarction in rheumatoid arthritis patients taking Vioxx (odds ratio 2.06) before 2005.Our results show that it is possible to apply annotation analysis methods for testing hypotheses about drug safety using electronic medical records.

View details for DOI 10.1186/2041-1480-3-S1-S5

View details for PubMedID 22541596

View details for PubMedCentralID PMC3337270
Analyzing patterns of drug use in clinical notes for patient safety. AMIA Summits on Translational Science proceedings AMIA Summit on Translational Science LePendu, P., Liu, Y., Iyer, S., Udell, M. R., Shah, N. H. 2012; 2012: 63-70

Abstract

Doctors prescribe drugs for indications that are not FDA approved. Research indicates that 21% of prescriptions filled are for off-label indications. Of those, more than 73% lack supporting scientific evidence. Traditional drug safety alerts may not cover usages that are not FDA approved. Therefore, analyzing patterns of off-label drug usage in the clinical setting is an important step toward reducing the incidence of adverse events and for improving patient safety. We applied term extraction tools on the clinical notes of a million patients to compile a database of statistically significant patterns of drug use. We validated some of the usage patterns learned from the data against sources of known on-label and off-label use. Given our ability to quantify adverse event risks using the clinical notes, this will enable us to address patient safety because we can now rank-order off-label drug use and prioritize the search for their adverse event profiles.

View details for PubMedID 22779054
Enabling enrichment analysis with the Human Disease Ontology. Journal of biomedical informatics LePendu, P., Musen, M. A., Shah, N. H. 2011; 44: S31-8

Abstract

Advanced statistical methods used to analyze high-throughput data such as gene-expression assays result in long lists of "significant genes." One way to gain insight into the significance of altered expression levels is to determine whether Gene Ontology (GO) terms associated with a particular biological process, molecular function, or cellular component are over- or under-represented in the set of genes deemed significant. This process, referred to as enrichment analysis, profiles a gene set, and is widely used to make sense of the results of high-throughput experiments. Our goal is to develop and apply general enrichment analysis methods to profile other sets of interest, such as patient cohorts from the electronic medical record, using a variety of ontologies including SNOMED CT, MedDRA, RxNorm, and others. Although it is possible to perform enrichment analysis using ontologies other than the GO, a key pre-requisite is the availability of a background set of annotations to enable the enrichment calculation. In the case of the GO, this background set is provided by the Gene Ontology Annotations. In the current work, we describe: (i) a general method that uses hand-curated GO annotations as a starting point for creating background datasets for enrichment analysis using other ontologies; and (ii) a gene-disease background annotation set - that enables disease-based enrichment - to demonstrate feasibility of our method.

View details for DOI 10.1016/j.jbi.2011.04.007

View details for PubMedID 21550421

View details for PubMedCentralID PMC3392036
NCBO Resource Index: Ontology-based search and mining of biomedical resources JOURNAL OF WEB SEMANTICS Jonquet, C., LePendu, P., Falconer, S., Coulet, A., Noy, N. F., Musen, M. A., Shah, N. H. 2011; 9 (3): 316-324

Abstract

The volume of publicly available data in biomedicine is constantly increasing. However, these data are stored in different formats and on different platforms. Integrating these data will enable us to facilitate the pace of medical discoveries by providing scientists with a unified view of this diverse information. Under the auspices of the National Center for Biomedical Ontology (NCBO), we have developed the Resource Index-a growing, large-scale ontology-based index of more than twenty heterogeneous biomedical resources. The resources come from a variety of repositories maintained by organizations from around the world. We use a set of over 200 publicly available ontologies contributed by researchers in various domains to annotate the elements in these resources. We use the semantics that the ontologies encode, such as different properties of classes, the class hierarchies, and the mappings between ontologies, in order to improve the search experience for the Resource Index user. Our user interface enables scientists to search the multiple resources quickly and efficiently using domain terms, without even being aware that there is semantics "under the hood."

View details for DOI 10.1016/j.websem.2011.06.005

View details for Web of Science ID 000300169800007

View details for PubMedCentralID PMC3170774
NCBO Resource Index: Ontology-Based Search and Mining of Biomedical Resources. Web semantics (Online) Jonquet, C., Lependu, P., Falconer, S., Coulet, A., Noy, N. F., Musen, M. A., Shah, N. H. 2011; 9 (3): 316-324

Abstract

The volume of publicly available data in biomedicine is constantly increasing. However, these data are stored in different formats and on different platforms. Integrating these data will enable us to facilitate the pace of medical discoveries by providing scientists with a unified view of this diverse information. Under the auspices of the National Center for Biomedical Ontology (NCBO), we have developed the Resource Index-a growing, large-scale ontology-based index of more than twenty heterogeneous biomedical resources. The resources come from a variety of repositories maintained by organizations from around the world. We use a set of over 200 publicly available ontologies contributed by researchers in various domains to annotate the elements in these resources. We use the semantics that the ontologies encode, such as different properties of classes, the class hierarchies, and the mappings between ontologies, in order to improve the search experience for the Resource Index user. Our user interface enables scientists to search the multiple resources quickly and efficiently using domain terms, without even being aware that there is semantics "under the hood."

View details for DOI 10.1016/j.websem.2011.06.005

View details for PubMedID 21918645

View details for PubMedCentralID PMC3170774
BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications NUCLEIC ACIDS RESEARCH Whetzel, P. L., Noy, N. F., Shah, N. H., Alexander, P. R., Nyulas, C., Tudorache, T., Musen, M. A. 2011; 39: W541-W545

Abstract

The National Center for Biomedical Ontology (NCBO) is one of the National Centers for Biomedical Computing funded under the NIH Roadmap Initiative. Contributing to the national computing infrastructure, NCBO has developed BioPortal, a web portal that provides access to a library of biomedical ontologies and terminologies (http://bioportal.bioontology.org) via the NCBO Web services. BioPortal enables community participation in the evaluation and evolution of ontology content by providing features to add mappings between terms, to add comments linked to specific ontology terms and to provide ontology reviews. The NCBO Web services (http://www.bioontology.org/wiki/index.php/NCBO_REST_services) enable this functionality and provide a uniform mechanism to access ontologies from a variety of knowledge representation formats, such as Web Ontology Language (OWL) and Open Biological and Biomedical Ontologies (OBO) format. The Web services provide multi-layered access to the ontology content, from getting all terms in an ontology to retrieving metadata about a term. Users can easily incorporate the NCBO Web services into software applications to generate semantically aware applications and to facilitate structured data collection.

View details for DOI 10.1093/nar/gkr469

View details for Web of Science ID 000292325300088

View details for PubMedID 21672956

View details for PubMedCentralID PMC3125807
Computationally translating molecular discoveries into tools for medicine: translational bioinformatics articles now featured in JAMIA JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Butte, A. J., Shah, N. H. 2011; 18 (4): 352-353

View details for DOI 10.1136/amiajnl-2011-000343

View details for Web of Science ID 000292061700002

View details for PubMedID 21672904

View details for PubMedCentralID PMC3128419
Integration and publication of heterogeneous text-mined relationships on the Semantic Web. Journal of biomedical semantics Coulet, A., Garten, Y., Dumontier, M., Altman, R. B., Musen, M. A., Shah, N. H. 2011; 2: S10-?

Abstract

Advances in Natural Language Processing (NLP) techniques enable the extraction of fine-grained relationships mentioned in biomedical text. The variability and the complexity of natural language in expressing similar relationships causes the extracted relationships to be highly heterogeneous, which makes the construction of knowledge bases difficult and poses a challenge in using these for data mining or question answering.We report on the semi-automatic construction of the PHARE relationship ontology (the PHArmacogenomic RElationships Ontology) consisting of 200 curated relations from over 40,000 heterogeneous relationships extracted via text-mining. These heterogeneous relations are then mapped to the PHARE ontology using synonyms, entity descriptions and hierarchies of entities and roles. Once mapped, relationships can be normalized and compared using the structure of the ontology to identify relationships that have similar semantics but different syntax. We compare and contrast the manual procedure with a fully automated approach using WordNet to quantify the degree of integration enabled by iterative curation and refinement of the PHARE ontology. The result of such integration is a repository of normalized biomedical relationships, named PHARE-KB, which can be queried using Semantic Web technologies such as SPARQL and can be visualized in the form of a biological network.The PHARE ontology serves as a common semantic framework to integrate more than 40,000 relationships pertinent to pharmacogenomics. The PHARE ontology forms the foundation of a knowledge base named PHARE-KB. Once populated with relationships, PHARE-KB (i) can be visualized in the form of a biological network to guide human tasks such as database curation and (ii) can be queried programmatically to guide bioinformatics applications such as the prediction of molecular interactions. PHARE is available at http://purl.bioontology.org/ontology/PHARE.

View details for DOI 10.1186/2041-1480-2-S2-S10

View details for PubMedID 21624156

View details for PubMedCentralID PMC3102890
MINING THE PHARMACOGENOMICS LITERATURE Cohen, K., Garten, Y., Hahn, U., Shah, N. H. edited by Altman, R. B., Dunker, A. K., Hunter, L., Murray, T., Klein, T. E. WORLD SCIENTIFIC PUBL CO PTE LTD. 2011: 362-368

View details for Web of Science ID 000413599200037
Mapping between the OBO and OWL ontology languages. Journal of biomedical semantics Tirmizi, S. H., Aitken, S., Moreira, D. A., Mungall, C., Sequeda, J., Shah, N. H., Miranker, D. P. 2011; 2: S3-?

Abstract

Ontologies are commonly used in biomedicine to organize concepts to describe domains such as anatomies, environments, experiment, taxonomies etc. NCBO BioPortal currently hosts about 180 different biomedical ontologies. These ontologies have been mainly expressed in either the Open Biomedical Ontology (OBO) format or the Web Ontology Language (OWL). OBO emerged from the Gene Ontology, and supports most of the biomedical ontology content. In comparison, OWL is a Semantic Web language, and is supported by the World Wide Web consortium together with integral query languages, rule languages and distributed infrastructure for information interchange. These features are highly desirable for the OBO content as well. A convenient method for leveraging these features for OBO ontologies is by transforming OBO ontologies to OWL.We have developed a methodology for translating OBO ontologies to OWL using the organization of the Semantic Web itself to guide the work. The approach reveals that the constructs of OBO can be grouped together to form a similar layer cake. Thus we were able to decompose the problem into two parts. Most OBO constructs have easy and obvious equivalence to a construct in OWL. A small subset of OBO constructs requires deeper consideration. We have defined transformations for all constructs in an effort to foster a standard common mapping between OBO and OWL. Our mapping produces OWL-DL, a Description Logics based subset of OWL with desirable computational properties for efficiency and correctness. Our Java implementation of the mapping is part of the official Gene Ontology project source.Our transformation system provides a lossless roundtrip mapping for OBO ontologies, i.e. an OBO ontology may be translated to OWL and back without loss of knowledge. In addition, it provides a roadmap for bridging the gap between the two ontology languages in order to enable the use of ontology content in a language independent manner.

View details for DOI 10.1186/2041-1480-2-S1-S3

View details for PubMedID 21388572
Selected papers from the 13th Annual Bio-Ontologies Special Interest Group Meeting. Journal of biomedical semantics Soldatova, L. N., Sansone, S., Stephens, S. M., Shah, N. H. 2011; 2: I1-?

Abstract

Over the years, the Bio-Ontologies SIG at ISMB has provided a forum for discussion of the latest and most innovative research in the application of ontologies and more generally the organisation, presentation and dissemination of knowledge in biomedicine and the life sciences. The ten papers selected for this supplement are extended versions of the original papers presented at the 2010 SIG. The papers span a wide range of topics including practical solutions for data and knowledge integration for translational medicine, hypothesis based querying , understanding kidney and urinary pathways, mining the pharmacogenomics literature; theoretical research into the orthogonality of biomedical ontologies, the representation of diseases, the representation of research hypotheses, the combination of ontologies and natural language processing for an annotation framework, the generation of textual definitions, and the discovery of gene interaction networks.

View details for DOI 10.1186/2041-1480-2-S2-I1

View details for PubMedID 21624154
HyQue: evaluating hypotheses using Semantic Web technologies. Journal of biomedical semantics Callahan, A., Dumontier, M., Shah, N. H. 2011; 2: S3-?

Abstract

Key to the success of e-Science is the ability to computationally evaluate expert-composed hypotheses for validity against experimental data. Researchers face the challenge of collecting, evaluating and integrating large amounts of diverse information to compose and evaluate a hypothesis. Confronted with rapidly accumulating data, researchers currently do not have the software tools to undertake the required information integration tasks.We present HyQue, a Semantic Web tool for querying scientific knowledge bases with the purpose of evaluating user submitted hypotheses. HyQue features a knowledge model to accommodate diverse hypotheses structured as events and represented using Semantic Web languages (RDF/OWL). Hypothesis validity is evaluated against experimental and literature-sourced evidence through a combination of SPARQL queries and evaluation rules. Inference over OWL ontologies (for type specifications, subclass assertions and parthood relations) and retrieval of facts stored as Bio2RDF linked data provide support for a given hypothesis. We evaluate hypotheses of varying levels of detail about the genetic network controlling galactose metabolism in Saccharomyces cerevisiae to demonstrate the feasibility of deploying such semantic computing tools over a growing body of structured knowledge in Bio2RDF.HyQue is a query-based hypothesis evaluation system that can currently evaluate hypotheses about the galactose metabolism in S. cerevisiae. Hypotheses as well as the supporting or refuting data are represented in RDF and directly linked to one another allowing scientists to browse from data to hypothesis and vice versa. HyQue hypotheses and data are available at http://semanticscience.org/projects/hyque.

View details for DOI 10.1186/2041-1480-2-S2-S3

View details for PubMedID 21624158
Using text to build semantic networks for pharmacogenomics JOURNAL OF BIOMEDICAL INFORMATICS Coulet, A., Shah, N. H., Garten, Y., Musen, M., Altman, R. B. 2010; 43 (6): 1009-1019

Abstract

Most pharmacogenomics knowledge is contained in the text of published studies, and is thus not available for automated computation. Natural Language Processing (NLP) techniques for extracting relationships in specific domains often rely on hand-built rules and domain-specific ontologies to achieve good performance. In a new and evolving field such as pharmacogenomics (PGx), rules and ontologies may not be available. Recent progress in syntactic NLP parsing in the context of a large corpus of pharmacogenomics text provides new opportunities for automated relationship extraction. We describe an ontology of PGx relationships built starting from a lexicon of key pharmacogenomic entities and a syntactic parse of more than 87 million sentences from 17 million MEDLINE abstracts. We used the syntactic structure of PGx statements to systematically extract commonly occurring relationships and to map them to a common schema. Our extracted relationships have a 70-87.7% precision and involve not only key PGx entities such as genes, drugs, and phenotypes (e.g., VKORC1, warfarin, clotting disorder), but also critical entities that are frequently modified by these key entities (e.g., VKORC1 polymorphism, warfarin response, clotting disorder treatment). The result of our analysis is a network of 40,000 relationships between more than 200 entity types with clear semantics. This network is used to guide the curation of PGx knowledge and provide a computable resource for knowledge discovery.

View details for DOI 10.1016/j.jbi.2010.08.005

View details for Web of Science ID 000285036700017

View details for PubMedID 20723615

View details for PubMedCentralID PMC2991587
The BioPAX community standard for pathway data sharing NATURE BIOTECHNOLOGY Demir, E., Cary, M. P., Paley, S., Fukuda, K., Lemer, C., Vastrik, I., Wu, G., D'Eustachio, P., Schaefer, C., Luciano, J., Schacherer, F., Martinez-Flores, I., Hu, Z., Jimenez-Jacinto, V., Joshi-Tope, G., Kandasamy, K., Lopez-Fuentes, A. C., Mi, H., Pichler, E., Rodchenkov, I., Splendiani, A., Tkachev, S., Zucker, J., Gopinath, G., Rajasimha, H., Ramakrishnan, R., Shah, I., Syed, M., Anwar, N., Babur, O., Blinov, M., Brauner, E., Corwin, D., Donaldson, S., Gibbons, F., Goldberg, R., Hornbeck, P., Luna, A., Murray-Rust, P., Neumann, E., Reubenacker, O., Samwald, M., van Iersel, M., Wimalaratne, S., Allen, K., Braun, B., Whirl-Carrillo, M., Cheung, K., Dahlquist, K., Finney, A., Gillespie, M., Glass, E., Gong, L., Haw, R., Honig, M., Hubaut, O., Kane, D., Krupa, S., Kutmon, M., Leonard, J., Marks, D., Merberg, D., Petri, V., Pico, A., Ravenscroft, D., Ren, L., Shah, N., Sunshine, M., Tang, R., Whaley, R., Letovksy, S., Buetow, K. H., Rzhetsky, A., Schachter, V., Sobral, B. S., Dogrusoz, U., McWeeney, S., Aladjem, M., Birney, E., Collado-Vides, J., Goto, S., Hucka, M., Le Novere, N., Maltsev, N., Pandey, A., Thomas, P., Wingender, E., Karp, P. D., Sander, C., Bader, G. D. 2010; 28 (9): 935-942

Abstract

Biological Pathway Exchange (BioPAX) is a standard language to represent biological pathways at the molecular and cellular level and to facilitate the exchange of pathway data. The rapid growth of the volume of pathway data has spurred the development of databases and computational tools to aid interpretation; however, use of these data is hampered by the current fragmentation of pathway information across many databases with incompatible formats. BioPAX, which was created through a community process, solves this problem by making pathway data substantially easier to collect, index, interpret and share. BioPAX can represent metabolic and signaling pathways, molecular and genetic interactions and gene regulation networks. Using BioPAX, millions of interactions, organized into thousands of pathways, from many organisms are available from a growing number of databases. This large amount of pathway data in a computable form will support visualization, analysis and biological discovery.

View details for DOI 10.1038/nbt.1666

View details for Web of Science ID 000281719100019

View details for PubMedID 20829833

View details for PubMedCentralID PMC3001121
A UIMA wrapper for the NCBO annotator BIOINFORMATICS Roeder, C., Jonquet, C., Shah, N. H., Baumgartner, W. A., Verspoor, K., Hunter, L. 2010; 26 (14): 1800-1801

Abstract

The Unstructured Information Management Architecture (UIMA) framework and web services are emerging as useful tools for integrating biomedical text mining tools. This note describes our work, which wraps the National Center for Biomedical Ontology (NCBO) Annotator-an ontology-based annotation service-to make it available as a component in UIMA workflows.This wrapper is freely available on the web at http://bionlp-uima.sourceforge.net/ as part of the UIMA tools distribution from the Center for Computational Pharmacology (CCP) at the University of Colorado School of Medicine. It has been implemented in Java for support on Mac OS X, Linux and MS Windows.

View details for DOI 10.1093/bioinformatics/btq250

View details for Web of Science ID 000279474400025

View details for PubMedID 20505005

View details for PubMedCentralID PMC2894505
In Silico Functional Profiling of Human Disease-Associated and Polymorphic Amino Acid Substitutions HUMAN MUTATION Mort, M., Evani, U. S., Krishnan, V. G., Kamati, K. K., Baenziger, P. H., Bagchi, A., Peters, B. J., Sathyesh, R., Li, B., Sun, Y., Xue, B., Shah, N. H., Kann, M. G., Cooper, D. N., Radivojac, P., Mooney, S. D. 2010; 31 (3): 335-346

Abstract

An important challenge in translational bioinformatics is to understand how genetic variation gives rise to molecular changes at the protein level that can precipitate both monogenic and complex disease. To this end, we compiled datasets of human disease-associated amino acid substitutions (AAS) in the contexts of inherited monogenic disease, complex disease, functional polymorphisms with no known disease association, and somatic mutations in cancer, and compared them with respect to predicted functional sites in proteins. Using the sequence homology-based tool SIFT to estimate the proportion of deleterious AAS in each dataset, only complex disease AAS were found to be indistinguishable from neutral polymorphic AAS. Investigation of monogenic disease AAS predicted to be nondeleterious by SIFT were characterized by a significant enrichment for inherited AAS within solvent accessible residues, regions of intrinsic protein disorder, and an association with the loss or gain of various posttranslational modifications. Sites of structural and/or functional interest were therefore surmised to constitute useful additional features with which to identify the molecular disruptions caused by deleterious AAS. A range of bioinformatic tools, designed to predict structural and functional sites in protein sequences, were then employed to demonstrate that intrinsic biases exist in terms of the distribution of different types of human AAS with respect to specific structural, functional and pathological features. Our Web tool, designed to potentiate the functional profiling of novel AAS, has been made available at http://profile.mutdb.org/.

View details for DOI 10.1002/humu.21192

View details for Web of Science ID 000275419900014

View details for PubMedID 20052762

View details for PubMedCentralID PMC3098813
Selected papers from the 12th annual Bio-Ontologies meeting. Journal of biomedical semantics Soldatova, L. N., Lord, P., Sansone, S., Stephens, S. M., Shah, N. H. 2010; 1: I1-?

View details for DOI 10.1186/2041-1480-1-S1-I1

View details for PubMedID 20626920
Optimize First, Buy Later: Analyzing Metrics to Ramp-Up Very Large Knowledge Bases 9th International Semantic Web Conference LePendu, P., Noy, N. F., Jonquet, C., Alexander, P. R., Shah, N. H., Musen, M. A. SPRINGER-VERLAG BERLIN. 2010: 486–501

View details for Web of Science ID 000297613200031
The Lexicon Builder Web service: Building Custom Lexicons from two hundred Biomedical Ontologies. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Parai, G. K., Jonquet, C., xu, r., Musen, M. A., Shah, N. H. 2010; 2010: 587-591

Abstract

Domain specific biomedical lexicons are extensively used by researchers for natural language processing tasks. Currently these lexicons are created manually by expert curators and there is a pressing need for automated methods to compile such lexicons. The Lexicon Builder Web service addresses this need and reduces the investment of time and effort involved in lexicon maintenance. The service has three components: Inclusion - selects one or several ontologies (or its branches) and includes preferred names and synonym terms; Exclusion - filters terms based on the term's Medline frequency, syntactic type, UMLS semantic type and match with stopwords; Output - aggregates information, handles compression and output formats. Evaluation demonstrates that the service has high accuracy and runtime performance. It is currently being evaluated for several use cases to establish its utility in biomedical information processing tasks. The Lexicon Builder promotes collaboration, sharing and standardization of lexicons amongst researchers by automating the creation, maintainence and cross referencing of custom lexicons.

View details for PubMedID 21347046
Building a biomedical ontology recommender web service. Journal of biomedical semantics Jonquet, C., Musen, M. A., Shah, N. H. 2010; 1: S1-?

Abstract

Researchers in biomedical informatics use ontologies and terminologies to annotate their data in order to facilitate data integration and translational discoveries. As the use of ontologies for annotation of biomedical datasets has risen, a common challenge is to identify ontologies that are best suited to annotating specific datasets. The number and variety of biomedical ontologies is large, and it is cumbersome for a researcher to figure out which ontology to use.We present the Biomedical Ontology Recommender web service. The system uses textual metadata or a set of keywords describing a domain of interest and suggests appropriate ontologies for annotating or representing the data. The service makes a decision based on three criteria. The first one is coverage, or the ontologies that provide most terms covering the input text. The second is connectivity, or the ontologies that are most often mapped to by other ontologies. The final criterion is size, or the number of concepts in the ontologies. The service scores the ontologies as a function of scores of the annotations created using the National Center for Biomedical Ontology (NCBO) Annotator web service. We used all the ontologies from the UMLS Metathesaurus and the NCBO BioPortal.We compare and contrast our Recommender by an exhaustive functional comparison to previously published efforts. We evaluate and discuss the results of several recommendation heuristics in the context of three real world use cases. The best recommendations heuristics, rated 'very relevant' by expert evaluators, are the ones based on coverage and connectivity criteria. The Recommender service (alpha version) is available to the community and is embedded into BioPortal.

View details for DOI 10.1186/2041-1480-1-S1-S1

View details for PubMedID 20626921

View details for PubMedCentralID PMC2903720
An ontology-neutral framework for enrichment analysis. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Tirrell, R., Evani, U., Berman, A. E., Mooney, S. D., Musen, M. A., Shah, N. H. 2010; 2010: 797-801

Abstract

Advanced statistical methods used to analyze high-throughput data (e.g. gene-expression assays) result in long lists of "significant genes." One way to gain insight into the significance of altered expression levels is to determine whether Gene Ontology (GO) terms associated with a particular biological process, molecular function, or cellular component are over- or under-represented in the set of genes deemed significant. This process, referred to as enrichment analysis, profiles a gene-set, and is relevant for and extensible to data analysis with other high-throughput measurement modalities such as proteomics, metabolomics, and tissue-microarray assays. With the availability of tools for automatic ontology-based annotation of datasets with terms from biomedical ontologies besides the GO, we need not restrict enrichment analysis to the GO. We describe, RANSUM - Rich Annotation Summarizer - which performs enrichment analysis using any ontology in the National Center for Biomedical Ontology's (NCBO) BioPortal. We outline the methodology of enrichment analysis, the associated challenges, and discuss novel analyses enabled by RANSUM.

View details for PubMedID 21347088
Extraction of genotype-phenotype-drug relationships from text: from entity recognition to bioinformatics application. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Coulet, A., Shah, N., Hunter, L., Barral, C., Altman, R. B. 2010: 485-487

Abstract

Advances in concept recognition and natural language parsing have led to the development of various tools that enable the identification of biomedical entities and relationships between them in text. The aim of the Genotype-Phenotype-Drug Relationship Extraction from Text workshop (or GPD-Rx workshop) is to examine the current state of art and discuss the next steps for making the extraction of relationships between biomedical entities integral to the curation and knowledge management workflow in Pharmacogenomics. The workshop will focus particularly on the extraction of Genotype-Phenotype, Genotype-Drug, and Phenotype-Drug relationships that are of interest to Pharmacogenomics. Extracting and structuring such text-mined relationships is a key to support the evaluation and the validation of multiple hypotheses that emerge from high throughput translational studies spanning multiple measurement modalities. In order to advance this agenda, it is essential that existing relationship extraction methods be compared to one another and that a community wide benchmark corpus emerges; against which future methods can be compared. The workshop aims to bring together researchers working on the automatic or semi-automatic extraction of relationships between biomedical entities from research literature in order to identify the key groups interested in creating such a benchmark.

View details for PubMedID 19904832
A Comprehensive Analysis of Five Million UMLS Metathesaurus Terms Using Eighteen Million MEDLINE Citations. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium xu, r., Musen, M. A., Shah, N. H. 2010; 2010: 907-911

Abstract

The Unified Medical Language System (UMLS) Metathesaurus is widely used for biomedical natural language processing (NLP) tasks. In this study, we systematically analyzed UMLS Metathesaurus terms by analyzing their occurrences in over 18 million MEDLINE abstracts. Our goals were: 1. analyze the frequency and syntactic distribution of Metathesaurus terms in MEDLINE; 2. create a filtered UMLS Metathesaurus based on the MEDLINE analysis; 3. augment the UMLS Metathesaurus where each term is associated with metadata on its MEDLINE frequency and syntactic distribution statistics. After MEDLINE frequency-based filtering, the augmented UMLS Metathesaurus contains 518,835 terms and is roughly 13% of its original size. We have shown that the syntactic and frequency information is useful to identify errors in the Metathesaurus. This filtered and augmented UMLS Metathesaurus can potentially be used to improve efficiency and precision of UMLS-based information retrieval and NLP tasks.

View details for PubMedID 21347110
BioPortal: ontologies and integrated data resources at the click of a mouse NUCLEIC ACIDS RESEARCH Noy, N. F., Shah, N. H., Whetzel, P. L., Dai, B., Dorf, M., Griffith, N., Jonquet, C., Rubin, D. L., Storey, M., Chute, C. G., Musen, M. A. 2009; 37: W170-W173

Abstract

Biomedical ontologies provide essential domain knowledge to drive data integration, information retrieval, data annotation, natural-language processing and decision support. BioPortal (http://bioportal.bioontology.org) is an open repository of biomedical ontologies that provides access via Web services and Web browsers to ontologies developed in OWL, RDF, OBO format and Protégé frames. BioPortal functionality includes the ability to browse, search and visualize ontologies. The Web interface also facilitates community-based participation in the evaluation and evolution of ontology content by providing features to add notes to ontology terms, mappings between terms and ontology reviews based on criteria such as usability, domain coverage, quality of content, and documentation and support. BioPortal also enables integrated search of biomedical data resources such as the Gene Expression Omnibus (GEO), ClinicalTrials.gov, and ArrayExpress, through the annotation and indexing of these resources with ontologies in BioPortal. Thus, BioPortal not only provides investigators, clinicians, and developers 'one-stop shopping' to programmatically access biomedical ontologies, but also provides support to integrate data from a variety of biomedical resources.

View details for DOI 10.1093/nar/gkp440

View details for Web of Science ID 000267889100031

View details for PubMedID 19483092

View details for PubMedCentralID PMC2703982
Ontology-driven indexing of public datasets for translational bioinformatics 1st Summit on Translational Bioinformatics Shah, N. H., Jonquet, C., Chiang, A. P., Butte, A. J., Chen, R., Musen, M. A. BIOMED CENTRAL LTD. 2009

Abstract

The volume of publicly available genomic scale data is increasing. Genomic datasets in public repositories are annotated with free-text fields describing the pathological state of the studied sample. These annotations are not mapped to concepts in any ontology, making it difficult to integrate these datasets across repositories. We have previously developed methods to map text-annotations of tissue microarrays to concepts in the NCI thesaurus and SNOMED-CT. In this work we generalize our methods to map text annotations of gene expression datasets to concepts in the UMLS. We demonstrate the utility of our methods by processing annotations of datasets in the Gene Expression Omnibus. We demonstrate that we enable ontology-based querying and integration of tissue and gene expression microarray data. We enable identification of datasets on specific diseases across both repositories. Our approach provides the basis for ontology-driven data integration for translational research on gene and protein expression data. Based on this work we have built a prototype system for ontology based annotation and indexing of biomedical data. The system processes the text metadata of diverse resource elements such as gene expression data sets, descriptions of radiology images, clinical-trial reports, and PubMed article abstracts to annotate and index them with concepts from appropriate ontologies. The key functionality of this system is to enable users to locate biomedical data resources related to particular ontology concepts.

View details for Web of Science ID 000265602500002

View details for PubMedID 19208184

View details for PubMedCentralID PMC2646250
The open biomedical annotator. Summit on translational bioinformatics Jonquet, C., Shah, N. H., Musen, M. A. 2009; 2009: 56-60

Abstract

The range of publicly available biomedical data is enormous and is expanding fast. This expansion means that researchers now face a hurdle to extracting the data they need from the large numbers of data that are available. Biomedical researchers have turned to ontologies and terminologies to structure and annotate their data with ontology concepts for better search and retrieval. However, this annotation process cannot be easily automated and often requires expert curators. Plus, there is a lack of easy-to-use systems that facilitate the use of ontologies for annotation. This paper presents the Open Biomedical Annotator (OBA), an ontology-based Web service that annotates public datasets with biomedical ontology concepts based on their textual metadata (www.bioontology.org). The biomedical community can use the annotator service to tag datasets automatically with ontology terms (from UMLS and NCBO BioPortal ontologies). Such annotations facilitate translational discoveries by integrating annotated data.[1].

View details for PubMedID 21347171
What Four Million Mappings Can Tell You about Two Hundred Ontologies 8th International Semantic Web Conference Ghazvinian, A., Noy, N. F., Jonquet, C., Shah, N., Musen, M. A. SPRINGER-VERLAG BERLIN. 2009: 229–242

View details for Web of Science ID 000273977000015
Comparison of concept recognizers for building the Open Biomedical Annotator 2nd Summit on Translational Bioinformatics Shah, N. H., Bhatia, N., Jonquet, C., Rubin, D., Chiang, A. P., Musen, M. A. BIOMED CENTRAL LTD. 2009

Abstract

The National Center for Biomedical Ontology (NCBO) is developing a system for automated, ontology-based access to online biomedical resources (Shah NH, et al.: Ontology-driven indexing of public datasets for translational bioinformatics. BMC Bioinformatics 2009, 10(Suppl 2):S1). The system's indexing workflow processes the text metadata of diverse resources such as datasets from GEO and ArrayExpress to annotate and index them with concepts from appropriate ontologies. This indexing requires the use of a concept-recognition tool to identify ontology concepts in the resource's textual metadata. In this paper, we present a comparison of two concept recognizers - NLM's MetaMap and the University of Michigan's Mgrep. We utilize a number of data sources and dictionaries to evaluate the concept recognizers in terms of precision, recall, speed of execution, scalability and customizability. Our evaluations demonstrate that Mgrep has a clear edge over MetaMap for large-scale service oriented applications. Based on our analysis we also suggest areas of potential improvements for Mgrep. We have subsequently used Mgrep to build the Open Biomedical Annotator service. The Annotator service has access to a large dictionary of biomedical terms derived from the United Medical Language System (UMLS) and NCBO ontologies. The Annotator also leverages the hierarchical structure of the ontologies and their mappings to expand annotations. The Annotator service is available to the community as a REST Web service for creating ontology-based annotations of their data.

View details for Web of Science ID 000270371700015

View details for PubMedID 19761568

View details for PubMedCentralID PMC2745685
BioPortal: ontologies and data resources with the click of a mouse. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Musen, M. A., Shah, N. H., Noy, N. F., Dai, B. Y., Dorf, M., Griffith, N., Buntrok, J., Jonquet, C., Montegut, M. J., Rubin, D. L. 2008: 1223-1224

View details for PubMedID 18999306
Pathway knowledge base: An integrated pathway resource using BioPAX APPLIED ONTOLOGY Kotecha, N., Bruck, K., Lu, W., Shah, N. 2008; 3 (4): 235-245

View details for DOI 10.3233/AO-2008-0054

View details for Web of Science ID 000526966400006
A system for ontology-based annotation of biomedical data 5th International Workshop on Data Integration in the Life Sciences Jonquet, C., Musen, M. A., Shah, N. SPRINGER-VERLAG BERLIN. 2008: 144–152

View details for Web of Science ID 000257304400014
Comparison of ontology-based semantic-similarity measures. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Lee, W., Shah, N., Sundlass, K., Musen, M. 2008: 384-388

Abstract

Semantic-similarity measures quantify concept similarities in a given ontology. Potential applications for these measures include search, data mining, and knowledge discovery in database or decision-support systems that utilize ontologies. To date, there have not been comparisons of the different semantic-similarity approaches on a single ontology. Such a comparison can offer insight on the validity of different approaches. We compared 3 approaches to semantic similarity-metrics (which rely on expert opinion, ontologies only, and information content) with 4 metrics applied to SNOMED-CT. We found that there was poor agreement among those metrics based on information content with the ontology only metric. The metric based only on the ontology structure correlated most with expert opinion. Our results suggest that metrics based on the ontology only may be preferable to information-content-based metrics, and point to the need for more research on validating the different approaches.

View details for PubMedID 18999312
UMLS-Query: a perl module for querying the UMLS. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Shah, N. H., Muse, M. A. 2008: 652-656

Abstract

The Metathesaurus from the Unified Medical Language System (UMLS) is a widely used ontology resource, which is mostly used in a relational database form for terminology research, mapping and information indexing. A significant section of UMLS users use a MySQL installation of the metathesaurus and Perl programming language as their access mechanism. We describe UMLS-Query, a Perl module that provides functions for retrieving concept identifiers, mapping text-phrases to Metathesaurus concepts and graph traversal in the Metathesaurus stored in a MySQL database. UMLS-Query can be used to build applications for semi-automated sample annotation, terminology based browsers for tissue sample databases and for terminology research. We describe the results of such uses of UMLS-Query and present the module for others to use.

View details for PubMedID 18998805
Biomedical ontologies: a functional perspective BRIEFINGS IN BIOINFORMATICS Rubin, D. L., Shah, N. H., Noy, N. F. 2008; 9 (1): 75-90

Abstract

The information explosion in biology makes it difficult for researchers to stay abreast of current biomedical knowledge and to make sense of the massive amounts of online information. Ontologies--specifications of the entities, their attributes and relationships among the entities in a domain of discourse--are increasingly enabling biomedical researchers to accomplish these tasks. In fact, bio-ontologies are beginning to proliferate in step with accruing biological data. The myriad of ontologies being created enables researchers not only to solve some of the problems in handling the data explosion but also introduces new challenges. One of the key difficulties in realizing the full potential of ontologies in biomedical research is the isolation of various communities involved: some workers spend their career developing ontologies and ontology-related tools, while few researchers (biologists and physicians) know how ontologies can accelerate their research. The objective of this review is to give an overview of biomedical ontology in practical terms by providing a functional perspective--describing how bio-ontologies can and are being used. As biomedical scientists begin to recognize the many different ways ontologies enable biomedical research, they will drive the emergence of new computer applications that will help them exploit the wealth of research data now at their fingertips.

View details for DOI 10.1093/bib/bbm059

View details for Web of Science ID 000251864600008

View details for PubMedID 18077472
The Stanford Tissue Microarray Database NUCLEIC ACIDS RESEARCH Marinelli, R. J., Montgomery, K., Liu, C. L., Shah, N. H., Prapong, W., Nitzberg, M., Zachariah, Z. K., Sherlock, G. J., Natkunam, Y., West, R. B., van de Rijn, M., Brown, P. O., Ball, C. A. 2008; 36: D871-D877

Abstract

The Stanford Tissue Microarray Database (TMAD; http://tma.stanford.edu) is a public resource for disseminating annotated tissue images and associated expression data. Stanford University pathologists, researchers and their collaborators worldwide use TMAD for designing, viewing, scoring and analyzing their tissue microarrays. The use of tissue microarrays allows hundreds of human tissue cores to be simultaneously probed by antibodies to detect protein abundance (Immunohistochemistry; IHC), or by labeled nucleic acids (in situ hybridization; ISH) to detect transcript abundance. TMAD archives multi-wavelength fluorescence and bright-field images of tissue microarrays for scoring and analysis. As of July 2007, TMAD contained 205 161 images archiving 349 distinct probes on 1488 tissue microarray slides. Of these, 31 306 images for 68 probes on 125 slides have been released to the public. To date, 12 publications have been based on these raw public data. TMAD incorporates the NCI Thesaurus ontology for searching tissues in the cancer domain. Image processing researchers can extract images and scores for training and testing classification algorithms. The production server uses the Apache HTTP Server, Oracle Database and Perl application code. Source code is available to interested researchers under a no-cost license.

View details for DOI 10.1093/nar/gkm861

View details for PubMedID 17989087
The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration NATURE BIOTECHNOLOGY Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W., Ceusters, W., Goldberg, L. J., Eilbeck, K., Ireland, A., Mungall, C. J., Leontis, N., Rocca-Serra, P., Ruttenberg, A., Sansone, S., Scheuermann, R. H., Shah, N., Whetzel, P. L., Lewis, S. 2007; 25 (11): 1251-1255

Abstract

The value of any kind of data is greatly enhanced when it exists in a form that allows it to be integrated with other data. One approach to integration is through the annotation of multiple bodies of data using common controlled vocabularies or 'ontologies'. Unfortunately, the very success of this approach has led to a proliferation of ontologies, which itself creates obstacles to integration. The Open Biomedical Ontologies (OBO) consortium is pursuing a strategy to overcome this problem. Existing OBO ontologies, including the Gene Ontology, are undergoing coordinated reform, and new ontologies are being created on the basis of an evolving set of shared principles governing ontology development. The result is an expanding family of ontologies designed to be interoperable and logically well formed and to incorporate accurate representations of biological reality. We describe this OBO Foundry initiative and provide guidelines for those who might wish to become involved.

View details for DOI 10.1038/nbt1346

View details for Web of Science ID 000251086500025

View details for PubMedID 17989687

View details for PubMedCentralID PMC2814061
Current progress in network research: toward reference networks for key model organisms BRIEFINGS IN BIOINFORMATICS Srinivasan, B. S., Shah, N. H., Flannick, J. A., Abeliuk, E., Novak, A. F., Batzoglou, S. 2007; 8 (5): 318-332

Abstract

The collection of multiple genome-scale datasets is now routine, and the frontier of research in systems biology has shifted accordingly. Rather than clustering a single dataset to produce a static map of functional modules, the focus today is on data integration, network alignment, interactive visualization and ontological markup. Because of the intrinsic noisiness of high-throughput measurements, statistical methods have been central to this effort. In this review, we briefly survey available datasets in functional genomics, review methods for data integration and network alignment, and describe recent work on using network models to guide experimental validation. We explain how the integration and validation steps spring from a Bayesian description of network uncertainty, and conclude by describing an important near-term milestone for systems biology: the construction of a set of rich reference networks for key model organisms.

View details for DOI 10.1093/bib/bbm038

View details for Web of Science ID 000251034700005

View details for PubMedID 17728341
Annotation and query of tissue microarray data using the NCI Thesaurus BMC BIOINFORMATICS Shah, N. H., Rubin, D. L., Espinosa, I., Montgomery, K., Musen, M. A. 2007; 8

Abstract

The Stanford Tissue Microarray Database (TMAD) is a repository of data serving a consortium of pathologists and biomedical researchers. The tissue samples in TMAD are annotated with multiple free-text fields, specifying the pathological diagnoses for each sample. These text annotations are not structured according to any ontology, making future integration of this resource with other biological and clinical data difficult.We developed methods to map these annotations to the NCI thesaurus. Using the NCI-T we can effectively represent annotations for about 86% of the samples. We demonstrate how this mapping enables ontology driven integration and querying of tissue microarray data. We have deployed the mapping and ontology driven querying tools at the TMAD site for general use.We have demonstrated that we can effectively map the diagnosis-related terms describing a sample in TMAD to the NCI-T. The NCI thesaurus terms have a wide coverage and provide terms for about 86% of the samples. In our opinion the NCI thesaurus can facilitate integration of this resource with other biological data.

View details for DOI 10.1186/1471-2105-8-296

View details for Web of Science ID 000249734300001

View details for PubMedID 17686183

View details for PubMedCentralID PMC1988837
Interpretation errors related to the GO annotation file format. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Moreira, D. A., Shah, N. H., Musen, M. A. 2007: 538-542

Abstract

The Gene Ontology (GO) is the most widely used ontology for creating biomedical annotations. GO annotations are statements associating a biological entity with a GO term. These statements comprise a large dataset of biological knowledge that is used widely in biomedical research. GO Annotations are available as "gene association files" from the GO website in a tab-delimited file format (GO Annotation File Format) composed of rows of 15 tab-delimited fields. This simple format lacks the knowledge representation (KR) capabilities to represent unambiguously semantic relationships between each field. This paper demonstrates that this KR shortcoming leads users to interpret the files in ways that can be erroneous. We propose a complementary format to represent GO annotation files as knowledge bases using the W3C recommended Web Ontology Language (OWL).

View details for PubMedID 18693894
Using annotations from controlled vocabularies to find meaningful associations 4th International Workshop on Data Integration in the Life Sciences Lee, W., Raschid, L., Srinivasan, P., Shah, N., Rubin, D., Noy, N. SPRINGER-VERLAG BERLIN. 2007: 247–263

View details for Web of Science ID 000250355600019
Searching Ontologies Based on Content: Experiments in the Biomedical Domain 4th International Conference on Knowledge Capture Alani, H., Noy, N. F., Shah, N., Shadbolt, N., Musen, M. A. ASSOC COMPUTING MACHINERY. 2007: 55–62

View details for Web of Science ID 000266369400009
A case study in pathway knowledgebase verification BMC BIOINFORMATICS Racunas, S. A., Shah, N. H., Fedoroff, N. V. 2006; 7

Abstract

Biological databases and pathway knowledge-bases are proliferating rapidly. We are developing software tools for computer-aided hypothesis design and evaluation, and we would like our tools to take advantage of the information stored in these repositories. But before we can reliably use a pathway knowledge-base as a data source, we need to proofread it to ensure that it can fully support computer-aided information integration and inference.We design a series of logical tests to detect potential problems we might encounter using a particular knowledge-base, the Reactome database, with a particular computer-aided hypothesis evaluation tool, HyBrow. We develop an explicit formal language from the language implicit in the Reactome data format and specify a logic to evaluate models expressed using this language. We use the formalism of finite model theory in this work. We then use this logic to formulate tests for desirable properties (such as completeness, consistency, and well-formedness) for pathways stored in Reactome. We apply these tests to the publicly available Reactome releases (releases 10 through 14) and compare the results, which highlight Reactome's steady improvement in terms of decreasing inconsistencies. We also investigate and discuss Reactome's potential for supporting computer-aided inference tools.The case study described in this work demonstrates that it is possible to use our model theory based approach to identify problems one might encounter using a knowledge-base to support hypothesis evaluation tools. The methodology we use is general and is in no way restricted to the specific knowledge-base employed in this case study. Future application of this methodology will enable us to compare pathway resources with respect to the generic properties such resources will need to possess if they are to support automated reasoning.

View details for DOI 10.1186/1471-2105-7-196

View details for Web of Science ID 000239302400001

View details for PubMedID 16603083

View details for PubMedCentralID PMC1522024
Ontology-based annotation and query of tissue microarray data. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Shah, N. H., Rubin, D. L., Supekar, K. S., Musen, M. A. 2006: 709-713

Abstract

The Stanford Tissue Microarray Database (TMAD) is a repository of data amassed by a consortium of pathologists and biomedical researchers. The TMAD data are annotated with multiple free-text fields, specifying the pathological diagnoses for each tissue sample. These annotations are spread out over multiple text fields and are not structured according to any ontology, making it difficult to integrate this resource with other biological and clinical data. We developed methods to map these annotations to the NCI thesaurus and the SNOMED-CT ontologies. Using these two ontologies we can effectively represent about 80% of the annotations in a structured manner. This mapping offers the ability to perform ontology driven querying of the TMAD data. We also found that 40% of annotations can be mapped to terms from both ontologies, providing the potential to align the two ontologies based on experimental data. Our approach provides the basis for a data-driven ontology alignment by mapping annotations of experimental data.

View details for PubMedID 17238433
Temporal evolution of the Arabidopsis oxidative stress response PLANT MOLECULAR BIOLOGY Mahalingam, R., Shah, N., Scrymgeour, A., Fedoroff, N. 2005; 57 (5): 709-730

Abstract

We have carried out a detailed analysis of the changes in gene expression levels in Arabidopsis thaliana ecotype Columbia (Col-0) plants during and for 6 h after exposure to ozone (O3) at 350 parts per billion (ppb) for 6 h. This O3 exposure is sufficient to induce a marked transcriptional response and an oxidative burst, but not to cause substantial tissue damage in Col-0 wild-type plants and is within the range encountered in some major metropolitan areas. We have developed analytical and visualization tools to automate the identification of expression profile groups with common gene ontology (GO) annotations based on the sub-cellular localization and function of the proteins encoded by the genes, as well as to automate promoter analysis for such gene groups. We describe application of these methods to identify stress-induced genes whose transcript abundance is likely to be controlled by common regulatory mechanisms and summarized our findings in a temporal model of the stress response.

View details for DOI 10.1007/s11103-005-2860-4

View details for Web of Science ID 000231220400007

View details for PubMedID 15988565
HyBrow: a prototype system for computer-aided hypothesis evaluation. Bioinformatics Racunas, S. A., Shah, N. H., Albert, I., Fedoroff, N. V. 2004; 20: i257-64

Abstract

Experimental design, hypothesis-testing and model-building in the current data-rich environment require the biologists' to collect, evaluate and integrate large amounts of information of many disparate kinds. Developing a unified framework for the representation and conceptual integration of biological data and processes is a major challenge in bioinformatics because of the variety of available data and the different levels of detail at which biological processes can be considered.We have developed the HyBrow (Hypothesis Browser) system as a prototype bioinformatics tool for designing hypotheses and evaluating them for consistency with existing knowledge. HyBrow consists of a modeling framework with the ability to accommodate diverse biological information sources, an event-based ontology for representing biological processes at different levels of detail, a database to query information in the ontology and programs to perform hypothesis design and evaluation. We demonstrate the HyBrow prototype using the galactose gene network in Saccharomyces cerevisiae as our test system, and evaluate alternative hypotheses for consistency with stored information.www.hybrow.org

View details for PubMedID 15262807
HyBrow: a prototype system for computer-aided hypothesis evaluation BIOINFORMATICS Racunas, S. A., Shah, N. H., Albert, I., Fedoroff, N. V. 2004; 20: 257-264

View details for DOI 10.1093/bioinformatics/bth905

View details for Web of Science ID 000208392400034
CLENCH: a program for calculating Cluster ENriCHment using the Gene Ontology BIOINFORMATICS Shah, N. H., Fedoroff, N. V. 2004; 20 (7): 1196-1197

Abstract

Analysis of microarray data most often produces lists of genes with similar expression patterns, which are then subdivided into functional categories for biological interpretation. Such functional categorization is most commonly accomplished using Gene Ontology (GO) categories. Although there are several programs that identify and analyze functional categories for human, mouse and yeast genes, none of them accept Arabidopsis thaliana data. In order to address this need for A.thaliana community, we have developed a program that retrieves GO annotations for A.thaliana genes and performs functional category analysis for lists of genes selected by the user.http://www.personal.psu.edu/nhs109/Clench

View details for DOI 10.1093/bioinformatics/bth056

View details for Web of Science ID 000221139700024

View details for PubMedID 14764555
A finite model theory for biological hypotheses IEEE Computational Systems Bioinformatics Conference (CSB 2004) Racunas, S., Griffin, C., Shah, N. IEEE COMPUTER SOC. 2004: 616–620

View details for Web of Science ID 000224127800106
A tool-kit for cDNA microarray and promoter analysis BIOINFORMATICS Shah, N. H., King, D. C., Shah, P. N., Fedoroff, N. V. 2003; 19 (14): 1846-1848

Abstract

We describe two sets of programs for expediting routine tasks in analysis of cDNA microarray data and promoter sequences. The first set permits bad data points to be flagged with respect to a number of parameters and performs normalization in three different ways. It allows combining of result files into comprehensive data sets, evaluation of the quality of both technical and biological replicates and row and/or column standardization of data matrices. The second set supports mapping ESTs in the genome, identifying the corresponding genes and recovering their promoters, analyzing promoters for transcription factor binding sites, and visual representation of the results. The programs are designed primarily for Arabidopsis thaliana researchers, but can be adapted readily for other model systems. Availability and Supplementary information: http://www.personal.psu.edu/nhs109/Programs/

View details for DOI 10.1093/bioinformatics/btg253

View details for Web of Science ID 000185701100017

View details for PubMedID 14512358
A contradiction-based framework for testing gene regulation hypotheses 2nd International Computational Systems Bioinformatics Conference Racunas, S., Shah, N., Fedoroff, N. V. IEEE COMPUTER SOC. 2003: 634–638

View details for Web of Science ID 000188997700143
Characterizing the stress/defense transcriptome of Arabidopsis GENOME BIOLOGY Mahalingam, R., Gomez-Buitrago, A., Eckardt, N., Shah, N., Guevara-Garcia, A., Day, P., Raina, R., Fedoroff, N. V. 2003; 4 (3)

Abstract

To understand the gene networks that underlie plant stress and defense responses, it is necessary to identify and characterize the genes that respond both initially and as the physiological response to the stress or pathogen develops. We used PCR-based suppression subtractive hybridization to identify Arabidopsis genes that are differentially expressed in response to ozone, bacterial and oomycete pathogens and the signaling molecules salicylic acid (SA) and jasmonic acid.We identified a total of 1,058 differentially expressed genes from eight stress cDNA libraries. Digital northern analysis revealed that 55% of the stress-inducible genes are rarely transcribed in unstressed plants and 17% of them were not previously represented in Arabidopsis expressed sequence tag databases. More than two-thirds of the genes in the stress cDNA collection have not been identified in previous studies as stress/defense response genes. Several stress-responsive cis-elements showed a statistically significant over-representation in the promoters of the genes in the stress cDNA collection. These include W- and G-boxes, the SA-inducible element, the abscisic acid response element and the TGA motif.The stress cDNA collection comprises a broad repertoire of stress-responsive genes encoding proteins that are involved in both the initial and subsequent stages of the physiological response to abiotic stress and pathogens. This set of stress-, pathogen- and hormone-modulated genes is an important resource for understanding the genetic interactions underlying stress signaling and responses and may contribute to the characterization of the stress transcriptome through the construction of standardized specialized arrays.

View details for Web of Science ID 000182694200009

View details for PubMedID 12620105
StressDB: A locally installable web-based relational microarray database designed for small user communities COMPARATIVE AND FUNCTIONAL GENOMICS Mitra, M., Shah, N., Mueller, L., Pin, S., Fedoroff, N. 2002; 3 (2): 91-96

Abstract

We have built a microarray database, StressDB, for management of microarray data from our studies on stress-modulated genes in Arabidopsis. StressDB provides small user groups with a locally installable web-based relational microarray database. It has a simple and intuitive architecture and has been designed for cDNA microarray technology users. StressDB uses Windows 2000 as the centralized database server with Oracle 8i as the relational database management system. It allows users to manage microarray data and data-related biological information over the Internet using a web browser. The source-code is currently available on request from the authors and will soon be made freely available for downloading from our website athttp://arastressdb.cac.psu.edu.

View details for DOI 10.1002/cfg.153

View details for Web of Science ID 000175388900001

View details for PubMedID 18628845

View details for PubMedCentralID PMC2447266

Nigam H. Shah, MBBS, PhD

Professor of Medicine (Biomedical Informatics) and of Biomedical Data Science

Medicine - Biomedical Informatics Research

Web page: http://web.stanford.edu/~nigam

Bio

Academic Appointments

Administrative Appointments

Honors & Awards

Professional Education

Contact

Additional Info

Links

Current Research and Scholarly Interests

2025-26 Courses

2024-25 Courses

2023-24 Courses

2022-23 Courses

Stanford Advisees

Graduate and Fellowship Programs

All Publications

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract