Bio


Dr. Suzanne Tamang an Assistant Professor in the Department of Medicine, Division of Immunology and Rheumatology and a Faculty Fellow at the Stanford Center for Population Health Sciences. She is also the Computation Systems Evaluation Lead at the VA Office of Mental Health and Suicide Prevention's Program Evaluation Resource Center. Dr. Tamang uses her training in biology, computer science, health services research and biomedical informatics to work with interdisciplinary teams of experts on population health problems of public interest. Integral to her research, is the analysis of large and complex population-based datasets, using techniques from natural language processing, machine learning and deep learning. Her expertise spans US and Danish population-based registries, Electronic Medical Records from various vendors, administrative healthcare claims and other types of observational health and demographic data sources in the US and internationally; also, constructing, populating and applying knowledge-bases for automated reasoning. Dr. Tamang has developed open-source tools for the extraction of health information from unstructured free-text clinical progress notes and licensed machine learning prediction models to Silicon Valley health analytics startups. She is the faculty mentor for the Stanford community working group Stats for Social Good.

Academic Appointments


Administrative Appointments


  • Computational Systems Evaluation Lead, Program Evaluation Resource Center, Office of Mental Health and Suicide Prevention, VA (2023 - Present)
  • Faculty Fellow, Center for Population Health Sciences (2023 - Present)

Professional Education


  • Postdoctoral Training, Stanford School of Medicine, Biomedical Informatics (2015)
  • Doctor of Philosophy, Graduate Center, City University of New York (CUNY), Computer Science (2013)
  • Master of Science, Brooklyn College, CUNY, Computer Science and Health Science (2006)
  • Bachelor of Science, Brooklyn College, CUNY, Biology

2024-25 Courses


Stanford Advisees


All Publications


  • Disparities in Cancer Mortality among Disaggregated Asian American Subpopulations, 2018-2021. Journal of racial and ethnic health disparities Zhu, D. T., Lai, A., Park, A., Zhong, A., Tamang, S. 2024

    Abstract

    Federal, state, and institutional data collection practices and analyses involving Asian Americans as a single, aggregated group obscure critical health disparities among the vast diversity of Asian American subpopulations. Using from the Centers for Disease Control and Prevention Wide-Ranging Online Data for Epidemiologic Research (CDC WONDER) Underlying Causes of Death database, we conducted a cross-sectional study using data on disaggregated Asian American subgroups (Asian Indian, Chinese, Filipino, Japanese, Korean, Vietnamese, other Asians) between 2018 and 2021. We examine deaths from 22 cancer types and in situ, benign neoplasms, identified using ICD-10 codes C00-C97 and D00-D48. Overall, our study comprised 327,311 Asian American decedents, with a mean age of death at 70.57 years (SD=2.79), wherein females accounted for approximately half of the sample (n=36,596/73,207; 49.99%). Notably, compared to the aggregated Asian American reference group, we found higher proportions of deaths from total cancers among Chinese (25.99% vs. 22.37% [ref]), Korean (25.29% vs. 22.37% [ref]), and Vietnamese (24.98% vs. 22.37% [ref]) subgroups. In contrast, total cancer deaths were less prevalent among Asian Indians (17.49% vs. 22.37% [ref]), Japanese (18.90% vs. 22.37% [ref]), and other Asians (20.37% vs. 22.37% [ref]). We identified further disparities by cancer type, sex, and age. Disaggregated data collection and analyses are imperative to understanding differences in cancer mortality among Asian American subgroups, illustrating at-risk populations with greater granularity. Future studies should aim to describe the association between these trends and social, demographic, and environmental risk factors.

    View details for DOI 10.1007/s40615-024-02067-0

    View details for PubMedID 38918322

    View details for PubMedCentralID 5325676

  • Evaluating accuracy and fairness of clinical decision support algorithms when health care resources are limited. Journal of biomedical informatics Meerwijk, E. L., Mcelfresh, D. C., Martins, S., Tamang, S. R. 2024: 104664

    Abstract

    Guidance on how to evaluate accuracy and algorithmic fairness across subgroups is missing for clinical models that flag patients for an intervention but when health care resources to administer that intervention are limited. We aimed to propose a framework of metrics that would fit this specific use case.We evaluated the following metrics and applied them to a Veterans Health Administration clinical model that flags patients for intervention who are at risk of overdose or a suicidal event among outpatients who were prescribed opioids (N = 405,817): Receiver - Operating Characteristic and area under the curve, precision - recall curve, calibration - reliability curve, false positive rate, false negative rate, and false omission rate. In addition, we developed a new approach to visualize false positives and false negatives that we named 'per true positive bars.' We demonstrate the utility of these metrics to our use case for three cohorts of patients at the highest risk (top 0.5 %, 1.0 %, and 5.0 %) by evaluating algorithmic fairness across the following age groups: <30, <50, <=65, and > 65 years old.Metrics that allowed us to assess group differences more clearly were the false positive rate, false negative rate, false omission rate, and the new 'per true positive bars'. Metrics with limited utility to our use case were the Receiver - Operating Characteristic and area under the curve, the calibration - reliability curve, and the precision - recall curve CONCLUSION: There is no "one size fits all" approach to model performance monitoring and bias analysis. Our work informs future researchers and clinicians who seek to evaluate accuracy and fairness of predictive models that identify patients to intervene on in the context of limited health care resources. In terms of ease of interpretation and utility for our use case, the new 'per true positive bars' may be the most intuitive to a range of stakeholders and facilitates choosing a threshold that allows weighing false positives against false negatives, which is especially important when predicting severe adverse events.

    View details for DOI 10.1016/j.jbi.2024.104664

    View details for PubMedID 38851413

  • Question-answering system extracts information on injection drug use from clinical notes. Communications medicine Mahbub, M., Goethert, I., Danciu, I., Knight, K., Srinivasan, S., Tamang, S., Rozenberg-Ben-Dror, K., Solares, H., Martins, S., Trafton, J., Begoli, E., Peterson, G. D. 2024; 4 (1): 61

    Abstract

    Injection drug use (IDU) can increase mortality and morbidity. Therefore, identifying IDU early and initiating harm reduction interventions can benefit individuals at risk. However, extracting IDU behaviors from patients' electronic health records (EHR) is difficult because there is no other structured data available, such as International Classification of Disease (ICD) codes, and IDU is most often documented in unstructured free-text clinical notes. Although natural language processing can efficiently extract this information from unstructured data, there are no validated tools.To address this gap in clinical information, we design a question-answering (QA) framework to extract information on IDU from clinical notes for use in clinical operations. Our framework involves two main steps: (1) generating a gold-standard QA dataset and (2) developing and testing the QA model. We use 2323 clinical notes of 1145 patients curated from the US Department of Veterans Affairs (VA) Corporate Data Warehouse to construct the gold-standard dataset for developing and evaluating the QA model. We also demonstrate the QA model's ability to extract IDU-related information from temporally out-of-distribution data.Here, we show that for a strict match between gold-standard and predicted answers, the QA model achieves a 51.65% F1 score. For a relaxed match between the gold-standard and predicted answers, the QA model obtains a 78.03% F1 score, along with 85.38% Precision and 79.02% Recall scores. Moreover, the QA model demonstrates consistent performance when subjected to temporally out-of-distribution data.Our study introduces a QA framework designed to extract IDU information from clinical notes, aiming to enhance the accurate and efficient detection of people who inject drugs, extract relevant information, and ultimately facilitate informed patient care.

    View details for DOI 10.1038/s43856-024-00470-6

    View details for PubMedID 38570620

    View details for PubMedCentralID PMC10991373

  • Disaggregating Asian-American Mortality in Drug-Related Overdoses and Behavioral Disorders: A Cross-Sectional Study. Journal of racial and ethnic health disparities Zhu, D. T., Zhong, A., Ho, W. J., Tamang, S. 2024

    Abstract

    Asian Americans have been historically underrepresented in the national drug overdose discourse due to their lower substance use and overdose rates compared to other racial/ethnic groups. However, aggregated analyses fail to capture the vast diversity among Asian-American subgroups, obscuring critical disparities. We conducted a cross-sectional study between 2018 and 2021 examining Asian-American individuals within the CDC WONDER database with drug overdoses as the underlying cause of death (n = 3195; ICD-10 codes X40-X44, X60-X64, X85, and Y10-Y14) or psychoactive substance-related mental and behavioral disorders as one of multiple causes of death (n = 15,513; ICD-10 codes F10-F19). Proportional mortality ratios were calculated, comparing disaggregated Asian-American subgroups to the reference group (Asian Americans as a single aggregate group). Z-tests identified significant differences between subgroups. Compared to the reference group (0.99%), drug overdose deaths were less prevalent among Japanese (0.46%; p < 0.001), Chinese (0.47%; p < 0.001), and Filipino (0.82%; p < 0.001) subgroups, contrasting with a higher prevalence among Asian Indian (1.20%; p < 0.001), Vietnamese (1.35%; p < 0.001), Korean (1.36%; p < 0.001), and other Asian (1.79%; p < 0.001) subgroups. Similarly, compared to the reference group (4.80%), deaths from mental and behavioral disorders were less prevalent among Chinese (3.18%; p < 0.001), Filipino (4.52%; p < 0.001), and Asian Indian (4.56%; p < 0.001) subgroups, while more prevalent among Korean (5.60%; p < 0.001), Vietnamese (5.64%; p < 0.001), Japanese (5.81%; p < 0.001), and other Asian (6.14%; p < 0.001) subgroups. Disaggregated data also revealed substantial geographical variations in these deaths obscured by aggregated analyses. Our findings revealed pronounced intra-racial disparities, underscoring the importance of data disaggregation to inform targeted clinical and public health interventions.

    View details for DOI 10.1007/s40615-024-01983-5

    View details for PubMedID 38530623

    View details for PubMedCentralID 9539247

  • High dimensional predictions of suicide risk in 4.2 million US Veterans using ensemble transfer learning. Scientific reports Dhaubhadel, S., Ganguly, K., Ribeiro, R. M., Cohn, J. D., Hyman, J. M., Hengartner, N. W., Kolade, B., Singley, A., Bhattacharya, T., Finley, P., Levin, D., Thelen, H., Cho, K., Costa, L., Ho, Y. L., Justice, A. C., Pestian, J., Santel, D., Zamora-Resendiz, R., Crivelli, S., Tamang, S., Martins, S., Trafton, J., Oslin, D. W., Beckham, J. C., Kimbrel, N. A., McMahon, B. H. 2024; 14 (1): 1793

    Abstract

    We present an ensemble transfer learning method to predict suicide from Veterans Affairs (VA) electronic medical records (EMR). A diverse set of base models was trained to predict a binary outcome constructed from reported suicide, suicide attempt, and overdose diagnoses with varying choices of study design and prediction methodology. Each model used twenty cross-sectional and 190 longitudinal variables observed in eight time intervals covering 7.5 years prior to the time of prediction. Ensembles of seven base models were created and fine-tuned with ten variables expected to change with study design and outcome definition in order to predict suicide and combined outcome in a prospective cohort. The ensemble models achieved c-statistics of 0.73 on 2-year suicide risk and 0.83 on the combined outcome when predicting on a prospective cohort of [Formula: see text] 4.2 M veterans. The ensembles rely on nonlinear base models trained using a matched retrospective nested case-control (Rcc) study cohort and show good calibration across a diversity of subgroups, including risk strata, age, sex, race, and level of healthcare utilization. In addition, a linear Rcc base model provided a rich set of biological predictors, including indicators of suicide, substance use disorder, mental health diagnoses and treatments, hypoxia and vascular damage, and demographics.

    View details for DOI 10.1038/s41598-024-51762-9

    View details for PubMedID 38245528

    View details for PubMedCentralID 5319859

  • Development of a 3-Step theory of suicide ontology to facilitate 3ST factor extraction from clinical progress notes. Journal of biomedical informatics Meerwijk, E. L., Jones, G. A., Shotqara, A. S., Reyes, S., Tamang, S. R., Eddington, H. S., Reeves, R. M., Finlay, A. K., Harris, A. H. 2023: 104582

    Abstract

    OBJECTIVE: Suicide risk prediction algorithms at the Veterans Health Administration (VHA) do not include predictors based on the 3-Step Theory of suicide (3ST), which builds on hopelessness, psychological pain, connectedness, and capacity for suicide. These four factors are not available from structured fields in VHA electronic health records, but they are found in unstructured clinical text. An ontology and controlled vocabulary that maps psychosocial and behavioral terms to these factors does not exist. The objectives of this study were 1) to develop an ontology with a controlled vocabulary of terms that map onto classes that represent the 3ST factors as identified within electronic clinical progress notes, and 2) to determine the accuracy of automated extractions based on terms in the controlled vocabulary.METHODS: A team of four annotators did linguistic annotation of 30,000 clinical progress notes from 231 Veterans in VHA electronic health records who attempted suicide or who died by suicide for terms relating to the 3ST factors. Annotation involved manually assigning a label to words or phrases that indicated presence or absence of the factor (polarity). These words and phrases were entered into a controlled vocabulary that was then used by our computational system to tag 14 million clinical progress notes from Veterans who attempted or died by suicide after 2013. Tagged text was extracted and machine-labelled for presence or absence of the 3ST factors. Accuracy of these machine-labels was determined for 1000 randomly selected extractions for each factor against a ground truth created by our annotators.RESULTS: Linguistic annotation identified 8486 terms that related to 33 subclasses across the four factors and polarities. Precision of machine-labeled extractions ranged from 0.73 to 1.00 for most factor-polarity combinations, whereas recall was somewhat lower 0.65-0.91.CONCLUSION: The ontology that was developed consists of classes that represent each of the four 3ST factors, subclasses, relationships, and terms that map onto those classes which are stored in a controlled vocabulary (https://bioportal.bioontology.org/ontologies/THREE-ST). The use case that we present shows how scores based on clinical notes tagged for terms in the controlled vocabulary capture meaningful change in the 3ST factors during weeks preceding a suicidal event.

    View details for DOI 10.1016/j.jbi.2023.104582

    View details for PubMedID 38160758

  • Deploying a national clinical text processing infrastructure. Journal of the American Medical Informatics Association : JAMIA McManus, K. F., Stringer, J. M., Corson, N., Fodeh, S., Steinhardt, S., Levin, F. L., Shotqara, A. S., D'Auria, J., Fielstein, E. M., Gobbel, G. T., Scott, J., Trafton, J. A., Taddei, T. H., Erdos, J., Tamang, S. R. 2023

    Abstract

    OBJECTIVES: Clinical text processing offers a promising avenue for improving multiple aspects of healthcare, though operational deployment remains a substantial challenge. This case report details the implementation of a national clinical text processing infrastructure within the Department of Veterans Affairs (VA).METHODS: Two foundational use cases, cancer case management and suicide and overdose prevention, illustrate how text processing can be practically implemented at scale for diverse clinical applications using shared services.RESULTS: Insights from these use cases underline both commonalities and differences, providing a replicable model for future text processing applications.CONCLUSIONS: This project enables more efficient initiation, testing, and future deployment of text processing models, streamlining the integration of these use cases into healthcare operations. This project implementation is in a large integrated health delivery system in the United States, but we expect the lessons learned to be relevant to any health system, including smaller local and regional health systems in the United States.

    View details for DOI 10.1093/jamia/ocad249

    View details for PubMedID 38146986

  • The Problem of Pain in Rheumatology: Variations in Case Definitions Derived From Chronic Pain Phenotyping Algorithms Using Electronic Health Records. The Journal of rheumatology Falasinnu, T., Nguyen, T., En Jiang, T., Tamang, S., Chaichian, Y., Darnall, B. D., Mackey, S., Simard, J. F., Chen, J. H. 2023

    Abstract

    The aim of this study was to investigate and compare different case definitions for chronic pain to provide estimates of possible misclassification when researchers are limited by available electronic health record and administrative claims data, allowing for greater precision in case definitions.We compared the prevalence of different case definitions for chronic pain (N = 3042) in patients with autoimmune rheumatic diseases. We estimated the prevalence of chronic pain based on 15 unique combinations of pain scores, diagnostic codes, analgesic medications, and pain interventions.Chronic pain prevalence was lowest in unimodal pain phenotyping algorithms: 15% using analgesic medications, 18% using pain scores, 21% using pain diagnostic codes, and 22% using pain interventions. In comparison, the prevalence using a well-validated phenotyping algorithm was 37%. The prevalence of chronic pain also increased with the increasing number (bimodal to quadrimodal) of phenotyping algorithms that comprised the multimodal phenotyping algorithms. The highest estimated chronic pain prevalence (47%) was the multimodal phenotyping algorithm that combined pain scores, diagnostic codes, analgesic medications, and pain interventions. However, this quadrimodal phenotyping algorithm yielded a 10% overestimation of chronic pain compared to the well-validated algorithm.This is the first empirical study to our knowledge that shows that established common modes of phenotyping chronic pain can lead to substantially varying estimates of the number of patients with chronic pain. These findings can be a reference for biases in case definitions for chronic pain and could be used to estimate the extent of possible misclassifications or corrections in using datasets that cannot include specific data elements.

    View details for DOI 10.3899/jrheum.2023-0416

    View details for PubMedID 38101917

  • Eliminating Algorithmic Racial Bias in Clinical Decision Support Algorithms: Use Cases from the Veterans Health Administration. Health equity List, J. M., Palevsky, P., Tamang, S., Crowley, S., Au, D., Yarbrough, W. C., Navathe, A. S., Kreisler, C., Parikh, R. B., Wang-Rodriguez, J., Klutts, J. S., Conlin, P., Pogach, L., Meerwijk, E., Moy, E. 2023; 7 (1): 809-816

    Abstract

    The Veterans Health Administration uses equity- and evidence-based principles to examine, correct, and eliminate use of potentially biased clinical equations and predictive models. We discuss the processes, successes, challenges, and next steps in four examples. We detail elimination of the race modifier for estimated kidney function and discuss steps to achieve more equitable pulmonary function testing measurement. We detail the use of equity lenses in two predictive clinical modeling tools: Stratification Tool for Opioid Risk Mitigation (STORM) and Care Assessment Need (CAN) predictive models. We conclude with consideration of ways to advance racial health equity in clinical decision support algorithms.

    View details for DOI 10.1089/heq.2023.0037

    View details for PubMedID 38076213

    View details for PubMedCentralID PMC10698768

  • Editorial: Artificial intelligence for human function and disability. Frontiers in digital health Newman-Griffis, D. R., Desmet, B., Zirikly, A., Tamang, S., Chang, C. H. 2023; 5: 1282287

    View details for DOI 10.3389/fdgth.2023.1282287

    View details for PubMedID 37744682

    View details for PubMedCentralID PMC10515276

  • The emerging fentanyl-xylazine syndemic in the USA: challenges and future directions. Lancet (London, England) Zhu, D. T., Friedman, J., Bourgois, P., Montero, F., Tamang, S. 2023

    View details for DOI 10.1016/S0140-6736(23)01686-0

    View details for PubMedID 37634523

  • A call for better validation of opioid overdose risk algorithms. Journal of the American Medical Informatics Association : JAMIA McElfresh, D. C., Chen, L., Oliva, E., Joyce, V., Rose, S., Tamang, S. 2023

    Abstract

    Clinical decision support (CDS) systems powered by predictive models have the potential to improve the accuracy and efficiency of clinical decision-making. However, without sufficient validation, these systems have the potential to mislead clinicians and harm patients. This is especially true for CDS systems used by opioid prescribers and dispensers, where a flawed prediction can directly harm patients. To prevent these harms, regulators and researchers have proposed guidance for validating predictive models and CDS systems. However, this guidance is not universally followed and is not required by law. We call on CDS developers, deployers, and users to hold these systems to higher standards of clinical and technical validation. We provide a case study on two CDS systems deployed on a national scale in the United States for predicting a patient's risk of adverse opioid-related events: the Stratification Tool for Opioid Risk Mitigation (STORM), used by the Veterans Health Administration, and NarxCare, a commercial system.

    View details for DOI 10.1093/jamia/ocad110

    View details for PubMedID 37428897

  • Promises and perils of the FDA's over-the-counter naloxone reclassification. Lancet regional health. Americas Zhu, D. T., Tamang, S., Humphreys, K. 2023; 23: 100518

    View details for DOI 10.1016/j.lana.2023.100518

    View details for PubMedID 37497396

  • Sarcoidosis rates in BCG-vaccinated and unvaccinated young adults: A natural experiment using Danish registers. Seminars in arthritis and rheumatism Baker, M. C., Vágó, E., Tamang, S., Horváth-Puhó, E., Sørensen, H. T. 2023; 60: 152205

    Abstract

    Sarcoidosis may have an infectious trigger, including Mycobacterium spp. The Bacille Calmette-Guérin (BCG) vaccine provides partial protection against tuberculosis and induces trained immunity. We examined the incidence rate (IR) of sarcoidosis in Danish individuals born during high BCG vaccine uptake (born before 1976) compared with individuals born during low BCG vaccine uptake (born in or after 1976).We performed a quasi-randomized registry-based incidence study using data from the Danish Civil Registration System and the Danish National Patient Registry between 1995 and 2016. We included individuals aged 25-35 years old and born between 1970 and 1981. Using Poisson regression models, we calculated the incidence rate ratio (IRR) of sarcoidosis in individuals born during low BCG vaccine uptake versus high BCG vaccine uptake, adjusting for age and calendar year (separately for men and women).The IR of sarcoidosis was increased for individuals born during low BCG vaccine uptake compared with individuals born during high BCG vaccine uptake, which was largely attributed to men. The IRR of sarcoidosis for men born during low BCG vaccine uptake versus high BCG vaccine uptake was 1.22 (95% confidence interval [CI] 1.02-1.45). In women, the IRR was 1.08 (95% CI 0.88-1.31).In this quasi-experimental study that minimizes confounding, the time period with high BCG vaccine uptake was associated with a lower incidence rate of sarcoidosis in men, with a similar effect seen in women that did not reach significance. Our findings support a potential protective effect of BCG vaccination against the development of sarcoidosis. Future interventional studies for high-risk individuals could be considered.

    View details for DOI 10.1016/j.semarthrit.2023.152205

    View details for PubMedID 37054583

  • Practical Considerations for Developing Clinical Natural Language Processing Systems for Population Health Management and Measurement. JMIR medical informatics Tamang, S., Humbert-Droz, M., Gianfrancesco, M., Izadi, Z., Schmajuk, G., Yazdany, J. 2023; 11: e37805

    Abstract

    Experts have noted a concerning gap between clinical natural language processing (NLP) research and real-world applications, such as clinical decision support. To help address this gap, in this viewpoint, we enumerate a set of practical considerations for developing an NLP system to support real-world clinical needs and improve health outcomes. They include determining (1) the readiness of the data and compute resources for NLP, (2) the organizational incentives to use and maintain the NLP systems, and (3) the feasibility of implementation and continued monitoring. These considerations are intended to benefit the design of future clinical NLP projects and can be applied across a variety of settings, including large health systems or smaller clinical practices that have adopted electronic medical records in the United States and globally.

    View details for DOI 10.2196/37805

    View details for PubMedID 36595345

  • Revelations from a Machine Learning Analysis of the Most Downloaded Articles Published in Journal of Palliative Medicine 1999-2018. Journal of palliative medicine Tamang, S., Jin, Z., Periyakoil, V. S. 2023; 26 (1): 13-16

    Abstract

    The Journal of Palliative Medicine (JPM) is globally recognized as a leading interdisciplinary peer-reviewed palliative care journal providing balanced information that informs and improves the practice of palliative care. JPM shapes the values, integrity, and standards of the subspecialty of palliative medicine by what it chooses to publish. The global JPM readership chooses to download the articles that are of most relevance and utility to them. Utilizing machine learning methods, the top 100 most downloaded articles in JPM were analyzed to gain a better understanding of any latent trends and patterns in the topics between 1999 and 2018. The top five topic themes identified in the first decade were different from the ones identified in the second decade of publication. There is evidence of differentiation and maturation of the field in the context of comprehensive health care. Although noncancer serious illnesses have still not risen to the same prominence as cancer palliation, there is a directional quality to the emerging evidence as it pertains to cardiac, respiratory, neurological, renal, and other etiologies. Across both decades under study, there was persistent evidence of the importance of understanding and managing the mental health care needs of seriously ill patients and their families. A cause for concern is that the word "spirituality" was prominent in the first decade and was lacking in the second. Future palliative care clinical and research initiatives should focus on its development as an essential interprofessional and medical subspecialty germane to all types of serious illnesses and across all venues.

    View details for DOI 10.1089/jpm.2022.0574

    View details for PubMedID 36607778

  • cpgQA: A Benchmark Dataset for Machine Reading Comprehension Tasks on Clinical Practice Guidelines and a Case Study Using Transfer Learning IEEE ACCESS Mahbub, M., Begoli, E., Martins, S., Peluso, A., Tamang, S., Peterson, G. 2023; 11: 3691-3705
  • Development and validation of MedDRA Tagger: a tool for extraction and structuring medical information from clinical notes. medRxiv : the preprint server for health sciences Humbert-Droz, M., Corley, J., Tamang, S., Gevaert, O. 2022

    Abstract

    Rapid and automated extraction of clinical information from patientsa notes is a desirable though difficult task. Natural language processing (NLP) and machine learning have great potential to automate and accelerate such applications, but developing such models can require a large amount of labeled clinical text, which can be a slow and laborious process. To address this gap, we propose the MedDRA tagger, a fast annotation tool that makes use of industrial level libraries such as spaCy, biomedical ontologies and weak supervision to annotate and extract clinical concepts at scale. The tool can be used to annotate clinical text and obtain labels for training machine learning models and further refine the clinical concept extraction performance, or to extract clinical concepts for observational study purposes. To demonstrate the usability and versatility of our tool, we present three different use cases: we use the tagger to determine patients with a primary brain cancer diagnosis, we show evidence of rising mental health symptoms at the population level and our last use case shows the evolution of COVID-19 symptomatology throughout three waves between February 2020 and October 2021. The validation of our tool showed good performance on both specific annotations from our development set (F1 score 0.81) and open source annotated data set (F1 score 0.79). We successfully demonstrate the versatility of our pipeline with three different use cases. Finally, we note that the modular nature of our tool allows for a straightforward adaptation to another biomedical ontology. We also show that our tool is independent of EHR system, and as such generalizable.

    View details for DOI 10.1101/2022.12.14.22283470

    View details for PubMedID 36561189

  • Sarcoidosis incidence after mTOR inhibitor treatment. Seminars in arthritis and rheumatism Baker, M. C., Vago, E., Liu, Y., Lu, R., Tamang, S., Horvath-Puho, E., Sorensen, H. T. 2022; 57: 152102

    Abstract

    OBJECTIVE: Mechanistic target of rapamycin (mTOR) inhibitors are effective in animal models of granulomatous disease, but their benefit in sarcoidosis patients is unknown. We evaluated the incidence of sarcoidosis in patients treated with mTOR inhibitors versus calcineurin inhibitors.METHODS: This was a cohort study using the Optum Clinformatics Data Mart (CDM) Database (2003-2019), IBM MarketScan Research Database (2006-2016), and Danish health and administrative registries (1996-2018). Patients aged ≥18 years with ≥1 year continuous enrollment before and after kidney, liver, heart, or lung transplant treated with an mTOR inhibitor or calcineurin inhibitor were included. Patients diagnosed with sarcoidosis before, or up to 90 days after, transplant were excluded. The incidence of sarcoidosis by treatment group was calculated.RESULTS: In the Optum CDM/IBM MarketScan cohort, 1,898 patients were treated with an mTOR inhibitor (mean age 49 years; 34% female) and 9,894 patients were treated with a calcineurin inhibitor (mean age 50 years; 37% female). The mean follow-up in the mTOR inhibitor group was 1.1 years, with no incident sarcoidosis diagnosed. In the calcineurin inhibitor group, the mean follow-up was 2.2 years, with 12 incident sarcoidosis cases diagnosed. In the Danish cohort, 230 patients were treated with an mTOR inhibitor (mean age 49; 45% female), with no incident sarcoidosis diagnosed. There were 3,411 patients treated with a calcineurin inhibitor (mean age 45; 40% female), with 10 incident cases of sarcoidosis diagnosed.CONCLUSIONS: This study indicates a potential protective effect of mTOR inhibitor treatment compared with calcineurin inhibitor treatment against the development of sarcoidosis.

    View details for DOI 10.1016/j.semarthrit.2022.152102

    View details for PubMedID 36182721

  • Sarcoidosis in patients after solid organ transplantation treated with mtor inhibitors versus calcineurin inhibitors Vago, E. K., Baker, M. C., Tamang, S., Sorensen, H., Horvath-Puho, E. WILEY. 2022: 263-264
  • Sarcoidosis Incidence After mTOR Inhibitor Treatment Baker, M., Vago, E., Liu, Y., Lu, R., Tamang, S., Horvath-Puho, E., Sorensen, H. WILEY. 2022: 254-256
  • Sarcoidosis Rates in BCG-Vaccinated and Unvaccinated Young Adults: A Danish Register-Based Study Baker, M., Vago, E., Tamang, S., Horvath-Puho, E., Sorensen, H. WILEY. 2022: 2207-2208
  • Application of Natural Language Processing to Identify Varicella Zoster Infection in Clinical Notes Ho, A., Izadi, Z., Schmajuk, G., Yazdany, J., Tamang, S., Gianfrancesco, M. WILEY. 2022: 1455-1457
  • Suicide theory-guided natural language processing of clinical progress notes to improve prediction of veteran suicide risk: protocol for a mixed-method study. BMJ open Meerwijk, E. L., Tamang, S. R., Finlay, A. K., Ilgen, M. A., Reeves, R. M., Harris, A. H. 2022; 12 (8): e065088

    Abstract

    The state-of-the-art 3-step Theory of Suicide (3ST) describes why people consider suicide and who will act on their suicidal thoughts and attempt suicide. The central concepts of 3ST-psychological pain, hopelessness, connectedness, and capacity for suicide-are among the most important drivers of suicidal behaviour but they are missing from clinical suicide risk prediction models in use at the US Veterans Health Administration (VHA). These four concepts are not systematically recorded in structured fields of VHA's electronic healthcare records. Therefore, this study will develop a domain-specific ontology that will enable automated extraction of these concepts from clinical progress notes using natural language processing (NLP), and test whether NLP-based predictors for these concepts improve accuracy of existing VHA suicide risk prediction models.Our mixed-method study has an exploratory sequential design where a qualitative component (aim 1) will inform quantitative analyses (aims 2 and 3). For aim 1, subject matter experts will manually annotate progress notes of clinical encounters with veterans who attempted or died by suicide to develop a domain-specific ontology for the 3ST concepts. During aim 2, we will use NLP to machine-annotate clinical progress notes and derive longitudinal representations for each patient with respect to the presence and intensity of hopelessness, psychological pain, connectedness and capacity for suicide in temporal proximity of suicide attempts and deaths by suicide. These longitudinal representations will be evaluated during aim 3 for their ability to improve existing VHA prediction models of suicide and suicide attempts, STORM (Stratification Tool for Opioid Risk Mitigation) and REACHVET (Recovery Engagement and Coordination for Health - Veterans Enhanced Treatment).Ethics approval for this study was granted by the Stanford University Institutional Review Board and the Research and Development Committee of the VA Palo Alto Health Care System. Results of the study will be disseminated through several outlets, including peer-reviewed publications and presentations at national conferences.

    View details for DOI 10.1136/bmjopen-2022-065088

    View details for PubMedID 36002210

  • Preparing for the next pandemic via transfer learning from existing diseases with hierarchical multi-modal BERT: a study on COVID-19 outcome prediction. Scientific reports Agarwal, K., Choudhury, S., Tipirneni, S., Mukherjee, P., Ham, C., Tamang, S., Baker, M., Tang, S., Kocaman, V., Gevaert, O., Rallo, R., Reddy, C. K. 2022; 12 (1): 10748

    Abstract

    Developing prediction models for emerging infectious diseases from relatively small numbers of cases is a critical need for improving pandemic preparedness. Using COVID-19 as an exemplar, we propose a transfer learning methodology for developing predictive models from multi-modal electronic healthcare records by leveraging information from more prevalent diseases with shared clinical characteristics. Our novel hierarchical, multi-modal model ([Formula: see text]) integrates baseline risk factors from the natural language processing of clinical notes at admission, time-series measurements of biomarkers obtained from laboratory tests, and discrete diagnostic, procedure and drug codes. We demonstrate the alignment of [Formula: see text]'s predictions with well-established clinical knowledge about COVID-19 through univariate and multivariate risk factor driven sub-cohort analysis. [Formula: see text]'s superior performance over state-of-the-art methods shows that leveraging patient data across modalities and transferring prior knowledge from similar disorders is critical for accurate prediction of patient outcomes, and this approach may serve as an important tool in the early response to future pandemics.

    View details for DOI 10.1038/s41598-022-13072-w

    View details for PubMedID 35750878

  • A case for developing domain-specific vocabularies for extracting suicide factors from healthcare notes. Journal of psychiatric research Morrow, D., Zamora-Resendiz, R., Beckham, J. C., Kimbrel, N. A., Oslin, D. W., Tamang, S., Million Veteran Program Suicide Exemplar Work Group, Crivelli, S. 2022; 151: 328-338

    Abstract

    The onset and persistence of life events (LE) such as housing instability, job instability, and reduced social connection have been shown to increase risk of suicide. Predictive models for suicide risk have low sensitivity to many of these factors due to under-reporting in structured electronic health records (EHR) data. In this study, we show how natural language processing (NLP) can help identify LE in clinical notes at higher rates than reported medical codes. We compare domain-specific lexicons formulated from Unified Medical Language System (UMLS) selection, content analysis by subject matter experts (SME) and the Gravity Project, to data-driven expansion through contextual word embedding using Word2Vec. Our analysis covers EHR from the Veterans Affairs (VA) Corporate Data Warehouse (CDW) and measures the prevalence of LE across time for patients with known underlying cause of death in the National Death Index (NDI). We found that NLP methods had higher sensitivity of detecting LE relative to structured EHR (S-EHR) variables. We observed that, on average, suicide cases had higher rates of LE over time when compared to patients who died of non-suicide related causes with no previous history of diagnosed mental illness. When used to discriminate these outcomes, the inclusion of NLP derived variables increased the concentration of LE along the top 0.1%, 0.5% and 1% of predicted risk. LE were less informative when discriminating suicide death from non-suicide related death for patients with diagnosed mental illness.

    View details for DOI 10.1016/j.jpsychires.2022.04.009

    View details for PubMedID 35533516

  • Development of a natural language processing system for extracting rheumatoid arthritis outcomes from clinical notes using the national RISE registry. Arthritis care & research Humbert-Droz, M., Izadi, Z., Schmajuk, G., Gianfrancesco, M., Baker, M. C., Yazdany, J., Tamang, S. 2022

    Abstract

    OBJECTIVE: To accelerate the use of outcome measures in rheumatology, we developed and evaluated a natural language processing (NLP) pipeline for extracting these measures from free-text outpatient rheumatology notes within the ACR's Rheumatology Informatics System for Effectiveness (RISE) registry.METHODS: We included all patients in RISE (2015 to 2018). The NLP pipeline extracted scores corresponding to eight measures of RA disease activity (DA) and functional status (FS) documented in outpatient rheumatology notes. Score extraction performance was evaluated by chart review, and we assessed agreement with scores documented in structured data. We conducted an external validation of our NLP pipeline using data from rheumatology notes from an academic medical center that is not included in the RISE registry.RESULTS: We processed over 34 million notes from 854,628 patients, 158 practices, and 24 EHR systems from RISE. Manual chart review revealed a sensitivity, positive predictive value (PPV), and F1 score of 95%, 87%, and 91%, respectively. Substantial agreement was observed between scores extracted from RISE notes and scores derived from structured data (kappa: 0.43 - 0.68 among DA and 0.86-0.98 among FS measures). Inthe external validation, we found a sensitivity, PPV, and F1 score of 92%, 69%, and 79%, respectively.CONCLUSIONS: We developed an NLP pipeline to extract RA outcome measures from a national registry of notes from multiple EHR systems and found it to have good internal and external validity. This pipeline can facilitate measurement of clinical and patient reported outcomes for use in research and quality measurement.

    View details for DOI 10.1002/acr.24869

    View details for PubMedID 35157365

  • Unstructured clinical notes within the 24 hours since admission predict short, mid & long-term mortality in adult ICU patients. PloS one Mahbub, M., Srinivasan, S., Danciu, I., Peluso, A., Begoli, E., Tamang, S., Peterson, G. D. 1800; 17 (1): e0262182

    Abstract

    Mortality prediction for intensive care unit (ICU) patients is crucial for improving outcomes and efficient utilization of resources. Accessibility of electronic health records (EHR) has enabled data-driven predictive modeling using machine learning. However, very few studies rely solely on unstructured clinical notes from the EHR for mortality prediction. In this work, we propose a framework to predict short, mid, and long-term mortality in adult ICU patients using unstructured clinical notes from the MIMIC III database, natural language processing (NLP), and machine learning (ML) models. Depending on the statistical description of the patients' length of stay, we define the short-term as 48-hour and 4-day period, the mid-term as 7-day and 10-day period, and the long-term as 15-day and 30-day period after admission. We found that by only using clinical notes within the 24 hours of admission, our framework can achieve a high area under the receiver operating characteristics (AU-ROC) score for short, mid and long-term mortality prediction tasks. The test AU-ROC scores are 0.87, 0.83, 0.83, 0.82, 0.82, and 0.82 for 48-hour, 4-day, 7-day, 10-day, 15-day, and 30-day period mortality prediction, respectively. We also provide a comparative study among three types of feature extraction techniques from NLP: frequency-based technique, fixed embedding-based technique, and dynamic embedding-based technique. Lastly, we provide an interpretation of the NLP-based predictive models using feature-importance scores.

    View details for DOI 10.1371/journal.pone.0262182

    View details for PubMedID 34990485

  • Natural Language Processing Tool for Extraction of Patient-Reported Outcomes from a National Multi-Electronic Health Records Registry Humbert-Droz, M., Izadi, Z., Schmajuk, G., Gianfrancesco, M., Yazdany, J., Tamang, S. WILEY. 2021: 3955-3957
  • Association of alpha1-Blocker Receipt With 30-Day Mortality and Risk of Intensive Care Unit Admission Among Adults Hospitalized With Influenza or Pneumonia in Denmark. JAMA network open Thomsen, R. W., Christiansen, C. F., Heide-Jorgensen, U., Vogelstein, J. T., Vogelstein, B., Bettegowda, C., Tamang, S., Athey, S., Sorensen, H. T. 2021; 4 (2): e2037053

    Abstract

    Importance: Alpha 1-adrenergic receptor blocking agents (alpha1-blockers) have been reported to have protective benefits against hyperinflammation and cytokine storm syndrome, conditions that are associated with mortality in patients with coronavirus disease 2019 and other severe respiratory tract infections. However, studies of the association of alpha1-blockers with outcomes among human participants with respiratory tract infections are scarce.Objective: To examine the association between the receipt of alpha1-blockers and outcomes among adult patients hospitalized with influenza or pneumonia.Design, Setting, and Participants: This population-based cohort study used data from Danish national registries to identify individuals 40 years and older who were hospitalized with influenza or pneumonia between January 1, 2005, and November 30, 2018, with follow-up through December 31, 2018. In the main analyses, patients currently receiving alpha1-blockers were compared with those not receiving alpha1-blockers (defined as patients with no prescription for an alpha1-blocker filled within 365 days before the index date) and those currently receiving 5alpha-reductase inhibitors. Propensity scores were used to address confounding factors and to compute weighted risks, absolute risk differences, and risk ratios. Data were analyzed from April 21 to December 21, 2020.Exposures: Current receipt of alpha1-blockers compared with nonreceipt of alpha1-blockers and with current receipt of 5alpha-reductase inhibitors.Main Outcomes and Measures: Death within 30 days of hospital admission and risk of intensive care unit (ICU) admission.Results: A total of 528 467 adult patients (median age, 75.0 years; interquartile range, 64.4-83.6 years; 273 005 men [51.7%]) were hospitalized with influenza or pneumonia in Denmark between 2005 and 2018. Of those, 21 772 patients (4.1%) were currently receiving alpha1-blockers compared with a population of 22 117 patients not receiving alpha1-blockers who were weighted to the propensity score distribution of those receiving alpha1-blockers. In the propensity score-weighted analyses, patients receiving alpha1-blockers had lower 30-day mortality (15.9%) compared with patients not receiving alpha1-blockers (18.5%), with a corresponding risk difference of -2.7% (95% CI, -3.2% to -2.2%) and a risk ratio (RR) of 0.85 (95% CI, 0.83-0.88). The risk of ICU admission was 7.3% among patients receiving alpha1-blockers and 7.7% among those not receiving alpha1-blockers (risk difference, -0.4% [95% CI, -0.8% to 0%]; RR, 0.95 [95% CI, 0.90-1.00]). A comparison between 18 280 male patients currently receiving alpha1-blockers and 18 228 propensity score-weighted male patients currently receiving 5alpha-reductase inhibitors indicated that those receiving alpha1-blockers had lower 30-day mortality (risk difference, -2.0% [95% CI, -3.4% to -0.6%]; RR, 0.89 [95% CI, 0.82-0.96]) and a similar risk of ICU admission (risk difference, -0.3% [95% CI, -1.4% to 0.7%]; RR, 0.96 [95% CI, 0.83-1.10]).Conclusions and Relevance: This cohort study's findings suggest that the receipt of alpha1-blockers is associated with protective benefits among adult patients hospitalized with influenza or pneumonia.

    View details for DOI 10.1001/jamanetworkopen.2020.37053

    View details for PubMedID 33566109

  • Ten Rules for Conducting Retrospective Pharmacoepidemiological Analyses: Example COVID-19 Study. Frontiers in pharmacology Powell, M., Koenecke, A., Byrd, J. B., Nishimura, A., Konig, M. F., Xiong, R., Mahmood, S., Mucaj, V., Bettegowda, C., Rose, L., Tamang, S., Sacarny, A., Caffo, B., Athey, S., Stuart, E. A., Vogelstein, J. T. 2021; 12: 700776

    Abstract

    Since the beginning of the COVID-19 pandemic, pharmaceutical treatment hypotheses have abounded, each requiring careful evaluation. A randomized controlled trial generally provides the most credible evaluation of a treatment, but the efficiency and effectiveness of the trial depend on the existing evidence supporting the treatment. The researcher must therefore compile a body of evidence justifying the use of time and resources to further investigate a treatment hypothesis in a trial. An observational study can provide this evidence, but the lack of randomized exposure and the researcher's inability to control treatment administration and data collection introduce significant challenges. A proper analysis of observational health care data thus requires contributions from experts in a diverse set of topics ranging from epidemiology and causal analysis to relevant medical specialties and data sources. Here we summarize these contributions as 10 rules that serve as an end-to-end introduction to retrospective pharmacoepidemiological analyses of observational health care data using a running example of a hypothetical COVID-19 study. A detailed supplement presents a practical how-to guide for following each rule. When carefully designed and properly executed, a retrospective pharmacoepidemiological analysis framed around these rules will inform the decisions of whether and how to investigate a treatment hypothesis in a randomized controlled trial. This work has important implications for any future pandemic by prescribing what we can and should do while the world waits for global vaccine distribution.

    View details for DOI 10.3389/fphar.2021.700776

    View details for PubMedID 34393782

  • Application of Text Mining Methods to Identify Lupus Nephritis from Electronic Health Records Gianfrancesco, M., Tamang, S., Schmajuk, G., Yazdany, J. WILEY. 2020
  • Risk of primary urological and genital cancers following incident breast cancer: a Danish population-based cohort study. Breast cancer research and treatment Sundboll, J., Farkas, D. K., Adelborg, K., Schapira, L., Tamang, S., Norgaard, M., Cullen, M. R., Cronin-Fenton, D., Sorensen, H. T. 2020

    Abstract

    PURPOSE: The prevalence of breast cancer survivors has increased due to dissemination of population-based mammographic screening and improved treatments. Recent changes in anti-hormonal therapies for breast cancer may have modified the risks of subsequent urological and genital cancers. We examine the risk of subsequent primary urological and genital cancers in patients with incident breast cancer compared with risks in the general population.METHODS: Using population-based Danish medical registries, we identified a cohort of women with primary breast cancer (1990-2017). We followed them from one year after their breast cancer diagnosis until any subsequent urological or genital cancer diagnosis. We computed incidence rates and standardized incidence ratios (SIRs) with 95% confidence intervals (CIs) as the observed number of cancers relative to the expected number based on national incidence rates (by sex, age, and calendar year).RESULTS: Among 84,972 patients with breast cancer (median age 61years), we observed 623 urological cancers and 1397 genital cancers during a median follow-up of 7.4years. The incidence rate per 100,000 person-years was stable during follow-up (83 for urological cancers and 176 for genital cancers). The SIR was increased for ovarian cancer (1.37, 95% CI 1.23-1.52) and uterine cancer (1.37, 95% CI 1.25-1.50), but only during the pre-aromatase inhibitor era (before 2007). Moreover, the SIR of kidney cancer was increased (1.52, 95% CI 1.15-1.97), but only during 2007-2017. The SIR for urinary bladder cancer was marginally increased (1.15, 95% CI 1.04-1.28) with no temporal effects. No associations were observed for cervical cancer.CONCLUSION: Breast cancer survivors had higher risks of uterine and ovarian cancer than expected, but only before 2007, and of kidney cancer, but only after 2007. The risk of urinary bladder cancer was moderately increased without temporal effects, and we observed no association with cervical cancer.

    View details for DOI 10.1007/s10549-020-05879-w

    View details for PubMedID 32845432

  • A Machine Learning Approach to Identifying Changes in Suicidal Language. Suicide & life-threatening behavior Pestian, J., Santel, D., Sorter, M., Bayram, U., Connolly, B., Glauser, T., DelBello, M., Tamang, S., Cohen, K. 2020

    Abstract

    OBJECTIVE: With early identification and intervention, many suicidal deaths are preventable. Tools that include machine learning methods have been able to identify suicidal language. This paper examines the persistence of this suicidal language up to 30days after discharge from care.METHOD: In a multi-center study, 253 subjects were enrolled into either suicidal or control cohorts. Their responses to standardized instruments and interviews were analyzed using machine learning algorithms. Subjects were re-interviewed approximately 30days later, and their language was compared to the original language to determine the presence of suicidal ideation.RESULTS: The results show that language characteristics used to classify suicidality at the initial encounter are still present in the speech 30days later (AUC=89% (95% CI: 85-95%), p<.0001) and that algorithms trained on the second interviews could also identify the subjects that produced the first interviews (AUC=85% (95% CI: 81-90%), p<.0001).CONCLUSIONS: This approach explores the stability of suicidal language. When using advanced computational methods, the results show that a patient's language is similar 30days after first captured, while responses to standard measures change. This can be useful when developing methods that identify the data-based phenotype of a subject.

    View details for DOI 10.1111/sltb.12642

    View details for PubMedID 32484597

  • Risk of primary gastrointestinal cancers following incident non-metastatic breast cancer: a Danish population-based cohort study. BMJ open gastroenterology Adelborg, K., Farkas, D. K., Sundboll, J., Schapira, L., Tamang, S., Cullen, M. R., Cronin-Fenton, D., Sorensen, H. T. 2020; 7 (1)

    Abstract

    OBJECTIVE: We examined the risk of primary gastrointestinal cancers in women with breast cancer and compared this risk with that of the general population.DESIGN: Using population-based Danish registries, we conducted a cohort study of women with incident non-metastatic breast cancer (1990-2017). We computed cumulative cancer incidences and standardised incidence ratios (SIRs).RESULTS: Among 84972 patients with breast cancer, we observed 2340 gastrointestinal cancers. After 20 years of follow-up, the cumulative incidence of gastrointestinal cancers was 4%, driven mainly by colon cancers. Only risk of stomach cancer was continually increased beyond 1year following breast cancer. The SIR for colon cancer was neutral during 2-5 years of follow-up and approximately 1.2-fold increased thereafter. For cancer of the oesophagus, the SIR was increased only during 6-10 years. There was a weak association with pancreas cancer beyond 10 years. Between 1990-2006 and 2007-2017, the 1-10 years SIR estimate decreased and reached unity for upper gastrointestinal cancers (oesophagus, stomach, and small intestine). For lower gastrointestinal cancers (colon, rectum, and anal canal), the SIR estimate was increased only after 2007. No temporal effects were observed for the remaining gastrointestinal cancers. Treatment effects were negligible.CONCLUSION: Breast cancer survivors were at increased risk of oesophagus and stomach cancer, but only before 2007. The risk of colon cancer was increased, but only after 2007.

    View details for DOI 10.1136/bmjgast-2020-000413

    View details for PubMedID 32611556

  • Stress Disorders and Dementia in the Danish Population AMERICAN JOURNAL OF EPIDEMIOLOGY Gradus, J. L., Horvath-Puho, E., Lash, T. L., Ehrenstein, V., Tamang, S., Adler, N. E., Milstein, A., Glymour, M., Henderson, V. W., Sorensen, H. T. 2019; 188 (3): 493–99

    View details for DOI 10.1093/aje/kwy269

    View details for Web of Science ID 000467881700001

  • Using natural language processing to construct a metastatic breast cancer cohort from linked cancer registry and electronic medical records data. JAMIA open Ling, A. Y., Kurian, A. W., Caswell-Jin, J. L., Sledge, G. W., Shah, N. H., Tamang, S. R. 2019; 2 (4): 528–37

    Abstract

    Most population-based cancer databases lack information on metastatic recurrence. Electronic medical records (EMR) and cancer registries contain complementary information on cancer diagnosis, treatment and outcome, yet are rarely used synergistically. To construct a cohort of metastatic breast cancer (MBC) patients, we applied natural language processing techniques within a semisupervised machine learning framework to linked EMR-California Cancer Registry (CCR) data.We studied all female patients treated at Stanford Health Care with an incident breast cancer diagnosis from 2000 to 2014. Our database consisted of structured fields and unstructured free-text clinical notes from EMR, linked to CCR, a component of the Surveillance, Epidemiology and End Results Program (SEER). We identified de novo MBC patients from CCR and extracted information on distant recurrences from patient notes in EMR. Furthermore, we trained a regularized logistic regression model for recurrent MBC classification and evaluated its performance on a gold standard set of 146 patients.There were 11 459 breast cancer patients in total and the median follow-up time was 96.3 months. We identified 1886 MBC patients, 512 (27.1%) of whom were de novo MBC patients and 1374 (72.9%) were recurrent MBC patients. Our final MBC classifier achieved an area under the receiver operating characteristic curve (AUC) of 0.917, with sensitivity 0.861, specificity 0.878, and accuracy 0.870.To enable population-based research on MBC, we developed a framework for retrospective case detection combining EMR and CCR data. Our classifier achieved good AUC, sensitivity, and specificity without expert-labeled examples.

    View details for DOI 10.1093/jamiaopen/ooz040

    View details for PubMedID 32025650

    View details for PubMedCentralID PMC6994019

  • Stress Disorders and Dementia in the Danish Population. American journal of epidemiology Gradus, J. L., Horvath-Puho, E., Lash, T. L., Ehrenstein, V., Tamang, S., Adler, N. E., Milstein, A., Glymour, M. M., Henderson, V. W., Sorensen, H. T. 2018

    Abstract

    There is an association between stress and dementia. However, less is known about dementia among persons with varied stress responses and sex differences in these associations. This population-based cohort study examined dementia among persons with a range of clinician-diagnosed stress disorders, and the interaction between stress disorders and sex in predicting dementia, in Denmark from 1995 to 2011. This study included Danes 40 years or older with a stress disorder diagnosis (n=47,047) and a matched comparison cohort (n=232,141) without a stress disorder diagnosis from 1995 through 2011. Diagnoses were culled from national registries. We used Cox proportional-hazards regression to estimate associations between stress disorders and dementia. Risk of dementia was higher for persons with stress disorders than for persons without such diagnosis; adjusted hazard ratios ranged from 1.6 to 2.8. There was evidence of an interaction between sex and stress disorders in predicting dementia, with a greater rate of dementia among men with stress disorders except posttraumatic stress disorder, for which women had a greater rate. Results support existing evidence of an association between stress and dementia. This study contributes novel information regarding dementia risk across a range of stress responses, and interactions between stress disorders and sex.

    View details for PubMedID 30576420

  • Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data JAMA INTERNAL MEDICINE Gianfrancesco, M. A., Tamang, S., Yazdany, J., Schmajuk, G. 2018; 178 (11): 1544–47

    Abstract

    A promise of machine learning in health care is the avoidance of biases in diagnosis and treatment; a computer algorithm could objectively synthesize and interpret the data in the medical record. Integration of machine learning with clinical decision support tools, such as computerized alerts or diagnostic support, may offer physicians and others who provide health care targeted and timely information that can improve clinical decisions. Machine learning algorithms, however, may also be subject to biases. The biases include those related to missing data and patients not identified by algorithms, sample size and underestimation, and misclassification and measurement error. There is concern that biases and deficiencies in the data used by machine learning algorithms may contribute to socioeconomic disparities in health care. This Special Communication outlines the potential biases that may be introduced into machine learning-based clinical decision support tools that use electronic health record data and proposes potential solutions to the problems of overreliance on automation, algorithms based on biased data, and algorithms that do not provide information that is clinically meaningful. Existing health care disparities should not be amplified by thoughtless or excessive reliance on machines.

    View details for PubMedID 30128552

  • Scalable Electronic Phenotyping For Studying Patient Comorbidities. AMIA ... Annual Symposium proceedings. AMIA Symposium Ling, A. Y., Alsentzer, E., Chen, J., Banda, J. M., Tamang, S., Minty, E. 2018; 2018: 740–49

    Abstract

    Over 75 million Americans have multiple concurrent chronic conditions and medical decision making for these patients is mostly based on retrospective cohort studies. Current methods to generate cohorts of patients with comorbidities are neither scalable nor generalizable. We propose a supervised machine learning algorithm for learning comorbidity phenotypes without requiring manually created training sets. First, we generated myocardial infarction (MI) and type-2 diabetes (T2DM) patient cohorts using ICD9-based imperfectly labeled samples upon which LASSO logistic regression models were trained. Second, we assessed the effects of training sample size, inclusion of physician input, and inclusion of clinical text features on model performance. Using ICD9 codes as our labeling heuristic, we achieved comparable performance to models created using keywords as labeling heuristic. We found that expert input and higher training sample sizes could compensate for the lack of clinical text derived features. However, our best performing model included clinical text as features with a large training sample size.

    View details for PubMedID 30815116

  • Predicting patient 'cost blooms' in Denmark: a longitudinal population-based study. BMJ open Tamang, S., Milstein, A., Sørensen, H. T., Pedersen, L., Mackey, L., Betterton, J., Janson, L., Shah, N. 2017; 7 (1)

    Abstract

    To compare the ability of standard versus enhanced models to predict future high-cost patients, especially those who move from a lower to the upper decile of per capita healthcare expenditures within 1 year-that is, 'cost bloomers'.We developed alternative models to predict being in the upper decile of healthcare expenditures in year 2 of a sample, based on data from year 1. Our 6 alternative models ranged from a standard cost-prediction model with 4 variables (ie, traditional model features), to our largest enhanced model with 1053 non-traditional model features. To quantify any increases in predictive power that enhanced models achieved over standard tools, we compared the prospective predictive performance of each model.We used the population of Western Denmark between 2004 and 2011 (2 146 801 individuals) to predict future high-cost patients and characterise high-cost patient subgroups. Using the most recent 2-year period (2010-2011) for model evaluation, our whole-population model used a cohort of 1 557 950 individuals with a full year of active residency in year 1 (2010). Our cost-bloom model excluded the 155 795 individuals who were already high cost at the population level in year 1, resulting in 1 402 155 individuals for prediction of cost bloomers in year 2 (2011).Using unseen data from a future year, we evaluated each model's prospective predictive performance by calculating the ratio of predicted high-cost patient expenditures to the actual high-cost patient expenditures in Year 2-that is, cost capture.Our best enhanced model achieved a 21% and 30% improvement in cost capture over a standard diagnosis-based model for predicting population-level high-cost patients and cost bloomers, respectively.In combination with modern statistical learning methods for analysing large data sets, models enhanced with a large and diverse set of features led to better performance-especially for predicting future cost bloomers.

    View details for DOI 10.1136/bmjopen-2016-011580

    View details for PubMedID 28077408

    View details for PubMedCentralID PMC5253526

  • Enhanced Quality Measurement Event Detection: An Application to Physician Reporting. EGEMS (Washington, DC) Tamang, S. R., Hernandez-Boussard, T. n., Ross, E. G., Gaskin, G. n., Patel, M. I., Shah, N. H. 2017; 5 (1): 5

    Abstract

    The wide-scale adoption of electronic health records (EHR)s has increased the availability of routinely collected clinical data in electronic form that can be used to improve the reporting of quality of care. However, the bulk of information in the EHR is in unstructured form (e.g., free-text clinical notes) and not amenable to automated reporting. Traditional methods are based on structured diagnostic and billing data that provide efficient, but inaccurate or incomplete summaries of actual or relevant care processes and patient outcomes. To assess the feasibility and benefit of implementing enhanced EHR- based physician quality measurement and reporting, which includes the analysis of unstructured free- text clinical notes, we conducted a retrospective study to compare traditional and enhanced approaches for reporting ten physician quality measures from multiple National Quality Strategy domains. We found that our enhanced approach enabled the calculation of five Physician Quality and Performance System measures not measureable in billing or diagnostic codes and resulted in over a five-fold increase in event at an average precision of 88 percent (95 percent CI: 83-93 percent). Our work suggests that enhanced EHR-based quality measurement can increase event detection for establishing value-based payment arrangements and can expedite quality reporting for physician practices, which are increasingly burdened by the process of manual chart review for quality reporting.

    View details for PubMedID 29881731

  • New Paradigms for Patient-Centered Outcomes Research in Electronic Medical Records: An Example of Detecting Urinary Incontinence Following Prostatectomy. EGEMS (Washington, DC) Hernandez-Boussard, T., Tamang, S., Blayney, D., Brooks, J., Shah, N. 2016; 4 (3): 1231-?

    Abstract

    National initiatives to develop quality metrics emphasize the need to include patient-centered outcomes. Patient-centered outcomes are complex, require documentation of patient communications, and have not been routinely collected by healthcare providers. The widespread implementation of electronic medical records (EHR) offers opportunities to assess patient-centered outcomes within the routine healthcare delivery system. The objective of this study was to test the feasibility and accuracy of identifying patient centered outcomes within the EHR.Data from patients with localized prostate cancer undergoing prostatectomy were used to develop and test algorithms to accurately identify patient-centered outcomes in post-operative EHRs - we used urinary incontinence as the use case. Standard data mining techniques were used to extract and annotate free text and structured data to assess urinary incontinence recorded within the EHRs.A total 5,349 prostate cancer patients were identified in our EHR-system between 1998-2013. Among these EHRs, 30.3% had a text mention of urinary incontinence within 90 days post-operative compared to less than 1.0% with a structured data field for urinary incontinence (i.e. ICD-9 code). Our workflow had good precision and recall for urinary incontinence (positive predictive value: 0.73 and sensitivity: 0.84).Our data indicate that important patient-centered outcomes, such as urinary incontinence, are being captured in EHRs as free text and highlight the long-standing importance of accurate clinician documentation. Standard data mining algorithms can accurately and efficiently identify these outcomes in existing EHRs; the complete assessment of these outcomes is essential to move practice into the patient-centered realm of healthcare.

    View details for DOI 10.13063/2327-9214.1231

    View details for PubMedID 27347492

  • Detecting unplanned care from clinician notes in electronic health records. Journal of oncology practice / American Society of Clinical Oncology Tamang, S., Patel, M. I., Blayney, D. W., Kuznetsov, J., Finlayson, S. G., Vetteth, Y., Shah, N. 2015; 11 (3): e313-9

    Abstract

    Reduction in unplanned episodes of care, such as emergency department visits and unplanned hospitalizations, are important quality outcome measures. However, many events are only documented in free-text clinician notes and are labor intensive to detect by manual medical record review.We studied 308,096 free-text machine-readable documents linked to individual entries in our electronic health records, representing care for patients with breast, GI, or thoracic cancer, whose treatment was initiated at one academic medical center, Stanford Health Care (SHC). Using a clinical text-mining tool, we detected unplanned episodes documented in clinician notes (for non-SHC visits) or in coded encounter data for SHC-delivered care and the most frequent symptoms documented in emergency department (ED) notes.Combined reporting increased the identification of patients with one or more unplanned care visits by 32% (15% using coded data; 20% using all the data) among patients with 3 months of follow-up and by 21% (23% using coded data; 28% using all the data) among those with 1 year of follow-up. Based on the textual analysis of SHC ED notes, pain (75%), followed by nausea (54%), vomiting (47%), infection (36%), fever (28%), and anemia (27%), were the most frequent symptoms mentioned. Pain, nausea, and vomiting co-occur in 35% of all ED encounter notes.The text-mining methods we describe can be applied to automatically review free-text clinician notes to detect unplanned episodes of care mentioned in these notes. These methods have broad application for quality improvement efforts in which events of interest occur outside of a network that allows for patient data sharing.

    View details for DOI 10.1200/JOP.2014.002741

    View details for PubMedID 25980019

    View details for PubMedCentralID PMC4438112

  • Text Mining for Adverse Drug Events: the Promise, Challenges, and State of the Art DRUG SAFETY Harpaz, R., Callahan, A., Tamang, S., Low, Y., Odgers, D., Finlayson, S., Jung, K., LePendu, P., Shah, N. H. 2014; 37 (10): 777-790

    Abstract

    Text mining is the computational process of extracting meaningful information from large amounts of unstructured text. It is emerging as a tool to leverage underutilized data sources that can improve pharmacovigilance, including the objective of adverse drug event (ADE) detection and assessment. This article provides an overview of recent advances in pharmacovigilance driven by the application of text mining, and discusses several data sources-such as biomedical literature, clinical narratives, product labeling, social media, and Web search logs-that are amenable to text mining for pharmacovigilance. Given the state of the art, it appears text mining can be applied to extract useful ADE-related information from multiple textual sources. Nonetheless, further research is required to address remaining technical challenges associated with the text mining methodologies, and to conclusively determine the relative contribution of each textual source to improving pharmacovigilance.

    View details for DOI 10.1007/s40264-014-0218-z

    View details for Web of Science ID 000344615300005

    View details for PubMedCentralID PMC4217510