Dr. Suzanne Tamang an Assistant Professor in the Department of Medicine, Division of Immunology and Rheumatology and a Faculty Fellow at the Stanford Center for Population Health Sciences. She is also the Computation Systems Evaluation Lead at the VA Office of Mental Health and Suicide Prevention's Program Evaluation Resource Center. Dr. Tamang uses her training in biology, computer science, health services research and biomedical informatics to work with interdisciplinary teams of experts on population health problems of public interest. Integral to her research, is the analysis of large and complex population-based datasets, using techniques from natural language processing, machine learning and deep learning. Her expertise spans US and Danish population-based registries, Electronic Medical Records from various vendors, administrative healthcare claims and other types of observational health and demographic data sources in the US and internationally; also, constructing, populating and applying knowledge-bases for automated reasoning. Dr. Tamang has developed open-source tools for the extraction of health information from unstructured free-text clinical progress notes and licensed machine learning prediction models to Silicon Valley health analytics startups. She is the faculty mentor for the Stanford community working group Stats for Social Good.
Assistant Professor, Medicine - Immunology & Rheumatology
Computational Systems Evaluation Lead, Program Evaluation Resource Center, Office of Mental Health and Suicide Prevention, VA (2023 - Present)
Faculty Fellow, Center for Population Health Sciences (2023 - Present)
Postdoctoral Training, Stanford School of Medicine, Biomedical Informatics (2015)
Doctor of Philosophy, Graduate Center, City University of New York (CUNY), Computer Science (2013)
Master of Science, Brooklyn College, CUNY, Computer Science and Health Science (2006)
Bachelor of Science, Brooklyn College, CUNY, Biology
- The emerging fentanyl-xylazine syndemic in the USA: challenges and future directions. Lancet (London, England) 2023
A call for better validation of opioid overdose risk algorithms.
Journal of the American Medical Informatics Association : JAMIA
Clinical decision support (CDS) systems powered by predictive models have the potential to improve the accuracy and efficiency of clinical decision-making. However, without sufficient validation, these systems have the potential to mislead clinicians and harm patients. This is especially true for CDS systems used by opioid prescribers and dispensers, where a flawed prediction can directly harm patients. To prevent these harms, regulators and researchers have proposed guidance for validating predictive models and CDS systems. However, this guidance is not universally followed and is not required by law. We call on CDS developers, deployers, and users to hold these systems to higher standards of clinical and technical validation. We provide a case study on two CDS systems deployed on a national scale in the United States for predicting a patient's risk of adverse opioid-related events: the Stratification Tool for Opioid Risk Mitigation (STORM), used by the Veterans Health Administration, and NarxCare, a commercial system.
View details for DOI 10.1093/jamia/ocad110
View details for PubMedID 37428897
- Promises and perils of the FDA's over-the-counter naloxone reclassification. Lancet regional health. Americas 2023; 23: 100518
Sarcoidosis rates in BCG-vaccinated and unvaccinated young adults: A natural experiment using Danish registers.
Seminars in arthritis and rheumatism
2023; 60: 152205
Sarcoidosis may have an infectious trigger, including Mycobacterium spp. The Bacille Calmette-Guérin (BCG) vaccine provides partial protection against tuberculosis and induces trained immunity. We examined the incidence rate (IR) of sarcoidosis in Danish individuals born during high BCG vaccine uptake (born before 1976) compared with individuals born during low BCG vaccine uptake (born in or after 1976).We performed a quasi-randomized registry-based incidence study using data from the Danish Civil Registration System and the Danish National Patient Registry between 1995 and 2016. We included individuals aged 25-35 years old and born between 1970 and 1981. Using Poisson regression models, we calculated the incidence rate ratio (IRR) of sarcoidosis in individuals born during low BCG vaccine uptake versus high BCG vaccine uptake, adjusting for age and calendar year (separately for men and women).The IR of sarcoidosis was increased for individuals born during low BCG vaccine uptake compared with individuals born during high BCG vaccine uptake, which was largely attributed to men. The IRR of sarcoidosis for men born during low BCG vaccine uptake versus high BCG vaccine uptake was 1.22 (95% confidence interval [CI] 1.02-1.45). In women, the IRR was 1.08 (95% CI 0.88-1.31).In this quasi-experimental study that minimizes confounding, the time period with high BCG vaccine uptake was associated with a lower incidence rate of sarcoidosis in men, with a similar effect seen in women that did not reach significance. Our findings support a potential protective effect of BCG vaccination against the development of sarcoidosis. Future interventional studies for high-risk individuals could be considered.
View details for DOI 10.1016/j.semarthrit.2023.152205
View details for PubMedID 37054583
Practical Considerations for Developing Clinical Natural Language Processing Systems for Population Health Management and Measurement.
JMIR medical informatics
2023; 11: e37805
Experts have noted a concerning gap between clinical natural language processing (NLP) research and real-world applications, such as clinical decision support. To help address this gap, in this viewpoint, we enumerate a set of practical considerations for developing an NLP system to support real-world clinical needs and improve health outcomes. They include determining (1) the readiness of the data and compute resources for NLP, (2) the organizational incentives to use and maintain the NLP systems, and (3) the feasibility of implementation and continued monitoring. These considerations are intended to benefit the design of future clinical NLP projects and can be applied across a variety of settings, including large health systems or smaller clinical practices that have adopted electronic medical records in the United States and globally.
View details for DOI 10.2196/37805
View details for PubMedID 36595345
- cpgQA: A Benchmark Dataset for Machine Reading Comprehension Tasks on Clinical Practice Guidelines and a Case Study Using Transfer Learning IEEE ACCESS 2023; 11: 3691-3705
Revelations from a Machine Learning Analysis of the Most Downloaded Articles Published in Journal of Palliative Medicine 1999-2018.
Journal of palliative medicine
2023; 26 (1): 13-16
The Journal of Palliative Medicine (JPM) is globally recognized as a leading interdisciplinary peer-reviewed palliative care journal providing balanced information that informs and improves the practice of palliative care. JPM shapes the values, integrity, and standards of the subspecialty of palliative medicine by what it chooses to publish. The global JPM readership chooses to download the articles that are of most relevance and utility to them. Utilizing machine learning methods, the top 100 most downloaded articles in JPM were analyzed to gain a better understanding of any latent trends and patterns in the topics between 1999 and 2018. The top five topic themes identified in the first decade were different from the ones identified in the second decade of publication. There is evidence of differentiation and maturation of the field in the context of comprehensive health care. Although noncancer serious illnesses have still not risen to the same prominence as cancer palliation, there is a directional quality to the emerging evidence as it pertains to cardiac, respiratory, neurological, renal, and other etiologies. Across both decades under study, there was persistent evidence of the importance of understanding and managing the mental health care needs of seriously ill patients and their families. A cause for concern is that the word "spirituality" was prominent in the first decade and was lacking in the second. Future palliative care clinical and research initiatives should focus on its development as an essential interprofessional and medical subspecialty germane to all types of serious illnesses and across all venues.
View details for DOI 10.1089/jpm.2022.0574
View details for PubMedID 36607778
Sarcoidosis incidence after mTOR inhibitor treatment.
Seminars in arthritis and rheumatism
2022; 57: 152102
OBJECTIVE: Mechanistic target of rapamycin (mTOR) inhibitors are effective in animal models of granulomatous disease, but their benefit in sarcoidosis patients is unknown. We evaluated the incidence of sarcoidosis in patients treated with mTOR inhibitors versus calcineurin inhibitors.METHODS: This was a cohort study using the Optum Clinformatics Data Mart (CDM) Database (2003-2019), IBM MarketScan Research Database (2006-2016), and Danish health and administrative registries (1996-2018). Patients aged ≥18 years with ≥1 year continuous enrollment before and after kidney, liver, heart, or lung transplant treated with an mTOR inhibitor or calcineurin inhibitor were included. Patients diagnosed with sarcoidosis before, or up to 90 days after, transplant were excluded. The incidence of sarcoidosis by treatment group was calculated.RESULTS: In the Optum CDM/IBM MarketScan cohort, 1,898 patients were treated with an mTOR inhibitor (mean age 49 years; 34% female) and 9,894 patients were treated with a calcineurin inhibitor (mean age 50 years; 37% female). The mean follow-up in the mTOR inhibitor group was 1.1 years, with no incident sarcoidosis diagnosed. In the calcineurin inhibitor group, the mean follow-up was 2.2 years, with 12 incident sarcoidosis cases diagnosed. In the Danish cohort, 230 patients were treated with an mTOR inhibitor (mean age 49; 45% female), with no incident sarcoidosis diagnosed. There were 3,411 patients treated with a calcineurin inhibitor (mean age 45; 40% female), with 10 incident cases of sarcoidosis diagnosed.CONCLUSIONS: This study indicates a potential protective effect of mTOR inhibitor treatment compared with calcineurin inhibitor treatment against the development of sarcoidosis.
View details for DOI 10.1016/j.semarthrit.2022.152102
View details for PubMedID 36182721
Sarcoidosis in patients after solid organ transplantation treated with mtor inhibitors versus calcineurin inhibitors
WILEY. 2022: 263-264
View details for Web of Science ID 000859084401127
Sarcoidosis Rates in BCG-Vaccinated and Unvaccinated Young Adults: A Danish Register-Based Study
WILEY. 2022: 2207-2208
View details for Web of Science ID 000877386502125
Application of Natural Language Processing to Identify Varicella Zoster Infection in Clinical Notes
WILEY. 2022: 1455-1457
View details for Web of Science ID 000877386501248
Sarcoidosis Incidence After mTOR Inhibitor Treatment
WILEY. 2022: 254-256
View details for Web of Science ID 000877386500136
Suicide theory-guided natural language processing of clinical progress notes to improve prediction of veteran suicide risk: protocol for a mixed-method study.
2022; 12 (8): e065088
The state-of-the-art 3-step Theory of Suicide (3ST) describes why people consider suicide and who will act on their suicidal thoughts and attempt suicide. The central concepts of 3ST-psychological pain, hopelessness, connectedness, and capacity for suicide-are among the most important drivers of suicidal behaviour but they are missing from clinical suicide risk prediction models in use at the US Veterans Health Administration (VHA). These four concepts are not systematically recorded in structured fields of VHA's electronic healthcare records. Therefore, this study will develop a domain-specific ontology that will enable automated extraction of these concepts from clinical progress notes using natural language processing (NLP), and test whether NLP-based predictors for these concepts improve accuracy of existing VHA suicide risk prediction models.Our mixed-method study has an exploratory sequential design where a qualitative component (aim 1) will inform quantitative analyses (aims 2 and 3). For aim 1, subject matter experts will manually annotate progress notes of clinical encounters with veterans who attempted or died by suicide to develop a domain-specific ontology for the 3ST concepts. During aim 2, we will use NLP to machine-annotate clinical progress notes and derive longitudinal representations for each patient with respect to the presence and intensity of hopelessness, psychological pain, connectedness and capacity for suicide in temporal proximity of suicide attempts and deaths by suicide. These longitudinal representations will be evaluated during aim 3 for their ability to improve existing VHA prediction models of suicide and suicide attempts, STORM (Stratification Tool for Opioid Risk Mitigation) and REACHVET (Recovery Engagement and Coordination for Health - Veterans Enhanced Treatment).Ethics approval for this study was granted by the Stanford University Institutional Review Board and the Research and Development Committee of the VA Palo Alto Health Care System. Results of the study will be disseminated through several outlets, including peer-reviewed publications and presentations at national conferences.
View details for DOI 10.1136/bmjopen-2022-065088
View details for PubMedID 36002210
Preparing for the next pandemic via transfer learning from existing diseases with hierarchical multi-modal BERT: a study on COVID-19 outcome prediction.
2022; 12 (1): 10748
Developing prediction models for emerging infectious diseases from relatively small numbers of cases is a critical need for improving pandemic preparedness. Using COVID-19 as an exemplar, we propose a transfer learning methodology for developing predictive models from multi-modal electronic healthcare records by leveraging information from more prevalent diseases with shared clinical characteristics. Our novel hierarchical, multi-modal model ([Formula: see text]) integrates baseline risk factors from the natural language processing of clinical notes at admission, time-series measurements of biomarkers obtained from laboratory tests, and discrete diagnostic, procedure and drug codes. We demonstrate the alignment of [Formula: see text]'s predictions with well-established clinical knowledge about COVID-19 through univariate and multivariate risk factor driven sub-cohort analysis. [Formula: see text]'s superior performance over state-of-the-art methods shows that leveraging patient data across modalities and transferring prior knowledge from similar disorders is critical for accurate prediction of patient outcomes, and this approach may serve as an important tool in the early response to future pandemics.
View details for DOI 10.1038/s41598-022-13072-w
View details for PubMedID 35750878
A case for developing domain-specific vocabularies for extracting suicide factors from healthcare notes.
Journal of psychiatric research
2022; 151: 328-338
The onset and persistence of life events (LE) such as housing instability, job instability, and reduced social connection have been shown to increase risk of suicide. Predictive models for suicide risk have low sensitivity to many of these factors due to under-reporting in structured electronic health records (EHR) data. In this study, we show how natural language processing (NLP) can help identify LE in clinical notes at higher rates than reported medical codes. We compare domain-specific lexicons formulated from Unified Medical Language System (UMLS) selection, content analysis by subject matter experts (SME) and the Gravity Project, to data-driven expansion through contextual word embedding using Word2Vec. Our analysis covers EHR from the Veterans Affairs (VA) Corporate Data Warehouse (CDW) and measures the prevalence of LE across time for patients with known underlying cause of death in the National Death Index (NDI). We found that NLP methods had higher sensitivity of detecting LE relative to structured EHR (S-EHR) variables. We observed that, on average, suicide cases had higher rates of LE over time when compared to patients who died of non-suicide related causes with no previous history of diagnosed mental illness. When used to discriminate these outcomes, the inclusion of NLP derived variables increased the concentration of LE along the top 0.1%, 0.5% and 1% of predicted risk. LE were less informative when discriminating suicide death from non-suicide related death for patients with diagnosed mental illness.
View details for DOI 10.1016/j.jpsychires.2022.04.009
View details for PubMedID 35533516
Development of a natural language processing system for extracting rheumatoid arthritis outcomes from clinical notes using the national RISE registry.
Arthritis care & research
OBJECTIVE: To accelerate the use of outcome measures in rheumatology, we developed and evaluated a natural language processing (NLP) pipeline for extracting these measures from free-text outpatient rheumatology notes within the ACR's Rheumatology Informatics System for Effectiveness (RISE) registry.METHODS: We included all patients in RISE (2015 to 2018). The NLP pipeline extracted scores corresponding to eight measures of RA disease activity (DA) and functional status (FS) documented in outpatient rheumatology notes. Score extraction performance was evaluated by chart review, and we assessed agreement with scores documented in structured data. We conducted an external validation of our NLP pipeline using data from rheumatology notes from an academic medical center that is not included in the RISE registry.RESULTS: We processed over 34 million notes from 854,628 patients, 158 practices, and 24 EHR systems from RISE. Manual chart review revealed a sensitivity, positive predictive value (PPV), and F1 score of 95%, 87%, and 91%, respectively. Substantial agreement was observed between scores extracted from RISE notes and scores derived from structured data (kappa: 0.43 - 0.68 among DA and 0.86-0.98 among FS measures). Inthe external validation, we found a sensitivity, PPV, and F1 score of 92%, 69%, and 79%, respectively.CONCLUSIONS: We developed an NLP pipeline to extract RA outcome measures from a national registry of notes from multiple EHR systems and found it to have good internal and external validity. This pipeline can facilitate measurement of clinical and patient reported outcomes for use in research and quality measurement.
View details for DOI 10.1002/acr.24869
View details for PubMedID 35157365
Unstructured clinical notes within the 24 hours since admission predict short, mid & long-term mortality in adult ICU patients.
1800; 17 (1): e0262182
Mortality prediction for intensive care unit (ICU) patients is crucial for improving outcomes and efficient utilization of resources. Accessibility of electronic health records (EHR) has enabled data-driven predictive modeling using machine learning. However, very few studies rely solely on unstructured clinical notes from the EHR for mortality prediction. In this work, we propose a framework to predict short, mid, and long-term mortality in adult ICU patients using unstructured clinical notes from the MIMIC III database, natural language processing (NLP), and machine learning (ML) models. Depending on the statistical description of the patients' length of stay, we define the short-term as 48-hour and 4-day period, the mid-term as 7-day and 10-day period, and the long-term as 15-day and 30-day period after admission. We found that by only using clinical notes within the 24 hours of admission, our framework can achieve a high area under the receiver operating characteristics (AU-ROC) score for short, mid and long-term mortality prediction tasks. The test AU-ROC scores are 0.87, 0.83, 0.83, 0.82, 0.82, and 0.82 for 48-hour, 4-day, 7-day, 10-day, 15-day, and 30-day period mortality prediction, respectively. We also provide a comparative study among three types of feature extraction techniques from NLP: frequency-based technique, fixed embedding-based technique, and dynamic embedding-based technique. Lastly, we provide an interpretation of the NLP-based predictive models using feature-importance scores.
View details for DOI 10.1371/journal.pone.0262182
View details for PubMedID 34990485
Natural Language Processing Tool for Extraction of Patient-Reported Outcomes from a National Multi-Electronic Health Records Registry
WILEY. 2021: 3955-3957
View details for Web of Science ID 000744545207208
Association of alpha1-Blocker Receipt With 30-Day Mortality and Risk of Intensive Care Unit Admission Among Adults Hospitalized With Influenza or Pneumonia in Denmark.
JAMA network open
2021; 4 (2): e2037053
Importance: Alpha 1-adrenergic receptor blocking agents (alpha1-blockers) have been reported to have protective benefits against hyperinflammation and cytokine storm syndrome, conditions that are associated with mortality in patients with coronavirus disease 2019 and other severe respiratory tract infections. However, studies of the association of alpha1-blockers with outcomes among human participants with respiratory tract infections are scarce.Objective: To examine the association between the receipt of alpha1-blockers and outcomes among adult patients hospitalized with influenza or pneumonia.Design, Setting, and Participants: This population-based cohort study used data from Danish national registries to identify individuals 40 years and older who were hospitalized with influenza or pneumonia between January 1, 2005, and November 30, 2018, with follow-up through December 31, 2018. In the main analyses, patients currently receiving alpha1-blockers were compared with those not receiving alpha1-blockers (defined as patients with no prescription for an alpha1-blocker filled within 365 days before the index date) and those currently receiving 5alpha-reductase inhibitors. Propensity scores were used to address confounding factors and to compute weighted risks, absolute risk differences, and risk ratios. Data were analyzed from April 21 to December 21, 2020.Exposures: Current receipt of alpha1-blockers compared with nonreceipt of alpha1-blockers and with current receipt of 5alpha-reductase inhibitors.Main Outcomes and Measures: Death within 30 days of hospital admission and risk of intensive care unit (ICU) admission.Results: A total of 528 467 adult patients (median age, 75.0 years; interquartile range, 64.4-83.6 years; 273 005 men [51.7%]) were hospitalized with influenza or pneumonia in Denmark between 2005 and 2018. Of those, 21 772 patients (4.1%) were currently receiving alpha1-blockers compared with a population of 22 117 patients not receiving alpha1-blockers who were weighted to the propensity score distribution of those receiving alpha1-blockers. In the propensity score-weighted analyses, patients receiving alpha1-blockers had lower 30-day mortality (15.9%) compared with patients not receiving alpha1-blockers (18.5%), with a corresponding risk difference of -2.7% (95% CI, -3.2% to -2.2%) and a risk ratio (RR) of 0.85 (95% CI, 0.83-0.88). The risk of ICU admission was 7.3% among patients receiving alpha1-blockers and 7.7% among those not receiving alpha1-blockers (risk difference, -0.4% [95% CI, -0.8% to 0%]; RR, 0.95 [95% CI, 0.90-1.00]). A comparison between 18 280 male patients currently receiving alpha1-blockers and 18 228 propensity score-weighted male patients currently receiving 5alpha-reductase inhibitors indicated that those receiving alpha1-blockers had lower 30-day mortality (risk difference, -2.0% [95% CI, -3.4% to -0.6%]; RR, 0.89 [95% CI, 0.82-0.96]) and a similar risk of ICU admission (risk difference, -0.3% [95% CI, -1.4% to 0.7%]; RR, 0.96 [95% CI, 0.83-1.10]).Conclusions and Relevance: This cohort study's findings suggest that the receipt of alpha1-blockers is associated with protective benefits among adult patients hospitalized with influenza or pneumonia.
View details for DOI 10.1001/jamanetworkopen.2020.37053
View details for PubMedID 33566109
Ten Rules for Conducting Retrospective Pharmacoepidemiological Analyses: Example COVID-19 Study.
Frontiers in pharmacology
2021; 12: 700776
Since the beginning of the COVID-19 pandemic, pharmaceutical treatment hypotheses have abounded, each requiring careful evaluation. A randomized controlled trial generally provides the most credible evaluation of a treatment, but the efficiency and effectiveness of the trial depend on the existing evidence supporting the treatment. The researcher must therefore compile a body of evidence justifying the use of time and resources to further investigate a treatment hypothesis in a trial. An observational study can provide this evidence, but the lack of randomized exposure and the researcher's inability to control treatment administration and data collection introduce significant challenges. A proper analysis of observational health care data thus requires contributions from experts in a diverse set of topics ranging from epidemiology and causal analysis to relevant medical specialties and data sources. Here we summarize these contributions as 10 rules that serve as an end-to-end introduction to retrospective pharmacoepidemiological analyses of observational health care data using a running example of a hypothetical COVID-19 study. A detailed supplement presents a practical how-to guide for following each rule. When carefully designed and properly executed, a retrospective pharmacoepidemiological analysis framed around these rules will inform the decisions of whether and how to investigate a treatment hypothesis in a randomized controlled trial. This work has important implications for any future pandemic by prescribing what we can and should do while the world waits for global vaccine distribution.
View details for DOI 10.3389/fphar.2021.700776
View details for PubMedID 34393782
Application of Text Mining Methods to Identify Lupus Nephritis from Electronic Health Records
View details for Web of Science ID 000587568500258
Risk of primary urological and genital cancers following incident breast cancer: a Danish population-based cohort study.
Breast cancer research and treatment
PURPOSE: The prevalence of breast cancer survivors has increased due to dissemination of population-based mammographic screening and improved treatments. Recent changes in anti-hormonal therapies for breast cancer may have modified the risks of subsequent urological and genital cancers. We examine the risk of subsequent primary urological and genital cancers in patients with incident breast cancer compared with risks in the general population.METHODS: Using population-based Danish medical registries, we identified a cohort of women with primary breast cancer (1990-2017). We followed them from one year after their breast cancer diagnosis until any subsequent urological or genital cancer diagnosis. We computed incidence rates and standardized incidence ratios (SIRs) with 95% confidence intervals (CIs) as the observed number of cancers relative to the expected number based on national incidence rates (by sex, age, and calendar year).RESULTS: Among 84,972 patients with breast cancer (median age 61years), we observed 623 urological cancers and 1397 genital cancers during a median follow-up of 7.4years. The incidence rate per 100,000 person-years was stable during follow-up (83 for urological cancers and 176 for genital cancers). The SIR was increased for ovarian cancer (1.37, 95% CI 1.23-1.52) and uterine cancer (1.37, 95% CI 1.25-1.50), but only during the pre-aromatase inhibitor era (before 2007). Moreover, the SIR of kidney cancer was increased (1.52, 95% CI 1.15-1.97), but only during 2007-2017. The SIR for urinary bladder cancer was marginally increased (1.15, 95% CI 1.04-1.28) with no temporal effects. No associations were observed for cervical cancer.CONCLUSION: Breast cancer survivors had higher risks of uterine and ovarian cancer than expected, but only before 2007, and of kidney cancer, but only after 2007. The risk of urinary bladder cancer was moderately increased without temporal effects, and we observed no association with cervical cancer.
View details for DOI 10.1007/s10549-020-05879-w
View details for PubMedID 32845432
A Machine Learning Approach to Identifying Changes in Suicidal Language.
Suicide & life-threatening behavior
OBJECTIVE: With early identification and intervention, many suicidal deaths are preventable. Tools that include machine learning methods have been able to identify suicidal language. This paper examines the persistence of this suicidal language up to 30days after discharge from care.METHOD: In a multi-center study, 253 subjects were enrolled into either suicidal or control cohorts. Their responses to standardized instruments and interviews were analyzed using machine learning algorithms. Subjects were re-interviewed approximately 30days later, and their language was compared to the original language to determine the presence of suicidal ideation.RESULTS: The results show that language characteristics used to classify suicidality at the initial encounter are still present in the speech 30days later (AUC=89% (95% CI: 85-95%), p<.0001) and that algorithms trained on the second interviews could also identify the subjects that produced the first interviews (AUC=85% (95% CI: 81-90%), p<.0001).CONCLUSIONS: This approach explores the stability of suicidal language. When using advanced computational methods, the results show that a patient's language is similar 30days after first captured, while responses to standard measures change. This can be useful when developing methods that identify the data-based phenotype of a subject.
View details for DOI 10.1111/sltb.12642
View details for PubMedID 32484597
Risk of primary gastrointestinal cancers following incident non-metastatic breast cancer: a Danish population-based cohort study.
BMJ open gastroenterology
2020; 7 (1)
OBJECTIVE: We examined the risk of primary gastrointestinal cancers in women with breast cancer and compared this risk with that of the general population.DESIGN: Using population-based Danish registries, we conducted a cohort study of women with incident non-metastatic breast cancer (1990-2017). We computed cumulative cancer incidences and standardised incidence ratios (SIRs).RESULTS: Among 84972 patients with breast cancer, we observed 2340 gastrointestinal cancers. After 20 years of follow-up, the cumulative incidence of gastrointestinal cancers was 4%, driven mainly by colon cancers. Only risk of stomach cancer was continually increased beyond 1year following breast cancer. The SIR for colon cancer was neutral during 2-5 years of follow-up and approximately 1.2-fold increased thereafter. For cancer of the oesophagus, the SIR was increased only during 6-10 years. There was a weak association with pancreas cancer beyond 10 years. Between 1990-2006 and 2007-2017, the 1-10 years SIR estimate decreased and reached unity for upper gastrointestinal cancers (oesophagus, stomach, and small intestine). For lower gastrointestinal cancers (colon, rectum, and anal canal), the SIR estimate was increased only after 2007. No temporal effects were observed for the remaining gastrointestinal cancers. Treatment effects were negligible.CONCLUSION: Breast cancer survivors were at increased risk of oesophagus and stomach cancer, but only before 2007. The risk of colon cancer was increased, but only after 2007.
View details for DOI 10.1136/bmjgast-2020-000413
View details for PubMedID 32611556
The incidence of hematologic cancers after breast cancer. A 35-year population-based cohort study in Denmark
WILEY. 2019: 75
View details for Web of Science ID 000481785600147
Risk of primary urological and genital cancers following incident breast cancer: A Danish population-based cohort study
WILEY. 2019: 79
View details for Web of Science ID 000481785600155
Risk of primary gastrointestinal cancers following incident breast cancer: A Danish population-based cohort study
WILEY. 2019: 81
View details for Web of Science ID 000481785600159
- Stress Disorders and Dementia in the Danish Population AMERICAN JOURNAL OF EPIDEMIOLOGY 2019; 188 (3): 493–99
Using natural language processing to construct a metastatic breast cancer cohort from linked cancer registry and electronic medical records data.
2019; 2 (4): 528–37
Most population-based cancer databases lack information on metastatic recurrence. Electronic medical records (EMR) and cancer registries contain complementary information on cancer diagnosis, treatment and outcome, yet are rarely used synergistically. To construct a cohort of metastatic breast cancer (MBC) patients, we applied natural language processing techniques within a semisupervised machine learning framework to linked EMR-California Cancer Registry (CCR) data.We studied all female patients treated at Stanford Health Care with an incident breast cancer diagnosis from 2000 to 2014. Our database consisted of structured fields and unstructured free-text clinical notes from EMR, linked to CCR, a component of the Surveillance, Epidemiology and End Results Program (SEER). We identified de novo MBC patients from CCR and extracted information on distant recurrences from patient notes in EMR. Furthermore, we trained a regularized logistic regression model for recurrent MBC classification and evaluated its performance on a gold standard set of 146 patients.There were 11 459 breast cancer patients in total and the median follow-up time was 96.3 months. We identified 1886 MBC patients, 512 (27.1%) of whom were de novo MBC patients and 1374 (72.9%) were recurrent MBC patients. Our final MBC classifier achieved an area under the receiver operating characteristic curve (AUC) of 0.917, with sensitivity 0.861, specificity 0.878, and accuracy 0.870.To enable population-based research on MBC, we developed a framework for retrospective case detection combining EMR and CCR data. Our classifier achieved good AUC, sensitivity, and specificity without expert-labeled examples.
View details for DOI 10.1093/jamiaopen/ooz040
View details for PubMedID 32025650
View details for PubMedCentralID PMC6994019
Stress Disorders and Dementia in the Danish Population.
American journal of epidemiology
There is an association between stress and dementia. However, less is known about dementia among persons with varied stress responses and sex differences in these associations. This population-based cohort study examined dementia among persons with a range of clinician-diagnosed stress disorders, and the interaction between stress disorders and sex in predicting dementia, in Denmark from 1995 to 2011. This study included Danes 40 years or older with a stress disorder diagnosis (n=47,047) and a matched comparison cohort (n=232,141) without a stress disorder diagnosis from 1995 through 2011. Diagnoses were culled from national registries. We used Cox proportional-hazards regression to estimate associations between stress disorders and dementia. Risk of dementia was higher for persons with stress disorders than for persons without such diagnosis; adjusted hazard ratios ranged from 1.6 to 2.8. There was evidence of an interaction between sex and stress disorders in predicting dementia, with a greater rate of dementia among men with stress disorders except posttraumatic stress disorder, for which women had a greater rate. Results support existing evidence of an association between stress and dementia. This study contributes novel information regarding dementia risk across a range of stress responses, and interactions between stress disorders and sex.
View details for PubMedID 30576420
Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data
JAMA INTERNAL MEDICINE
2018; 178 (11): 1544–47
A promise of machine learning in health care is the avoidance of biases in diagnosis and treatment; a computer algorithm could objectively synthesize and interpret the data in the medical record. Integration of machine learning with clinical decision support tools, such as computerized alerts or diagnostic support, may offer physicians and others who provide health care targeted and timely information that can improve clinical decisions. Machine learning algorithms, however, may also be subject to biases. The biases include those related to missing data and patients not identified by algorithms, sample size and underestimation, and misclassification and measurement error. There is concern that biases and deficiencies in the data used by machine learning algorithms may contribute to socioeconomic disparities in health care. This Special Communication outlines the potential biases that may be introduced into machine learning-based clinical decision support tools that use electronic health record data and proposes potential solutions to the problems of overreliance on automation, algorithms based on biased data, and algorithms that do not provide information that is clinically meaningful. Existing health care disparities should not be amplified by thoughtless or excessive reliance on machines.
View details for PubMedID 30128552
Scalable Electronic Phenotyping For Studying Patient Comorbidities.
AMIA ... Annual Symposium proceedings. AMIA Symposium
2018; 2018: 740–49
Over 75 million Americans have multiple concurrent chronic conditions and medical decision making for these patients is mostly based on retrospective cohort studies. Current methods to generate cohorts of patients with comorbidities are neither scalable nor generalizable. We propose a supervised machine learning algorithm for learning comorbidity phenotypes without requiring manually created training sets. First, we generated myocardial infarction (MI) and type-2 diabetes (T2DM) patient cohorts using ICD9-based imperfectly labeled samples upon which LASSO logistic regression models were trained. Second, we assessed the effects of training sample size, inclusion of physician input, and inclusion of clinical text features on model performance. Using ICD9 codes as our labeling heuristic, we achieved comparable performance to models created using keywords as labeling heuristic. We found that expert input and higher training sample sizes could compensate for the lack of clinical text derived features. However, our best performing model included clinical text as features with a large training sample size.
View details for PubMedID 30815116
SynthNotes: A Generator Framework for High-volume, High-fidelity Synthetic Mental Health Notes
IEEE. 2018: 951–58
View details for Web of Science ID 000468499301003
Performance of Machine Learning Methods Using Electronic Medical Records to Predict Varicella Zoster Virus Infection
View details for Web of Science ID 000411824106394
Predicting patient 'cost blooms' in Denmark: a longitudinal population-based study.
2017; 7 (1)
To compare the ability of standard versus enhanced models to predict future high-cost patients, especially those who move from a lower to the upper decile of per capita healthcare expenditures within 1 year-that is, 'cost bloomers'.We developed alternative models to predict being in the upper decile of healthcare expenditures in year 2 of a sample, based on data from year 1. Our 6 alternative models ranged from a standard cost-prediction model with 4 variables (ie, traditional model features), to our largest enhanced model with 1053 non-traditional model features. To quantify any increases in predictive power that enhanced models achieved over standard tools, we compared the prospective predictive performance of each model.We used the population of Western Denmark between 2004 and 2011 (2 146 801 individuals) to predict future high-cost patients and characterise high-cost patient subgroups. Using the most recent 2-year period (2010-2011) for model evaluation, our whole-population model used a cohort of 1 557 950 individuals with a full year of active residency in year 1 (2010). Our cost-bloom model excluded the 155 795 individuals who were already high cost at the population level in year 1, resulting in 1 402 155 individuals for prediction of cost bloomers in year 2 (2011).Using unseen data from a future year, we evaluated each model's prospective predictive performance by calculating the ratio of predicted high-cost patient expenditures to the actual high-cost patient expenditures in Year 2-that is, cost capture.Our best enhanced model achieved a 21% and 30% improvement in cost capture over a standard diagnosis-based model for predicting population-level high-cost patients and cost bloomers, respectively.In combination with modern statistical learning methods for analysing large data sets, models enhanced with a large and diverse set of features led to better performance-especially for predicting future cost bloomers.
View details for DOI 10.1136/bmjopen-2016-011580
View details for PubMedID 28077408
View details for PubMedCentralID PMC5253526
Enhanced Quality Measurement Event Detection: An Application to Physician Reporting.
EGEMS (Washington, DC)
2017; 5 (1): 5
The wide-scale adoption of electronic health records (EHR)s has increased the availability of routinely collected clinical data in electronic form that can be used to improve the reporting of quality of care. However, the bulk of information in the EHR is in unstructured form (e.g., free-text clinical notes) and not amenable to automated reporting. Traditional methods are based on structured diagnostic and billing data that provide efficient, but inaccurate or incomplete summaries of actual or relevant care processes and patient outcomes. To assess the feasibility and benefit of implementing enhanced EHR- based physician quality measurement and reporting, which includes the analysis of unstructured free- text clinical notes, we conducted a retrospective study to compare traditional and enhanced approaches for reporting ten physician quality measures from multiple National Quality Strategy domains. We found that our enhanced approach enabled the calculation of five Physician Quality and Performance System measures not measureable in billing or diagnostic codes and resulted in over a five-fold increase in event at an average precision of 88 percent (95 percent CI: 83-93 percent). Our work suggests that enhanced EHR-based quality measurement can increase event detection for establishing value-based payment arrangements and can expedite quality reporting for physician practices, which are increasingly burdened by the process of manual chart review for quality reporting.
View details for PubMedID 29881731
New Paradigms for Patient-Centered Outcomes Research in Electronic Medical Records: An Example of Detecting Urinary Incontinence Following Prostatectomy.
EGEMS (Washington, DC)
2016; 4 (3): 1231-?
National initiatives to develop quality metrics emphasize the need to include patient-centered outcomes. Patient-centered outcomes are complex, require documentation of patient communications, and have not been routinely collected by healthcare providers. The widespread implementation of electronic medical records (EHR) offers opportunities to assess patient-centered outcomes within the routine healthcare delivery system. The objective of this study was to test the feasibility and accuracy of identifying patient centered outcomes within the EHR.Data from patients with localized prostate cancer undergoing prostatectomy were used to develop and test algorithms to accurately identify patient-centered outcomes in post-operative EHRs - we used urinary incontinence as the use case. Standard data mining techniques were used to extract and annotate free text and structured data to assess urinary incontinence recorded within the EHRs.A total 5,349 prostate cancer patients were identified in our EHR-system between 1998-2013. Among these EHRs, 30.3% had a text mention of urinary incontinence within 90 days post-operative compared to less than 1.0% with a structured data field for urinary incontinence (i.e. ICD-9 code). Our workflow had good precision and recall for urinary incontinence (positive predictive value: 0.73 and sensitivity: 0.84).Our data indicate that important patient-centered outcomes, such as urinary incontinence, are being captured in EHRs as free text and highlight the long-standing importance of accurate clinician documentation. Standard data mining algorithms can accurately and efficiently identify these outcomes in existing EHRs; the complete assessment of these outcomes is essential to move practice into the patient-centered realm of healthcare.
View details for DOI 10.13063/2327-9214.1231
View details for PubMedID 27347492
Detecting unplanned care from clinician notes in electronic health records.
Journal of oncology practice / American Society of Clinical Oncology
2015; 11 (3): e313-9
Reduction in unplanned episodes of care, such as emergency department visits and unplanned hospitalizations, are important quality outcome measures. However, many events are only documented in free-text clinician notes and are labor intensive to detect by manual medical record review.We studied 308,096 free-text machine-readable documents linked to individual entries in our electronic health records, representing care for patients with breast, GI, or thoracic cancer, whose treatment was initiated at one academic medical center, Stanford Health Care (SHC). Using a clinical text-mining tool, we detected unplanned episodes documented in clinician notes (for non-SHC visits) or in coded encounter data for SHC-delivered care and the most frequent symptoms documented in emergency department (ED) notes.Combined reporting increased the identification of patients with one or more unplanned care visits by 32% (15% using coded data; 20% using all the data) among patients with 3 months of follow-up and by 21% (23% using coded data; 28% using all the data) among those with 1 year of follow-up. Based on the textual analysis of SHC ED notes, pain (75%), followed by nausea (54%), vomiting (47%), infection (36%), fever (28%), and anemia (27%), were the most frequent symptoms mentioned. Pain, nausea, and vomiting co-occur in 35% of all ED encounter notes.The text-mining methods we describe can be applied to automatically review free-text clinician notes to detect unplanned episodes of care mentioned in these notes. These methods have broad application for quality improvement efforts in which events of interest occur outside of a network that allows for patient data sharing.
View details for DOI 10.1200/JOP.2014.002741
View details for PubMedID 25980019
View details for PubMedCentralID PMC4438112
Text Mining for Adverse Drug Events: the Promise, Challenges, and State of the Art
2014; 37 (10): 777-790
Text mining is the computational process of extracting meaningful information from large amounts of unstructured text. It is emerging as a tool to leverage underutilized data sources that can improve pharmacovigilance, including the objective of adverse drug event (ADE) detection and assessment. This article provides an overview of recent advances in pharmacovigilance driven by the application of text mining, and discusses several data sources-such as biomedical literature, clinical narratives, product labeling, social media, and Web search logs-that are amenable to text mining for pharmacovigilance. Given the state of the art, it appears text mining can be applied to extract useful ADE-related information from multiple textual sources. Nonetheless, further research is required to address remaining technical challenges associated with the text mining methodologies, and to conclusively determine the relative contribution of each textual source to improving pharmacovigilance.
View details for DOI 10.1007/s40264-014-0218-z
View details for Web of Science ID 000344615300005
View details for PubMedCentralID PMC4217510