Selen Bozkurt is a biomedical informatician and biostatistician at Stanford University, Center for Biomedical Informatics Research. She was a postdoctoral scholar before, at Stanford Biomedical Data Science Department. Her research area and interests have focused on health informatics research using electronic health records, machine learning and natural language processing. She also has work experience as a biostatistician in several projects. She is a member of RSNA Radiology Reporting Committee since 2009. Her PhD dissertation work was entitled "A Real Time Decision Support System for Mammography Interpretations" in which she developed an automated system for deep information extraction from mammography reports and an approach for real-time decision support driven by analysis of dictated radiology reports.
Education & Certifications
PhD, Akdeniz University, Faculty of Medicine, Biostatistics and Medical Informatics
Visiting PhD Student, Stanford University, Biomedical Informatics
MSc, Akdeniz University, Faculty of Medicine, Biostatistics and Medical Informatics
BSc, Dokuz Eylul University, Statistics
Automated Detection of Measurements and Their Descriptors in Radiology Reports Using a Hybrid Natural Language Processing Algorithm.
Journal of digital imaging
Radiological measurements are reported in free text reports, and it is challenging to extract such measures for treatment planning such as lesion summarization and cancer response assessment. The purpose of this work is to develop and evaluate a natural language processing (NLP) pipeline that can extract measurements and their core descriptors, such as temporality, anatomical entity, imaging observation, RadLex descriptors, series number, image number, and segment from a wide variety of radiology reports (MR, CT, and mammogram). We created a hybrid NLP pipeline that integrates rule-based feature extraction modules and conditional random field (CRF) model for extraction of the measurements from the radiology reports and links them with clinically relevant features such as anatomical entities or imaging observations. The pipeline was trained on 1117 CT/MR reports, and performance of the system was evaluated on an independent set of 100 expert-annotated CT/MR reports and also tested on 25 mammography reports. The system detected 813 out of 806 measurements in the CT/MR reports; 784 were true positives, 29 were false positives, and 0 were false negatives. Similarly, from the mammography reports, 96% of the measurements with their modifiers were extracted correctly. Our approach could enable the development of computerized applications that can utilize summarized lesion measurements from radiology report of varying modalities and improve practice by tracking the same lesions along multiple radiologic encounters.
View details for DOI 10.1007/s10278-019-00237-9
View details for PubMedID 31222557
A nomogram for decision-making of completion surgery in endometrial cancer diagnosed after hysterectomy.
Archives of gynecology and obstetrics
Extrauterine tumor spread is one of the essential determinants of disease outcome in endometrial cancer. However; more than 30% of patients still undergo incomplete surgery at the initial attempt. Strategies regarding the management of patients with incompletely staged early-stage disease or patients with undebulked advanced-stage disease remain controversial. Depending on postoperative uterine features and findings on imaging, patients may be put on observation or receive adjuvant therapy or undergo re-staging or debulking surgery followed by adjuvant therapy. To identify patients who would most benefit from a completion surgery, either for restaging or for cytoreduction, we developed a nomogram for estimation of extrauterine disease based on findings of final hysterectomy specimen.Data of 336 patients whose extrauterine disease status was known were analyzed. A nomogram was constructed using patient characteristics including age, grade, myometrial invasion, lymphovascular space involvement, cervical involvement, and peritoneal cytology. The nomogram was internally validated in terms of discrimination, calibration and overall performance.The nomogram showed good performance accuracy with an area under the receiver operating characteristic curve of 0.870, a specificity of 95.5%, and a positive predictive value of 73.9%. Decision curve analysis revealed that the use of the nomogram in decision-making for completion surgery leads to the equivalent of a net 18 true-positive results per 100 patients without an increase in the number of false-positive results.Estimation of extrauterine disease from final hysterectomy specimen is possible with high predictive performance using the nomogram developed. The nomogram may help clinicians in decision-making for management of incomplete surgeries.
View details for DOI 10.1007/s00404-019-05223-8
View details for PubMedID 31250198
Machine Learning Approaches for Extracting Stage from Pathology Reports in Prostate Cancer.
Studies in health technology and informatics
2019; 264: 1522–23
Clinical and pathological stage are defining parameters in oncology, which direct a patient's treatment options and prognosis. Pathology reports contain a wealth of staging information that is not stored in structured form in most electronic health records (EHRs). Therefore, we evaluated three supervised machine learning methods (Support Vector Machine, Decision Trees, Gradient Boosting) to classify free-text pathology reports for prostate cancer into T, N and M stage groups.
View details for DOI 10.3233/SHTI190515
View details for PubMedID 31438212
Natural Language Processing Approaches to Detect the Timeline of Metastatic Recurrence of Breast Cancer.
JCO clinical cancer informatics
2019; 3: 1–12
Electronic medical records (EMRs) and population-based cancer registries contain information on cancer outcomes and treatment, yet rarely capture information on the timing of metastatic cancer recurrence, which is essential to understand cancer survival outcomes. We developed a natural language processing (NLP) system to identify patient-specific timelines of metastatic breast cancer recurrence.We used the OncoSHARE database, which includes merged data from the California Cancer Registry and EMRs of 8,956 women diagnosed with breast cancer in 2000 to 2018. We curated a comprehensive vocabulary by interviewing expert clinicians and processing radiology and pathology reports and progress notes. We developed and evaluated the following two distinct NLP approaches to analyze free-text notes: a traditional rule-based model, using rules for metastatic detection from the literature and curated by domain experts; and a contemporary neural network model. For each 3-month period (quarter) from 2000 to 2018, we applied both models to infer recurrence status for that quarter. We trained the NLP models using 894 randomly selected patient records that were manually reviewed by clinical experts and evaluated model performance using 179 hold-out patients (20%) as a test set.The median follow-up time was 19 quarters (5 years) for the training set and 15 quarters (4 years) for the test set. The neural network model predicted the timing of distant metastatic recurrence with a sensitivity of 0.83 and specificity of 0.73, outperforming the rule-based model, which had a specificity of 0.35 and sensitivity of 0.88 (P < .001).We developed an NLP method that enables identification of the occurrence and timing of metastatic breast cancer recurrence from EMRs. This approach may be adaptable to other cancer sites and could help to unlock the potential of EMRs for research on real-world cancer outcomes.
View details for DOI 10.1200/CCI.19.00034
View details for PubMedID 31584836
Comparison of Orthogonal NLP Methods for Clinical Phenotyping and Assessment of Bone Scan Utilization among Prostate Cancer Patients.
Journal of biomedical informatics
Clinical care guidelines recommend that newly diagnosed prostate cancer patients at high risk for metastatic spread receive a bone scan prior to treatment and that low risk patients not receive it. The objective was to develop an automated pipeline to interrogate heterogeneous data to evaluate the use of bone scans using a two different Natural Language Processing (NLP) approaches.Our cohort was divided into risk groups based on Electronic Health Records (EHR). Information on bone scan utilization was identified in both structured data and free text from clinical notes. Our pipeline annotated sentences with a combination of a rule-based method using the ConText algorithm (a generalization of NegEx) and a Convolutional Neural Network (CNN) method using word2vec to produce word embeddings.A total of 5,500 patients and 369,764 notes were included in the study. A total of 39% of patients were high-risk and 73% of these received a bone scan; of the 18% low risk patients, 10% received one. The accuracy of CNN model outperformed the rule-based model one (F-measure = 0.918 and 0.897 respectively). We demonstrate a combination of both models could maximize precision or recall, based on the study question.Using structured data, we accurately classified patients' cancer risk group, identified bone scan documentation with two NLP methods, and evaluated guideline adherence. Our pipeline can be used to provide concrete feedback to clinicians and guide treatment decisions.
View details for PubMedID 31014980
Automatic Inference of BI-RADS Final Assessment Categories from Narrative Mammography Report Findings.
Journal of biomedical informatics
We propose an efficient natural language processing approach for inferring the BI-RADS final assessment categories by analyzing only the mammogram findings reported by the mammographer in narrative form. The proposed hybrid method integrates semantic term embedding with distributional semantics, producing a context-aware vector representation of unstructured mammography reports. A large corpus of unannotated mammography reports (300,000) was used to learn the context of the key-terms using a distributional semantics approach, and the trained model was applied to generate context-aware vector representations of the reports annotated with BI-RADS category(22,091). The vectorized reports were utilized to train a supervised classifier to derive the BI-RADS assessment class. Even though the majority of the proposed embedding pipeline is unsupervised, the classifier was able to recognize substantial semantic information for deriving the BI-RADS categorization not only on a holdout internal testset and also on an external validation set (1,900 reports). Our proposed method outperforms a recently published domain-specific rule-based system and could be relevant for evaluating concordance between radiologists. With minimal requirement for task specific customization, the proposed method can be easily transferable to a different domain to support large scale text mining or derivation of patient phenotype.
View details for PubMedID 30807833
Knowledge, attitudes and medical practice regarding hepatitis B prevention and management among healthcare workers in Northern Vietnam.
2019; 14 (10): e0223733
BACKGROUND AND AIM: Vietnam's burden of liver cancer is largely due to its high prevalence of chronic hepatitis B virus (HBV) infection. This study aimed to examine healthcare workers' (HCWs) knowledge, attitude and practices regarding HBV prevention and management.METHODS: A cross-sectional survey among health care workers working at primary and tertiary facilities in two Northern provinces in Vietnam in 2017. A standardized questionnaire was administered to randomly selected HCWs. Multivariate regression was used to identify predictors of the HBV knowledge score.RESULTS: Among the 314 participants, 75.5% did not know HBV infection at birth carries the highest risk of developing chronic infection. The median knowledge score was 25 out of 42 (59.5%). About one third (30.2%) wrongly believed that HBV can be transmitted through eating or sharing food with chronic hepatitis B patients. About 38.8% did not feel confident that the hepatitis B vaccine is safe. Only 30.1% provided correct answers to all the questions on injection safety. Up to 48.2% reported they consistently recap needles with two hands after injection, a practice that would put them at greater risk of needle stick injury. About 24.2% reported having been pricked by a needle at work within the past 12 months. More than 40% were concerned about having casual contact or sharing food with a person with chronic hepatitis B infection (CHB). In multivariate analysis, physicians scored significantly higher compared to other healthcare professionals. Having received training regarding hepatitis B within the last two years was also significantly associated with a better HBV knowledge score.CONCLUSIONS: Findings from the survey indicated an immediate need to implement an effective hepatitis B education and training program to build capacity among Vietnam's healthcare workers in hepatitis B prevention and control and to dispel hepatitis B stigma.
View details for DOI 10.1371/journal.pone.0223733
View details for PubMedID 31609983
Automatic inference of BI-RADS final assessment categories from narrative mammography report findings
Journal of Biomedical Informatics
View details for DOI 10.1016/j.jbi.2019.103137
Is it possible to automatically assess pretreatment digital rectal examination documentation using natural language processing? A single-centre retrospective study.
2019; 9 (7): e027182
To develop and test a method for automatic assessment of a quality metric, provider-documented pretreatment digital rectal examination (DRE), using the outputs of a natural language processing (NLP) framework.An electronic health records (EHR)-based prostate cancer data warehouse was used to identify patients and associated clinical notes from 1 January 2005 to 31 December 2017. Using a previously developed natural language processing pipeline, we classified DRE assessment as documented (currently or historically performed), deferred (or suggested as a future examination) and refused.We investigated the quality metric performance, documentation 6 months before treatment and identified patient and clinical factors associated with metric performance.The cohort included 7215 patients with prostate cancer and 426 227 unique clinical notes associated with pretreatment encounters. DREs of 5958 (82.6%) patients were documented and 1257 (17.4%) of patients did not have a DRE documented in the EHR. A total of 3742 (51.9%) patient DREs were documented within 6 months prior to treatment, meeting the quality metric. Patients with private insurance had a higher rate of DRE 6 months prior to starting treatment as compared with Medicaid-based or Medicare-based payors (77.3%vs69.5%, p=0.001). Patients undergoing chemotherapy, radiation therapy or surgery as the first line of treatment were more likely to have a documented DRE 6 months prior to treatment.EHRs contain valuable unstructured information and with NLP, it is feasible to accurately and efficiently identify quality metrics with current documentation clinician workflow.
View details for DOI 10.1136/bmjopen-2018-027182
View details for PubMedID 31324681
Impact of age on intermittent hypoxia in obstructive sleep apnea: a propensity-matched analysis
SLEEP AND BREATHING
2018; 22 (2): 317–22
To determine independent relationship of aging with chronic intermittent hypoxia, we compared hypoxia-related polysomnographic variables of geriatric patients (aged ≥ 65 years) with an apnea-hypopnea index (AHI)-, gender-, body mass index (BMI)-, and neck circumference-matched cohort of non-geriatric patients.The study was conducted using clinical and polysomnographic data of 1280 consecutive patients who underwent complete polysomnographic evaluation for suspected sleep-disordered breathing (SDB) at a single sleep disorder center. A propensity score-matched analysis was performed to obtain matched cohorts of geriatric and non-geriatric patients, which resulted in successful matching of 168 patients from each group.Study groups were comparable for gender (P = 0.999), BMI (P = 0.940), neck circumference (P = 0.969), AHI (P = 0.935), and severity of SDB (P = 0.089). The oximetric variables representing the duration of chronic intermittent hypoxia such as mean (P = 0.001), the longest (P = 0.001) and total apnea durations (P = 0.003), mean (P = 0.001) and the longest hypopnea durations (P = 0.001), and total sleep time with oxygen saturation below 90% (P = 0.008) were significantly higher in the geriatric patients as compared with younger adults. Geriatric patients had significantly lower minimum (P = 0.013) and mean oxygen saturation (P = 0.001) than non-geriatric patients.The study provides evidence that elderly patients exhibit more severe and deeper nocturnal intermittent hypoxia than the younger adults, independent of severity of obstructive sleep apnea, BMI, gender, and neck circumference. Hypoxia-related polysomnographic variables in geriatric patients may in fact reflect a physiological aging process rather than the severity of a SDB.
View details for DOI 10.1007/s11325-017-1560-z
View details for Web of Science ID 000430993000006
View details for PubMedID 28849299
Expanding a radiology lexicon using contextual patterns in radiology reports.
Journal of the American Medical Informatics Association : JAMIA
Distributional semantics algorithms, which learn vector space representations of words and phrases from large corpora, identify related terms based on contextual usage patterns. We hypothesize that distributional semantics can speed up lexicon expansion in a clinical domain, radiology, by unearthing synonyms from the corpus.We apply word2vec, a distributional semantics software package, to the text of radiology notes to identify synonyms for RadLex, a structured lexicon of radiology terms. We stratify performance by term category, term frequency, number of tokens in the term, vector magnitude, and the context window used in vector building.Ranking candidates based on distributional similarity to a target term results in high curation efficiency: on a ranked list of 775 249 terms, >50% of synonyms occurred within the first 25 terms. Synonyms are easier to find if the target term is a phrase rather than a single word, if it occurs at least 100× in the corpus, and if its vector magnitude is between 4 and 5. Some RadLex categories, such as anatomical substances, are easier to identify synonyms for than others.The unstructured text of clinical notes contains a wealth of information about human diseases and treatment patterns. However, searching and retrieving information from clinical notes often suffer due to variations in how similar concepts are described in the text. Biomedical lexicons address this challenge, but are expensive to produce and maintain. Distributional semantics algorithms can assist lexicon curation, saving researchers time and money.
View details for PubMedID 29329435
Distribution of global health measures from routinely collected PROMIS surveys in patients with breast cancer or prostate cancer.
The collection of patient-reported outcomes (PROs) is an emerging priority internationally, guiding clinical care, quality improvement projects and research studies. After the deployment of Patient-Reported Outcomes Measurement Information System (PROMIS) surveys in routine outpatient workflows at an academic cancer center, electronic health record data were used to evaluate survey completion rates and self-reported global health measures across 2 tumor types: breast and prostate cancer.This study retrospectively analyzed 11,657 PROMIS surveys from patients with breast cancer and 4411 surveys from patients with prostate cancer, and it calculated survey completion rates and global physical health (GPH) and global mental health (GMH) scores between 2013 and 2018.A total of 36.6% of eligible patients with breast cancer and 23.7% of patients with prostate cancer completed at least 1 survey, with completion rates lower among black patients for both tumor types (P < .05). The mean T scores (calibrated to a general population mean of 50) for GPH were 48.4 ± 9 for breast cancer and 50.6 ± 9 for prostate cancer, and the GMH scores were 52.7 ± 8 and 52.1 ± 9, respectively. GPH and GMH were frequently lower among ethnic minorities, patients without private health insurance, and those with advanced disease.This analysis provides important baseline data on patient-reported global health in breast and prostate cancer. Demonstrating that PROs can be integrated into clinical workflows, this study shows that supportive efforts may be needed to improve PRO collection and global health endpoints in vulnerable populations.
View details for PubMedID 30512191
An Automated Feature Engineering for Digital Rectal Examination Documentation using Natural Language Processing.
AMIA ... Annual Symposium proceedings. AMIA Symposium
2018; 2018: 288–94
Digital rectal examination (DRE) is considered a quality metric for prostate cancer care. However, much of the DRE related rich information is documented as free-text in clinical narratives. Therefore, we aimed to develop a natural language processing (NLP) pipeline for automatic documentation of DRE in clinical notes using a domain-specific dictionary created by clinical experts and an extended version of the same dictionary learned by clinical notes using distributional semantics algorithms. The proposed pipeline was compared to a baseline NLP algorithm and the results of the proposed pipeline were found superior in terms of precision (0.95) and recall (0.90) for documentation of DRE. We believe the rule-based NLP pipeline enriched with terms learned from the whole corpus can provide accurate and efficient identification of this quality metric.
View details for PubMedID 30815067
Impact of coexistent adenomyosis on outcomes of patients with endometrioid endometrial cancer: a propensity score-matched analysis
2018; 104 (1): 60–65
Despite the common occurrence of adenomyosis in endometrial cancer (EC), there is a paucity and conflict in the literature regarding its impact on outcomes of patients. We sought to compare outcomes of patients with endometrioid type EC with or without adenomyosis.A total of 314 patients were included in the analysis. Patients were divided into 2 groups according to the presence or absence of adenomyosis. Adenomyosis was identified in 79 patients (25.1%). A propensity score-matched comparison (1:1) was carried out to minimize selection biases. The propensity score was developed through multivariable logistic regression model including age, stage, and tumor grade as covariates. After performing propensity score matching, 70 patients from each group were successfully matched. Primary outcome of the study was disease-free survival (DFS), and the secondary outcomes were overall survival (OS) and disease-specific survival (DSS).Median follow-up time was 61 months for the adenomyosis positive group and 76 months for the adenomyosis negative group. There were no statistically significant differences in 3- and 5-year DFS, OS, and DSS rates between the 2 groups. Five-year DFS was 92% vs 88% (hazard ratio [HR] 1.54 [0.56-4.27]; p = 0.404), 5-year OS was 94% vs 92% (HR 1.60 [0.49-5.26]; p = 0.441), and 5-year DSS was 94% vs 96% (HR 2.51 [0.46-13.71]; p = 0.290) for patients with and without adenomyosis, respectively.Coexistent adenomyosis in EC is not a prognostic factor and does not impact survival outcomes.
View details for DOI 10.5301/tj.5000698
View details for Web of Science ID 000434682400009
View details for PubMedID 29192745
Can Statistical Machine Learning Algorithms Help for Classification of Obstructive Sleep Apnea Severity to Optimal Utilization of Polysomnography Resources?
Methods of information in medicine
2017; 56 (4)
The goal of this study is to evaluate the results of machine learning methods for the classification of OSA severity of patients with suspected sleep disorder breathing as normal, mild, moderate and severe based on non-polysomnographic variables: 1) clinical data, 2) symptoms and 3) physical examination.In order to produce classification models for OSA severity, five different machine learning methods (Bayesian network, Decision Tree, Random Forest, Neural Networks and Logistic Regression) were trained while relevant variables and their relationships were derived empirically from observed data. Each model was trained and evaluated using 10-fold cross-validation and to evaluate classification performances of all methods, true positive rate (TPR), false positive rate (FPR), Positive Predictive Value (PPV), F measure and Area Under Receiver Operating Characteristics curve (ROC-AUC) were used.Results of 10-fold cross validated tests with different variable settings promisingly indicated that the OSA severity of suspected OSA patients can be classified, using non-polysomnographic features, with 0.71 true positive rate as the highest and, 0.15 false positive rate as the lowest, respectively. Moreover, the test results of different variables settings revealed that the accuracy of the classification models was significantly improved when physical examination variables were added to the model.Study results showed that machine learning methods can be used to estimate the probabilities of no, mild, moderate, and severe obstructive sleep apnea and such approaches may improve accurate initial OSA screening and help referring only the suspected moderate or severe OSA patients to sleep laboratories for the expensive tests.
View details for DOI 10.3414/ME16-01-0084
View details for PubMedID 28590499
Usability Study of RSNA Radiology Reporting Template Library.
Studies in health technology and informatics
2017; 245: 1325
This study provides insights that could help to improve the Radiological Society of North America (RSNA) Reporting Template Digital Library, based on a usability evaluation. The results show that most users have been satisfied with the website. The general comments for the library are positive, although the participants suggested quite a few areas to improve. About 40% are returning visitors which means people often come back to the website.
View details for PubMedID 29295406
Estimation of cardiovascular disease from polysomnographic parameters in sleep-disordered breathing
EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY
2016; 273 (12): 4585-4593
We aimed to illustrate the causal relationships between cardiovascular diseases (CVDs) and various polysomnographic variables, and to develop a CVD estimation model from these variables in a population referred for assessment of possible sleep-disordered breathing (SDB). Clinical and polysomnographic data of 1162 consecutive patients with suspected SDB whose comorbidity status was known, were reviewed, retrospectively. Variable selection was performed in two steps using univariate analysis and tenfold cross validation information gain analysis. The resulting set of variables with an average merit value (m) of >0.005 was considered to be causal factors contributing to the CVDs, and used in Bayesian network models for providing estimations. Of the 1162 patients, 234 had CVDs (20.1 %). In total, 28 parameters were evaluated for variable selection. Of those, 19 were found to be associated with CVDs. Age was the most effective attribute in estimating CVD (m = 0.051), followed by total sleep time with oxygen saturation <90 % (m = 0.021). Some other important variables were apnea-hypopnea index during non-rapid eye movement (m = 0.018), lowest oxygen saturation (m = 0.018), body mass index (m = 0.016), total apnea duration (m = 0.014), mean apnea duration (m = 0.014), longest apnea duration (m = 0.013), and severity of SDB (m = 0.012). The modeling process resulted in a final model, with 76.9 % sensitivity, 96.2 % specificity, and 92.6 % negative predictive value, consisting of all selected variables. The study provides evidence that the estimation of CVDs from polysomnographic parameters is possible with high predictive performance using Bayesian network analysis.
View details for DOI 10.1007/s00405-016-4176-1
View details for Web of Science ID 000387700400066
View details for PubMedID 27363409
Using automatically extracted information from mammography reports for decision-support.
Journal of biomedical informatics
2016; 62: 224-231
To evaluate a system we developed that connects natural language processing (NLP) for information extraction from narrative text mammography reports with a Bayesian network for decision-support about breast cancer diagnosis. The ultimate goal of this system is to provide decision support as part of the workflow of producing the radiology report.We built a system that uses an NLP information extraction system (which extract BI-RADS descriptors and clinical information from mammography reports) to provide the necessary inputs to a Bayesian network (BN) decision support system (DSS) that estimates lesion malignancy from BI-RADS descriptors. We used this integrated system to predict diagnosis of breast cancer from radiology text reports and evaluated it with a reference standard of 300 mammography reports. We collected two different outputs from the DSS: (1) the probability of malignancy and (2) the BI-RADS final assessment category. Since NLP may produce imperfect inputs to the DSS, we compared the difference between using perfect ("reference standard") structured inputs to the DSS ("RS-DSS") vs NLP-derived inputs ("NLP-DSS") on the output of the DSS using the concordance correlation coefficient. We measured the classification accuracy of the BI-RADS final assessment category when using NLP-DSS, compared with the ground truth category established by the radiologist.The NLP-DSS and RS-DSS had closely matched probabilities, with a mean paired difference of 0.004±0.025. The concordance correlation of these paired measures was 0.95. The accuracy of the NLP-DSS to predict the correct BI-RADS final assessment category was 97.58%.The accuracy of the information extracted from mammography reports using the NLP system was sufficient to provide accurate DSS results. We believe our system could ultimately reduce the variation in practice in mammography related to assessment of malignant lesions and improve management decisions.
View details for DOI 10.1016/j.jbi.2016.07.001
View details for PubMedID 27388877
Automatic abstraction of imaging observations with their characteristics from mammography reports.
Journal of the American Medical Informatics Association
2015; 22 (e1): e81-92
Radiology reports are usually narrative, unstructured text, a format which hinders the ability to input report contents into decision support systems. In addition, reports often describe multiple lesions, and it is challenging to automatically extract information on each lesion and its relationships to characteristics, anatomic locations, and other information that describes it. The goal of our work is to develop natural language processing (NLP) methods to recognize each lesion in free-text mammography reports and to extract its corresponding relationships, producing a complete information frame for each lesion.We built an NLP information extraction pipeline in the General Architecture for Text Engineering (GATE) NLP toolkit. Sequential processing modules are executed, producing an output information frame required for a mammography decision support system. Each lesion described in the report is identified by linking it with its anatomic location in the breast. In order to evaluate our system, we selected 300 mammography reports from a hospital report database.The gold standard contained 797 lesions, and our system detected 815 lesions (780 true positives, 35 false positives, and 17 false negatives). The precision of detecting all the imaging observations with their modifiers was 94.9, recall was 90.9, and the F measure was 92.8.Our NLP system extracts each imaging observation and its characteristics from mammography reports. Although our application focuses on the domain of mammography, we believe our approach can generalize to other domains and may narrow the gap between unstructured clinical report text and structured information extraction needed for data mining and decision support.
View details for DOI 10.1136/amiajnl-2014-003009
View details for PubMedID 25352567
Automated detection of ambiguity in BI-RADS assessment categories in mammography reports.
Studies in health technology and informatics
2014; 197: 35-39
An unsolved challenge in biomedical natural language processing (NLP) is detecting ambiguities in the reports that can help physicians to improve report clarity. Our goal was to develop NLP methods to tackle the challenges of identifying ambiguous descriptions of the laterality of BI-RADS Final Assessment Categories in mammography radiology reports. We developed a text processing system that uses a BI-RADS ontology we built as a knowledge source for automatic annotation of the entities in mammography reports relevant to this problem. We used the GATE NLP toolkit and developed customized processing resources for report segmentation, named entity recognition, and detection of mismatches between BI-RADS Final Assessment Categories and mammogram laterality. Our system detected 55 mismatched cases in 190 reports and the accuracy rate was 81%. We conclude that such NLP techniques can detect ambiguities in mammography reports and may reduce discrepancy and variability in reporting.
View details for PubMedID 24743074
Annotation for Information Extraction from Mammography Reports
INFORMATICS, MANAGEMENT AND TECHNOLOGY IN HEALTHCARE
2013; 190: 183-185
Inter and intra-observer variability in mammographic interpretation is a challenging problem, and decision support systems (DSS) may be helpful to reduce variation in practice. Since radiology reports are created as unstructured text reports, Natural language processing (NLP) techniques are needed to extract structured information from reports in order to provide the inputs to DSS. Before creating NLP systems, producing high quality annotated data set is essential. The goal of this project is to develop an annotation schema to guide the information extraction tasks needed from free-text mammography reports.
View details for DOI 10.3233/978-1-61499-276-9-183
View details for Web of Science ID 000341032900053
View details for PubMedID 23823416
An Open-Standards Grammar for Outline-Style Radiology Report Templates
JOURNAL OF DIGITAL IMAGING
2012; 25 (3): 359-364
Structured reporting uses consistent ordering of results and standardized terminology to improve the quality and reduce the complexity of radiology reports. We sought to define a generalized approach for radiology reporting that produces flexible outline-style reports, accommodates structured information and named reporting elements, allows reporting terms to be linked to controlled vocabularies, uses existing informatics standards, and allows structured report data to be extracted readily. We applied the Regular Language for XML-Next Generation (RELAX NG) schema language to create templates for 110 reporting templates created as part of the Radiological Society of North America reporting initiative. We evaluated how well this approach addressed the project's goals. The RELAX NG schema language expressed the cardinality and hierarchical relationships of reporting concepts, and allowed reporting elements to be mapped to terms in controlled medical vocabularies, such as RadLex®, Systematized Nomenclature of Medicine Clinical Terms®, and Logical Observation Identifiers Names and Codes®. The approach provided extensibility and accommodated the addition of new features. Overall, the approach has proven to be useful and will form the basis for a supplement to the Digital Imaging and Communication in Medicine Standard.
View details for DOI 10.1007/s10278-012-9456-8
View details for Web of Science ID 000304109700007
View details for PubMedID 22258732
View details for PubMedCentralID PMC3348985