All Publications

  • Automatic inference of BI-RADS final assessment categories from narrative mammography report findings Journal of Biomedical Informatics Banerjee, I., Bozkurt, S., Alkim, E., Sagreiya, H., Kurian, A. W., Rubin, D. L. 2019
  • Comparison of Orthogonal NLP Methods for Clinical Phenotyping and Assessment of Bone Scan Utilization among Prostate Cancer Patients. Journal of biomedical informatics Coquet, J., Bozkurt, S., Kan, K. M., Ferrari, M. K., Blayney, D. W., Brooks, J. D., Hernandez-Boussard, T. 2019: 103184


    Clinical care guidelines recommend that newly diagnosed prostate cancer patients at high risk for metastatic spread receive a bone scan prior to treatment and that low risk patients not receive it. The objective was to develop an automated pipeline to interrogate heterogeneous data to evaluate the use of bone scans using a two different Natural Language Processing (NLP) approaches.Our cohort was divided into risk groups based on Electronic Health Records (EHR). Information on bone scan utilization was identified in both structured data and free text from clinical notes. Our pipeline annotated sentences with a combination of a rule-based method using the ConText algorithm (a generalization of NegEx) and a Convolutional Neural Network (CNN) method using word2vec to produce word embeddings.A total of 5,500 patients and 369,764 notes were included in the study. A total of 39% of patients were high-risk and 73% of these received a bone scan; of the 18% low risk patients, 10% received one. The accuracy of CNN model outperformed the rule-based model one (F-measure = 0.918 and 0.897 respectively). We demonstrate a combination of both models could maximize precision or recall, based on the study question.Using structured data, we accurately classified patients' cancer risk group, identified bone scan documentation with two NLP methods, and evaluated guideline adherence. Our pipeline can be used to provide concrete feedback to clinicians and guide treatment decisions.

    View details for PubMedID 31014980

  • Automatic Inference of BI-RADS Final Assessment Categories from Narrative Mammography Report Findings. Journal of biomedical informatics Banerjee, I., Bozkurt, S., Alkim, E., Sagreiya, H., Kurian, A. W., Rubin, D. L. 2019: 103137


    We propose an efficient natural language processing approach for inferring the BI-RADS final assessment categories by analyzing only the mammogram findings reported by the mammographer in narrative form. The proposed hybrid method integrates semantic term embedding with distributional semantics, producing a context-aware vector representation of unstructured mammography reports. A large corpus of unannotated mammography reports (300,000) was used to learn the context of the key-terms using a distributional semantics approach, and the trained model was applied to generate context-aware vector representations of the reports annotated with BI-RADS category(22,091). The vectorized reports were utilized to train a supervised classifier to derive the BI-RADS assessment class. Even though the majority of the proposed embedding pipeline is unsupervised, the classifier was able to recognize substantial semantic information for deriving the BI-RADS categorization not only on a holdout internal testset and also on an external validation set (1,900 reports). Our proposed method outperforms a recently published domain-specific rule-based system and could be relevant for evaluating concordance between radiologists. With minimal requirement for task specific customization, the proposed method can be easily transferable to a different domain to support large scale text mining or derivation of patient phenotype.

    View details for PubMedID 30807833

  • Impact of age on intermittent hypoxia in obstructive sleep apnea: a propensity-matched analysis SLEEP AND BREATHING Bostanci, A., Bozkurt, S., Turhan, M. 2018; 22 (2): 317–22


    To determine independent relationship of aging with chronic intermittent hypoxia, we compared hypoxia-related polysomnographic variables of geriatric patients (aged ≥ 65 years) with an apnea-hypopnea index (AHI)-, gender-, body mass index (BMI)-, and neck circumference-matched cohort of non-geriatric patients.The study was conducted using clinical and polysomnographic data of 1280 consecutive patients who underwent complete polysomnographic evaluation for suspected sleep-disordered breathing (SDB) at a single sleep disorder center. A propensity score-matched analysis was performed to obtain matched cohorts of geriatric and non-geriatric patients, which resulted in successful matching of 168 patients from each group.Study groups were comparable for gender (P = 0.999), BMI (P = 0.940), neck circumference (P = 0.969), AHI (P = 0.935), and severity of SDB (P = 0.089). The oximetric variables representing the duration of chronic intermittent hypoxia such as mean (P = 0.001), the longest (P = 0.001) and total apnea durations (P = 0.003), mean (P = 0.001) and the longest hypopnea durations (P = 0.001), and total sleep time with oxygen saturation below 90% (P = 0.008) were significantly higher in the geriatric patients as compared with younger adults. Geriatric patients had significantly lower minimum (P = 0.013) and mean oxygen saturation (P = 0.001) than non-geriatric patients.The study provides evidence that elderly patients exhibit more severe and deeper nocturnal intermittent hypoxia than the younger adults, independent of severity of obstructive sleep apnea, BMI, gender, and neck circumference. Hypoxia-related polysomnographic variables in geriatric patients may in fact reflect a physiological aging process rather than the severity of a SDB.

    View details for DOI 10.1007/s11325-017-1560-z

    View details for Web of Science ID 000430993000006

    View details for PubMedID 28849299

  • Expanding a radiology lexicon using contextual patterns in radiology reports. Journal of the American Medical Informatics Association : JAMIA Percha, B., Zhang, Y., Bozkurt, S., Rubin, D., Altman, R. B., Langlotz, C. P. 2018


    Distributional semantics algorithms, which learn vector space representations of words and phrases from large corpora, identify related terms based on contextual usage patterns. We hypothesize that distributional semantics can speed up lexicon expansion in a clinical domain, radiology, by unearthing synonyms from the corpus.We apply word2vec, a distributional semantics software package, to the text of radiology notes to identify synonyms for RadLex, a structured lexicon of radiology terms. We stratify performance by term category, term frequency, number of tokens in the term, vector magnitude, and the context window used in vector building.Ranking candidates based on distributional similarity to a target term results in high curation efficiency: on a ranked list of 775 249 terms, >50% of synonyms occurred within the first 25 terms. Synonyms are easier to find if the target term is a phrase rather than a single word, if it occurs at least 100× in the corpus, and if its vector magnitude is between 4 and 5. Some RadLex categories, such as anatomical substances, are easier to identify synonyms for than others.The unstructured text of clinical notes contains a wealth of information about human diseases and treatment patterns. However, searching and retrieving information from clinical notes often suffer due to variations in how similar concepts are described in the text. Biomedical lexicons address this challenge, but are expensive to produce and maintain. Distributional semantics algorithms can assist lexicon curation, saving researchers time and money.

    View details for PubMedID 29329435

  • Distribution of global health measures from routinely collected PROMIS surveys in patients with breast cancer or prostate cancer. Cancer Seneviratne, M. G., Bozkurt, S., Patel, M. I., Seto, T., Brooks, J. D., Blayney, D. W., Kurian, A. W., Hernandez-Boussard, T. 2018


    The collection of patient-reported outcomes (PROs) is an emerging priority internationally, guiding clinical care, quality improvement projects and research studies. After the deployment of Patient-Reported Outcomes Measurement Information System (PROMIS) surveys in routine outpatient workflows at an academic cancer center, electronic health record data were used to evaluate survey completion rates and self-reported global health measures across 2 tumor types: breast and prostate cancer.This study retrospectively analyzed 11,657 PROMIS surveys from patients with breast cancer and 4411 surveys from patients with prostate cancer, and it calculated survey completion rates and global physical health (GPH) and global mental health (GMH) scores between 2013 and 2018.A total of 36.6% of eligible patients with breast cancer and 23.7% of patients with prostate cancer completed at least 1 survey, with completion rates lower among black patients for both tumor types (P < .05). The mean T scores (calibrated to a general population mean of 50) for GPH were 48.4 ± 9 for breast cancer and 50.6 ± 9 for prostate cancer, and the GMH scores were 52.7 ± 8 and 52.1 ± 9, respectively. GPH and GMH were frequently lower among ethnic minorities, patients without private health insurance, and those with advanced disease.This analysis provides important baseline data on patient-reported global health in breast and prostate cancer. Demonstrating that PROs can be integrated into clinical workflows, this study shows that supportive efforts may be needed to improve PRO collection and global health endpoints in vulnerable populations.

    View details for PubMedID 30512191

  • An Automated Feature Engineering for Digital Rectal Examination Documentation using Natural Language Processing. AMIA ... Annual Symposium proceedings. AMIA Symposium Bozkurt, S., Park, J. I., Kan, K. M., Ferrari, M., Rubin, D. L., Brooks, J. D., Hernandez-Boussard, T. 2018; 2018: 288–94


    Digital rectal examination (DRE) is considered a quality metric for prostate cancer care. However, much of the DRE related rich information is documented as free-text in clinical narratives. Therefore, we aimed to develop a natural language processing (NLP) pipeline for automatic documentation of DRE in clinical notes using a domain-specific dictionary created by clinical experts and an extended version of the same dictionary learned by clinical notes using distributional semantics algorithms. The proposed pipeline was compared to a baseline NLP algorithm and the results of the proposed pipeline were found superior in terms of precision (0.95) and recall (0.90) for documentation of DRE. We believe the rule-based NLP pipeline enriched with terms learned from the whole corpus can provide accurate and efficient identification of this quality metric.

    View details for PubMedID 30815067

  • Impact of coexistent adenomyosis on outcomes of patients with endometrioid endometrial cancer: a propensity score-matched analysis TUMORI J Aydin, H., Toptas, T., Bozkurt, S., Pestereli, E., Simsek, T. 2018; 104 (1): 60–65


    Despite the common occurrence of adenomyosis in endometrial cancer (EC), there is a paucity and conflict in the literature regarding its impact on outcomes of patients. We sought to compare outcomes of patients with endometrioid type EC with or without adenomyosis.A total of 314 patients were included in the analysis. Patients were divided into 2 groups according to the presence or absence of adenomyosis. Adenomyosis was identified in 79 patients (25.1%). A propensity score-matched comparison (1:1) was carried out to minimize selection biases. The propensity score was developed through multivariable logistic regression model including age, stage, and tumor grade as covariates. After performing propensity score matching, 70 patients from each group were successfully matched. Primary outcome of the study was disease-free survival (DFS), and the secondary outcomes were overall survival (OS) and disease-specific survival (DSS).Median follow-up time was 61 months for the adenomyosis positive group and 76 months for the adenomyosis negative group. There were no statistically significant differences in 3- and 5-year DFS, OS, and DSS rates between the 2 groups. Five-year DFS was 92% vs 88% (hazard ratio [HR] 1.54 [0.56-4.27]; p = 0.404), 5-year OS was 94% vs 92% (HR 1.60 [0.49-5.26]; p = 0.441), and 5-year DSS was 94% vs 96% (HR 2.51 [0.46-13.71]; p = 0.290) for patients with and without adenomyosis, respectively.Coexistent adenomyosis in EC is not a prognostic factor and does not impact survival outcomes.

    View details for DOI 10.5301/tj.5000698

    View details for Web of Science ID 000434682400009

    View details for PubMedID 29192745

  • Can Statistical Machine Learning Algorithms Help for Classification of Obstructive Sleep Apnea Severity to Optimal Utilization of Polysomnography Resources? Methods of information in medicine Bozkurt, S., Bostanci, A., Turhan, M. 2017; 56 (4)


    The goal of this study is to evaluate the results of machine learning methods for the classification of OSA severity of patients with suspected sleep disorder breathing as normal, mild, moderate and severe based on non-polysomnographic variables: 1) clinical data, 2) symptoms and 3) physical examination.In order to produce classification models for OSA severity, five different machine learning methods (Bayesian network, Decision Tree, Random Forest, Neural Networks and Logistic Regression) were trained while relevant variables and their relationships were derived empirically from observed data. Each model was trained and evaluated using 10-fold cross-validation and to evaluate classification performances of all methods, true positive rate (TPR), false positive rate (FPR), Positive Predictive Value (PPV), F measure and Area Under Receiver Operating Characteristics curve (ROC-AUC) were used.Results of 10-fold cross validated tests with different variable settings promisingly indicated that the OSA severity of suspected OSA patients can be classified, using non-polysomnographic features, with 0.71 true positive rate as the highest and, 0.15 false positive rate as the lowest, respectively. Moreover, the test results of different variables settings revealed that the accuracy of the classification models was significantly improved when physical examination variables were added to the model.Study results showed that machine learning methods can be used to estimate the probabilities of no, mild, moderate, and severe obstructive sleep apnea and such approaches may improve accurate initial OSA screening and help referring only the suspected moderate or severe OSA patients to sleep laboratories for the expensive tests.

    View details for DOI 10.3414/ME16-01-0084

    View details for PubMedID 28590499

  • Usability Study of RSNA Radiology Reporting Template Library. Studies in health technology and informatics Hong, Y., Zhu, Y., Bozkurt, S., Zhang, J., Kahn, C. E. 2017; 245: 1325


    This study provides insights that could help to improve the Radiological Society of North America (RSNA) Reporting Template Digital Library, based on a usability evaluation. The results show that most users have been satisfied with the website. The general comments for the library are positive, although the participants suggested quite a few areas to improve. About 40% are returning visitors which means people often come back to the website.

    View details for PubMedID 29295406

  • Estimation of cardiovascular disease from polysomnographic parameters in sleep-disordered breathing EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY Turhan, M., Bostanci, A., Bozkurt, S. 2016; 273 (12): 4585-4593


    We aimed to illustrate the causal relationships between cardiovascular diseases (CVDs) and various polysomnographic variables, and to develop a CVD estimation model from these variables in a population referred for assessment of possible sleep-disordered breathing (SDB). Clinical and polysomnographic data of 1162 consecutive patients with suspected SDB whose comorbidity status was known, were reviewed, retrospectively. Variable selection was performed in two steps using univariate analysis and tenfold cross validation information gain analysis. The resulting set of variables with an average merit value (m) of >0.005 was considered to be causal factors contributing to the CVDs, and used in Bayesian network models for providing estimations. Of the 1162 patients, 234 had CVDs (20.1 %). In total, 28 parameters were evaluated for variable selection. Of those, 19 were found to be associated with CVDs. Age was the most effective attribute in estimating CVD (m = 0.051), followed by total sleep time with oxygen saturation <90 % (m = 0.021). Some other important variables were apnea-hypopnea index during non-rapid eye movement (m = 0.018), lowest oxygen saturation (m = 0.018), body mass index (m = 0.016), total apnea duration (m = 0.014), mean apnea duration (m = 0.014), longest apnea duration (m = 0.013), and severity of SDB (m = 0.012). The modeling process resulted in a final model, with 76.9 % sensitivity, 96.2 % specificity, and 92.6 % negative predictive value, consisting of all selected variables. The study provides evidence that the estimation of CVDs from polysomnographic parameters is possible with high predictive performance using Bayesian network analysis.

    View details for DOI 10.1007/s00405-016-4176-1

    View details for Web of Science ID 000387700400066

    View details for PubMedID 27363409

  • Using automatically extracted information from mammography reports for decision-support. Journal of biomedical informatics Bozkurt, S., Gimenez, F., Burnside, E. S., Gulkesen, K. H., Rubin, D. L. 2016; 62: 224-231


    To evaluate a system we developed that connects natural language processing (NLP) for information extraction from narrative text mammography reports with a Bayesian network for decision-support about breast cancer diagnosis. The ultimate goal of this system is to provide decision support as part of the workflow of producing the radiology report.We built a system that uses an NLP information extraction system (which extract BI-RADS descriptors and clinical information from mammography reports) to provide the necessary inputs to a Bayesian network (BN) decision support system (DSS) that estimates lesion malignancy from BI-RADS descriptors. We used this integrated system to predict diagnosis of breast cancer from radiology text reports and evaluated it with a reference standard of 300 mammography reports. We collected two different outputs from the DSS: (1) the probability of malignancy and (2) the BI-RADS final assessment category. Since NLP may produce imperfect inputs to the DSS, we compared the difference between using perfect ("reference standard") structured inputs to the DSS ("RS-DSS") vs NLP-derived inputs ("NLP-DSS") on the output of the DSS using the concordance correlation coefficient. We measured the classification accuracy of the BI-RADS final assessment category when using NLP-DSS, compared with the ground truth category established by the radiologist.The NLP-DSS and RS-DSS had closely matched probabilities, with a mean paired difference of 0.004±0.025. The concordance correlation of these paired measures was 0.95. The accuracy of the NLP-DSS to predict the correct BI-RADS final assessment category was 97.58%.The accuracy of the information extracted from mammography reports using the NLP system was sufficient to provide accurate DSS results. We believe our system could ultimately reduce the variation in practice in mammography related to assessment of malignant lesions and improve management decisions.

    View details for DOI 10.1016/j.jbi.2016.07.001

    View details for PubMedID 27388877

  • Automatic abstraction of imaging observations with their characteristics from mammography reports. Journal of the American Medical Informatics Association Bozkurt, S., Lipson, J. A., Senol, U., Rubin, D. L., Bulu, H. 2015; 22 (e1): e81-92


    Radiology reports are usually narrative, unstructured text, a format which hinders the ability to input report contents into decision support systems. In addition, reports often describe multiple lesions, and it is challenging to automatically extract information on each lesion and its relationships to characteristics, anatomic locations, and other information that describes it. The goal of our work is to develop natural language processing (NLP) methods to recognize each lesion in free-text mammography reports and to extract its corresponding relationships, producing a complete information frame for each lesion.We built an NLP information extraction pipeline in the General Architecture for Text Engineering (GATE) NLP toolkit. Sequential processing modules are executed, producing an output information frame required for a mammography decision support system. Each lesion described in the report is identified by linking it with its anatomic location in the breast. In order to evaluate our system, we selected 300 mammography reports from a hospital report database.The gold standard contained 797 lesions, and our system detected 815 lesions (780 true positives, 35 false positives, and 17 false negatives). The precision of detecting all the imaging observations with their modifiers was 94.9, recall was 90.9, and the F measure was 92.8.Our NLP system extracts each imaging observation and its characteristics from mammography reports. Although our application focuses on the domain of mammography, we believe our approach can generalize to other domains and may narrow the gap between unstructured clinical report text and structured information extraction needed for data mining and decision support.

    View details for DOI 10.1136/amiajnl-2014-003009

    View details for PubMedID 25352567

  • Automated detection of ambiguity in BI-RADS assessment categories in mammography reports. Studies in health technology and informatics Bozkurt, S., Rubin, D. 2014; 197: 35-39


    An unsolved challenge in biomedical natural language processing (NLP) is detecting ambiguities in the reports that can help physicians to improve report clarity. Our goal was to develop NLP methods to tackle the challenges of identifying ambiguous descriptions of the laterality of BI-RADS Final Assessment Categories in mammography radiology reports. We developed a text processing system that uses a BI-RADS ontology we built as a knowledge source for automatic annotation of the entities in mammography reports relevant to this problem. We used the GATE NLP toolkit and developed customized processing resources for report segmentation, named entity recognition, and detection of mismatches between BI-RADS Final Assessment Categories and mammogram laterality. Our system detected 55 mismatched cases in 190 reports and the accuracy rate was 81%. We conclude that such NLP techniques can detect ambiguities in mammography reports and may reduce discrepancy and variability in reporting.

    View details for PubMedID 24743074

  • Annotation for Information Extraction from Mammography Reports INFORMATICS, MANAGEMENT AND TECHNOLOGY IN HEALTHCARE Bozkurt, S., Gulkesen, K. H., Rubin, D. 2013; 190: 183-185


    Inter and intra-observer variability in mammographic interpretation is a challenging problem, and decision support systems (DSS) may be helpful to reduce variation in practice. Since radiology reports are created as unstructured text reports, Natural language processing (NLP) techniques are needed to extract structured information from reports in order to provide the inputs to DSS. Before creating NLP systems, producing high quality annotated data set is essential. The goal of this project is to develop an annotation schema to guide the information extraction tasks needed from free-text mammography reports.

    View details for DOI 10.3233/978-1-61499-276-9-183

    View details for Web of Science ID 000341032900053

    View details for PubMedID 23823416

  • An Open-Standards Grammar for Outline-Style Radiology Report Templates JOURNAL OF DIGITAL IMAGING Bozkurt, S., Kahn, C. E. 2012; 25 (3): 359-364


    Structured reporting uses consistent ordering of results and standardized terminology to improve the quality and reduce the complexity of radiology reports. We sought to define a generalized approach for radiology reporting that produces flexible outline-style reports, accommodates structured information and named reporting elements, allows reporting terms to be linked to controlled vocabularies, uses existing informatics standards, and allows structured report data to be extracted readily. We applied the Regular Language for XML-Next Generation (RELAX NG) schema language to create templates for 110 reporting templates created as part of the Radiological Society of North America reporting initiative. We evaluated how well this approach addressed the project's goals. The RELAX NG schema language expressed the cardinality and hierarchical relationships of reporting concepts, and allowed reporting elements to be mapped to terms in controlled medical vocabularies, such as RadLex®, Systematized Nomenclature of Medicine Clinical Terms®, and Logical Observation Identifiers Names and Codes®. The approach provided extensibility and accommodated the addition of new features. Overall, the approach has proven to be useful and will form the basis for a supplement to the Digital Imaging and Communication in Medicine Standard.

    View details for DOI 10.1007/s10278-012-9456-8

    View details for Web of Science ID 000304109700007

    View details for PubMedID 22258732

    View details for PubMedCentralID PMC3348985