Eun Kyoung (Amy) Hong
Assistant Professor of Radiology (Thoracic Imaging)
Clinical Focus
- Diagnostic Radiology
Academic Appointments
-
Assistant Professor - University Medical Line, Radiology
Professional Education
-
Fellowship, Brigham & Women’s Hospital (2025)
-
Residency: Seoul National University (2019)
-
Internship: Seoul National University (2015)
-
Medical Education: Seoul National University School of Medicine (2014) South Korea
All Publications
-
Temperature Setting of a Multimodal Generative Artificial Intelligence (AI) Model: Association With Accuracy and Quality of AI-Generated Chest Radiograph Reports.
AJR. American journal of roentgenology
2025
View details for DOI 10.2214/AJR.25.33704
View details for PubMedID 41159788
-
Multimodal Generative Artificial Intelligence Model for Creating Radiology Reports for Chest Radiographs in Patients Undergoing Tuberculosis Screening.
AJR. American journal of roentgenology
2025: 1-9
Abstract
BACKGROUND. Chest radiographs play a crucial role in tuberculosis screening in high-prevalence regions, although widespread radiographic screening requires expertise that may be unavailable in settings with limited medical resources. OBJECTIVE. The purpose of this study was to evaluate a multimodal generative artificial intelligence (AI) model for detecting tuberculosis-associated abnormalities on chest radiography in patients undergoing tuberculosis screening. METHODS. This retrospective study evaluated 800 chest radiographs obtained from two public datasets originating from tuberculosis screening programs. A generative AI model was used to create free-text reports for the radiographs. AI-generated reports were classified in terms of presence versus absence and laterality of tuberculosis-related abnormalities. Two radiologists independently reviewed the radiographs for tuberculosis presence and laterality in separate sessions, without and with use of AI-generated reports, and recorded if they would accept the report without modification. Two additional radiologists reviewed radiographs and clinical readings from the datasets to determine the reference standard. RESULTS. By the reference standard, 378 of 800 radiographs were positive for tuberculosis-related abnormalities. For detection of tuberculosis-related abnormalities, sensitivity, specificity, and accuracy were 95.2%, 86.7%, and 90.8% for AI-generated reports; 93.1%, 93.6%, and 93.4% for reader 1 without AI-generated reports; 93.1%, 95.0%, and 94.1% for reader 1 with AI-generated reports; 95.8%, 87.2%, and 91.3% for reader 2 without AI-generated reports; and 95.8%, 91.5%, and 93.5% for reader 2 with AI-generated reports. Accuracy was significantly lower for AI-generated reports than for both readers alone (p < .001), but significantly higher with than without AI-generated reports for one reader (reader 1: p = .47; reader 2: p = .03). Localization performance was significantly lower (p < .001) for AI-generated reports (63.3%) than for reader 1 (79.8%) and reader 2 (77.9%) without AI-generated reports and did not significantly change for either reader with AI-generated reports (reader 1: 78.7%, p = .71; reader 2: 81.5%, p = .23). Among normal and abnormal radiographs, reader 1 accepted 91.7% and 52.4%, whereas reader 2 accepted 83.2% and 37.0%, respectively, of AI-generated reports. CONCLUSION. Although AI-generated reports may augment radiologists' diagnostic assessments, the current model requires human oversight, given inferior standalone performance. CLINICAL IMPACT. The generative AI model could have potential application to aid tuberculosis screening programs in medically underserved regions, although technical improvements remain required.
View details for DOI 10.2214/AJR.25.33059
View details for PubMedID 40600508
-
Radiologist Interaction with Artificial Intelligence-Generated Preliminary Reports: A Longitudinal Multireader Study.
Journal of the American College of Radiology : JACR
2025
Abstract
To investigate the integration of multimodal AI-generated reports into radiology workflow over time, focusing on their impact on efficiency, acceptability, and report quality.A multicase, multireader study involved 756 publicly available chest radiographs interpreted by five radiologists using preliminary reports generated by a radiology-specific multimodal AI model, divided into 7 sequential batches of 108 radiographs each. Two thoracic radiologists assessed the final reports using RADPEER criteria for agreement and 5-point Likert scale for quality. Reading times, rate of acceptance without modification, agreement, and quality scores were measured, with statistical analyses evaluating trends across seven sequential batches.Radiologists' reading times for chest radiographs decreased from 25.8 seconds in batch 1 to 19.3 seconds in batch 7 (P < .001). Acceptability increased from 54.6% to 60.2% (P < .001), with normal chest radiographs demonstrating high rates (68.9%) compared with abnormal chest radiographs (52.6%; P < .001). Median agreement and quality scores remained stable for normal chest radiographs but varied significantly for abnormal chest radiographs (all P < .05).The introduction of AI-generated reports improved efficiency of chest radiograph interpretation, and acceptability increased over time. However, agreement and quality scores showed variability, particularly in abnormal cases, emphasizing the need for oversight in the interpretation of complex chest radiographs.
View details for DOI 10.1016/j.jacr.2025.09.015
View details for PubMedID 40983179
-
Diagnostic Accuracy and Clinical Value of a Domain-specific Multimodal Generative AI Model for Chest Radiograph Report Generation.
Radiology
2025; 314 (3): e241476
Abstract
Background Generative artificial intelligence (AI) is anticipated to alter radiology workflows, requiring a clinical value assessment for frequent examinations like chest radiograph interpretation. Purpose To develop and evaluate the diagnostic accuracy and clinical value of a domain-specific multimodal generative AI model for providing preliminary interpretations of chest radiographs. Materials and Methods For training, consecutive radiograph-report pairs from frontal chest radiography were retrospectively collected from 42 hospitals (2005-2023). The trained domain-specific AI model generated radiology reports for the radiographs. The test set included public datasets (PadChest, Open-i, VinDr-CXR, and MIMIC-CXR-JPG) and radiographs excluded from training. The sensitivity and specificity of the model-generated reports for 13 radiographic findings, compared with radiologist annotations (reference standard), were calculated (with 95% CIs). Four radiologists evaluated the subjective quality of the reports in terms of acceptability, agreement score, quality score, and comparative ranking of reports from (a) the domain-specific AI model, (b) radiologists, and (c) a general-purpose large language model (GPT-4Vision). Acceptability was defined as whether the radiologist would endorse the report as their own without changes. Agreement scores from 1 (clinically significant discrepancy) to 5 (complete agreement) were assigned using RADPEER; quality scores were on a 5-point Likert scale from 1 (very poor) to 5 (excellent). Results A total of 8 838 719 radiograph-report pairs (training) and 2145 radiographs (testing) were included (anonymized with respect to sex and gender). Reports generated by the domain-specific AI model demonstrated high sensitivity for detecting two critical radiographic findings: 95.3% (181 of 190) for pneumothorax and 92.6% (138 of 149) for subcutaneous emphysema. Acceptance rate, evaluated by four radiologists, was 70.5% (6047 of 8680), 73.3% (6288 of 8580), and 29.6% (2536 of 8580) for model-generated, radiologist, and GPT-4Vision reports, respectively. Agreement scores were highest for the model-generated reports (median = 4 [IQR, 3-5]) and lowest for GPT-4Vision reports (median = 1 [IQR, 1-3]; P < .001). Quality scores were also highest for the model-generated reports (median = 4 [IQR, 3-5]) and lowest for the GPT-4Vision reports (median = 2 [IQR, 1-3]; P < .001). From the ranking analysis, model-generated reports were most frequently ranked the highest (60.0%; 5146 of 8580), and GPT-4Vision reports were most frequently ranked the lowest (73.6%; 6312 of 8580). Conclusion A domain-specific multimodal generative AI model demonstrated potential for high diagnostic accuracy and clinical value in providing preliminary interpretations of chest radiographs for radiologists. © RSNA, 2025 Supplemental material is available for this article. See also the editorial by Little in this issue.
View details for DOI 10.1148/radiol.241476
View details for PubMedID 40131111
-
Value of Using a Generative AI Model in Chest Radiography Reporting: A Reader Study.
Radiology
2025; 314 (3): e241646
Abstract
Background Multimodal generative artificial intelligence (AI) technologies can produce preliminary radiology reports, and validation with reader studies is crucial for understanding the clinical value of these technologies. Purpose To assess the clinical value of the use of a domain-specific multimodal generative AI tool for chest radiograph interpretation by means of a reader study. Materials and Methods A retrospective, sequential, multireader, multicase reader study was conducted using 758 chest radiographs from a publicly available dataset from 2009 to 2017. Five radiologists interpreted the chest radiographs in two sessions: without AI-generated reports and with AI-generated reports as preliminary reports. Reading times, reporting agreement (RADPEER), and quality scores (five-point scale) were evaluated by two experienced thoracic radiologists and compared between the first and second sessions from October to December 2023. Reading times, report agreement, and quality scores were analyzed using a generalized linear mixed model. Additionally, a subset of 258 chest radiographs was used to assess the factual correctness of the reports, and sensitivities and specificities were compared between the reports from the first and second sessions with use of the McNemar test. Results The introduction of AI-generated reports significantly reduced average reading times from 34.2 seconds ± 20.4 to 19.8 seconds ± 12.5 (P < .001). Report agreement scores shifted from a median of 5.0 (IQR, 4.0-5.0) without AI reports to 5.0 (IQR, 4.5-5.0) with AI reports (P < .001). Report quality scores changed from 4.5 (IQR, 4.0-5.0) without AI reports to 4.5 (IQR, 4.5-5.0) with AI reports (P < .001). From the subset analysis of factual correctness, the sensitivity for detecting various abnormalities increased significantly, including widened mediastinal silhouettes (84.3% to 90.8%; P < .001) and pleural lesions (77.7% to 87.4%; P < .001). While the overall diagnostic performance improved, variability among individual radiologists was noted. Conclusion The use of a domain-specific multimodal generative AI model increased the efficiency and quality of radiology report generation. © RSNA, 2025 Supplemental material is available for this article. See also the editorial by Babyn and Adams in this issue.
View details for DOI 10.1148/radiol.241646
View details for PubMedID 40067108
-
Clinical Training As the Foundation for Responsible Artificial Intelligence Use in Radiology: A Perspective From Industry to Fellowship.
AJR. American journal of roentgenology
2025
View details for DOI 10.2214/AJR.25.33797
View details for PubMedID 40899675
-
CXR-CLIP: Toward Large Scale Chest X-ray Language-Image Pre-training
edited by Greenspan, H., Madabhushi, A., Mousavi, P., Salcudean, S., Duncan, J., Syeda-Mahmood, T., Taylor, R.
SPRINGER INTERNATIONAL PUBLISHING AG. 2023: 101-111
View details for DOI 10.1007/978-3-031-43895-0_10
View details for Web of Science ID 001109624900010
-
Colon cancer CT staging according to mismatch repair status: Comparison and suggestion of imaging features for high-risk colon cancer
EUROPEAN JOURNAL OF CANCER
2022; 174: 165-175
Abstract
Neoadjuvant treatment with either chemotherapy or immunotherapy is gaining momentum in colon cancers (CC). To reduce over-treatment, increasing staging accuracy using computed tomography (CT) is of high importance.To assess and compare CT imaging features of CC between mismatch repair-proficient (pMMR) and MMR-deficient (dMMR) tumours and identify CT features that can distinguish high-risk (pT3-4, N+) CC according to MMR status.Primary staging CTs of 266 patients who underwent primary surgical resection of a colon tumour were retrospectively and independently evaluated by two radiologists. Logistic regression analysis was performed to identify significant associations between imaging features and positive lymph node status. Receiver operating characteristic (ROC) curves of significantly associated features were assessed and validated in an external cohort of 104 patients.Among pT3 tumours only, dMMR CC were significantly larger than pMMR CC in both length and thickness (length 59.39 ± 26.28 mm versus 48.70 ± 23.72, respectively, p = 0.031; thickness 20.54 mm ± 11.17 versus 16.34 ± 8.73, respectively, p = 0.027). For pMMR tumours, nodal internal heterogeneity on CT was significantly associated with a positive lymph node status (odds ratio (OR) = 2.66, p = 0.027), while for dMMR tumours, the largest short diameter of the nodes was associated with lymph node status (OR = 2.01, p = 0.049). The best cut-off value of the largest short diameter of involved nodes was 10.4 mm for dMMR and 7.95 mm for pMMR. In the external validation cohort, AUCs for predicting involved nodes based on the largest short diameter was 0.764 for dMMR tumours using 10 mm size cut-off and 0.624 for pMMR tumours using 7 mm cut-off.These data show that CT imaging features of primary CC differ between dMMR and pMMR tumours, suggesting that the assessment of CT-based CC staging should take MMR status into consideration, especially for lymph node status, and thus may help in selecting patients for neoadjuvant treatment.
View details for DOI 10.1016/j.ejca.2022.06.060
View details for Web of Science ID 000911797000010
View details for PubMedID 36029713
-
Identifying high-risk colon cancer on CT an a radiomics signature improve radiologist's performance for T staging?
ABDOMINAL RADIOLOGY
2022; 47 (8): 2739-2746
Abstract
To assess the role of radiomics in detection of high-risk (pT3-4) colon cancer and develop a combined model that combines both radiomics and CT staging of colon cancer.We included 292 colon cancer patients who underwent pre-operative CT and primary surgical resection within 2 months. Three-dimensional segmentations and CT staging of primary colon tumors were done. From each 3D segmentation of colon tumor, radiomic features were automatically extracted. Logistic regression analysis was performed to identify associations between radiomic features and high-risk (pT3-4) colon tumors. A combined model that integrated both radiomics and CT staging was developed and their diagnostic performance was compared with that of conventional CT staging. Tenfold cross-validation was used to validate the performance of the model and CT staging.The model that combined radiomic features and CT staging demonstrated a significantly better performance in detection of high-risk colon tumors in training set (AUC = 0.799, 95% CI: 0.720-0.839 for combined model and AUC = 0.697, 95% CI = 0.538-0.756 for CT staging only, p < 0.001 for difference). Cross-validation results also demonstrated significantly better detection performance of combined model (AUC = 0.727, 95% Confidence Interval (CI): 0.621-0.777 for combined model and AUC = 0.628, 95% CI = 0.558-0.689 for CT staging only, Boot CI = 0.099).CT radiomic features of primary colon cancer, combined with CT staging, can improve the detection of high-risk colon cancer patients.
View details for DOI 10.1007/s00261-022-03534-0
View details for Web of Science ID 000805908800001
View details for PubMedID 35661244
View details for PubMedCentralID 7433093
-
CT for lymph node staging of Colon cancer: not only size but also location and number of lymph node count
ABDOMINAL RADIOLOGY
2021; 46 (9): 4096-4105
Abstract
To evaluate the diagnostic accuracy of imaging features to predict lymph node status of colon cancer using CT.This was a retrospective study from 2 tertiary hospitals in South Korea and Netherlands. 317 Colon cancer patients who underwent primary surgical treatment were included. Number of lymph nodes according to the anatomical location, size, cluster, degree of attenuation, shape, presence of internal heterogeneity and ill-defined margin of the lymph node were assessed and compared according to histological lymph node status.The largest short diameter of lymph node and presence of internal heterogeneity of lymph node showed significant association with malignant lymph node status (P < 0.001 and P = 0.041, respectively). The ROC curve analysis revealed AUC of 0.703 for the largest short diameter of lymph node (P < 0.001), and AUC of the presence of internal heterogeneity was 0.630 (P < 0.001). In addition, our study showed that a total number of lymph nodes, regardless of size, (P = 0.022) and number of lymph nodes in peritumoral area (P < 0.001) and along the mesenteric vessels (P < 0.001) on CT demonstrated significant association with malignant status of lymph nodes in colon cancer.There were significant associations between lymph node status and imaging features of lymph nodes on CT in colon cancer patients. The largest short diameter of lymph node and presence of internal heterogeneity can be used to predict the malignant status of lymph node in colon cancer patients. Also, the number of lymph nodes near the colonic tumor should be considered in assessment of colon cancer lymph node involvement on CT.
View details for DOI 10.1007/s00261-021-03057-0
View details for Web of Science ID 000644762800001
View details for PubMedID 33904991
View details for PubMedCentralID 4051918
-
Comparison of Genetic Profiles and Prognosis of High-Grade Gliomas Using Quantitative and Qualitative MRI Features: A Focus on G3 Gliomas
KOREAN JOURNAL OF RADIOLOGY
2021; 22 (2): 233-242
Abstract
To evaluate the association of MRI features with the major genomic profiles and prognosis of World Health Organization grade III (G3) gliomas compared with those of glioblastomas (GBMs).We enrolled 76 G3 glioma and 155 GBM patients with pathologically confirmed disease who had pretreatment brain MRI and major genetic information of tumors. Qualitative and quantitative imaging features, including volumetrics and histogram parameters, such as normalized cerebral blood volume (nCBV), cerebral blood flow (nCBF), and apparent diffusion coefficient (nADC) were evaluated. The G3 gliomas were divided into three groups for the analysis: with this isocitrate dehydrogenase (IDH)-mutation, IDH mutation and a chromosome arm 1p/19q-codeleted (IDHmut1p/19qdel), IDH mutation, 1p/19q-nondeleted (IDHmut1p/19qnondel), and IDH wildtype (IDHwt). A prediction model for the genetic profiles of G3 gliomas was developed and validated on a separate cohort. Both the quantitative and qualitative imaging parameters and progression-free survival (PFS) of G3 gliomas were compared and survival analysis was performed. Moreover, the imaging parameters and PFS between IDHwt G3 gliomas and GBMs were compared.IDHmut G3 gliomas showed a larger volume (p = 0.017), lower nCBF (p = 0.048), and higher nADC (p = 0.007) than IDHwt. Between the IDHmut tumors, IDHmut1p/19qdel G3 gliomas had higher nCBV (p = 0.024) and lower nADC (p = 0.002) than IDHmut1p/19qnondel G3 gliomas. Moreover, IDHmut1p/19qdel tumors had the best prognosis and IDHwt tumors had the worst prognosis among G3 gliomas (p < 0.001). PFS was significantly associated with the 95th percentile values of nCBV and nCBF in G3 gliomas. There was no significant difference in neither PFS nor imaging features between IDHwt G3 gliomas and IDHwt GBMs.We found significant differences in MRI features, including volumetrics, CBV, and ADC, in G3 gliomas, according to IDH mutation and 1p/19q codeletion status, which can be utilized for the prediction of genomic profiles and the prognosis of G3 glioma patients. The MRI signatures and prognosis of IDHwt G3 gliomas tend to follow those of IDHwt GBMs.
View details for DOI 10.3348/kjr.2020.0011
View details for Web of Science ID 000609579400008
View details for PubMedID 32932560
View details for PubMedCentralID PMC7817637
-
Locoregional CT staging of colon cancer: does a learning curve exist?
ABDOMINAL RADIOLOGY
2021; 46 (2): 476-485
Abstract
To evaluate the learning curve for locoreginal staging of colon cancer in radiologist trainees.Eighty-eight cases of colon cancer CT were included in this retrospective study. Four senior radiology residents staged the CTs according to TNM classification. Two out of four radiologists received feedback after reading every 20 cases. Radiologic staging was compared with pathologic staging and the learning curve, diagnostic performance, reader confidence and reading time were evaluated and compared between the two groups (feedback vs. no feedback). Generalized estimating equations logistic regression, QICu statistic, ANOVA and t test/Mann-Whitney test were utilized.Radiologists demonstrated a significant increase in their performance to distinguish between ≤ T2 and ≥ T3 and reached an inflection point at 38 cases, with a significant association with increased number of cases reviewed (P < 0.001). Sensitivity (P < 0.001), specificity (P = 0.030) and NPV (P = 0.002) demonstrated significant associations with increased experience. The overall reader's confidence was significantly higher in the group which received feedback (P < 0.001). There was no significant improvement in performance nor in reader's confidence for N staging (N0 vs. ≥ N1) for all readers. Reading time decreased with experience and showed a significant negative association with experience (P < 0.001).Diagnostic performance of senior radiology trainees in differentiating between T2 and T3 colon cancer on CTs improved with increased experience. In contrast, evaluation of lymph node involvement did not improve with more experience. Feedback had no significant effect on improvement of diagnostic performances.
View details for DOI 10.1007/s00261-020-02672-7
View details for Web of Science ID 000557202200001
View details for PubMedID 32734351
View details for PubMedCentralID 4618410
-
Prognostic Value of Dynamic Contrast-Enhanced MRI-Derived Pharmacokinetic Variables in Glioblastoma Patients: Analysis of Contrast-Enhancing Lesions and Non-Enhancing T2 High-Signal Intensity Lesions
KOREAN JOURNAL OF RADIOLOGY
2020; 21 (6): 707-716
Abstract
To evaluate pharmacokinetic variables from contrast-enhancing lesions (CELs) and non-enhancing T2 high signal intensity lesions (NE-T2HSILs) on dynamic contrast-enhanced (DCE) magnetic resonance (MR) imaging for predicting progression-free survival (PFS) in glioblastoma (GBM) patients.Sixty-four GBM patients who had undergone preoperative DCE MR imaging and received standard treatment were retrospectively included. We analyzed the pharmacokinetic variables of the volume transfer constant (Ktrans) and volume fraction of extravascular extracellular space within the CEL and NE-T2HSIL of the entire tumor. Univariate and multivariate Cox regression analyses were performed using preoperative clinical characteristics, pharmacokinetic variables of DCE MR imaging, and postoperative molecular biomarkers to predict PFS.The increased mean Ktrans of the CEL, increased 95th percentile Ktrans of the CELs, and absence of methylated O⁶-methylguanine-DNA methyltransferase promoter were relevant adverse variables for PFS in the univariate analysis (p = 0.041, p = 0.032, and p = 0.083, respectively). The Kaplan-Meier survival curves demonstrated that PFS was significantly shorter in patients with a mean Ktrans of the CEL > 0.068 and 95th percentile Ktrans of the CEL>0.223 (log-rank p = 0.038 and p = 0.041, respectively). However, only mean Ktrans of the CEL was significantly associated with PFS (p = 0.024; hazard ratio, 553.08; 95% confidence interval, 2.27-134756.74) in the multivariate Cox proportional hazard analysis. None of the pharmacokinetic variables from NE-T2HSILs were significantly related to PFS.Among the pharmacokinetic variables extracted from CELs and NE-T2HSILs on preoperative DCE MR imaging, the mean Ktrans of CELs exhibits potential as a useful imaging predictor of PFS in GBM patients.
View details for DOI 10.3348/kjr.2019.0629
View details for Web of Science ID 000534356000007
View details for PubMedID 32410409
View details for PubMedCentralID PMC7231611
-
Arterial spin labeling perfusion-weighted imaging aids in prediction of molecular biomarkers and survival in glioblastomas
EUROPEAN RADIOLOGY
2020; 30 (2): 1202-1211
Abstract
Prediction of progression-free survival (PFS) and overall survival (OS) and early identification of molecular biomarkers with prognostic information are clinically important in glioblastoma (GBM) patients. We aimed to explore the utility of arterial spin labeling perfusion-weighted imaging (ASL-PWI) in the prediction of molecular biomarkers and survival in GBM patients.We retrospectively analyzed 149 consecutive GBM patients, who had undergone maximal surgical resection or biopsy followed by concurrent chemoradiotherapy and adjuvant chemotherapy using temozolomide between November 2010 and June 2016. On preoperative ASL-PWI, cerebral blood flow (CBF) within contrast-enhancing (CE) and nonenhancing (NE) portions were evaluated both qualitatively (perfusion pattern[CE] and perfusion pattern[NE]) and quantitatively (nCBFCE and nCBFNE). ASL-PWI findings were correlated with molecular biomarkers, including isocitrate dehydrogenase (IDH) and O6-methylguanine-DNA methyltransferase (MGMT) methylation statuses, and survival, using the Mann-Whitney U-test, Spearman rank correlation, Kaplan-Meier analysis, and receiver operating characteristics analysis.nCBFCE was significantly higher in the IDH wild-type group than in the IDH mutant group (p = .013) and in the MGMT unmethylated group than in the methylated group (p = .047). Areas under the receiver operating characteristic curve were 0.678 for IDH mutation (p = .022) and 0.601 for MGMT promoter methylation (p = .043). Hyperperfusion was associated with the shortest median PFS for both perfusion pattern[CE] (7.6 months) and perfusion pattern[NE] (4.0 months). The perfusion pattern[NE] remained an independent predictor for PFS and OS even after adjusting for clinical and molecular predictors, unlike perfusion pattern[CE].ASL-PWI can aid to predict survival and molecular biomarkers including IDH mutation and MGMT promoter methylation statuses in GBM patients.• ASL-PWI can aid to predict survival in GBM patients. • ASL-PWI can aid to predict IDH and MGMT promoter methylation statuses in GBM.
View details for DOI 10.1007/s00330-019-06379-2
View details for Web of Science ID 000511977900058
View details for PubMedID 31468161
-
Deep Learning for Chest Radiograph Diagnosis in the Emergency Department
RADIOLOGY
2019; 293 (3): 573-580
Abstract
BackgroundThe performance of a deep learning (DL) algorithm should be validated in actual clinical situations, before its clinical implementation.PurposeTo evaluate the performance of a DL algorithm for identifying chest radiographs with clinically relevant abnormalities in the emergency department (ED) setting.Materials and MethodsThis single-center retrospective study included consecutive patients who visited the ED and underwent initial chest radiography between January 1 and March 31, 2017. Chest radiographs were analyzed with a commercially available DL algorithm. The performance of the algorithm was evaluated by determining the area under the receiver operating characteristic curve (AUC), sensitivity, and specificity at predefined operating cutoffs (high-sensitivity and high-specificity cutoffs). The sensitivities and specificities of the algorithm were compared with those of the on-call radiology residents who interpreted the chest radiographs in the actual practice by using McNemar tests. If there were discordant findings between the algorithm and resident, the residents reinterpreted the chest radiographs by using the algorithm's output.ResultsA total of 1135 patients (mean age, 53 years ± 18; 582 men) were evaluated. In the identification of abnormal chest radiographs, the algorithm showed an AUC of 0.95 (95% confidence interval [CI]: 0.93, 0.96), a sensitivity of 88.7% (227 of 256 radiographs; 95% CI: 84.1%, 92.3%), and a specificity of 69.6% (612 of 879 radiographs; 95% CI: 66.5%, 72.7%) at the high-sensitivity cutoff and a sensitivity of 81.6% (209 of 256 radiographs; 95% CI: 76.3%, 86.2%) and specificity of 90.3% (794 of 879 radiographs; 95% CI: 88.2%, 92.2%) at the high-specificity cutoff. Radiology residents showed lower sensitivity (65.6% [168 of 256 radiographs; 95% CI: 59.5%, 71.4%], P < .001) and higher specificity (98.1% [862 of 879 radiographs; 95% CI: 96.9%, 98.9%], P < .001) compared with the algorithm. After reinterpretation of chest radiographs with use of the algorithm's outputs, the sensitivity of the residents improved (73.4% [188 of 256 radiographs; 95% CI: 68.0%, 78.8%], P = .003), whereas specificity was reduced (94.3% [829 of 879 radiographs; 95% CI: 92.8%, 95.8%], P < .001).ConclusionA deep learning algorithm used with emergency department chest radiographs showed diagnostic performance for identifying clinically relevant abnormalities and helped improve the sensitivity of radiology residents' evaluation.Published under a CC BY 4.0 license.Online supplemental material is available for this article.See also the editorial by Munera and Infante in this issue.
View details for DOI 10.1148/radiol.2019191225
View details for Web of Science ID 000498015600012
View details for PubMedID 31638490
-
Diagnostic value of computed tomography combined with ultrasonography in detecting cervical recurrence in patients with thyroid cancer
HEAD AND NECK-JOURNAL FOR THE SCIENCES AND SPECIALTIES OF THE HEAD AND NECK
2019; 41 (5): 1206-1212
Abstract
To determine the diagnostic role of CT added to ultrasound for the diagnosis of recurrent differentiated thyroid cancer (DTC) and to evaluate potential benefits for patients.A total of 193 patients with recurrent DTC were retrospectively included. The diagnostic performances of ultrasound and combination of ultrasound and CT (ultrasound/CT) in detecting recurrence were compared. Benefits of CT were assessed based on the presence of any recurrence detected only with additional CT.In detecting cervical recurrence, ultrasound/CT showed higher sensitivity (P = .001) and lower specificity (P < .001) than ultrasound alone, overall resulting in higher area under the curve (P < .001). Seventy-nine patients (40.9%) benefited from additional CT in detecting recurrence.For reoperation of cervical recurrence in patients with DTC, addition of CT to ultrasound offers better surgical planning by enhancing detection of recurrent cancers that were overlooked with ultrasound alone.
View details for DOI 10.1002/hed.25538
View details for Web of Science ID 000466490000012
View details for PubMedID 30552732
-
Persistent/Recurrent Differentiated Thyroid Cancer: Clinical and Radiological Characteristics of Persistent Disease and Clinical Recurrence Based on Computed Tomography Analysis
THYROID
2018; 28 (11): 1490-1499
Abstract
The natural course of persistent/recurrent differentiated thyroid cancer (DTC) has not been fully elucidated. The purpose of this study was to assess the relative incidence and clinico-radiological characteristics of persistent disease and clinical recurrence based on computed tomography (CT) analysis in patients with persistent/recurrent DTC.From January 2005 to December 2016, this retrospective study included 107 patients (M:F = 28:79; Mage = 53.5 years) with surgically proven cervical locoregional recurrence of DTC. Two neck CT examinations (median interval 1.92 years; range 0.17-7.58 years) before the last thyroid cancer surgery within the study period were reevaluated. Based on the presence of the lesion on the first CT and its progression on the second CT, the locoregional recurrence was classified into the following categories: stable persistence (decrease, no change, or increase by <2 mm in short dimension on the second CT), progressive persistence (increase by ≥2 mm), and clinical recurrence (newly appeared on the second CT). Clinical and radiological characteristics of the three groups were compared using univariate and multivariate logistic regression analyses.The relative incidences of stable persistence, progressive persistence, and clinical recurrence were 56.1% (60/107), 15.0% (16/107), and 29.0% (31/107), respectively. Multivariate analysis between the clinical recurrence (29.0%) and persistence (71.0%) groups revealed various independent factors for prediction of clinical recurrence. These included longer interval between the two CT examinations (median 2.67 vs. 1.79 years; p = 0.021), a smaller number of thyroid surgeries (1.16 ± 0.45 vs. 1.55 ± 0.81; p = 0.002), and a history of neck dissection at the location of the largest locoregional recurrence (70.0% vs. 31.4%; p < 0.001). There was no significant independent factor for differentiation between the stable persistence (78.9%; 60/76) and progressive persistence (21.1%; 16/76) groups. The results may have been influenced by selection bias because this study included only surgically proven cases.With regard to cervical locoregional recurrence of DTC, active surveillance may be favored because more than a half of the cases are structurally persistent and stable. However, meticulous evaluation is necessary to detect progressive persistence and clinical recurrence, considering various clinical factors.
View details for DOI 10.1089/thy.2018.0151
View details for Web of Science ID 000447406200001
View details for PubMedID 30226443
-
Radiogenomics correlation between MR imaging features and major genetic profiles in glioblastoma
EUROPEAN RADIOLOGY
2018; 28 (10): 4350-4361
Abstract
To assess the association between MR imaging features and major genomic profiles in glioblastoma.Qualitative and quantitative imaging features such as volumetrics and histogram analysis from normalised CBV (nCBV) and ADC (nADC) were evaluated based on both T2WI and CET1WI. The imaging parameters of different genetic profile groups were compared and regression analyses were used for identifying imaging-molecular associations. Progression-free survival (PFS) was analysed by a Kaplan-Meier test and Cox proportional hazards model.An IDH mutation was observed in 18/176 patients, and ATRX loss was positive in 17/158 of the IDH-wt cases. The IDH-mut group showed a larger volume on T2WI and a higher volume ratio between T2WI and CET1WI than the IDH-wt group (p < 0.05). In the IDH-mut group, higher mean nADC values were observed compared with the IDH-wt tumours (p < 0.05). Among the IDH-wt tumours, IDH-wt, ATRX-loss tumours revealed higher 5th percentile nADC values than the IDH-wt, ATRX-noloss tumours (p = 0.03). PFS was the longest in the IDH-mut group, followed by the IDH-wt, ATRX-loss groups and the IDH-wt, ATRX-noloss groups, consecutively (p < 0.05). We found significant associations of PFS with the genetic profiles and imaging parameters.Major genetic profiles of glioblastoma showed a significant association with MR imaging features, along with some genetic profiles, which are independent prognostic parameters for GBM.• Significant correlation exists between radiological parameters such as volumetric and ADC values and major genomic profiles such as IDH mutation and ATRX loss status • Radiological parameters such as the ADC value were feasible predictors of glioblastoma patients' prognosis • Imaging features can predict major genomic profiles of the tumours and the prognosis of glioblastoma patients.
View details for DOI 10.1007/s00330-018-5400-8
View details for Web of Science ID 000443692400035
View details for PubMedID 29721688
-
Accurate measurements of liver stiffness using shear wave elastography in children and young adults and the role of the stability index
ULTRASONOGRAPHY
2018; 37 (3): 226-232
Abstract
The purpose of this study was to evaluate the usefulness of the stability index (SI) in liver stiffness measurements using shear wave elastography (SWE) in children.A total of 29 children and young adults (mean age, 16.1 years; range, 8 to 28 years; 11 boys and 18 girls) who underwent liver stiffness measurements using SWE under free-breathing and breath-holding conditions were included in our study. Ten SWE measurements were acquired in each of four groups: free-breathing and breath-holding, and with and without using the SI. The failure rate of acquisition of SI values over 90% was calculated in each group. To evaluate variability in the SWE measurements, the standard deviation, coefficient of variation, and percentage of unreliable measurements were compared. Intraobserver agreement and the optimal minimal number of measurements were calculated using intraclass correlation coefficients.A failure to acquire SI values over 90% was observed in 17% of the scans in the free-breathing group and in 7% of the scans in the breath-holding group. In both groups, utilizing the SI led to a significantly lower standard deviation and coefficient of variation. When using the SI, the percentage of unreliable measurements decreased from 16.7% to 8.3% in the free-breathing group and 14.8% to 0% in the breath-holding group. With the use of the SI, intraobserver agreement increased and the optimal minimal number of repeated measurements decreased in both the free-breathing and breath-holding groups.Utilization of the SI in the measurement of liver SWE in children reduced measurement variability and increased reliability in both free-breathing and breath-holding conditions.
View details for DOI 10.14366/usg.17025
View details for Web of Science ID 000436781800006
View details for PubMedID 29096427
View details for PubMedCentralID PMC6044215
https://orcid.org/0000-0002-5440-0451