Leveraging a strong foundation in data science and engineering, my objective is to address challenges within the biomedical sector. My experience encompasses a broad spectrum of data, including radiology, genomics, histopathology, and clinical data. I am committed to integrating these diverse datasets to conduct research aimed at benefiting patients.

Professional Education

  • Doctor of Philosophy, Seoul National University (2023)
  • Bachelor of Engineering, Inha University Yonghyeon Campus (2014)
  • Ph.D., Seoul National University, Bioinformatics, department of natural science

Stanford Advisors

All Publications

  • Ensemble Deep Learning Model to Predict Lymphovascular Invasion in Gastric Cancer. Cancers Lee, J., Cha, S., Kim, J., Kim, J. J., Kim, N., Jae Gal, S. G., Kim, J. H., Lee, J. H., Choi, Y., Kang, S., Song, G., Yang, D., Lee, J., Lee, K., Ahn, S., Moon, K. M., Noh, M. 2024; 16 (2)


    Lymphovascular invasion (LVI) is one of the most important prognostic factors in gastric cancer as it indicates a higher likelihood of lymph node metastasis and poorer overall outcome for the patient. Despite its importance, the detection of LVI(+) in histopathology specimens of gastric cancer can be a challenging task for pathologists as invasion can be subtle and difficult to discern. Herein, we propose a deep learning-based LVI(+) detection method using H&E-stained whole-slide images. The ConViT model showed the best performance in terms of both AUROC and AURPC among the classification models (AUROC: 0.9796; AUPRC: 0.9648). The AUROC and AUPRC of YOLOX computed based on the augmented patch-level confidence score were slightly lower (AUROC: -0.0094; AUPRC: -0.0225) than those of the ConViT classification model. With weighted averaging of the patch-level confidence scores, the ensemble model exhibited the best AUROC, AUPRC, and F1 scores of 0.9880, 0.9769, and 0.9280, respectively. The proposed model is expected to contribute to precision medicine by potentially saving examination-related time and labor and reducing disagreements among pathologists.

    View details for DOI 10.3390/cancers16020430

    View details for PubMedID 38275871

  • Artificial Intelligence Model Assisting Thyroid Nodule Diagnosis and Management: A Multicenter Diagnostic Study JOURNAL OF CLINICAL ENDOCRINOLOGY & METABOLISM Ha, E., Lee, J., Lee, D., Moon, J., Lee, H., Kim, Y., Kim, M., Na, D., Kim, J. 2023


    To develop and validate a deep-learning-based AI model (AI-Thyroid) for thyroid cancer diagnosis, and to explore how this improve diagnostic performance.The system was trained using 19,711 images of 6,163 patients in a tertiary hospital. It was validated using 11,185 images of 4,820 patients in 24 hospitals (test set 1) and 4,490 images of 2,367 patients in ____ (test set 2). The clinical implications were determined by comparing the findings of six physicians with different levels of experience (group 1: four trainees, and group 2: two faculty radiologists) before and after AI-Thyroid assistance.The area under the receiver operating characteristic (AUROC) curve of AI-Thyroid was 0.939. The AUROC, sensitivity, and specificity were 0.922, 87.0%, and 81.5% for test set 1 and 0.938, 89.9%, and 81.6% for test set 2. The AUROCs of AI-Thyroid did not differ significantly according to the prevalence of malignancies (> 15.0% vs. ≤ 15.0%, p = 0.226). In the simulated scenario, AI-Thyroid assistance changed the AUROC, sensitivity, and specificity from 0.854 to 0.945, from 84.2% to 92.7%, and from 72.9% to 86.6% (all p < 0.001) in group 1, and from 0.914 to 0.939 (p = 0.022), from 78.6% to 85.5% (p = 0.053) and from 91.9% to 92.5% (p = 0.683) in group 2. The interobserver agreement improved from moderate to substantial in both groups.AI-Thyroid can improve diagnostic performance and interobserver agreement in thyroid cancer diagnosis, especially in less-experienced physicians.

    View details for DOI 10.1210/clinem/dgad503

    View details for Web of Science ID 001066519400001

    View details for PubMedID 37622451

  • Development and Application of an Active Pharmacovigilance Framework Based on Electronic Healthcare Records from Multiple Centers in Korea. Drug safety Choe, S., Lee, S., Park, C. H., Lee, J. H., Kim, H. J., Byeon, S. J., Choi, J. H., Yang, H. J., Sim, D. W., Cho, B. J., Koo, H., Kang, M. G., Jeong, J. B., Choi, I. Y., Kim, S. H., Kim, W. J., Jung, J. W., Lhee, S. H., Ko, Y. J., Park, H. K., Kang, D. Y., Kim, J. H. 2023; 46 (7): 647-660


    With the availability of retrospective pharmacovigilance data, the common data model (CDM) has been identified as an efficient approach towards anonymized multicenter analysis; however, the establishment of a suitable model for individual medical systems and applications supporting their analysis is a challenge.The aim of this study was to construct a specialized Korean CDM (K-CDM) for pharmacovigilance systems based on a clinical scenario to detect adverse drug reactions (ADRs).De-identified patient records (n = 5,402,129) from 13 institutions were converted to the K-CDM. From 2005 to 2017, 37,698,535 visits, 39,910,849 conditions, 259,594,727 drug exposures, and 30,176,929 procedures were recorded. The K-CDM, which comprises three layers, is compatible with existing models and is potentially adaptable to extended clinical research. Local codes for electronic medical records (EMRs), including diagnosis, drug prescriptions, and procedures, were mapped using standard vocabulary. Distributed queries based on clinical scenarios were developed and applied to K-CDM through decentralized or distributed networks.Meta-analysis of drug relative risk ratios from ten institutions revealed that non-steroidal anti-inflammatory drugs (NSAIDs) increased the risk of gastrointestinal hemorrhage by twofold compared with aspirin, and non-vitamin K anticoagulants decreased cerebrovascular bleeding risk by 0.18-fold compared with warfarin.These results are similar to those from previous studies and are conducive for new research, thereby demonstrating the feasibility of K-CDM for pharmacovigilance. However, the low quality of original EMR data, incomplete mapping, and heterogeneity between institutions reduced the validity of the analysis, thus necessitating continuous calibration among researchers, clinicians, and the government.

    View details for DOI 10.1007/s40264-023-01296-2

    View details for PubMedID 37243963

    View details for PubMedCentralID 2635959

  • Contrast-enhanced CT-based Radiomics for the Differentiation of Anaplastic or Poorly Differentiated Thyroid Carcinoma from Differentiated Thyroid Carcinoma: A Pilot Study SCIENTIFIC REPORTS Moon, J., Lee, J., Roh, J., Lee, D., Ha, E. 2023; 13 (1): 4562


    Differential diagnosis of anaplastic thyroid carcinoma/poorly differentiated thyroid carcinoma (ATC/PDTC) from differentiated thyroid carcinoma (DTC) is crucial in patients with large thyroid malignancies. This study creates a predictive model using radiomics feature analysis to differentiate ATC/PDTC from DTC. We compared the clinicoradiological characteristics and radiomics features extracted from a volume of interest on contrast-enhanced computed tomography (CT) between the groups. Estimations of variable importance were performed via modeling using the random forest quantile classifier. The diagnostic performance of the model with radiomics features alone had the area under the receiver operating characteristic (AUROC) curve value of 0.883. The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy were 81.7%, 93.3%, 97.7%, 64.5%, and 84.6%, respectively, for the differential diagnosis of ATC/PDTC and DTC. The model with both radiomics and clinicoradiological information showed the AUROC of 0.908, with sensitivity, specificity, PPV, NPV, and accuracy of 82.9%, 97.6%, 99.2%, 67.1%, and 86.5% respectively. Distant metastasis, moment, shape, age, and gray-level size zone matrix features were the most useful factors for differential diagnosis. Therefore, we concluded that a radiomics approach based on contrast-enhanced CT features can potentially differentiate ATC/PDTC from DTC in patients with large thyroid malignancies.

    View details for DOI 10.1038/s41598-023-31212-8

    View details for Web of Science ID 000984222400034

    View details for PubMedID 36941287

    View details for PubMedCentralID PMC10027684

  • Development of a machine learning-based fine-grained risk stratification system for thyroid nodules using predefined clinicoradiological features EUROPEAN RADIOLOGY Ha, E., Lee, J., Lee, D., Na, D., Kim, J. 2023; 33 (5): 3211-3221


    We constructed and validated a machine learning-based malignancy risk estimation model using predefined clinicoradiological features, and evaluated its clinical utility for the management of thyroid nodules.In total, 5708 benign (n = 4597) and malignant (n = 1111) thyroid nodules were collected from 5081 consecutive patients treated in 26 institutions. Seventeen experienced radiologists evaluated nodule characteristics on ultrasonographic images. Eight predictive models were used to stratify the thyroid nodules according to malignancy risk; model performance was assessed via nested 10-fold cross-validation. The best-performing algorithm was externally validated using data for 454 thyroid nodules from a tertiary hospital, then compared to the Thyroid Imaging Reporting and Data System (TIRADS)-based interpretations of radiologists (American College of Radiology, European and Korean TIRADS, and AACE/ACE/AME guidelines).The area under the receiver operating characteristic (AUROC) curves of the algorithms ranged from 0.773 to 0.862. The sensitivities, specificities, positive predictive values, and negative predictive values of the best-performing models were 74.1-76.6%, 80.9-83.4%, 49.2-51.9%, and 93.0-93.5%, respectively. For the external validation set, the ElasticNet values were 83.2%, 89.2%, 81.8%, and 90.1%, respectively. The corresponding TIRADS values were 66.5-85.0%, 61.3-80.8%, 45.9-72.1%, and 81.5-90.3%, respectively. The new model exhibited a significantly higher AUROC and specificity than did the TIRADS risk stratification, although its sensitivity was similar.We developed a reliable machine learning-based predictive model that demonstrated enhanced specificity when stratifying thyroid nodules according to malignancy risk. This system will contribute to improved personalized management of thyroid nodules.• The area under the receiver operating characteristic (AUROC) curve, sensitivity, and specificity of our model were 0.914, 83.2%, and 89.2%, respectively (derived using the validation dataset). • Compared to the TIRADS values, the AUROC and specificity are significantly higher, while the sensitivity is similar. • An interactive version of our AI algorithm is at .

    View details for DOI 10.1007/s00330-022-09376-0

    View details for Web of Science ID 000907928600002

    View details for PubMedID 36600122

    View details for PubMedCentralID 8628155

  • A Data-Driven Reference Standard for Adverse Drug Reaction (RS-ADR) Signal Assessment: Development and Validation. Journal of medical Internet research Lee, S., Lee, J. H., Kim, G. J., Kim, J. Y., Shin, H., Ko, I., Choe, S., Kim, J. H. 2022; 24 (10): e35464


    Pharmacovigilance using real-world data (RWD), such as multicenter electronic health records (EHRs), yields massively parallel adverse drug reaction (ADR) signals. However, proper validation of computationally detected ADR signals is not possible due to the lack of a reference standard for positive and negative associations.This study aimed to develop a reference standard for ADR (RS-ADR) to streamline the systematic detection, assessment, and understanding of almost all drug-ADR associations suggested by RWD analyses.We integrated well-known reference sets for drug-ADR pairs, including Side Effect Resource, Observational Medical Outcomes Partnership, and EU-ADR. We created a pharmacovigilance dictionary using controlled vocabularies and systematically annotated EHR data. Drug-ADR associations computed from MetaLAB and MetaNurse analyses of multicenter EHRs and extracted from the Food and Drug Administration Adverse Event Reporting System were integrated as "empirically determined" positive and negative reference sets by means of cross-validation between institutions.The RS-ADR consisted of 1344 drugs, 4485 ADRs, and 6,027,840 drug-ADR pairs with positive and negative consensus votes as pharmacovigilance reference sets. After the curation of the initial version of RS-ADR, novel ADR signals such as "famotidine-hepatic function abnormal" were detected and reasonably validated by RS-ADR. Although the validation of the entire reference standard is challenging, especially with this initial version, the reference standard will improve as more RWD participate in the consensus voting with advanced pharmacovigilance dictionaries and analytic algorithms. One can check if a drug-ADR pair has been reported by our web-based search interface for RS-ADRs.RS-ADRs enriched with the pharmacovigilance dictionary, ADR knowledge, and real-world evidence from EHRs may streamline the systematic detection, evaluation, and causality assessment of computationally detected ADR signals.

    View details for DOI 10.2196/35464

    View details for PubMedID 36201386

    View details for PubMedCentralID PMC9585444

  • Predictions for Three-Month Postoperative Vocal Recovery after Thyroid Surgery from Spectrograms with Deep Neural Network. Sensors (Basel, Switzerland) Lee, J. H., Lee, C. Y., Eom, J. S., Pak, M., Jeong, H. S., Son, H. Y. 2022; 22 (17)


    Despite the lack of findings in laryngeal endoscopy, it is common for patients to undergo vocal problems after thyroid surgery. This study aimed to predict the recovery of the patient's voice after 3 months from preoperative and postoperative voice spectrograms. We retrospectively collected voice and the GRBAS score from 114 patients undergoing surgery with thyroid cancer. The data for each patient were taken from three points in time: preoperative, and 2 weeks and 3 months postoperative. Using the pretrained model to predict GRBAS as the backbone, the preoperative and 2-weeks-postoperative voice spectrogram were trained for the EfficientNet architecture deep-learning model with long short-term memory (LSTM) to predict the voice at 3 months postoperation. The correlation analysis of the predicted results for the grade, breathiness, and asthenia scores were 0.741, 0.766, and 0.433, respectively. Based on the scaled prediction results, the area under the receiver operating characteristic curve for the binarized grade, breathiness, and asthenia were 0.894, 0.918, and 0.735, respectively. In the follow-up test results for 12 patients after 6 months, the average of the AUC values for the five scores was 0.822. This study showed the feasibility of predicting vocal recovery after 3 months using the spectrogram. We expect this model could be used to relieve patients' psychological anxiety and encourage them to actively participate in speech rehabilitation.

    View details for DOI 10.3390/s22176387

    View details for PubMedID 36080847

    View details for PubMedCentralID PMC9460363

  • Clinicoradiological Characteristics in the Differential Diagnosis of Follicular-Patterned Lesions of the Thyroid: A Multicenter Cohort Study. Korean journal of radiology Lee, J. H., Ha, E. J., Lee, D. H., Han, M., Park, J. H., Kim, J. H. 2022; 23 (7): 763-772


    Preoperative differential diagnosis of follicular-patterned lesions is challenging. This multicenter cohort study investigated the clinicoradiological characteristics relevant to the differential diagnosis of such lesions.From June to September 2015, 4787 thyroid nodules (≥ 1.0 cm) with a final diagnosis of benign follicular nodule (BN, n = 4461), follicular adenoma (FA, n = 136), follicular carcinoma (FC, n = 62), or follicular variant of papillary thyroid carcinoma (FVPTC, n = 128) collected from 26 institutions were analyzed. The clinicoradiological characteristics of the lesions were compared among the different histological types using multivariable logistic regression analyses. The relative importance of the characteristics that distinguished histological types was determined using a random forest algorithm.Compared to BN (as the control group), the distinguishing features of follicular-patterned neoplasms (FA, FC, and FVPTC) were patient's age (odds ratio [OR], 0.969 per 1-year increase), lesion diameter (OR, 1.054 per 1-mm increase), presence of solid composition (OR, 2.255), presence of hypoechogenicity (OR, 2.181), and presence of halo (OR, 1.761) (all p < 0.05). Compared to FA (as the control), FC differed with respect to lesion diameter (OR, 1.040 per 1-mm increase) and rim calcifications (OR, 17.054), while FVPTC differed with respect to patient age (OR, 0.966 per 1-year increase), lesion diameter (OR, 0.975 per 1-mm increase), macrocalcifications (OR, 3.647), and non-smooth margins (OR, 2.538) (all p < 0.05). The five important features for the differential diagnosis of follicular-patterned neoplasms (FA, FC, and FVPTC) from BN are maximal lesion diameter, composition, echogenicity, orientation, and patient's age. The most important features distinguishing FC and FVPTC from FA are rim calcifications and macrocalcifications, respectively.Although follicular-patterned lesions have overlapping clinical and radiological features, the distinguishing features identified in our large clinical cohort may provide valuable information for preoperative distinction between them and decision-making regarding their management.

    View details for DOI 10.3348/kjr.2022.0079

    View details for PubMedID 35695317

    View details for PubMedCentralID PMC9240300

  • Development and Validation of a Multimodal-Based Prognosis and Intervention Prediction Model for COVID-19 Patients in a Multicenter Cohort SENSORS Lee, J., Ahn, J., Chung, M., Jeong, Y., Kim, J., Lim, J., Kim, J., Kim, Y., Lee, J., Kim, E. 2022; 22 (13)


    The ability to accurately predict the prognosis and intervention requirements for treating highly infectious diseases, such as COVID-19, can greatly support the effective management of patients, especially in resource-limited settings. The aim of the study is to develop and validate a multimodal artificial intelligence (AI) system using clinical findings, laboratory data and AI-interpreted features of chest X-rays (CXRs), and to predict the prognosis and the required interventions for patients diagnosed with COVID-19, using multi-center data. In total, 2282 real-time reverse transcriptase polymerase chain reaction-confirmed COVID-19 patients’ initial clinical findings, laboratory data and CXRs were retrospectively collected from 13 medical centers in South Korea, between January 2020 and June 2021. The prognostic outcomes collected included intensive care unit (ICU) admission and in-hospital mortality. Intervention outcomes included the use of oxygen (O2) supplementation, mechanical ventilation and extracorporeal membrane oxygenation (ECMO). A deep learning algorithm detecting 10 common CXR abnormalities (DLAD-10) was used to infer the initial CXR taken. A random forest model with a quantile classifier was used to predict the prognostic and intervention outcomes, using multimodal data. The area under the receiver operating curve (AUROC) values for the single-modal model, using clinical findings, laboratory data and the outputs from DLAD-10, were 0.742 (95% confidence interval [CI], 0.696−0.788), 0.794 (0.745−0.843) and 0.770 (0.724−0.815), respectively. The AUROC of the combined model, using clinical findings, laboratory data and DLAD-10 outputs, was significantly higher at 0.854 (0.820−0.889) than that of all other models (p < 0.001, using DeLong’s test). In the order of importance, age, dyspnea, consolidation and fever were significant clinical variables for prediction. The most predictive DLAD-10 output was consolidation. We have shown that a multimodal AI model can improve the performance of predicting both the prognosis and intervention in COVID-19 patients, and this could assist in effective treatment and subsequent resource management. Further, image feature extraction using an established AI engine with well-defined clinical outputs, and combining them with different modes of clinical data, could be a useful way of creating an understandable multimodal prediction model.

    View details for DOI 10.3390/s22135007

    View details for Web of Science ID 000822123000001

    View details for PubMedID 35808502

    View details for PubMedCentralID PMC9269794

  • Prognostic Role of Circulating Tumor Cells in the Pulmonary Vein, Peripheral Blood, and Bone Marrow in Resectable Non-Small Cell Lung Cancer. Journal of chest surgery Lee, J. M., Jung, W., Yum, S., Lee, J. H., Cho, S. 2022; 55 (3): 214-224


    Studies of the prognostic role of circulating tumor cells (CTCs) in early-stage non-small cell lung cancer (NSCLC) are still limited. This study investigated the prognostic power of CTCs from the pulmonary vein (PV), peripheral blood (PB), and bone marrow (BM) for postoperative recurrence in patients who underwent curative resection for NSCLC.Forty patients who underwent curative resection for NSCLC were enrolled. Before resection, 10-mL samples were obtained of PB from the radial artery, blood from the PV of the lobe containing the tumor, and BM aspirates from the rib. A microfabricated filter was used for CTC enrichment, and immunofluorescence staining was used to identify CTCs.The pathologic stage was stage I in 8 patients (20%), II in 15 (38%), III in 14 (35%), and IV in 3 (8%). The median number of PB-, PV-, and BM-CTCs was 4, 4, and 5, respectively. A time-dependent receiver operating characteristic curve analysis showed that PB-CTCs had excellent predictive value for recurrence-free survival (RFS), with the highest area under the curve at each time point (first, second, and third quartiles of RFS). In a multivariate Cox proportional hazard regression model, PB-CTCs were an independent risk factor for recurrence (hazard ratio, 10.580; 95% confidence interval, 1.637-68.388; p<0.013).The presence of ≥4 PB-CTCs was an independent poor prognostic factor for RFS, and PV-CTCs and PB-CTCs had a positive linear correlation in patients with recurrence.

    View details for DOI 10.5090/jcs.21.140

    View details for PubMedID 35440519

    View details for PubMedCentralID PMC9178304

  • Improving the Performance of Radiologists Using Artificial Intelligence-Based Detection Support Software for Mammography: A Multi-Reader Study. Korean journal of radiology Lee, J. H., Kim, K. H., Lee, E. H., Ahn, J. S., Ryu, J. K., Park, Y. M., Shin, G. W., Kim, Y. J., Choi, H. Y. 2022; 23 (5): 505-516


    To evaluate whether artificial intelligence (AI) for detecting breast cancer on mammography can improve the performance and time efficiency of radiologists reading mammograms.A commercial deep learning-based software for mammography was validated using external data collected from 200 patients, 100 each with and without breast cancer (40 with benign lesions and 60 without lesions) from one hospital. Ten readers, including five breast specialist radiologists (BSRs) and five general radiologists (GRs), assessed all mammography images using a seven-point scale to rate the likelihood of malignancy in two sessions, with and without the aid of the AI-based software, and the reading time was automatically recorded using a web-based reporting system. Two reading sessions were conducted with a two-month washout period in between. Differences in the area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, and reading time between reading with and without AI were analyzed, accounting for data clustering by readers when indicated.The AUROC of the AI alone, BSR (average across five readers), and GR (average across five readers) groups was 0.915 (95% confidence interval, 0.876-0.954), 0.813 (0.756-0.870), and 0.684 (0.616-0.752), respectively. With AI assistance, the AUROC significantly increased to 0.884 (0.840-0.928) and 0.833 (0.779-0.887) in the BSR and GR groups, respectively (p = 0.007 and p < 0.001, respectively). Sensitivity was improved by AI assistance in both groups (74.6% vs. 88.6% in BSR, p < 0.001; 52.1% vs. 79.4% in GR, p < 0.001), but the specificity did not differ significantly (66.6% vs. 66.4% in BSR, p = 0.238; 70.8% vs. 70.0% in GR, p = 0.689). The average reading time pooled across readers was significantly decreased by AI assistance for BSRs (82.73 vs. 73.04 seconds, p < 0.001) but increased in GRs (35.44 vs. 42.52 seconds, p < 0.001).AI-based software improved the performance of radiologists regardless of their experience and affected the reading time.

    View details for DOI 10.3348/kjr.2021.0476

    View details for PubMedID 35434976

    View details for PubMedCentralID PMC9081685

  • GLRX3, a novel cancer stem cell-related secretory biomarker of pancreatic ductal adenocarcinoma. BMC cancer Jo, J. H., Kim, S. A., Lee, J. H., Park, Y. R., Kim, C., Park, S. B., Jung, D. E., Lee, H. S., Chung, M. J., Song, S. Y. 2021; 21 (1): 1241


    Cancer stem cells (CSCs) are implicated in carcinogenesis, cancer progression, and recurrence. Several biomarkers have been described for pancreatic ductal adenocarcinoma (PDAC) CSCs; however, their function and mechanism remain unclear.In this study, secretome analysis was performed in pancreatic CSC-enriched spheres and control adherent cells for biomarker discovery. Glutaredoxin3 (GLRX3), a novel candidate upregulated in spheres, was evaluated for its function and clinical implication.PDAC CSC populations, cell lines, patient tissues, and blood samples demonstrated GLRX3 overexpression. In contrast, GLRX3 silencing decreased the in vitro proliferation, migration, clonogenicity, and sphere formation of cells. GLRX3 knockdown also reduced tumor formation and growth in vivo. GLRX3 was found to regulate Met/PI3K/AKT signaling and stemness-related molecules. ELISA results indicated GLRX3 overexpression in the serum of patients with PDAC compared to that in healthy controls. The sensitivity and specificity of GLRX3 for PDAC diagnosis were 80.0 and 100%, respectively. When GLRX3 and CA19-9 were combined, sensitivity was significantly increased to 98.3% compared to that with GLRX3 or CA19-9 alone. High GLRX3 expression was also associated with poor disease-free survival in patients receiving curative surgery.Overall, these results indicate GLRX3 as a novel diagnostic marker and therapeutic target for PDAC targeting CSCs.

    View details for DOI 10.1186/s12885-021-08898-y

    View details for PubMedID 34794402

    View details for PubMedCentralID PMC8603516

  • Author Correction: Deep learning from HE slides predicts the clinical benefit from adjuvant chemotherapy in hormone receptor-positive breast cancer patients. Scientific reports Cho, S. Y., Lee, J. H., Ryu, J. M., Lee, J. E., Cho, E. Y., Ahn, C. H., Paeng, K., Yoo, I., Ock, C. Y., Song, S. Y. 2021; 11 (1): 21043

    View details for DOI 10.1038/s41598-021-00546-6

    View details for PubMedID 34671078

    View details for PubMedCentralID PMC8528879

  • Deep learning from HE slides predicts the clinical benefit from adjuvant chemotherapy in hormone receptor-positive breast cancer patients. Scientific reports Cho, S. Y., Lee, J. H., Ryu, J. M., Lee, J. E., Cho, E. Y., Ahn, C. H., Paeng, K., Yoo, I., Ock, C. Y., Song, S. Y. 2021; 11 (1): 17363


    We hypothesized that a deep-learning algorithm using HE images might be capable of predicting the benefits of adjuvant chemotherapy in cancer patients. HE slides were retrospectively collected from 1343 de-identified breast cancer patients at the Samsung Medical Center and used to develop the Lunit SCOPE algorithm. Lunit SCOPE was trained to predict the recurrence using the 21-gene assay (Oncotype DX) and histological parameters. The risk prediction model predicted the Oncotype DX score > 25 and the recurrence survival of the prognosis validation cohort and TCGA cohorts. The most important predictive variable was the mitotic cells in the cancer epithelium. Of the 363 patients who did not receive adjuvant therapy, 104 predicted high risk had a significantly lower survival rate. The top-300 genes highly correlated with the predicted risk were enriched for cell cycle, nuclear division, and cell division. From the Oncotype DX genes, the predicted risk was positively correlated with proliferation-associated genes and negatively correlated with prognostic genes from the estrogen category. An integrative analysis using Lunit SCOPE predicted the risk of cancer recurrence and the early-stage hormone receptor-positive breast cancer patients who would benefit from adjuvant chemotherapy.

    View details for DOI 10.1038/s41598-021-96855-x

    View details for PubMedID 34462515

    View details for PubMedCentralID PMC8405682

  • Effect of aspirin on coronavirus disease 2019: A nationwide case-control study in South Korea. Medicine Son, M., Noh, M. G., Lee, J. H., Seo, J., Park, H., Yang, S. 2021; 100 (30): e26670


    Several studies reported that aspirin can potentially help prevent infection and serious complications of coronavirus disease (COVID-19), but no study has elucidated a definitive association between aspirin and COVID-19. This study aims to investigate the association between aspirin and COVID-19.This case-control study used demographic, clinical, and health screening laboratory test data collected from the National Health Insurance Service database. Patients who tested positive for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection until June 4, 2020, were matched with control patients using propensity score matching according to their SARS-CoV-2 status, the composite of complications, and death. The composite of complications included intensive care unit admission, use of vasopressors, high-flow oxygen therapy, renal replacement therapy, extracorporeal membrane oxygenation, and death. Exposure to aspirin was defined as having a prescription for aspirin for more than 14 days, including the index date. After matching, multivariable-adjusted conditional logistic regression analysis was performed. To confirm the robustness of this study, we used 2 study groups, 3 propensity score matching methods, and 3 models for conditional logistic regression analyses.The crude odds ratio and 95% confidence interval for SARS-CoV-2 infection between the groups without and with exposure to aspirin were 1.21 (1.04-1.41), but the adjusted odds ratios (95% confidence interval) were not significant. There was no association between aspirin exposure and COVID-19 status. Multiple statistical analyses, including subgroup analysis, revealed consistent results. Furthermore, the results of analysis for complications and death were not significant. Aspirin exposure was not associated with COVID-19-related complications and mortality in COVID-19 patients.In this nationwide population-based case-control study, aspirin use was not associated with SARS-CoV-2 infection or related complications. With several ongoing randomized controlled trials of aspirin in COVID-19 patients, more studies would be able to confirm the effectiveness of aspirin in COVID-19.

    View details for DOI 10.1097/MD.0000000000026670

    View details for PubMedID 34397693

    View details for PubMedCentralID PMC8322539

  • Prognostic Significance of the Extranodal Extension of Regional Lymph Nodes in Stage III-N2 Non-Small-Cell Lung Cancer after Curative Resection. Journal of clinical medicine Shih, B. C., Jeon, J. H., Chung, J. H., Kwon, H. J., Lee, J. H., Jung, W., Hwang, Y., Cho, S., Kim, K., Jheon, S. 2021; 10 (15)


    The present study investigated the prognostic role of extranodal extension (ENE) in stage III-N2 non-small-cell lung cancer (NSCLC) following curative surgery. From January 2005 to December 2018, pathologic stage III-N2 disease was diagnosed in 371 patients, all of whom underwent anatomic pulmonary resection accompanied by mediastinal lymph node dissection. This study included 282 patients, after excluding 89 patients who received preoperative chemotherapy or incomplete surgical resection. Their lymph nodes were processed; after hematoxylin and eosin staining, histopathologic slides of the metastatic nodes were reviewed by a designated pathologist. Predictors of disease free survival (DFS), including age, sex, operation type, pathologic T stage, nodal status, visceral pleural invasion, perioperative treatment, and the presence of ENE, were investigated. Among the 282 patients, ENE was detected in 85 patients (30.1%). ENE presence was associated with advanced T stage (p = 0.034), N2 subgroups (p < 0.001), lymphatic invasion (p = 0.001), and pneumonectomy (p = 0.002). The multivariable analysis demonstrated that old age (p < 0.001), advanced T stage (p = 0.012), N2 subgroups (p = 0.005), and ENE presence (p = 0.005) were significant independent predictors of DFS. The DFS rate at five years was 21.4% in patients who had ENE and 43.4% in patients who did not have ENE (p < 0.001). The presence of ENE, coupled with tumor-node-metastasis staging, should be recognized as a meaningful prognostic factor in stage III-N2 NSCLC patients.

    View details for DOI 10.3390/jcm10153324

    View details for PubMedID 34362108

    View details for PubMedCentralID PMC8347115

  • Sharing genetic variants with the NGS pipeline is essential for effective genomic data sharing and reproducibility in health information exchange. Scientific reports Lee, J. H., Kweon, S., Park, Y. R. 2021; 11 (1): 2268


    Genetic variants causing underlying pharmacogenetic and disease phenotypes have been used as the basis for clinical decision-making. However, due to the lack of standards for next-generation sequencing (NGS) pipelines, reproducing genetic variants among institutions is still difficult. The aim of this study is to show how many important variants for clinical decisions can be individually detected using different pipelines. Genetic variants were derived from 105 breast cancer patient target DNA sequences via three different variant-calling pipelines. HaplotypeCaller, Mutect2 tumor-only mode in the Genome Analysis ToolKit (GATK), and VarScan were used in variant calling from the sequence read data processed by the same NGS preprocessing tools using Variant Effect Predictor. GATK HaplotypeCaller, VarScan, and MuTect2 found 25,130, 16,972, and 4232 variants, comprising 1491, 1400, and 321 annotated variants with ClinVar significance, respectively. The average number of ClinVar significant variants in the patients was 769.43, 16.50% of the variants were detected by only one variant caller. Despite variants with significant impact on clinical decision-making, the detected variants are different for each algorithm. To utilize genetic variants in the clinical field, a strict standard for NGS pipelines is essential.

    View details for DOI 10.1038/s41598-021-82006-9

    View details for PubMedID 33500538

    View details for PubMedCentralID PMC7838410

  • In Silico Inference of Synthetic Cytotoxic Interactions from Paclitaxel Responses. International journal of molecular sciences Lee, J. H., Lee, K. H., Kim, J. H. 2021; 22 (3)


    To exploit negatively interacting pairs of cancer somatic mutations in chemotherapy responses or synthetic cytotoxicity (SC), we systematically determined mutational pairs that had significantly lower paclitaxel half maximal inhibitory concentration (IC50) values. We evaluated 407 cell lines with somatic mutation profiles and estimated their copy number and drug-inhibitory concentrations in Genomics of Drug Sensitivity in Cancer (GDSC) database. The SC effect of 142 mutated gene pairs on response to paclitaxel was successfully cross-validated using human cancer datasets for urogenital cancers available in The Cancer Genome Atlas (TCGA) database. We further analyzed the cumulative effect of increasing SC pair numbers on the TP53 tumor suppressor gene. Patients with TCGA bladder and urogenital cancer exhibited improved cancer survival rates as the number of disrupted SC partners (i.e., SYNE2, SON, and/or PRY) of TP53 increased. The prognostic effect of SC burden on response to paclitaxel treatment could be differentiated from response to other cytotoxic drugs. Thus, the concept of pairwise SC may aid the identification of novel therapeutic and prognostic targets.

    View details for DOI 10.3390/ijms22031097

    View details for PubMedID 33499282

    View details for PubMedCentralID PMC7865701

  • Technical feasibility of radiomics signature analyses for improving detection of occult tonsillar cancer. Scientific reports Lee, J. H., Ha, E. J., Roh, J., Lee, S. J., Jang, J. Y. 2021; 11 (1): 192


    Diagnosis of occult palatine tonsil squamous cell carcinoma (SCC) using conventional magnetic resonance imaging (MRI) is difficult in patients with cervical nodal metastasis from an unknown primary site at presentation. We aimed to establish a radiomics approach based on MRI features extracted from the volume of interest in these patients. An Elastic Net model was developed to differentiate between normal palatine tonsils and occult palatine tonsil SCC. The diagnostic performances of the model with radiomics features extracted from T1-weighted image (WI), T2WI, contrast-enhanced T1WI, and an apparent diffusion coefficient (ADC) map had area under the receiver operating characteristic (AUROC) curve values of 0.831, 0.840, 0.781, and 0.807, respectively, for differential diagnosis. The model with features from the ADC alone showed the highest sensitivity of 90.0%, while the model with features from T1WI + T2WI + contrast-enhanced T1WI showed the highest AUROC of 0.853. The added sensitivity of the radiomics feature analysis were 34.6% over that of conventional MRI to detect occult palatine tonsil SCC. Therefore, we concluded that adding radiomics feature analysis to MRI may improve the detection sensitivity for occult palatine tonsil SCC in patients with a cervical nodal metastasis from cancer of an unknown primary site.

    View details for DOI 10.1038/s41598-020-80597-3

    View details for PubMedID 33420249

    View details for PubMedCentralID PMC7794329

  • Personal Health Information Inference Using Machine Learning on RNA Expression Data from Patients With Cancer: Algorithm Validation Study. Journal of medical Internet research Kweon, S., Lee, J. H., Lee, Y., Park, Y. R. 2020; 22 (8): e18387


    As the need for sharing genomic data grows, privacy issues and concerns, such as the ethics surrounding data sharing and disclosure of personal information, are raised.The main purpose of this study was to verify whether genomic data is sufficient to predict a patient's personal information.RNA expression data and matched patient personal information were collected from 9538 patients in The Cancer Genome Atlas program. Five personal information variables (age, gender, race, cancer type, and cancer stage) were recorded for each patient. Four different machine learning algorithms (support vector machine, decision tree, random forest, and artificial neural network) were used to determine whether a patient's personal information could be accurately predicted from RNA expression data. Performance measurement of the prediction models was based on the accuracy and area under the receiver operating characteristic curve. We selected five cancer types (breast carcinoma, kidney renal clear cell carcinoma, head and neck squamous cell carcinoma, low-grade glioma, and lung adenocarcinoma) with large samples sizes to verify whether predictive accuracy would differ between them. We also validated the efficacy of our four machine learning models in analyzing normal samples from 593 cancer patients.In most samples, personal information with high genetic relevance, such as gender and cancer type, could be predicted from RNA expression data alone. The prediction accuracies for gender and cancer type, which were the best models, were 0.93-0.99 and 0.78-0.94, respectively. Other aspects of personal information, such as age, race, and cancer stage, were difficult to predict from RNA expression data, with accuracies ranging from 0.0026-0.29, 0.76-0.96, and 0.45-0.79, respectively. Among the tested machine learning methods, the highest predictive accuracy was obtained using the support vector machine algorithm (mean accuracy 0.77), while the lowest accuracy was obtained using the random forest method (mean accuracy 0.65). Gender and race were predicted more accurately than other variables in the samples. On average, the accuracy of cancer stage prediction ranged between 0.71-0.67, while the age prediction accuracy ranged between 0.18-0.23 for the five cancer types.We attempted to predict patient information using RNA expression data. We found that some identifiers could be predicted, but most others could not. This study showed that personal information available from RNA expression data is limited and this information cannot be used to identify specific patients.

    View details for DOI 10.2196/18387

    View details for PubMedID 32773372

    View details for PubMedCentralID PMC7445622

  • Application of deep learning to the diagnosis of cervical lymph node metastasis from thyroid cancer with CT: external validation and clinical utility for resident training. European radiology Lee, J. H., Ha, E. J., Kim, D., Jung, Y. J., Heo, S., Jang, Y. H., An, S. H., Lee, K. 2020; 30 (6): 3066-3072


    This study aimed to validate a deep learning model's diagnostic performance in using computed tomography (CT) to diagnose cervical lymph node metastasis (LNM) from thyroid cancer in a large clinical cohort and to evaluate the model's clinical utility for resident training.The performance of eight deep learning models was validated using 3838 axial CT images from 698 consecutive patients with thyroid cancer who underwent preoperative CT imaging between January and August 2018 (3606 and 232 images from benign and malignant lymph nodes, respectively). Six trainees viewed the same patient images (n = 242), and their diagnostic performance and confidence level (5-point scale) were assessed before and after computer-aided diagnosis (CAD) was included.The overall area under the receiver operating characteristics (AUROC) of the eight deep learning algorithms was 0.846 (range 0.784-0.884). The best performing model was Xception, with an AUROC of 0.884. The diagnostic accuracy, sensitivity, specificity, positive predictive value, and negative predictive value of Xception were 82.8%, 80.2%, 83.0%, 83.0%, and 80.2%, respectively. After introducing the CAD system, underperforming trainees received more help from artificial intelligence than the higher performing trainees (p = 0.046), and overall confidence levels significantly increased from 3.90 to 4.30 (p < 0.001).The deep learning-based CAD system used in this study for CT diagnosis of cervical LNM from thyroid cancer was clinically validated with an AUROC of 0.884. This approach may serve as a training tool to help resident physicians to gain confidence in diagnosis.• A deep learning-based CAD system for CT diagnosis of cervical LNM from thyroid cancer was validated using data from a clinical cohort. The AUROC for the eight tested algorithms ranged from 0.784 to 0.884. • Of the eight models, the Xception algorithm was the best performing model for the external validation dataset with 0.884 AUROC. The accuracy, sensitivity, specificity, positive predictive value, and negative predictive value were 82.8%, 80.2%, 83.0%, 83.0%, and 80.2%, respectively. • The CAD system exhibited potential to improve diagnostic specificity and accuracy in underperforming trainees (3 of 6 trainees, 50.0%). This approach may have clinical utility as a training tool to help trainees to gain confidence in diagnoses.

    View details for DOI 10.1007/s00330-019-06652-4

    View details for PubMedID 32065285

  • Development of a Real-Time Risk Prediction Model for In-Hospital Cardiac Arrest in Critically Ill Patients Using Deep Learning: Retrospective Study. JMIR medical informatics Kim, J., Park, Y. R., Lee, J. H., Lee, J. H., Kim, Y. H., Huh, J. W. 2020; 8 (3): e16349


    Cardiac arrest is the most serious death-related event in intensive care units (ICUs), but it is not easily predicted because of the complex and time-dependent data characteristics of intensive care patients. Given the complexity and time dependence of ICU data, deep learning-based methods are expected to provide a good foundation for developing risk prediction models based on large clinical records.This study aimed to implement a deep learning model that estimates the distribution of cardiac arrest risk probability over time based on clinical data and assesses its potential.A retrospective study of 759 ICU patients was conducted between January 2013 and July 2015. A character-level gated recurrent unit with a Weibull distribution algorithm was used to develop a real-time prediction model. Fivefold cross-validation testing (training set: 80% and validation set: 20%) determined the consistency of model accuracy. The time-dependent area under the curve (TAUC) was analyzed based on the aggregation of 5 validation sets.The TAUCs of the implemented model were 0.963, 0.942, 0.917, 0.875, 0.850, 0.842, and 0.761 before cardiac arrest at 1, 8, 16, 24, 32, 40, and 48 hours, respectively. The sensitivity was between 0.846 and 0.909, and specificity was between 0.923 and 0.946. The distribution of risk between the cardiac arrest group and the non-cardiac arrest group was generally different, and the difference rapidly increased as the time left until cardiac arrest reduced.A deep learning model for forecasting cardiac arrest was implemented and tested by considering the cumulative and fluctuating effects of time-dependent clinical data gathered from a large medical center. This real-time prediction model is expected to improve patient's care by allowing early intervention in patients at high risk of unexpected cardiac arrests.

    View details for DOI 10.2196/16349

    View details for PubMedID 32186517

    View details for PubMedCentralID PMC7113801

  • Gene regulatory network analysis with drug sensitivity reveals synergistic effects of combinatory chemotherapy in gastric cancer. Scientific reports Lee, J. H., Park, Y. R., Jung, M., Lim, S. G. 2020; 10 (1): 3932


    The combination of docetaxel, cisplatin, and fluorouracil (DCF) is highly synergistic in advanced gastric cancer. We aimed to explain these synergistic effects at the molecular level. Thus, we constructed a weighted correlation network using the differentially expressed genes between Stage I and IV gastric cancer based on The Cancer Genome Atlas (TCGA), and three modules were derived. Next, we investigated the correlation between the eigengene of the expression of the gene network modules and the chemotherapeutic drug response to DCF from the Genomics of Drug Sensitivity in Cancer (GDSC) database. The three modules were associated with functions related to cell migration, angiogenesis, and the immune response. The eigengenes of the three modules had a high correlation with DCF (-0.41, -0.40, and -0.15). The eigengenes of the three modules tended to increase as the stage increased. Advanced gastric cancer was affected by the interaction the among modules with three functions, namely cell migration, angiogenesis, and the immune response, all of which are related to metastasis. The weighted correlation network analysis model proved the complementary effects of DCF at the molecular level and thus, could be used as a unique methodology to determine the optimal combination of chemotherapy drugs for patients with gastric cancer.

    View details for DOI 10.1038/s41598-020-61016-z

    View details for PubMedID 32127608

    View details for PubMedCentralID PMC7054272

  • Prognostic Implication of pAMPK Immunohistochemical Staining by Subcellular Location and Its Association with SMAD Protein Expression in Clear Cell Renal Cell Carcinoma. Cancers Jung, M., Lee, J. H., Lee, C., Park, J. H., Park, Y. R., Moon, K. C. 2019; 11 (10)


    Although cytoplasmic AMP-activated protein kinase (AMPK) has been known as a tumor-suppressor protein, nuclear AMPK is suggested to support clear cell renal cell carcinoma (ccRCC). In addition, pAMPK interacts with TGF-β/SMAD, which is one of the frequently altered pathways in ccRCC. In this study, we investigated the prognostic significance of pAMPK with respect to subcellular location and investigated its interaction with TGF-β/SMAD in ccRCC. Immunohistochemical staining for pAMPK, pSMAD2 and SMAD4 was conducted on tissue microarray of 987 ccRCC specimens. Moreover, the levels of pSMAD2 were measured in Caki-1 cells treated with 5-aminoimidazole-4-carboxamide ribonucleotide. The relationship between AMPK/pAMPK and TGFB1 expression was determined using the TCGA database. As a result, pAMPK positivity, either in the cytoplasm or nuclei, was independently associated with improved ccRCC prognosis, after adjusting for TNM stage and WHO grade. Furthermore, pAMPK-positive ccRCC displayed increased pSMAD2 and SMAD4 expression, while activation of pAMPK increased pSMAD2 in Caki-1 cells. However, AMPK/pAMPK expression was inversely correlated with TGFB1 expression in the TCGA database. Therefore, pAMPK immunostaining, both in the cytoplasm and nuclei, is a useful prognostic biomarker for ccRCC. pAMPK targets TGF-β-independent phosphorylation of SMAD2 and activates pSMAD2/SMAD4, representing a novel anti-tumoral mechanism of pAMPK in ccRCC.

    View details for DOI 10.3390/cancers11101602

    View details for PubMedID 31640193

    View details for PubMedCentralID PMC6826619

  • Application of deep learning to the diagnosis of cervical lymph node metastasis from thyroid cancer with CT. European radiology Lee, J. H., Ha, E. J., Kim, J. H. 2019; 29 (10): 5452-5457


    To develop a deep learning-based computer-aided diagnosis (CAD) system for use in the CT diagnosis of cervical lymph node metastasis (LNM) in patients with thyroid cancer.A total of 995 axial CT images that included benign (n = 647) and malignant (n = 348) lymph nodes were collected from 202 patients with thyroid cancer who underwent CT for surgical planning between July 2017 and January 2018. The datasets were randomly split into training (79.0%), validation (10.5%), and test (10.5%) datasets. Eight deep convolutional neural network (CNN) models were used to classify the images into metastatic or benign lymph nodes. Pretrained networks were used on the ImageNet and the best-performing algorithm was selected. Class-specific discriminative regions were visualized with attention heatmap using a global average pooling method.The area under the ROC curve (AUROC) for the tested algorithms ranged from 0.909 to 0.953. The sensitivity, specificity, and accuracy of the best-performing algorithm were all 90.4%, respectively. Attention heatmap highlighted important subregions for further clinical review.A deep learning-based CAD system could accurately classify cervical LNM in patients with thyroid cancer on preoperative CT with an AUROC of 0.953. Whether this approach has clinical utility will require evaluation in a clinical setting.• A deep learning-based CAD system could accurately classify cervical lymph node metastasis. The AUROC for the eight tested algorithms ranged from 0.909 to 0.953. • Of the eight models, the ResNet50 algorithm was the best-performing model for the validation dataset with 0.953 AUROC. The sensitivity, specificity, and accuracy of the ResNet50 model were all 90.4%, respectively, in the test dataset. • Based on its high accuracy of 90.4%, we consider that this model may be useful in a clinical setting to detect LNM on preoperative CT in patients with thyroid cancer.

    View details for DOI 10.1007/s00330-019-06098-8

    View details for PubMedID 30877461

  • Implementation of Korean Clinical Imaging Guidelines: A Mobile App-Based Decision Support System. Korean journal of radiology Lee, J. H., Ha, E. J., Baek, J. H., Choi, M., Jung, S. E., Yong, H. S. 2019; 20 (2): 182-189


    The aims of this study were to develop a mobile app-based clinical decision support system (CDSS) for implementation of Korean clinical imaging guidelines (K-CIGs) and to assess future developments therein.K-CIGs were implemented in the form of a web-based application ( The app containing K-CIGs consists of 53 information databases, including 10 medical subspecialties and 119 guidelines, developed by the Korean Society of Radiology (KSR) between 2015 and 2017. An email survey consisting of 18 questions on the implementation of K-CIGs and the mobile app-based CDSS was distributed to 43 members of the guideline working group (expert members of the KSR and Korean Academy of Oral and Maxillofacial Radiology) and 23 members of the consultant group (clinical experts belonging to related medical societies) to gauge opinion on the future developmental direction of K-CIGs.The web-based mobile app can be downloaded from the Google Play Store. Detailed information on the grade of recommendation, evidence level, and radiation dose for each imaging modality in the K-CIGs can be accessed via the home page and side menus. In total, 32 of the 66 experts contacted completed the survey (response rate, 45%). Twenty-four of the 32 respondents were from the working group and eight were from the consulting group. Most (93.8%) of the respondents agreed on the need for ongoing development and implementation of K-CIGs.This study describes the mobile app-based CDSS designed for implementation of K-CIGs in Korea. The results will allow physicians to have easy access to the K-CIGs and encourage appropriate use of imaging modalities.

    View details for DOI 10.3348/kjr.2018.0621

    View details for PubMedID 30672158

    View details for PubMedCentralID PMC6342762

  • Transcriptional Analysis of Immunohistochemically Defined Subgroups of Non-Muscle-Invasive Papillary High-Grade Upper Tract Urothelial Carcinoma. International journal of molecular sciences Jung, M., Lee, J. H., Kim, B., Park, J. H., Moon, K. C. 2019; 20 (3)


    Immunohistochemical (IHC) staining for CK5/6 and CK20 was reported to be correlated with the prognosis of early urothelial carcinoma in a way contrary to that of advanced tumors for unknown reasons. We aimed to characterize the gene expression profiles of subgroups of non-muscle-invasive papillary high-grade upper tract urothelial carcinoma (UTUC) classified by CK5/6 and CK20 expression levels: group 1 (CK5/6-high/CK20-low), group 2 (CK5/6-high/CK20-high), and group 3 (CK5/6-low/CK20-high). Expression of group 3 was predictive of worse prognosis of non-muscle-invasive papillary high-grade UTUC. Transcriptional analysis revealed 308 differentially expressed genes across the subgroups. Functional analyses of the genes identified cell adhesion as a common process differentially enriched in group 3 compared to the other groups, which could explain its high-risk phenotype. Late cell cycle/proliferation signatures were also enriched in group 3 and in some of the other groups, which may be used as a prognostic biomarker complementary to CK5/6 and CK20. Group 2, characterized by low levels of genes associated with mitogen-activated protein kinase and tumor necrosis factor signaling pathways, was hypothesized to represent the least cancerous subtype considering its normal urothelium-like IHC pattern. This study would facilitate the application of easily accessible prognostic biomarkers in practice.

    View details for DOI 10.3390/ijms20030570

    View details for PubMedID 30699951

    View details for PubMedCentralID PMC6386996

  • Deep Learning-Based Computer-Aided Diagnosis System for Localization and Diagnosis of Metastatic Lymph Nodes on Ultrasound: A Pilot Study. Thyroid : official journal of the American Thyroid Association Lee, J. H., Baek, J. H., Kim, J. H., Shim, W. H., Chung, S. R., Choi, Y. J., Lee, J. H. 2018; 28 (10): 1332-1338


    The presence of metastatic lymph nodes is a prognostic indicator for patients with thyroid carcinomas and is an important determinant of clinical decision making. However, evaluating neck lymph nodes requires experience and is labor- and time-intensive. Therefore, the development of a computer-aided diagnosis (CAD) system to identify and differentiate metastatic lymph nodes may be useful.From January 2008 to December 2016, we retrieved clinical records for 804 consecutive patients with 812 lymph nodes. The status of all lymph nodes was confirmed by fine-needle aspiration. The datasets were split into training (263 benign and 286 metastatic lymph nodes), validation (30 benign and 33 metastatic lymph nodes), and test (100 benign and 100 metastatic lymph nodes). Using the VGG-Class Activation Map model, we developed a CAD system to localize and differentiate the metastatic lymph nodes. We then evaluated the diagnostic performance of this CAD system in our test set.In the test set, the accuracy, sensitivity, and specificity of our model for predicting lymph node malignancy were 83.0%, 79.5%, and 87.5%, respectively. The CAD system clearly detected the locations of the lymph nodes, which not only provided identifying data, but also demonstrated the basis of decisions.We developed a deep learning-based CAD system for the localization and differentiation of metastatic lymph nodes from thyroid cancer on ultrasound. This CAD system is highly sensitive and may be used as a screening tool; however, as it is relatively less specific, the screening results should be validated by experienced physicians.

    View details for DOI 10.1089/thy.2018.0082

    View details for PubMedID 30132411

  • Gene expression profiling of calcifications in breast cancer. Scientific reports Shin, S. U., Lee, J., Kim, J. H., Kim, W. H., Song, S. E., Chu, A., Kim, H. S., Han, W., Ryu, H. S., Moon, W. K. 2017; 7 (1): 11427


    We investigated the gene expression profiles of calcifications in breast cancer. Gene expression analysis of surgical specimen was performed using Affymetrix GeneChip® Human Gene 2.0 ST arrays in 168 breast cancer patients. The mammographic calcifications were reviewed by three radiologists and classified into three groups according to malignancy probability: breast cancers without suspicious calcifications; breast cancers with low-to-intermediate suspicious calcifications; and breast cancers with highly suspicious calcifications. To identify differentially expressed genes (DEGs) between these three groups, a one-way analysis of variance was performed with post hoc comparisons with Tukey's honest significant difference test. To explore the biological significance of DEGs, we used DAVID for gene ontology analysis and BioLattice for clustering analysis. A total of 2551 genes showed differential expression among the three groups. ERBB2 genes are up-regulated in breast cancers with highly suspicious calcifications (fold change 2.474, p < 0.001). Gene ontology analysis revealed that the immune, defense and inflammatory responses were decreased in breast cancers with highly suspicious calcifications compared to breast cancers without suspicious calcifications (p from 10-23 to 10-8). The clustering analysis also demonstrated that the immune system is associated with mammographic calcifications (p < 0.001). Our study showed calcifications in breast cancers are associated with high levels of mRNA expression of ERBB2 and decreased immune system activity.

    View details for DOI 10.1038/s41598-017-11331-9

    View details for PubMedID 28900139

    View details for PubMedCentralID PMC5595962

  • Gene Regulatory Network Analysis for Triple-Negative Breast Neoplasms by Using Gene Expression Data. Journal of breast cancer Jung, H. C., Kim, S. H., Lee, J. H., Kim, J. H., Han, S. W. 2017; 20 (3): 240-245


    To better identify the physiology of triple-negative breast neoplasm (TNBN), we analyzed the TNBN gene regulatory network using gene expression data.We collected TNBN gene expression data from The Cancer Genome Atlas to construct a TNBN gene regulatory network using least absolute shrinkage and selection operator regression. In addition, we constructed a triple-positive breast neoplasm (TPBN) network for comparison. Furthermore, survival analysis based on gene expression levels and differentially expressed gene (DEG) analysis were carried out to support and compare the network analysis results, respectively.The TNBN gene regulatory network, which followed a power-law distribution, had 10,237 vertices and 17,773 edges, with an average vertex-to-vertex distance of 8.6. The genes ZDHHC20 and RAPGEF6 were identified by centrality analysis to be important vertices. However, in the DEG analysis, we could not find meaningful fold changes in ZDHHC20 and RAPGEF6 between the TPBN and TNBN gene expression data. In the multivariate survival analysis, the hazard ratio for ZDHHC20 and RAPGEF6 was 1.677 (1.192-2.357) and 1.676 (1.222-2.299), respectively.Our TNBN gene regulatory network was a scale-free one, which means that the network would be easily destroyed if the hub vertices were attacked. Thus, it is important to identify the hub vertices in the network analysis. In the TNBN gene regulatory network, ZDHHC20 and RAPGEF6 were found to be oncogenes. Further study of these genes could help to reveal a novel method for treating TNBN in the future.

    View details for DOI 10.4048/jbc.2017.20.3.240

    View details for PubMedID 28970849

    View details for PubMedCentralID PMC5620438

  • Small molecule-based lineage switch of human adipose-derived stem cells into neural stem cells and functional GABAergic neurons. Scientific reports Park, J., Lee, N., Lee, J., Choe, E. K., Kim, M. K., Lee, J., Byun, M. S., Chon, M. W., Kim, S. W., Lee, C. J., Kim, J. H., Kwon, J. S., Chang, M. S. 2017; 7 (1): 10166


    Cellular reprogramming using small molecules (SMs) without genetic modification provides a promising strategy for generating target cells for cell-based therapy. Human adipose-derived stem cells (hADSCs) are a desirable cell source for clinical application due to their self-renewal capacity, easy obtainability and the lack of safety concerns, such as tumor formation. However, methods to convert hADSCs into neural cells, such as neural stem cells (NSCs), are inefficient, and few if any studies have achieved efficient reprogramming of hADSCs into functional neurons. Here, we developed highly efficient induction protocols to generate NSC-like cells (iNSCs), neuron-like cells (iNs) and GABAergic neuron-like cells (iGNs) from hADSCs via SM-mediated inhibition of SMAD signaling without genetic manipulation. All induced cells adopted morphological, molecular and functional features of their bona fide counterparts. Electrophysiological data demonstrated that iNs and iGNs exhibited electrophysiological properties of neurons and formed neural networks in vitro. Microarray analysis further confirmed that iNSCs and iGNs underwent lineage switch toward a neural fate. Together, these studies provide rapid, reproducible and robust protocols for efficient generation of functional iNSCs, iNs and iGNs from hADSCs, which have utility for modeling disease pathophysiology and providing cell-therapy sources of neurological disorders.

    View details for DOI 10.1038/s41598-017-10394-y

    View details for PubMedID 28860504

    View details for PubMedCentralID PMC5579051

  • GeneNetFinder2: Improved Inference of Dynamic Gene Regulatory Relations with Multiple Regulators. IEEE/ACM transactions on computational biology and bioinformatics Han, K., Lee, J. 2016; 13 (1): 4-11


    A gene involved in complex regulatory interactions may have multiple regulators since gene expression in such interactions is often controlled by more than one gene. Another thing that makes gene regulatory interactions complicated is that regulatory interactions are not static, but change over time during the cell cycle. Most research so far has focused on identifying gene regulatory relations between individual genes in a particular stage of the cell cycle. In this study we developed a method for identifying dynamic gene regulations of several types from the time-series gene expression data. The method can find gene regulations with multiple regulators that work in combination or individually as well as those with single regulators. The method has been implemented as the second version of GeneNetFinder (hereafter called GeneNetFinder2) and tested on several gene expression datasets. Experimental results with gene expression data revealed the existence of genes that are not regulated by individual genes but rather by a combination of several genes. Such gene regulatory relations cannot be found by conventional methods. Our method finds such regulatory relations as well as those with multiple, independent regulators or single regulators, and represents gene regulatory relations as a dynamic network in which different gene regulatory relations are shown in different stages of the cell cycle. GeneNetFinder2 is available at and will be useful for modeling dynamic gene regulations with multiple regulators.

    View details for DOI 10.1109/TCBB.2015.2450728

    View details for PubMedID 26886731

  • Inference of Dynamic Gene Regulatory Relations with Multiple Regulators Lee, J., Chen, Y., Han, K., Huang, D. S., Han, K., Gromiha, M. SPRINGER-VERLAG BERLIN. 2014: 134-140