Soroosh Tayebi Arasteh's Profile

Professional Education

Ph.D., RWTH Aachen University, Aachen, Germany, Theoretical Medicine (Dr. rer. medic.) (2024)
Ph.D., FAU Erlangen-Nuremberg, Erlangen, Germany, Computer Science (Dr.-Ing.) (2024)
M.Sc. Thesis, Harvard Medical School, Boston, MA (2021)
M.Sc., FAU Erlangen-Nuremberg, Erlangen, Germany, Communications and Multimedia Engineering (2021)
B.Sc., Bu-Ali Sina University, Hamedan, Iran, Electrical Engineering (2017)

Stanford Advisors

Geoffrey Sonn, Postdoctoral Faculty Sponsor

Lab Affiliations

Geoffrey Sonn, Urologic Cancer Innovation Lab (UCIL) (3/1/2025)
Mirabela Rusu, Laboratory for Integrative Personalized Medicine (PIMed) (3/1/2025)

All Publications

The Treasure Trove Hidden in Plain Sight: The Utility of GPT-4 in Chest Radiograph Evaluation. Radiology Tayebi Arasteh, S., Siepmann, R., Huppertz, M., Lotfinia, M., Puladi, B., Kuhl, C., Truhn, D., Nebelung, S. 2024; 313 (2): e233441

Abstract

Background Limited statistical knowledge can slow critical engagement with and adoption of artificial intelligence (AI) tools for radiologists. Large language models (LLMs) such as OpenAI's GPT-4, and notably its Advanced Data Analysis (ADA) extension, may improve the adoption of AI in radiology. Purpose To validate GPT-4 ADA outputs when autonomously conducting analyses of varying complexity on a multisource clinical dataset. Materials and Methods In this retrospective study, unique itemized radiologic reports of bedside chest radiographs, associated demographic data, and laboratory markers of inflammation from patients in intensive care from January 2009 to December 2019 were evaluated. GPT-4 ADA, accessed between December 2023 and January 2024, was tasked with autonomously analyzing this dataset by plotting radiography usage rates, providing descriptive statistics measures, quantifying factors of pulmonary opacities, and setting up machine learning (ML) models to predict their presence. Three scientists with 6-10 years of ML experience validated the outputs by verifying the methodology, assessing coding quality, re-executing the provided code, and comparing ML models head-to-head with their human-developed counterparts (based on the area under the receiver operating characteristic curve [AUC], accuracy, sensitivity, and specificity). Statistical significance was evaluated using bootstrapping. Results A total of 43 788 radiograph reports, with their laboratory values, from University Hospital RWTH Aachen were evaluated from 43 788 patients (mean age, 66 years ± 15 [SD]; 26 804 male). While GPT-4 ADA provided largely appropriate visualizations, descriptive statistical measures, quantitative statistical associations based on logistic regression, and gradient boosting machines for the predictive task (AUC, 0.75), some statistical errors and inaccuracies were encountered. ML strategies were valid and based on consistent coding routines, resulting in valid outputs on par with human specialist-developed reference models (AUC, 0.80 [95% CI: 0.80, 0.81] vs 0.80 [95% CI: 0.80, 0.81]; P = .51) (accuracy, 79% [6910 of 8758 patients] vs 78% [6875 of 8758 patients], respectively; P = .27). Conclusion LLMs may facilitate data analysis in radiology, from basic statistics to advanced ML-based predictive modeling. © RSNA, 2024 Supplemental material is available for this article.

View details for DOI 10.1148/radiol.233441

View details for PubMedID 39530893
Addressing challenges in speaker anonymization to maintain utility while ensuring privacy of pathological speech. Communications medicine Tayebi Arasteh, S., Arias-Vergara, T., Pérez-Toro, P. A., Weise, T., Packhäuser, K., Schuster, M., Noeth, E., Maier, A., Yang, S. H. 2024; 4 (1): 182

Abstract

Integration of speech into healthcare has intensified privacy concerns due to its potential as a non-invasive biomarker containing individual biometric information. In response, speaker anonymization aims to conceal personally identifiable information while retaining crucial linguistic content. However, the application of anonymization techniques to pathological speech, a critical area where privacy is especially vital, has not been extensively examined.This study investigates anonymization's impact on pathological speech across over 2700 speakers from multiple German institutions, focusing on privacy, pathological utility, and demographic fairness. We explore both deep-learning-based and signal processing-based anonymization methods.We document substantial privacy improvements across disorders-evidenced by equal error rate increases up to 1933%, with minimal overall impact on utility. Specific disorders such as Dysarthria, Dysphonia, and Cleft Lip and Palate experience minimal utility changes, while Dysglossia shows slight improvements. Our findings underscore that the impact of anonymization varies substantially across different disorders. This necessitates disorder-specific anonymization strategies to optimally balance privacy with diagnostic utility. Additionally, our fairness analysis reveals consistent anonymization effects across most of the demographics.This study demonstrates the effectiveness of anonymization in pathological speech for enhancing privacy, while also highlighting the importance of customized and disorder-specific approaches to account for inversion attacks.

View details for DOI 10.1038/s43856-024-00609-5

View details for PubMedID 39322637

View details for PubMedCentralID PMC11424628
Intraindividual Comparison of Different Methods for Automated BPE Assessment at Breast MRI: A Call for Standardization. Radiology Müller-Franzes, G., Khader, F., Tayebi Arasteh, S., Huck, L., Bode, M., Han, T., Lemainque, T., Kather, J. N., Nebelung, S., Kuhl, C., Truhn, D. 2024; 312 (1): e232304

Abstract

Background The level of background parenchymal enhancement (BPE) at breast MRI provides predictive and prognostic information and can have diagnostic implications. However, there is a lack of standardization regarding BPE assessment. Purpose To investigate how well results of quantitative BPE assessment methods correlate among themselves and with assessments made by radiologists experienced in breast MRI. Materials and Methods In this pseudoprospective analysis of 5773 breast MRI examinations from 3207 patients (mean age, 60 years ± 10 [SD]), the level of BPE was prospectively categorized according to the Breast Imaging Reporting and Data System by radiologists experienced in breast MRI. For automated extraction of BPE, fibroglandular tissue (FGT) was segmented in an automated pipeline. Four different published methods for automated quantitative BPE extractions were used: two methods (A and B) based on enhancement intensity and two methods (C and D) based on the volume of enhanced FGT. The results from all methods were correlated, and agreement was investigated in comparison with the respective radiologist-based categorization. For surrogate validation of BPE assessment, how accurately the methods distinguished premenopausal women with (n = 50) versus without (n = 896) antihormonal treatment was determined. Results Intensity-based methods (A and B) exhibited a correlation with radiologist-based categorization of 0.56 ± 0.01 and 0.55 ± 0.01, respectively, and volume-based methods (C and D) had a correlation of 0.52 ± 0.01 and 0.50 ± 0.01 (P < .001). There were notable correlation differences (P < .001) between the BPE determined with the four methods. Among the four quantitation methods, method D offered the highest accuracy for distinguishing women with versus without antihormonal therapy (P = .01). Conclusion Results of different methods for quantitative BPE assessment agree only moderately among themselves or with visual categories reported by experienced radiologists; intensity-based methods correlate more closely with radiologists' ratings than volume-based methods. © RSNA, 2024 Supplemental material is available for this article. See also the editorial by Mann in this issue.

View details for DOI 10.1148/radiol.232304

View details for PubMedID 39012249
Preserving fairness and diagnostic accuracy in private large-scale AI models for medical imaging. Communications medicine Tayebi Arasteh, S., Ziller, A., Kuhl, C., Makowski, M., Nebelung, S., Braren, R., Rueckert, D., Truhn, D., Kaissis, G. 2024; 4 (1): 46

Abstract

Artificial intelligence (AI) models are increasingly used in the medical domain. However, as medical data is highly sensitive, special precautions to ensure its protection are required. The gold standard for privacy preservation is the introduction of differential privacy (DP) to model training. Prior work indicates that DP has negative implications on model accuracy and fairness, which are unacceptable in medicine and represent a main barrier to the widespread use of privacy-preserving techniques. In this work, we evaluated the effect of privacy-preserving training of AI models regarding accuracy and fairness compared to non-private training.We used two datasets: (1) A large dataset (N = 193,311) of high quality clinical chest radiographs, and (2) a dataset (N = 1625) of 3D abdominal computed tomography (CT) images, with the task of classifying the presence of pancreatic ductal adenocarcinoma (PDAC). Both were retrospectively collected and manually labeled by experienced radiologists. We then compared non-private deep convolutional neural networks (CNNs) and privacy-preserving (DP) models with respect to privacy-utility trade-offs measured as area under the receiver operating characteristic curve (AUROC), and privacy-fairness trade-offs, measured as Pearson's r or Statistical Parity Difference.We find that, while the privacy-preserving training yields lower accuracy, it largely does not amplify discrimination against age, sex or co-morbidity. However, we find an indication that difficult diagnoses and subgroups suffer stronger performance hits in private training.Our study shows that - under the challenging realistic circumstances of a real-life clinical dataset - the privacy-preserving training of diagnostic deep learning models is possible with excellent diagnostic accuracy and fairness.

View details for DOI 10.1038/s43856-024-00462-6

View details for PubMedID 38486100

View details for PubMedCentralID PMC10940659
Large language models streamline automated machine learning for clinical studies. Nature communications Tayebi Arasteh, S., Han, T., Lotfinia, M., Kuhl, C., Kather, J. N., Truhn, D., Nebelung, S. 2024; 15 (1): 1603

Abstract

A knowledge gap persists between machine learning (ML) developers (e.g., data scientists) and practitioners (e.g., clinicians), hampering the full utilization of ML for clinical data analysis. We investigated the potential of the ChatGPT Advanced Data Analysis (ADA), an extension of GPT-4, to bridge this gap and perform ML analyses efficiently. Real-world clinical datasets and study details from large trials across various medical specialties were presented to ChatGPT ADA without specific guidance. ChatGPT ADA autonomously developed state-of-the-art ML models based on the original study's training data to predict clinical outcomes such as cancer development, cancer progression, disease complications, or biomarkers such as pathogenic gene sequences. Following the re-implementation and optimization of the published models, the head-to-head comparison of the ChatGPT ADA-crafted ML models and their respective manually crafted counterparts revealed no significant differences in traditional performance metrics (p ≥ 0.072). Strikingly, the ChatGPT ADA-crafted ML models often outperformed their counterparts. In conclusion, ChatGPT ADA offers a promising avenue to democratize ML in medicine by simplifying complex data analyses, yet should enhance, not replace, specialized training and resources, to promote broader applications in medical research and practice.

View details for DOI 10.1038/s41467-024-45879-8

View details for PubMedID 38383555

View details for PubMedCentralID PMC10881983
Encrypted federated learning for secure decentralized collaboration in cancer image analysis. Medical image analysis Truhn, D., Tayebi Arasteh, S., Saldanha, O. L., Müller-Franzes, G., Khader, F., Quirke, P., West, N. P., Gray, R., Hutchins, G. G., James, J. A., Loughrey, M. B., Salto-Tellez, M., Brenner, H., Brobeil, A., Yuan, T., Chang-Claude, J., Hoffmeister, M., Foersch, S., Han, T., Keil, S., Schulze-Hagen, M., Isfort, P., Bruners, P., Kaissis, G., Kuhl, C., Nebelung, S., Kather, J. N. 2024; 92: 103059

Abstract

Artificial intelligence (AI) has a multitude of applications in cancer research and oncology. However, the training of AI systems is impeded by the limited availability of large datasets due to data protection requirements and other regulatory obstacles. Federated and swarm learning represent possible solutions to this problem by collaboratively training AI models while avoiding data transfer. However, in these decentralized methods, weight updates are still transferred to the aggregation server for merging the models. This leaves the possibility for a breach of data privacy, for example by model inversion or membership inference attacks by untrusted servers. Somewhat-homomorphically-encrypted federated learning (SHEFL) is a solution to this problem because only encrypted weights are transferred, and model updates are performed in the encrypted space. Here, we demonstrate the first successful implementation of SHEFL in a range of clinically relevant tasks in cancer image analysis on multicentric datasets in radiology and histopathology. We show that SHEFL enables the training of AI models which outperform locally trained models and perform on par with models which are centrally trained. In the future, SHEFL can enable multiple institutions to co-train AI models without forsaking data governance and without ever transmitting any decryptable data to untrusted servers.

View details for DOI 10.1016/j.media.2023.103059

View details for PubMedID 38104402

View details for PubMedCentralID PMC10804934
Securing Collaborative Medical AI by Using Differential Privacy: Domain Transfer for Classification of Chest Radiographs RADIOLOGY-ARTIFICIAL INTELLIGENCE Arasteh, S., Lotfinia, M., Nolte, T., Saehn, M., Isfort, P., Kuhl, C., Nebelung, S., Kaissis, G., Truhn, D. 2024; 6 (1)

View details for DOI 10.1148/ryai.230212

View details for Web of Science ID 001171836700002
Multimodal Deep Learning for Integrating Chest Radiographs and Clinical Parameters: A Case for Transformers. Radiology Khader, F., Müller-Franzes, G., Wang, T., Han, T., Tayebi Arasteh, S., Haarburger, C., Stegmaier, J., Bressem, K., Kuhl, C., Nebelung, S., Kather, J. N., Truhn, D. 2023; 309 (1): e230806

Abstract

Background Clinicians consider both imaging and nonimaging data when diagnosing diseases; however, current machine learning approaches primarily consider data from a single modality. Purpose To develop a neural network architecture capable of integrating multimodal patient data and compare its performance to models incorporating a single modality for diagnosing up to 25 pathologic conditions. Materials and Methods In this retrospective study, imaging and nonimaging patient data were extracted from the Medical Information Mart for Intensive Care (MIMIC) database and an internal database comprised of chest radiographs and clinical parameters inpatients in the intensive care unit (ICU) (January 2008 to December 2020). The MIMIC and internal data sets were each split into training (n = 33 893, n = 28 809), validation (n = 740, n = 7203), and test (n = 1909, n = 9004) sets. A novel transformer-based neural network architecture was trained to diagnose up to 25 conditions using nonimaging data alone, imaging data alone, or multimodal data. Diagnostic performance was assessed using area under the receiver operating characteristic curve (AUC) analysis. Results The MIMIC and internal data sets included 36 542 patients (mean age, 63 years ± 17 [SD]; 20 567 male patients) and 45 016 patients (mean age, 66 years ± 16; 27 577 male patients), respectively. The multimodal model showed improved diagnostic performance for all pathologic conditions. For the MIMIC data set, the mean AUC was 0.77 (95% CI: 0.77, 0.78) when both chest radiographs and clinical parameters were used, compared with 0.70 (95% CI: 0.69, 0.71; P < .001) for only chest radiographs and 0.72 (95% CI: 0.72, 0.73; P < .001) for only clinical parameters. These findings were confirmed on the internal data set. Conclusion A model trained on imaging and nonimaging data outperformed models trained on only one type of data for diagnosing multiple diseases in patients in an ICU setting. © RSNA, 2023 Supplemental material is available for this article. See also the editorial by Kitamura and Topol in this issue.

View details for DOI 10.1148/radiol.230806

View details for PubMedID 37787671
Using Machine Learning to Reduce the Need for Contrast Agents in Breast MRI through Synthetic Images. Radiology Müller-Franzes, G., Huck, L., Tayebi Arasteh, S., Khader, F., Han, T., Schulz, V., Dethlefsen, E., Kather, J. N., Nebelung, S., Nolte, T., Kuhl, C., Truhn, D. 2023; 307 (3): e222211

Abstract

Background Reducing the amount of contrast agent needed for contrast-enhanced breast MRI is desirable. Purpose To investigate if generative adversarial networks (GANs) can recover contrast-enhanced breast MRI scans from unenhanced images and virtual low-contrast-enhanced images. Materials and Methods In this retrospective study of breast MRI performed from January 2010 to December 2019, simulated low-contrast images were produced by adding virtual noise to the existing contrast-enhanced images. GANs were then trained to recover the contrast-enhanced images from the simulated low-contrast images (approach A) or from the unenhanced T1- and T2-weighted images (approach B). Two experienced radiologists were tasked with distinguishing between real and synthesized contrast-enhanced images using both approaches. Image appearance and conspicuity of enhancing lesions on the real versus synthesized contrast-enhanced images were independently compared and rated on a five-point Likert scale. P values were calculated by using bootstrapping. Results A total of 9751 breast MRI examinations from 5086 patients (mean age, 56 years ± 10 [SD]) were included. Readers who were blinded to the nature of the images could not distinguish real from synthetic contrast-enhanced images (average accuracy of differentiation: approach A, 52 of 100; approach B, 61 of 100). The test set included images with and without enhancing lesions (29 enhancing masses and 21 nonmass enhancement; 50 total). When readers who were not blinded compared the appearance of the real versus synthetic contrast-enhanced images side by side, approach A image ratings were significantly higher than those of approach B (mean rating, 4.6 ± 0.1 vs 3.0 ± 0.2; P < .001), with the noninferiority margin met by synthetic images from approach A (P < .001) but not B (P > .99). Conclusion Generative adversarial networks may be useful to enable breast MRI with reduced contrast agent dose. © RSNA, 2023 Supplemental material is available for this article. See also the editorial by Bahl in this issue.

View details for DOI 10.1148/radiol.222211

View details for PubMedID 36943080
Accelerating breast MRI acquisition with generative AI models. European radiology Okolie, A., Dirrichs, T., Huck, L. C., Nebelung, S., Arasteh, S. T., Nolte, T., Han, T., Kuhl, C. K., Truhn, D. 2025; 35 (2): 1092-1100

Abstract

To investigate the use of the score-based diffusion model to accelerate breast MRI reconstruction.We trained a score-based model on 9549 MRI examinations of the female breast and employed it to reconstruct undersampled MRI images with undersampling factors of 2, 5, and 20. Images were evaluated by two experienced radiologists who rated the images based on their overall quality and diagnostic value on an independent test set of 100 additional MRI examinations.The score-based model produces MRI images of high quality and diagnostic value. Both T1- and T2-weighted MRI images could be reconstructed to a high degree of accuracy. Two radiologists rated the images as almost indistinguishable from the original images (rating 4 or 5 on a scale of 5) in 100% (radiologist 1) and 99% (radiologist 2) of cases when the acceleration factor was 2. This fraction dropped to 88% and 70% for an acceleration factor of 5 and to 5% and 21% with an extreme acceleration factor of 20.Score-based models can reconstruct MRI images at high fidelity, even at comparatively high acceleration factors, but further work on a larger scale of images is needed to ensure that diagnostic quality holds.The number of MRI examinations of the breast is expected to rise with MRI screening recommended for women with dense breasts. Accelerated image acquisition methods can help in making this examination more accessible.Accelerating breast MRI reconstruction remains a significant challenge in clinical settings. Score-based diffusion models can achieve near-perfect reconstruction for moderate undersampling factors. Faster breast MRI scans with maintained image quality could revolutionize clinic workflows and patient experience.

View details for DOI 10.1007/s00330-024-10853-x

View details for PubMedID 39088043

View details for PubMedCentralID PMC11782449
Enhancing diagnostic deep learning via self-supervised pretraining on large-scale, unlabeled non-medical images. European radiology experimental Tayebi Arasteh, S., Misera, L., Kather, J. N., Truhn, D., Nebelung, S. 2024; 8 (1): 10

Abstract

Pretraining labeled datasets, like ImageNet, have become a technical standard in advanced medical image analysis. However, the emergence of self-supervised learning (SSL), which leverages unlabeled data to learn robust features, presents an opportunity to bypass the intensive labeling process. In this study, we explored if SSL for pretraining on non-medical images can be applied to chest radiographs and how it compares to supervised pretraining on non-medical images and on medical images.We utilized a vision transformer and initialized its weights based on the following: (i) SSL pretraining on non-medical images (DINOv2), (ii) supervised learning (SL) pretraining on non-medical images (ImageNet dataset), and (iii) SL pretraining on chest radiographs from the MIMIC-CXR database, the largest labeled public dataset of chest radiographs to date. We tested our approach on over 800,000 chest radiographs from 6 large global datasets, diagnosing more than 20 different imaging findings. Performance was quantified using the area under the receiver operating characteristic curve and evaluated for statistical significance using bootstrapping.SSL pretraining on non-medical images not only outperformed ImageNet-based pretraining (p < 0.001 for all datasets) but, in certain cases, also exceeded SL on the MIMIC-CXR dataset. Our findings suggest that selecting the right pretraining strategy, especially with SSL, can be pivotal for improving diagnostic accuracy of artificial intelligence in medical imaging.By demonstrating the promise of SSL in chest radiograph analysis, we underline a transformative shift towards more efficient and accurate AI models in medical imaging.Self-supervised learning highlights a paradigm shift towards the enhancement of AI-driven accuracy and efficiency in medical imaging. Given its promise, the broader application of self-supervised learning in medical imaging calls for deeper exploration, particularly in contexts where comprehensive annotated datasets are limited.

View details for DOI 10.1186/s41747-023-00411-3

View details for PubMedID 38326501

View details for PubMedCentralID PMC10850044
Enhancing domain generalization in the AI-based analysis of chest radiographs with federated learning. Scientific reports Tayebi Arasteh, S., Kuhl, C., Saehn, M. J., Isfort, P., Truhn, D., Nebelung, S. 2023; 13 (1): 22576

Abstract

Developing robust artificial intelligence (AI) models that generalize well to unseen datasets is challenging and usually requires large and variable datasets, preferably from multiple institutions. In federated learning (FL), a model is trained collaboratively at numerous sites that hold local datasets without exchanging them. So far, the impact of training strategy, i.e., local versus collaborative, on the diagnostic on-domain and off-domain performance of AI models interpreting chest radiographs has not been assessed. Consequently, using 610,000 chest radiographs from five institutions across the globe, we assessed diagnostic performance as a function of training strategy (i.e., local vs. collaborative), network architecture (i.e., convolutional vs. transformer-based), single versus cross-institutional performance (i.e., on-domain vs. off-domain), imaging finding (i.e., cardiomegaly, pleural effusion, pneumonia, atelectasis, consolidation, pneumothorax, and no abnormality), dataset size (i.e., from n = 18,000 to 213,921 radiographs), and dataset diversity. Large datasets not only showed minimal performance gains with FL but, in some instances, even exhibited decreases. In contrast, smaller datasets revealed marked improvements. Thus, on-domain performance was mainly driven by training data size. However, off-domain performance leaned more on training diversity. When trained collaboratively across diverse external institutions, AI models consistently surpassed models trained locally for off-domain tasks, emphasizing FL's potential in leveraging data diversity. In conclusion, FL can bolster diagnostic privacy, reproducibility, and off-domain reliability of AI models and, potentially, optimize healthcare outcomes.

View details for DOI 10.1038/s41598-023-49956-8

View details for PubMedID 38114729

View details for PubMedCentralID PMC10730705
The effect of speech pathology on automatic speaker verification: a large-scale study. Scientific reports Tayebi Arasteh, S., Weise, T., Schuster, M., Noeth, E., Maier, A., Yang, S. H. 2023; 13 (1): 20476

Abstract

Navigating the challenges of data-driven speech processing, one of the primary hurdles is accessing reliable pathological speech data. While public datasets appear to offer solutions, they come with inherent risks of potential unintended exposure of patient health information via re-identification attacks. Using a comprehensive real-world pathological speech corpus, with over n[Formula: see text]3800 test subjects spanning various age groups and speech disorders, we employed a deep-learning-driven automatic speaker verification (ASV) approach. This resulted in a notable mean equal error rate (EER) of [Formula: see text], outstripping traditional benchmarks. Our comprehensive assessments demonstrate that pathological speech overall faces heightened privacy breach risks compared to healthy speech. Specifically, adults with dysphonia are at heightened re-identification risks, whereas conditions like dysarthria yield results comparable to those of healthy speakers. Crucially, speech intelligibility does not influence the ASV system's performance metrics. In pediatric cases, particularly those with cleft lip and palate, the recording environment plays a decisive role in re-identification. Merging data across pathological types led to a marked EER decrease, suggesting the potential benefits of pathological diversity in ASV, accompanied by a logarithmic boost in ASV effectiveness. In essence, this research sheds light on the dynamics between pathological speech and speaker verification, emphasizing its crucial role in safeguarding patient confidentiality in our increasingly digitized healthcare era.

View details for DOI 10.1038/s41598-023-47711-7

View details for PubMedID 37993490

View details for PubMedCentralID PMC10665418
Automated segmentation of 3D cine cardiovascular magnetic resonance imaging. Frontiers in cardiovascular medicine Tayebi Arasteh, S., Romanowicz, J., Pace, D. F., Golland, P., Powell, A. J., Maier, A. K., Truhn, D., Brosch, T., Weese, J., Lotfinia, M., van der Geest, R. J., Moghari, M. H. 2023; 10: 1167500

Abstract

As the life expectancy of children with congenital heart disease (CHD) is rapidly increasing and the adult population with CHD is growing, there is an unmet need to improve clinical workflow and efficiency of analysis. Cardiovascular magnetic resonance (CMR) is a noninvasive imaging modality for monitoring patients with CHD. CMR exam is based on multiple breath-hold 2-dimensional (2D) cine acquisitions that should be precisely prescribed and is expert and institution dependent. Moreover, 2D cine images have relatively thick slices, which does not allow for isotropic delineation of ventricular structures. Thus, development of an isotropic 3D cine acquisition and automatic segmentation method is worthwhile to make CMR workflow straightforward and efficient, as the present work aims to establish.Ninety-nine patients with many types of CHD were imaged using a non-angulated 3D cine CMR sequence covering the whole-heart and great vessels. Automatic supervised and semi-supervised deep-learning-based methods were developed for whole-heart segmentation of 3D cine images to separately delineate the cardiac structures, including both atria, both ventricles, aorta, pulmonary arteries, and superior and inferior vena cavae. The segmentation results derived from the two methods were compared with the manual segmentation in terms of Dice score, a degree of overlap agreement, and atrial and ventricular volume measurements.The semi-supervised method resulted in a better overlap agreement with the manual segmentation than the supervised method for all 8 structures (Dice score 83.23 ± 16.76% vs. 77.98 ± 19.64%; P-value ≤0.001). The mean difference error in atrial and ventricular volumetric measurements between manual segmentation and semi-supervised method was lower (bias ≤ 5.2 ml) than the supervised method (bias ≤ 10.1 ml).The proposed semi-supervised method is capable of cardiac segmentation and chamber volume quantification in a CHD population with wide anatomical variability. It accurately delineates the heart chambers and great vessels and can be used to accurately calculate ventricular and atrial volumes throughout the cardiac cycle. Such a segmentation method can reduce inter- and intra- observer variability and make CMR exams more standardized and efficient.

View details for DOI 10.3389/fcvm.2023.1167500

View details for PubMedID 37904806

View details for PubMedCentralID PMC10613522
Fibroglandular tissue segmentation in breast MRI using vision transformers: a multi-institutional evaluation. Scientific reports Müller-Franzes, G., Müller-Franzes, F., Huck, L., Raaff, V., Kemmer, E., Khader, F., Arasteh, S. T., Lemainque, T., Kather, J. N., Nebelung, S., Kuhl, C., Truhn, D. 2023; 13 (1): 14207

Abstract

Accurate and automatic segmentation of fibroglandular tissue in breast MRI screening is essential for the quantification of breast density and background parenchymal enhancement. In this retrospective study, we developed and evaluated a transformer-based neural network for breast segmentation (TraBS) in multi-institutional MRI data, and compared its performance to the well established convolutional neural network nnUNet. TraBS and nnUNet were trained and tested on 200 internal and 40 external breast MRI examinations using manual segmentations generated by experienced human readers. Segmentation performance was assessed in terms of the Dice score and the average symmetric surface distance. The Dice score for nnUNet was lower than for TraBS on the internal testset (0.909 ± 0.069 versus 0.916 ± 0.067, P < 0.001) and on the external testset (0.824 ± 0.144 versus 0.864 ± 0.081, P = 0.004). Moreover, the average symmetric surface distance was higher (= worse) for nnUNet than for TraBS on the internal (0.657 ± 2.856 versus 0.548 ± 2.195, P = 0.001) and on the external testset (0.727 ± 0.620 versus 0.584 ± 0.413, P = 0.03). Our study demonstrates that transformer-based networks improve the quality of fibroglandular tissue segmentation in breast MRI compared to convolutional-based models like nnUNet. These findings might help to enhance the accuracy of breast density and parenchymal enhancement quantification in breast MRI screening.

View details for DOI 10.1038/s41598-023-41331-x

View details for PubMedID 37648728

View details for PubMedCentralID PMC10468506
A multimodal comparison of latent denoising diffusion probabilistic models and generative adversarial networks for medical image synthesis. Scientific reports Müller-Franzes, G., Niehues, J. M., Khader, F., Arasteh, S. T., Haarburger, C., Kuhl, C., Wang, T., Han, T., Nolte, T., Nebelung, S., Kather, J. N., Truhn, D. 2023; 13 (1): 12098

Abstract

Although generative adversarial networks (GANs) can produce large datasets, their limited diversity and fidelity have been recently addressed by denoising diffusion probabilistic models, which have demonstrated superiority in natural image synthesis. In this study, we introduce Medfusion, a conditional latent DDPM designed for medical image generation, and evaluate its performance against GANs, which currently represent the state-of-the-art. Medfusion was trained and compared with StyleGAN-3 using fundoscopy images from the AIROGS dataset, radiographs from the CheXpert dataset, and histopathology images from the CRCDX dataset. Based on previous studies, Progressively Growing GAN (ProGAN) and Conditional GAN (cGAN) were used as additional baselines on the CheXpert and CRCDX datasets, respectively. Medfusion exceeded GANs in terms of diversity (recall), achieving better scores of 0.40 compared to 0.19 in the AIROGS dataset, 0.41 compared to 0.02 (cGAN) and 0.24 (StyleGAN-3) in the CRMDX dataset, and 0.32 compared to 0.17 (ProGAN) and 0.08 (StyleGAN-3) in the CheXpert dataset. Furthermore, Medfusion exhibited equal or higher fidelity (precision) across all three datasets. Our study shows that Medfusion constitutes a promising alternative to GAN-based models for generating high-quality medical images, leading to improved diversity and less artifacts in the generated images.

View details for DOI 10.1038/s41598-023-39278-0

View details for PubMedID 37495660

View details for PubMedCentralID PMC10372018
Medical transformer for multimodal survival prediction in intensive care: integration of imaging and non-imaging data. Scientific reports Khader, F., Kather, J. N., Müller-Franzes, G., Wang, T., Han, T., Tayebi Arasteh, S., Hamesch, K., Bressem, K., Haarburger, C., Stegmaier, J., Kuhl, C., Nebelung, S., Truhn, D. 2023; 13 (1): 10666

Abstract

When clinicians assess the prognosis of patients in intensive care, they take imaging and non-imaging data into account. In contrast, many traditional machine learning models rely on only one of these modalities, limiting their potential in medical applications. This work proposes and evaluates a transformer-based neural network as a novel AI architecture that integrates multimodal patient data, i.e., imaging data (chest radiographs) and non-imaging data (clinical data). We evaluate the performance of our model in a retrospective study with 6,125 patients in intensive care. We show that the combined model (area under the receiver operating characteristic curve [AUROC] of 0.863) is superior to the radiographs-only model (AUROC = 0.811, p < 0.001) and the clinical data-only model (AUROC = 0.785, p < 0.001) when tasked with predicting in-hospital survival per patient. Furthermore, we demonstrate that our proposed model is robust in cases where not all (clinical) data points are available.

View details for DOI 10.1038/s41598-023-37835-1

View details for PubMedID 37393383

View details for PubMedCentralID PMC10314902
Denoising diffusion probabilistic models for 3D medical image generation. Scientific reports Khader, F., Müller-Franzes, G., Tayebi Arasteh, S., Han, T., Haarburger, C., Schulze-Hagen, M., Schad, P., Engelhardt, S., Baeßler, B., Foersch, S., Stegmaier, J., Kuhl, C., Nebelung, S., Kather, J. N., Truhn, D. 2023; 13 (1): 7303

Abstract

Recent advances in computer vision have shown promising results in image generation. Diffusion probabilistic models have generated realistic images from textual input, as demonstrated by DALL-E 2, Imagen, and Stable Diffusion. However, their use in medicine, where imaging data typically comprises three-dimensional volumes, has not been systematically evaluated. Synthetic images may play a crucial role in privacy-preserving artificial intelligence and can also be used to augment small datasets. We show that diffusion probabilistic models can synthesize high-quality medical data for magnetic resonance imaging (MRI) and computed tomography (CT). For quantitative evaluation, two radiologists rated the quality of the synthesized images regarding "realistic image appearance", "anatomical correctness", and "consistency between slices". Furthermore, we demonstrate that synthetic images can be used in self-supervised pre-training and improve the performance of breast segmentation models when data is scarce (Dice scores, 0.91 [without synthetic data], 0.95 [with synthetic data]).

View details for DOI 10.1038/s41598-023-34341-2

View details for PubMedID 37147413

View details for PubMedCentralID PMC10163245
Collaborative training of medical artificial intelligence models with non-uniform labels. Scientific reports Tayebi Arasteh, S., Isfort, P., Saehn, M., Mueller-Franzes, G., Khader, F., Kather, J. N., Kuhl, C., Nebelung, S., Truhn, D. 2023; 13 (1): 6046

Abstract

Due to the rapid advancements in recent years, medical image analysis is largely dominated by deep learning (DL). However, building powerful and robust DL models requires training with large multi-party datasets. While multiple stakeholders have provided publicly available datasets, the ways in which these data are labeled vary widely. For Instance, an institution might provide a dataset of chest radiographs containing labels denoting the presence of pneumonia, while another institution might have a focus on determining the presence of metastases in the lung. Training a single AI model utilizing all these data is not feasible with conventional federated learning (FL). This prompts us to propose an extension to the widespread FL process, namely flexible federated learning (FFL) for collaborative training on such data. Using 695,000 chest radiographs from five institutions from across the globe-each with differing labels-we demonstrate that having heterogeneously labeled datasets, FFL-based training leads to significant performance increase compared to conventional FL training, where only the uniformly annotated images are utilized. We believe that our proposed algorithm could accelerate the process of bringing collaborative training methods from research and simulation phase to the real-world applications in healthcare.

View details for DOI 10.1038/s41598-023-33303-y

View details for PubMedID 37055456

View details for PubMedCentralID PMC10102221
Federated learning for secure development of AI models for Parkinson's disease detection using speech from different languages Arasteh, S., Rios-Urrego, C., Noeth, E., Maier, A., Yang, S., Rusz, J., Orozco-Arroyave, J., Int Speech Commun Assoc ISCA-INT SPEECH COMMUNICATION ASSOC. 2023: 5003-5007

View details for DOI 10.21437/Interspeech.2023-2108

View details for Web of Science ID 001186650305033
VECTOR-QUANTIZED LATENT FLOWS FOR MEDICAL IMAGE SYNTHESIS AND OUT-OF-DISTRIBUTION DETECTION Khader, F., Mueller-Franzes, G., Arasteh, S., Han, T., Kather, J., Stegmaier, J., Nebelung, S., Truhn, D., IEEE IEEE. 2023

View details for DOI 10.1109/ISBI53787.2023.10230460

View details for Web of Science ID 001062050500138
Conversion Between Cubic Bezier Curves and Catmull–Rom Splines SN Computer Science Tayebi Arasteh, S., Kalisz, A. 2021; 2

View details for DOI 10.1007/s42979-021-00770-x
How Will Your Tweet Be Received? Predicting the Sentiment Polarity of Tweet Replies Arasteh, S., Monajem, M., Christlein, V., Heinrich, P., Nicolaou, A., Boldaji, H., Lotfinia, M., Evert, S., IEEE IEEE. 2021: 370-373

View details for DOI 10.1109/ICSC50631.2021.00068

View details for Web of Science ID 000668692000064

Soroosh Tayebi Arasteh

Postdoctoral Scholar, Urology

Professional Education

Stanford Advisors

Contact

Links

Lab Affiliations

All Publications

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract