Clinical Focus


  • Diagnostic Radiology

Academic Appointments


  • Assistant Professor - University Medical Line, Radiology

Professional Education


  • Board Certification: Swiss Medical Federation FMH, Diagnostic Radiology (2023)
  • Residency: University Hospital Zurich (2021) Switzerland
  • Medical Education: University of Heidelberg (2015) Germany

All Publications


  • Merlin: a computed tomography vision-language foundation model and dataset. Nature Blankemeier, L., Kumar, A., Cohen, J. P., Liu, J., Liu, L., Van Veen, D., Gardezi, S. J., Yu, H., Paschali, M., Chen, Z., Delbrouck, J. B., Reis, E., Holland, R., Truyts, C., Bluethgen, C., Wu, Y., Lian, L., Jensen, M. E., Ostmeier, S., Varma, M., Valanarasu, J. M., Fang, Z., Huo, Z., Nabulsi, Z., Ardila, D., Weng, W. H., Junior, E. A., Ahuja, N., Fries, J., Shah, N. H., Zaharchuk, G., Willis, M., Yala, A., Johnston, A., Boutin, R. D., Wentland, A., Langlotz, C. P., Hom, J., Gatidis, S., Chaudhari, A. S. 2026

    Abstract

    The large volume of abdominal computed tomography (CT) scans1,2 coupled with the shortage of radiologists3-6 have intensified the need for automated medical image analysis tools. Previous state-of-the-art approaches for automated analysis leverage vision-language models (VLMs) that jointly model images and radiology reports7-12. However, current medical VLMs are generally limited to 2D images and short reports. Here to overcome these shortcomings for abdominal CT interpretation, we introduce Merlin, a 3D VLM that learns from volumetric CT scans, electronic health record data and radiology reports. This approach is enabled by a multistage pretraining framework that does not require additional manual annotations. We trained Merlin using a high-quality clinical dataset of paired CT scans (>6 million images from 15,331 CT scans), diagnosis codes (>1.8 million codes) and radiology reports (>6 million tokens). We comprehensively evaluated Merlin on 6 task types and 752 individual tasks that covered diagnostic, prognostic and quality-related tasks. The non-adapted (off-the-shelf) tasks included zero-shot classification of findings (30 findings), phenotype classification (692 phenotypes) and zero-shot cross-modal retrieval (image-to-findings and image-to-impression). The model-adapted tasks included 5-year chronic disease prediction (6 diseases), radiology report generation and 3D semantic segmentation (20 organs). We validated Merlin at scale, with internal testing on 5,137 CT scans and external testing on 44,098 CT scans from 3 independent sites and 2 public datasets. The results demonstrated high generalization across institutions and anatomies. Merlin outperformed 2D VLMs, CT foundation models and off-the-shelf radiology models. We also computed scaling laws and conducted ablation studies to identify optimal training strategies. We release our trained models, code and dataset for 25,494 pairs of abdominal CT scans and radiology reports. Our results demonstrate how Merlin may assist in the interpretation of abdominal CT scans and mitigate the burden on radiologists while simultaneously adding value for future biomarker discovery and disease risk stratification.

    View details for DOI 10.1038/s41586-026-10181-8

    View details for PubMedID 41781626

    View details for PubMedCentralID 11868850

  • Reporting checklist for foundation and large language models in medical research (REFINE): an international consensus guideline. Diagnostic and interventional radiology (Ankara, Turkey) Mese, I., Akinci D'Antonoli, T., Bluethgen, C., Bressem, K., Cuocolo, R., Chaudhari, A., Tejani, A. S., Isaac, A., Ponsiglione, A., Meddeb, A., Khosravi, B., Le Guellec, B., Kahn, C. E., Suh, C. H., Pinto Dos Santos, D., Koh, D. M., Tzanis, E., Kotter, E., Colak, E., Kitamura, F., Busch, F., Nensa, F., Yang, G., Müller, H., Kather, J. N., Nawabi, J., Kleesiek, J., Zhong, J., Santinha, J., Haubold, J., de Almeida, J. G., Lekadir, K., Marias, K., Reiner, L. N., Maier-Hein, L., Moy, L., Adams, L. C., Martí-Bonmatí, L., Paschali, M., Moassefi, M., Dietzel, M., Huisman, M., Ingrisch, M., Klontzas, M. E., Papanikolaou, N., Diaz, O., Kuriki, P., Seeböck, P., Rouzrokh, P., Strotzer, Q. D., Park, S. H., Faghani, S., Tayebi Arasteh, S., Kim, S. H., Venugopal, V. K., Kim, W., Kocak, B. 2026

    Abstract

    To develop the REporting checklist for FoundatIon and large laNguagE models (REFINE), an international reporting guideline for transparent and reproducible reporting of foundation model (FM) and large language model (LLM) studies in medical research, including imaging artificial intelligence (AI) applications.The protocol was prespecified and publicly archived. A modified Delphi process was conducted to establish reporting standards for unimodal and multimodal FM and LLM applications involving text, imaging, and structured data. The steering committee coordinated protocol development, expert recruitment, all Delphi rounds, and the harmonization phase. Decisions were made based on predefined consensus thresholds. In Rounds 1 and 2, structured ratings and free-text feedback informed iterative revisions. In the post-Delphi harmonization phase, terminology was standardized, and detailed reporting instructions were finalized.The REFINE development group comprised 57 contributors from 17 countries, and 54 panelists from 16 countries completed Rounds 1 and 2. The harmonization phase was completed by three expert panelists and the steering committee. The entire process produced a 44-item, six-section framework with standardized terminology and detailed reporting instructions, supported by an online platform for practical use (https://refinechecklist.github.io/refine/checklist.html).The REFINE provides a comprehensive, consensus-based reporting standard for medical FM and LLM research, including imaging AI studies. The online version facilitates practical implementation.The REFINE enables transparent, comparable, and reproducible reporting of FM and LLM studies, supporting reliable evidence synthesis in medical and imaging-focused AI studies.

    View details for DOI 10.4274/dir.2026.263812

    View details for PubMedID 41742713

  • Generalist foundation models from a multimodal dataset for 3D computed tomography. Nature biomedical engineering Hamamci, I. E., Er, S., Wang, C., Almas, F., Simsek, A. G., Esirgun, S. N., Dogan, I., Durugol, O. F., Hou, B., Shit, S., Dai, W., Xu, M., Reynaud, H., Dasdelen, M. F., Wittmann, B., Amiranashvili, T., Simsar, E., Simsar, M., Erdemir, E. B., Alanbay, A., Sekuboyina, A., Lafci, B., Kaplan, A., Lu, Z., Polacin, M., Kainz, B., Bluethgen, C., Batmanghelich, K., Ozdemir, M. K., Menze, B. 2026

    Abstract

    Advancements in medical imaging AI, particularly in 3D imaging, have been limited due to the scarcity of comprehensive datasets. We introduce CT-RATE, a public dataset that pairs 3D medical images with corresponding textual reports. CT-RATE comprises 25,692 non-contrast 3D chest CT scans from 21,304 unique patients. Each scan is accompanied by its corresponding radiology report. Leveraging CT-RATE, we develop CT-CLIP, a CT-focused contrastive language-image pretraining framework designed for broad applications without the need for task-specific training. We demonstrate how CT-CLIP can be used in multi-abnormality detection and case retrieval, and outperforms state-of-the-art fully supervised models across all key metrics. By combining CT-CLIP's vision encoder with a pretrained large language model, we create CT-CHAT, a vision-language foundational chat model for 3D chest CT volumes. Fine-tuned on over 2.7 million question-answer pairs derived from the CT-RATE dataset, CT-CHAT underscores the necessity for specialized methods in 3D medical imaging. Collectively, the open-source release of CT-RATE, CT-CLIP and CT-CHAT not only addresses critical challenges in 3D medical imaging but also lays the groundwork for future innovations in medical AI and improved patient care.

    View details for DOI 10.1038/s41551-025-01599-y

    View details for PubMedID 41680439

    View details for PubMedCentralID 10644676

  • Guidelines for Reporting Studies on Large Language Models in Radiology: An International Delphi Expert Survey. Radiology Kottlors, J., Iuga, A. I., Bluethgen, C., Bressem, K., Kather, J. N., Moy, L., Wald, C., Wang, W., Liu, T., Ranschaert, E., Dratsch, T., Kleesiek, J., Gertz, R. J., Rajpurkar, P., Bedayat, A., Fink, M. A., Zeeck, A., Chaudhari, A., Alkasab, T., Wu, H., Nensa, F., Wang, B., Große Hokamp, N., Laukamp, K. R., Persigehl, T., Maintz, D., Truhn, D., Lennartz, S. 2026; 318 (2): e250913

    Abstract

    Large language models (LLMs) have transformative potential in radiology, including textual summaries, diagnostic decision support, proofreading, and image analysis. However, the rapid increase in studies investigating these models, along with the lack of standardized LLM-specific reporting practices, affects reproducibility, reliability, and clinical applicability. To address this, reporting guidelines for LLM studies in radiology were developed using a two-step process. First, a systematic review of LLM studies in radiology was conducted across PubMed, IEEE Xplore, and the ACM Digital Library, covering publications between May 2023 and March 2024. Of 511 screened studies, 57 were included to identify relevant aspects for the guidelines. Then, in a Delphi process, 20 international experts developed the final list of items for inclusion. Items consented as relevant were summarized into a structured checklist containing 32 items across six key categories: general information and data input; prompting and fine-tuning; performance metrics; ethics and data transparency; implementation, risks, and limitations; and further/optional aspects. The final FLAIR (Framework for LLM Assessment in Radiology) checklist aims to standardize reporting of LLM studies in radiology, fostering transparency, reproducibility, comparability, and clinical applicability to enhance clinical translation and patient care. © The Author(s) 2026. Published by the Radiological Society of North America under a CC BY 4.0 license. Supplemental material is available for this article.

    View details for DOI 10.1148/radiol.250913

    View details for PubMedID 41631991

  • Automated lung texture analysis for assessing interstitial lung disease in systemic sclerosis: Diagnostic accuracy in photon-counting-detector and conventional energy-integrating-detector CT. European journal of radiology Happe, J., Bruni, C., Jungblut, L., Landini, N., Strappa, C., Bluethgen, C., Elhai, M., Dobrota, R., Mihai, C., Muraru, S., Hoffmann-Vold, A. M., Larici, A. R., Frauenfelder, T., Distler, O., Kroschke, J. 2025; 195: 112605

    Abstract

    To evaluate the performance of automated Lung Texture Analysis (LTA) in assessing interstitial lung disease (ILD) in systemic sclerosis (SSc) using low-dose photon-counting detector CT (PCD-CT) compared to conventional low-dose energy-integrating detector CT (EID-CT).In this study of a prospectively enrolled SSc cohort, a post-hoc analysis on 186 patients (93 PCD-CT, 93 EID-CT), matched by propensity scoring, was performed. Visual ILD assessment by three expert radiologists served as the reference standard. Image quality assessment was performed using Likert-scales by expert radiologists and signal-to-noise ratios (SNR). Quantitative ILD features and extent were extracted using LTA (Imbio, CALIPER-based). Diagnostic accuracy was assessed using ROC-AUC analysis.LTA-based assessment of ILD on PCD-CT demonstrated a higher AUC for detecting ILD presence (AUC = 0.846) compared to EID-CT (AUC = 0.772). PCD-CT also exhibited superior AUCs in identifying specific ILD features, including ground-glass opacities, reticulation, and honeycombing. However, EID-CT showed higher AUCs than PCD-CT in detecting extensive ILD (>20 % lung involvement; AUC = 0.978 vs. 0.842). Despite significantly lower radiation dose, PCD-CT achieved comparable SNR and superior image quality ratings on Likert-scale.Both EID-CT and PCD-CT demonstrated acceptable to excellent AUC values, indicating their strong applicability in ILD assessment. Further, LTA using PCD-CT consistently provided excellent AUCs for detecting individual ILD features in SSc, supporting its clinical utility despite being trained on PCD-CT data. PCD-CT's enhanced image quality and lower radiation dose make it a promising tool for longitudinal ILD assessment. Further multicenter validation is warranted.

    View details for DOI 10.1016/j.ejrad.2025.112605

    View details for PubMedID 41401537

  • Improving Performance, Robustness, and Fairness of Radiographic AI Models with Finely-Controllable Synthetic Data. Research square Moroianu, S. L., Bluethgen, C., Chambon, P., Cherti, M., Delbrouck, J. B., Paschali, M., Price, B., Gichoya, J., Jitsev, J., Langlotz, C. P., Chaudhari, A. S. 2025

    Abstract

    Achieving robust performance and fairness across diverse patient populations remains a central challenge in developing clinically deployable deep learning models for diagnostic imaging. Synthetic data generation has emerged as a promising strategy to address current limitations in dataset scale and diversity. In this study, we introduce RoentGen-v2, a state-of-the-art text-to-image diffusion model for chest radiographs that enables fine-grained control over both radiographic findings and patient demographic attributes, including sex, age, and race/ethnicity. RoentGen-v2 is the first model to generate clinically plausible chest radiographs with explicit demographic conditioning, facilitating the creation of a large, demographically balanced synthetic dataset comprising over 565,000 images. We use this large synthetic dataset to evaluate optimal training pipelines for downstream disease classification models. In contrast to prior work that combines real and synthetic data naively, we propose an improved training strategy that leverages synthetic data for supervised pretraining, followed by fine-tuning on real data. Through extensive evaluation on over 137,000 held-out chest radiographs from five institutions, we demonstrate that synthetic pretraining consistently improves model performance, generalization to out-of-distribution settings, and fairness across demographic subgroups defined across varying fairness metrics. Across datasets, synthetic pretraining led to a 6.5% accuracy increase in the performance of downstream classification models, compared to a modest 2.7% increase when naively combining real and synthetic data. We observe this performance improvement simultaneously with the reduction of the underdiagnosis fairness gap by 19.3%, with marked improvements across intersectional subgroups of sex, age, and race/ethnicity. Our proposed data-centric training approach that combines high-fidelity synthetic training data with multi-stage training pipelines is label-efficient, reducing reliance on large quantities of annotated real data. These results highlight the potential of demographically controllable synthetic imaging to advance equitable and generalizable medical deep learning under real-world data constraints. We open source our code, trained models, and synthetic dataset.

    View details for DOI 10.21203/rs.3.rs-7687810/v1

    View details for PubMedID 41356360

    View details for PubMedCentralID PMC12676388

  • The Effect of X-ray Dose Photon-Counting Detector Computed Tomography on Nodule Properties in a Lung Cancer Screening Cohort INVESTIGATIVE RADIOLOGY Kerber, B., Ensle, F., Kroschke, J., Strappa, C., Stolzmann-Hinzpeter, R., Bluthgen, C., Marty, M., Larici, A., Frauenfelder, T., Jungblut, L. 2025; 60 (10): 627-635

    Abstract

    The aim of the study was to evaluate the effect of photon-counting detector (PCD-)CT dose reduction to x-ray equivalent levels on nodule detection, diameter, volume, and density compared to a low-dose reference standard using semiautomated and manual methods.Between February and July 2023, 101 prospectively enrolled participants underwent noncontrast same-study low- and chest x-ray-dose CT scans using PCD-CT. Patients who were not referred for lung cancer screening or nodule follow-up, as well as those with nodules smaller than 5 mm in diameter, were excluded. Nodule detection and measurement of nodule diameters and volumes was semiautomatically performed for low- and x-ray-dose scans using computer-aided diagnosis software. Additionally, 2 blinded readers manually measured largest nodule diameters and examined nodule density. Nodules were classified using Lung-RADS v2022. Image quality was assessed with subjective and objective measures.Mean CTDIvol for x-ray dose scans was 0.11 ± 0.03 mGy, compared to 0.65 ± 0.15 mGy for low-dose images ( P  < 0.001). One hundred seventy-two nodules larger than 5 mm were detected in 53 of the 101 participants (32 male, 61.6 ± 12.5 years; 21 female, 60.3 ± 12.5 years). The semiautomated method had high overall sensitivity for nodule detection (0.94) on x-ray dose scans, with a higher sensitivity for solid nodules (>0.95) and lower for subsolid nodules (>0.86). Nodules not detected on x-ray dose scans were significantly smaller. Semiautomated measurements underestimated nodule diameter for solid nodules on x-ray dose scans ( P  = 0.01), but no significant effect for nodule volume was found ( P  = 0.775). Readers rated nodule density less dense on x-ray dose scans (R1: P  < 0.001, R2: P  = 0.006). There was no significant difference in nodule diameter for both readers between scan doses (R1: P  = 0.141; R2: P  = 0.554). There were good to excellent correlations between semiautomated and reader nodule diameters. Agreement and accuracy between low-dose and x-ray dose Lung-RADS classifications across methods were good (Cohens' к = 0.73, 0.62, 0.76 for semiautomated method, R1 and R2; resp. Accuracy: 0.82, 0.78, 0.85). No Lung-RADS classification changes were observed with semiautomated volumetric measurements of nodules.Semiautomated nodule detection is highly sensitive in PCD-CT x-ray dose scans. Semiautomated nodule volume measurement is more robust to image quality changes than nodule diameter. Accurate semiautomated and manual nodule measurements are feasible on x-ray dose scans, but nodule density was in tendency underestimated. Nodule classification using Lung-RADS was shown to be accurate on x-ray dose scans.

    View details for DOI 10.1097/RLI.0000000000001174

    View details for Web of Science ID 001566777900003

    View details for PubMedID 40054009

  • Generative Artificial Intelligence Increases Radiograph Reporting Efficiency. AJR. American journal of roentgenology Bluethgen, C. 2025

    View details for DOI 10.2214/AJR.25.33695

    View details for PubMedID 40801499

  • Foundation models for radiology: fundamentals, applications, opportunities, challenges, risks, and prospects. Diagnostic and interventional radiology (Ankara, Turkey) Akinci D'Antonoli, T., Bluethgen, C., Cuocolo, R., Klontzas, M. E., Ponsiglione, A., Kocak, B. 2025

    Abstract

    Foundation models (FMs) represent a significant evolution in artificial intelligence (AI), impacting diverse fields. Within radiology, this evolution offers greater adaptability, multimodal integration, and improved generalizability compared with traditional narrow AI. Utilizing large-scale pre-training and efficient fine-tuning, FMs can support diverse applications, including image interpretation, report generation, integrative diagnostics combining imaging with clinical/laboratory data, and synthetic data creation, holding significant promise for advancements in precision medicine. However, clinical translation of FMs faces several substantial challenges. Key concerns include the inherent opacity of model decision-making processes, environmental and social sustainability issues, risks to data privacy, complex ethical considerations, such as bias and fairness, and navigating the uncertainty of regulatory frameworks. Moreover, rigorous validation is essential to address inherent stochasticity and the risk of hallucination. This international collaborative effort provides a comprehensive overview of the fundamentals, applications, opportunities, challenges, and prospects of FMs, aiming to guide their responsible and effective adoption in radiology and healthcare.

    View details for DOI 10.4274/dir.2025.253445

    View details for PubMedID 40626693

  • Cybersecurity Threats and Mitigation Strategies for Large Language Models in Health Care. Radiology. Artificial intelligence Akinci D'Antonoli, T., Tejani, A. S., Khosravi, B., Bluethgen, C., Busch, F., Bressem, K. K., Adams, L. C., Moassefi, M., Faghani, S., Gichoya, J. W. 2025: e240739

    Abstract

    "Just Accepted" papers have undergone full peer review and have been accepted for publication in Radiology: Artificial Intelligence. This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content. The integration of large language models (LLMs) into health care offers tremendous opportunities to improve medical practice and patient care. Besides being susceptible to biases and threats common to all artificial intelligence systems, LLMs pose unique cybersecurity risks that must be carefully evaluated before these AI models are deployed in health care. LLMs can be exploited in several ways, such as malicious attacks, privacy breaches, and unauthorized manipulation of patient data. Moreover, malicious actors could use LLMs to infer sensitive patient information from training data. Furthermore, manipulated or poisoned data fed into these models could change their results in a way that is beneficial for the malicious actors. This report presents the cybersecurity challenges posed by LLMs in health care and provides strategies for mitigation. By implementing robust security measures and adhering to best practices during the model development, training, and deployment stages, stakeholders can help minimize these risks and protect patient privacy. ©RSNA, 2025.

    View details for DOI 10.1148/ryai.240739

    View details for PubMedID 40366259

  • Best Practices for Large Language Models in Radiology. Radiology Bluethgen, C., Van Veen, D., Zakka, C., Link, K. E., Fanous, A. H., Daneshjou, R., Frauenfelder, T., Langlotz, C. P., Gatidis, S., Chaudhari, A. 2025; 315 (1): e240528

    Abstract

    Radiologists must integrate complex imaging data with clinical information to produce actionable insights. This task requires a nuanced application of language across many activities, including managing clinical requests, analyzing imaging findings in the context of clinical data, interpreting these through the radiologist's lens, and effectively documenting and communicating the outcomes. Radiology practices must ensure reliable communication among numerous systems and stakeholders critical for medical decision-making. Large language models (LLMs) offer an opportunity to improve the management and interpretation of the vast amounts of text data in radiology. Despite being developed as general-purpose tools, these advanced computational models demonstrate impressive capabilities in specialized tasks, even without specific training. Unlocking the potential of LLMs for radiology requires an understanding of their foundations and a strategic approach to navigate their idiosyncrasies. This review, drawing from practical radiology and machine learning expertise, provides general and technically adept radiologists insight into the potential of LLMs in radiology. It also equips those interested in implementing applicable best practices that have so far stood the test of time in the rapidly evolving landscape of LLMs. The review provides practical advice for optimizing LLM characteristics for radiology practices, including advice on limitations, effective prompting, and fine-tuning strategies.

    View details for DOI 10.1148/radiol.240528

    View details for PubMedID 40298602

  • Using deep feature distances for evaluating the perceptual quality of MR image reconstructions. Magnetic resonance in medicine Adamson, P. M., Desai, A. D., Dominic, J., Varma, M., Bluethgen, C., Wood, J. P., Syed, A. B., Boutin, R. D., Stevens, K. J., Vasanawala, S., Pauly, J. M., Gunel, B., Chaudhari, A. S. 2025

    Abstract

    Commonly used MR image quality (IQ) metrics have poor concordance with radiologist-perceived diagnostic IQ. Here, we develop and explore deep feature distances (DFDs)-distances computed in a lower-dimensional feature space encoded by a convolutional neural network (CNN)-as improved perceptual IQ metrics for MR image reconstruction. We further explore the impact of distribution shifts between images in the DFD CNN encoder training data and the IQ metric evaluation.We compare commonly used IQ metrics (PSNR and SSIM) to two "out-of-domain" DFDs with encoders trained on natural images, an "in-domain" DFD trained on MR images alone, and two domain-adjacent DFDs trained on large medical imaging datasets. We additionally compare these with several state-of-the-art but less commonly reported IQ metrics, visual information fidelity (VIF), noise quality metric (NQM), and the high-frequency error norm (HFEN). IQ metric performance is assessed via correlations with five expert radiologist reader scores of perceived diagnostic IQ of various accelerated MR image reconstructions. We characterize the behavior of these IQ metrics under common distortions expected during image acquisition, including their sensitivity to acquisition noise.All DFDs and HFEN correlate more strongly with radiologist-perceived diagnostic IQ than SSIM, PSNR, and other state-of-the-art metrics, with correlations being comparable to radiologist inter-reader variability. Surprisingly, out-of-domain DFDs perform comparably to in-domain and domain-adjacent DFDs.A suite of IQ metrics, including DFDs and HFEN, should be used alongside commonly-reported IQ metrics for a more holistic evaluation of MR image reconstruction perceptual quality. We also observe that general vision encoders are capable of assessing visual IQ even for MR images.

    View details for DOI 10.1002/mrm.30437

    View details for PubMedID 39921580

  • Foundation Models in Radiology: What, How, Why, and Why Not. Radiology Paschali, M., Chen, Z., Blankemeier, L., Varma, M., Youssef, A., Bluethgen, C., Langlotz, C., Gatidis, S., Chaudhari, A. 2025; 314 (2): e240597

    Abstract

    Recent advances in artificial intelligence have witnessed the emergence of large-scale deep learning models capable of interpreting and generating both textual and imaging data. Such models, typically referred to as foundation models (FMs), are trained on extensive corpora of unlabeled data and demonstrate high performance across various tasks. FMs have recently received extensive attention from academic, industry, and regulatory bodies. Given the potentially transformative impact that FMs can have on the field of radiology, radiologists must be aware of potential pathways to train these radiology-specific FMs, including understanding both the benefits and challenges. Thus, this review aims to explain the fundamental concepts and terms of FMs in radiology, with a specific focus on the requirements of training data, model training paradigms, model capabilities, and evaluation strategies. Overall, the goal of this review is to unify technical advances and clinical needs for safe and responsible training of FMs in radiology to ultimately benefit patients, providers, and radiologists.

    View details for DOI 10.1148/radiol.240597

    View details for PubMedID 39903075

  • GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes Hamamci, I., Ere, S., Sekuboyina, A., Simsar, E., Tezcan, A., Simsek, A., Esirgun, S., Almas, F., Dogan, I., Dasdelen, M., Prabhakar, C., Reynaud, H., Pati, S., Bluethgen, C., Ozdemir, M., Menze, B. edited by Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. SPRINGER INTERNATIONAL PUBLISHING AG. 2025: 126-143
  • Automated Structured Radiology Report Generation Delbrouck, J., Xu, J., Moll, J., Thomas, A., Chen, Z., Ostmeier, S., Azhar, A., Li, K., Johnston, A., Bluethgen, C., Reis, E., Muneer, M., Varma, M., Langlotz, C. edited by Che, W., Nabende, J., Shutova, E., Pilehvar, M. T. ASSOC COMPUTATIONAL LINGUISTICS-ACL. 2025: 26813-26829
  • CheXalign: Preference fine-tuning in chest X-ray interpretation models without human feedback Hein, D., Chen, Z., Ostmeier, S., Xu, J., Varma, M., Reis, E., Michalson, A., Bluethgen, C., Shin, H., Langlotz, C., Chaudhari, A. S. edited by Che, W., Nabende, J., Shutova, E., Pilehvar, M. T. ASSOC COMPUTATIONAL LINGUISTICS-ACL. 2025: 27679-27702
  • A dataset and benchmark for hospital course summarization with adapted large language models. Journal of the American Medical Informatics Association : JAMIA Aali, A., Van Veen, D., Arefeen, Y. I., Hom, J., Bluethgen, C., Reis, E. P., Gatidis, S., Clifford, N., Daws, J., Tehrani, A. S., Kim, J., Chaudhari, A. S. 2024

    Abstract

    Brief hospital course (BHC) summaries are clinical documents that summarize a patient's hospital stay. While large language models (LLMs) depict remarkable capabilities in automating real-world tasks, their capabilities for healthcare applications such as synthesizing BHCs from clinical notes have not been shown. We introduce a novel preprocessed dataset, the MIMIC-IV-BHC, encapsulating clinical note and BHC pairs to adapt LLMs for BHC synthesis. Furthermore, we introduce a benchmark of the summarization performance of 2 general-purpose LLMs and 3 healthcare-adapted LLMs.Using clinical notes as input, we apply prompting-based (using in-context learning) and fine-tuning-based adaptation strategies to 3 open-source LLMs (Clinical-T5-Large, Llama2-13B, and FLAN-UL2) and 2 proprietary LLMs (Generative Pre-trained Transformer [GPT]-3.5 and GPT-4). We evaluate these LLMs across multiple context-length inputs using natural language similarity metrics. We further conduct a clinical study with 5 clinicians, comparing clinician-written and LLM-generated BHCs across 30 samples, focusing on their potential to enhance clinical decision-making through improved summary quality. We compare reader preferences for the original and LLM-generated summary using Wilcoxon signed-rank tests. We further request optional qualitative feedback from clinicians to gain deeper insights into their preferences, and we present the frequency of common themes arising from these comments.The Llama2-13B fine-tuned LLM outperforms other domain-adapted models given quantitative evaluation metrics of Bilingual Evaluation Understudy (BLEU) and Bidirectional Encoder Representations from Transformers (BERT)-Score. GPT-4 with in-context learning shows more robustness to increasing context lengths of clinical note inputs than fine-tuned Llama2-13B. Despite comparable quantitative metrics, the reader study depicts a significant preference for summaries generated by GPT-4 with in-context learning compared to both Llama2-13B fine-tuned summaries and the original summaries (P<.001), highlighting the need for qualitative clinical evaluation.We release a foundational clinically relevant dataset, the MIMIC-IV-BHC, and present an open-source benchmark of LLM performance in BHC synthesis from clinical notes. We observe high-quality summarization performance for both in-context proprietary and fine-tuned open-source LLMs using both quantitative metrics and a qualitative clinical reader study. Our research effectively integrates elements from the data assimilation pipeline: our methods use (1) clinical data sources to integrate, (2) data translation, and (3) knowledge creation, while our evaluation strategy paves the way for (4) deployment.

    View details for DOI 10.1093/jamia/ocae312

    View details for PubMedID 39786555

  • Reconstruction of patient-specific confounders in AI-based radiologic image interpretation CELL REPORTS MEDICINE Han, T., Zigutyte, L., Huck, L., Huppertz, M., Siepmann, R., Gandelsma, Y., Bluthgen, C., Khader, F., Kuhl, C., Nebelung, S., Kather, J., Truhn, D. 2024; 5 (9): 101713

    Abstract

    Reliably detecting potentially misleading patterns in automated diagnostic assistance systems, such as those powered by artificial intelligence (AI), is crucial for instilling user trust and ensuring reliability. Current techniques fall short in visualizing such confounding factors. We propose DiffChest, a self-conditioned diffusion model trained on 515,704 chest radiographs from 194,956 patients across the US and Europe. DiffChest provides patient-specific explanations and visualizes confounding factors that might mislead the model. The high inter-reader agreement, with Fleiss' kappa values of 0.8 or higher, validates its capability to identify treatment-related confounders. Confounders are accurately detected with 10%-100% prevalence rates. The pretraining process optimizes the model for relevant imaging information, resulting in excellent diagnostic accuracy for 11 chest conditions, including pleural effusion and heart insufficiency. Our findings highlight the potential of diffusion models in medical image classification, providing insights into confounding factors and enhancing model robustness and reliability.

    View details for DOI 10.1016/j.xcrm.2024.101713

    View details for Web of Science ID 001322282000001

    View details for PubMedID 39241771

    View details for PubMedCentralID PMC11528237

  • A vision-language foundation model for the generation of realistic chest X-ray images. Nature biomedical engineering Bluethgen, C., Chambon, P., Delbrouck, J. B., van der Sluijs, R., Połacin, M., Zambrano Chaves, J. M., Abraham, T. M., Purohit, S., Langlotz, C. P., Chaudhari, A. S. 2024

    Abstract

    The paucity of high-quality medical imaging datasets could be mitigated by machine learning models that generate compositionally diverse images that faithfully represent medical concepts and pathologies. However, large vision-language models are trained on natural images, and the diversity distribution of the generated images substantially differs from that of medical images. Moreover, medical language involves specific and semantically rich vocabulary. Here we describe a domain-adaptation strategy for large vision-language models that overcomes distributional shifts. Specifically, by leveraging publicly available datasets of chest X-ray images and the corresponding radiology reports, we adapted a latent diffusion model pre-trained on pairs of natural images and text descriptors to generate diverse and visually plausible synthetic chest X-ray images (as confirmed by board-certified radiologists) whose appearance can be controlled with free-form medical text prompts. The domain-adaptation strategy for the text-conditioned synthesis of medical images can be used to augment training datasets and is a viable alternative to the sharing of real medical images for model training and fine-tuning.

    View details for DOI 10.1038/s41551-024-01246-y

    View details for PubMedID 39187663

    View details for PubMedCentralID 10131505

  • Bias in artificial intelligence for medical imaging: fundamentals, detection, avoidance, mitigation, challenges, ethics, and prospects. Diagnostic and interventional radiology (Ankara, Turkey) Kocak, B., Ponsiglione, A., Stanzione, A., Bluethgen, C., Santinha, J., Ugga, L., Huisman, M., Klontzas, M. E., Cannella, R., Cuocolo, R. 2024

    Abstract

    Although artificial intelligence (AI) methods hold promise for medical imaging-based prediction tasks, their integration into medical practice may present a double-edged sword due to bias (i.e., systematic errors). AI algorithms have the potential to mitigate cognitive biases in human interpretation, but extensive research has highlighted the tendency of AI systems to internalize biases within their model. This fact, whether intentional or not, may ultimately lead to unintentional consequences in the clinical setting, potentially compromising patient outcomes. This concern is particularly important in medical imaging, where AI has been more progressively and widely embraced than any other medical field. A comprehensive understanding of bias at each stage of the AI pipeline is therefore essential to contribute to developing AI solutions that are not only less biased but also widely applicable. This international collaborative review effort aims to increase awareness within the medical imaging community about the importance of proactively identifying and addressing AI bias to prevent its negative consequences from being realized later. The authors began with the fundamentals of bias by explaining its different definitions and delineating various potential sources. Strategies for detecting and identifying bias were then outlined, followed by a review of techniques for its avoidance and mitigation. Moreover, ethical dimensions, challenges encountered, and prospects were discussed.

    View details for DOI 10.4274/dir.2024.242854

    View details for PubMedID 38953330

  • A New Era of Text Mining in Radiology with Privacy-Preserving LLMs. Radiology. Artificial intelligence Akinci D'Antonoli, T., Bluethgen, C. 2024; 6 (4): e240261

    View details for DOI 10.1148/ryai.240261

    View details for PubMedID 38900034

  • Merlin: A Vision Language Foundation Model for 3D Computed Tomography. Research square Blankemeier, L., Cohen, J. P., Kumar, A., Veen, D. V., Gardezi, S., Paschali, M., Chen, Z., Delbrouck, J. B., Reis, E., Truyts, C., Bluethgen, C., Jensen, M., Ostmeier, S., Varma, M., Valanarasu, J., Fang, Z., Huo, Z., Nabulsi, Z., Ardila, D., Weng, W. H., Junior, E. A., Ahuja, N., Fries, J., Shah, N., Johnston, A., Boutin, R., Wentland, A., Langlotz, C., Hom, J., Gatidis, S., Chaudhari, A. 2024

    Abstract

    Over 85 million computed tomography (CT) scans are performed annually in the US, of which approximately one quarter focus on the abdomen. Given the current shortage of both general and specialized radiologists, there is a large impetus to use artificial intelligence to alleviate the burden of interpreting these complex imaging studies while simultaneously using the images to extract novel physiological insights. Prior state-of-the-art approaches for automated medical image interpretation leverage vision language models (VLMs) that utilize both the image and the corresponding textual radiology reports. However, current medical VLMs are generally limited to 2D images and short reports. To overcome these shortcomings for abdominal CT interpretation, we introduce Merlin - a 3D VLM that leverages both structured electronic health records (EHR) and unstructured radiology reports for pretraining without requiring additional manual annotations. We train Merlin using a high-quality clinical dataset of paired CT scans (6+ million images from 15,331 CTs), EHR diagnosis codes (1.8+ million codes), and radiology reports (6+ million tokens) for training. We comprehensively evaluate Merlin on 6 task types and 752 individual tasks. The non-adapted (off-the-shelf) tasks include zero-shot findings classification (31 findings), phenotype classification (692 phenotypes), and zero-shot cross-modal retrieval (image to findings and image to impressions), while model adapted tasks include 5-year chronic disease prediction (6 diseases), radiology report generation, and 3D semantic segmentation (20 organs). We perform internal validation on a test set of 5,137 CTs, and external validation on 7,000 clinical CTs and on two public CT datasets (VerSe, TotalSegmentator). Beyond these clinically-relevant evaluations, we assess the efficacy of various network architectures and training strategies to depict that Merlin has favorable performance to existing task-specific baselines. We derive data scaling laws to empirically assess training data needs for requisite downstream task performance. Furthermore, unlike conventional VLMs that require hundreds of GPUs for training, we perform all training on a single GPU. This computationally efficient design can help democratize foundation model training, especially for health systems with compute constraints. We plan to release our trained models, code, and dataset, pending manual removal of all protected health information.

    View details for DOI 10.21203/rs.3.rs-4546309/v1

    View details for PubMedID 38978576

    View details for PubMedCentralID PMC11230513

  • Adapted large language models can outperform medical experts in clinical text summarization. Nature medicine Van Veen, D., Van Uden, C., Blankemeier, L., Delbrouck, J. B., Aali, A., Bluethgen, C., Pareek, A., Polacin, M., Reis, E. P., Seehofnerová, A., Rohatgi, N., Hosamani, P., Collins, W., Ahuja, N., Langlotz, C. P., Hom, J., Gatidis, S., Pauly, J., Chaudhari, A. S. 2024

    Abstract

    Analyzing vast textual data and summarizing key information from electronic health records imposes a substantial burden on how clinicians allocate their time. Although large language models (LLMs) have shown promise in natural language processing (NLP) tasks, their effectiveness on a diverse range of clinical summarization tasks remains unproven. Here we applied adaptation methods to eight LLMs, spanning four distinct clinical summarization tasks: radiology reports, patient questions, progress notes and doctor-patient dialogue. Quantitative assessments with syntactic, semantic and conceptual NLP metrics reveal trade-offs between models and adaptation methods. A clinical reader study with 10 physicians evaluated summary completeness, correctness and conciseness; in most cases, summaries from our best-adapted LLMs were deemed either equivalent (45%) or superior (36%) compared with summaries from medical experts. The ensuing safety analysis highlights challenges faced by both LLMs and medical experts, as we connect errors to potential medical harm and categorize types of fabricated information. Our research provides evidence of LLMs outperforming medical experts in clinical text summarization across multiple tasks. This suggests that integrating LLMs into clinical workflows could alleviate documentation burden, allowing clinicians to focus more on patient care.

    View details for DOI 10.1038/s41591-024-02855-5

    View details for PubMedID 38413730

    View details for PubMedCentralID 5593724

  • Reconsidering Conclusions of Bias Assessment in Medical Imaging Foundation Models. Radiology. Artificial intelligence Chaudhari, A. S., Bluethgen, C., Ouyang, D. 2023; 5 (6): e230432

    View details for DOI 10.1148/ryai.230432

    View details for PubMedID 38074780

    View details for PubMedCentralID PMC10698581

  • Reconsidering Conclusions of Bias Assessment in Medical Imaging Foundation Models RADIOLOGY-ARTIFICIAL INTELLIGENCE Chaudhari, A. S., Bluethgen, C., Ouyang, D. 2023; 5 (6)
  • Clinical Text Summarization: Adapting Large Language Models Can Outperform Human Experts. Research square Veen, D. V., Uden, C. V., Blankemeier, L., Delbrouck, J. B., Aali, A., Bluethgen, C., Pareek, A., Polacin, M., Reis, E. P., Seehofnerova, A., Rohatgi, N., Hosamani, P., Collins, W., Ahuja, N., Langlotz, C., Hom, J., Gatidis, S., Pauly, J., Chaudhari, A. 2023

    Abstract

    Sifting through vast textual data and summarizing key information from electronic health records (EHR) imposes a substantial burden on how clinicians allocate their time. Although large language models (LLMs) have shown immense promise in natural language processing (NLP) tasks, their efficacy on a diverse range of clinical summarization tasks has not yet been rigorously demonstrated. In this work, we apply domain adaptation methods to eight LLMs, spanning six datasets and four distinct clinical summarization tasks: radiology reports, patient questions, progress notes, and doctor-patient dialogue. Our thorough quantitative assessment reveals trade-offs between models and adaptation methods in addition to instances where recent advances in LLMs may not improve results. Further, in a clinical reader study with ten physicians, we show that summaries from our best-adapted LLMs are preferable to human summaries in terms of completeness and correctness. Our ensuing qualitative analysis highlights challenges faced by both LLMs and human experts. Lastly, we correlate traditional quantitative NLP metrics with reader study scores to enhance our understanding of how these metrics align with physician preferences. Our research marks the first evidence of LLMs outperforming human experts in clinical text summarization across multiple tasks. This implies that integrating LLMs into clinical workflows could alleviate documentation burden, empowering clinicians to focus more on personalized patient care and the inherently human aspects of medicine.

    View details for DOI 10.21203/rs.3.rs-3483777/v1

    View details for PubMedID 37961377

    View details for PubMedCentralID PMC10635391

  • Large language models in radiology: fundamentals, applications, ethical considerations, risks, and future directions. Diagnostic and interventional radiology (Ankara, Turkey) Akinci D'Antonoli, T., Stanzione, A., Bluethgen, C., Vernuccio, F., Ugga, L., Klontzas, M. E., Cuocolo, R., Cannella, R., Koçak, B. 2023

    Abstract

    With the advent of large language models (LLMs), the artificial intelligence revolution in medicine and radiology is now more tangible than ever. Every day, an increasingly large number of articles are published that utilize LLMs in radiology. To adopt and safely implement this new technology in the field, radiologists should be familiar with its key concepts, understand at least the technical basics, and be aware of the potential risks and ethical considerations that come with it. In this review article, the authors provide an overview of the LLMs that might be relevant to the radiology community and include a brief discussion of their short history, technical basics, ChatGPT, prompt engineering, potential applications in medicine and radiology, advantages, disadvantages and risks, ethical and regulatory considerations, and future directions.

    View details for DOI 10.4274/dir.2023.232417

    View details for PubMedID 37789676

  • External validation, radiological evaluation, and development of deep learning automatic lung segmentation in contrast-enhanced chest CT. European radiology Dwivedi, K., Sharkey, M., Alabed, S., Langlotz, C. P., Swift, A. J., Bluethgen, C. 2023

    Abstract

    There is a need for CT pulmonary angiography (CTPA) lung segmentation models. Clinical translation requires radiological evaluation of model outputs, understanding of limitations, and identification of failure points. This multicentre study aims to develop an accurate CTPA lung segmentation model, with evaluation of outputs in two diverse patient cohorts with pulmonary hypertension (PH) and interstitial lung disease (ILD).This retrospective study develops an nnU-Net-based segmentation model using data from two specialist centres (UK and USA). Model was trained (n = 37), tested (n = 12), and clinically evaluated (n = 176) on a diverse 'real-world' cohort of 225 PH patients with volumetric CTPAs. Dice score coefficient (DSC) and normalised surface distance (NSD) were used for testing. Clinical evaluation of outputs was performed by two radiologists who assessed clinical significance of errors. External validation was performed on heterogenous contrast and non-contrast scans from 28 ILD patients.A total of 225 PH and 28 ILD patients with diverse demographic and clinical characteristics were evaluated. Mean accuracy, DSC, and NSD scores were 0.998 (95% CI 0.9976, 0.9989), 0.990 (0.9840, 0.9962), and 0.983 (0.9686, 0.9972) respectively. There were no segmentation failures. On radiological review, 82% and 71% of internal and external cases respectively had no errors. Eighteen percent and 25% respectively had clinically insignificant errors. Peripheral atelectasis and consolidation were common causes for suboptimal segmentation. One external case (0.5%) with patulous oesophagus had a clinically significant error.State-of-the-art CTPA lung segmentation model provides accurate outputs with minimal clinical errors on evaluation across two diverse cohorts with PH and ILD.Clinical translation of artificial intelligence models requires radiological review and understanding of model limitations. This study develops an externally validated state-of-the-art model with robust radiological review. Intended clinical use is in techniques such as lung volume or parenchymal disease quantification.• Accurate, externally validated CT pulmonary angiography (CTPA) lung segmentation model tested in two large heterogeneous clinical cohorts (pulmonary hypertension and interstitial lung disease). • No segmentation failures and robust review of model outputs by radiologists found 1 (0.5%) clinically significant segmentation error. • Intended clinical use of this model is a necessary step in techniques such as lung volume, parenchymal disease quantification, or pulmonary vessel analysis.

    View details for DOI 10.1007/s00330-023-10235-9

    View details for PubMedID 37775589

    View details for PubMedCentralID 6646484

  • FDA-cleared artificial intelligence and machine learning-based medical devices and their 510(k) predicate networks. The Lancet. Digital health Muehlematter, U. J., Bluethgen, C., Vokinger, K. N. 2023; 5 (9): e618-e626

    Abstract

    The US Food and Drug Administration is clearing an increasing number of artificial intelligence and machine learning (AI/ML)-based medical devices through the 510(k) pathway. This pathway allows clearance if the device is substantially equivalent to a former cleared device (ie, predicate). We analysed the predicate networks of cleared AI/ML-based medical devices (cleared between 2019 and 2021), their underlying tasks, and recalls. More than a third of cleared AI/ML-based medical devices originated from non-AI/ML-based medical devices in the first generation. Devices with the longest time since the last predicate device with an AI/ML component were haematology (2001), radiology (2001), and cardiovascular devices (2008). Especially for devices in radiology, the AI/ML tasks changed frequently along the device's predicate network, raising safety concerns. To date, only a few recalls might have affected the AI/ML components. To improve patient care, a stronger focus should be placed on the distinctive characteristics of AI/ML when defining substantial equivalence between a new AI/ML-based medical device and predicate devices.

    View details for DOI 10.1016/S2589-7500(23)00126-7

    View details for PubMedID 37625896

  • FDA-cleared artificial intelligence and machine learning-based medical devices and their 510(k) predicate networks LANCET DIGITAL HEALTH Muehlematter, U. J., Bluethgen, C., Vokinger, K. N. 2023; 5 (9): E618-E626

    Abstract

    The US Food and Drug Administration is clearing an increasing number of artificial intelligence and machine learning (AI/ML)-based medical devices through the 510(k) pathway. This pathway allows clearance if the device is substantially equivalent to a former cleared device (ie, predicate). We analysed the predicate networks of cleared AI/ML-based medical devices (cleared between 2019 and 2021), their underlying tasks, and recalls. More than a third of cleared AI/ML-based medical devices originated from non-AI/ML-based medical devices in the first generation. Devices with the longest time since the last predicate device with an AI/ML component were haematology (2001), radiology (2001), and cardiovascular devices (2008). Especially for devices in radiology, the AI/ML tasks changed frequently along the device's predicate network, raising safety concerns. To date, only a few recalls might have affected the AI/ML components. To improve patient care, a stronger focus should be placed on the distinctive characteristics of AI/ML when defining substantial equivalence between a new AI/ML-based medical device and predicate devices.

    View details for Web of Science ID 001072842200001

    View details for PubMedID 37625896

  • Does GPT4 dream of counting electric nodules? EUROPEAN RADIOLOGY Bluthgen, C. 2023

    View details for DOI 10.1007/s00330-023-09671-4

    View details for Web of Science ID 000979229900002

    View details for PubMedID 37099177

  • Added value of ultra-short echo time and fast field echo using restricted echo-spacing MR imaging in the assessment of the osseous cervical spine RADIOLOGIA MEDICA Deininger-Czermak, E., Gascho, D., Franckenberg, S., Kalin, P., Bluthgen, C., Villefort, C., Thali, M. J., Guggenberger, R. 2023; 128 (2): 234-241

    Abstract

    To evaluate the added value of ultra-short echo time (UTE) and fast field echo resembling a CT using restricted echo-spacing (FRACTURE) MR sequences in the assessment of the osseous cervical spine using CT as reference.Twenty-seven subjects underwent postmortem CT and MRI within 48 h. Datasets were anonymized and analyzed retrospectively by two radiologists. Morphological cervical spine alterations were rated on CT, UTE and FRACTURE images. Afterward, neural foraminal stenosis was graded on standard MR and again after viewing additional UTE/FRACTURE sequences. To evaluate interreader and intermodality reliability, intra-class correlation coefficients (ICC) and for stenosis grading Wilcoxon-matched-pairs testing with multiple comparison correction were calculated.Moderate interreader reliability (ICC = 0.48-0.71) was observed concerning morphological findings on all modalities. Intermodality reliability was good between modalities regarding degenerative vertebral and joint alterations (ICC = 0.69-0.91). Compared to CT neural stenosis grades were more often considered as nonsignificant on all analyzed MR sequences. Neural stenosis grading scores differed also significantly between specific bone imaging sequences, UTE and FRACTURE, to standard MR sequences. However, no significant difference was observed between UTE and FRACTURE sequences.Compared to CT as reference, UTE or FRACTURE sequence added to standard MR sequences can deliver comparable information on osseous cervical spine status. Both led to changes in clinically significant stenosis gradings when added to standard MR, mainly reducing the severity of neural foramina stenosis.

    View details for DOI 10.1007/s11547-023-01589-7

    View details for Web of Science ID 000915315100002

    View details for PubMedID 36637741

    View details for PubMedCentralID PMC9938813

  • Fully automated breast segmentation on spiral breast computed tomography images JOURNAL OF APPLIED CLINICAL MEDICAL PHYSICS Shim, S., Cester, D., Ruby, L., Bluethgen, C., Marcon, M., Berger, N., Unkelbach, J., Boss, A. 2022; 23 (10): e13726

    Abstract

    The quantification of the amount of the glandular tissue and breast density is important to assess breast cancer risk. Novel photon-counting breast computed tomography (CT) technology has the potential to quantify them. For accurate analysis, a dedicated method to segment the breast components-the adipose and glandular tissue, skin, pectoralis muscle, skinfold section, rib, and implant-is required. We propose a fully automated breast segmentation method for breast CT images.The framework consists of four parts: (1) investigate, (2) segment the components excluding adipose and glandular tissue, (3) assess the breast density, and (4) iteratively segment the glandular tissue according to the estimated density. For the method, adapted seeded watershed and region growing algorithm were dedicatedly developed for the breast CT images and optimized on 68 breast images. The segmentation performance was qualitatively (five-point Likert scale) and quantitatively (Dice similarity coefficient [DSC] and difference coefficient [DC]) demonstrated according to human reading by experienced radiologists.The performance evaluation on each component and overall segmentation for 17 breast CT images resulted in DSCs ranging 0.90-0.97 and in DCs 0.01-0.08. The readers rated 4.5-4.8 (5 highest score) with an excellent inter-reader agreement. The breast density varied by 3.7%-7.1% when including mis-segmented muscle or skin.The automatic segmentation results coincided with the human expert's reading. The accurate segmentation is important to avoid the significant bias in breast density analysis. Our method enables accurate quantification of the breast density and amount of the glandular tissue that is directly related to breast cancer risk.

    View details for DOI 10.1002/acm2.13726

    View details for Web of Science ID 000837887200001

    View details for PubMedID 35946049

    View details for PubMedCentralID PMC9588268

  • Influence of CT Image Matrix Size and Kernel Type on the Assessment of HRCT in Patients with SSC-ILD DIAGNOSTICS Balmer, B. D., Bluethgen, C., Baessler, B., Martini, K., Huber, F. A., Ruby, L., Schoenenberger, A., Frauenfelder, T. 2022; 12 (7)

    Abstract

    Interstitial lung disease (ILD) is a frequent complication of systemic sclerosis (SSc), and its early detection and treatment may prevent deterioration of lung function. Different vendors have recently made larger image matrices available as a post-processing option for computed tomography (CT), which could facilitate the diagnosis of SSc-ILD. Therefore, the objective of this study was to assess the effect of matrix size on lung image quality in patients with SSc by comparing a 1024-pixel matrix to a standard 512-pixel matrix and applying different reconstruction kernels.Lung scans of 50 patients (mean age 54 years, range 23-85 years) with SSc were reconstructed with these two different matrix sizes, after determining the most appropriate kernel in a first step. Four observers scored the images on a five-point Likert scale regarding image quality and detectability of clinically relevant findings.Among the eight tested kernels, the Br59-kernel (sharp) reached the highest score (19.48 ± 3.99), although differences did not reach statistical significance. The 1024-pixel matrix scored higher than the 512-pixel matrix HRCT overall (p = 0.01) and in the subcategories sharpness (p < 0.01), depiction of bronchiole (p < 0.01) and overall image impression (p < 0.01), and lower for the detection of ground-glass opacities (GGO) (p = 0.04). No significant differences were found for detection of extent of reticulations/bronchiectasis/fibrosis (p = 0.50) and image noise (p = 0.09).Our results show that with the use of a sharp kernel, the 1024-pixel matrix HRCT, provides a slightly better subjective image quality in terms of assessing interstitial lung changes, whereby GGO are more visible on the 512-pixel matrix. However, it remains to be answered to what extent this is related to the improved representation of the smallest structures.

    View details for DOI 10.3390/diagnostics12071662

    View details for Web of Science ID 000831365500001

    View details for PubMedID 35885565

    View details for PubMedCentralID PMC9321522

  • Comparison of detection of trauma-related injuries using combined "all-in-one"fused images and conventionally reconstructed images in acute trauma CT EUROPEAN RADIOLOGY Higashigaito, K., Fischer, G., Jungblut, L., Bluthgen, C., Schwyzer, M., Eberhard, M., dos Santos, D., Baessler, B., Vuylsteke, P., Soons, J. A. M., Frauenfelder, T. 2022; 32 (6): 3903-3911

    Abstract

    To compare the accuracy of lesion detection of trauma-related injuries using combined "all-in-one" fused (AIO) and conventionally reconstructed images (CR) in acute trauma CT.In this retrospective study, trauma CT of 66 patients (median age 47 years, range 18-96 years; 20 female (30.3%)) were read using AIO and CR. Images were independently reviewed by 4 blinded radiologists (two residents and two consultants) for trauma-related injuries in 22 regions. Sub-analyses were performed to analyze the influence of experience (residents vs. consultants) and body region (chest, abdomen, skeletal structures) on lesion detection. Paired t-test was used to compare the accuracy of lesion detection. The effect size was calculated (Cohen's d). Linear mixed-effects model with patients as the fixed effect and random forest models were used to investigate the effect of experience, reconstruction/image processing, and body region on lesion detection.Reading time of residents was significantly faster using AIO (AIO: 266 ± 72 s, CR: 318 ± 113 s; p < 0.001; d = 0.46) while no significant difference was observed in the accuracy of lesion detection (AIO: 93.5 ± 6.0%, CR: 94.6 ± 6.0% p = 0.092; d =  - 0.21). Reading time of consultants showed no significant difference (AIO: 283 ± 82 s, CR: 274 ± 95 s; p = 0.067; d = 0.16). Accuracy was significantly higher using CR; however, the difference and effect size were very small (AIO 95.1 ± 4.9%, CR: 97.3 ± 3.7%, p = 0.002; d =  - 0.39). The linear mixed-effects model showed only minor effect of image processing/reconstruction for lesion detection.Residents at the emergency department might benefit from faster reading time without sacrificing lesion detection rate using AIO for trauma CT.• Image fusion techniques decrease the reading time of acute trauma CT without sacrificing diagnostic accuracy.

    View details for DOI 10.1007/s00330-021-08473-w

    View details for Web of Science ID 000741942900005

    View details for PubMedID 35020010

  • Sarcopenia, Precardial Adipose Tissue and High Tumor Volume as Outcome Predictors in Surgically Treated Pleural Mesothelioma DIAGNOSTICS Verhoek, O., Jungblut, L., Lauk, O., Bluethgen, C., Opitz, I., Frauenfelder, T., Martini, K. 2022; 12 (1)

    Abstract

    We evaluated the prognostic value of Sarcopenia, low precardial adipose-tissue (PAT), and high tumor-volume in the outcome of surgically-treated pleural mesothelioma (PM).From 2005 to 2020, consecutive surgically-treated PM-patients having a pre-operative computed tomography (CT) scan were retrospectively included. Sarcopenia was assessed by CT-based parameters measured at the level of the fifth thoracic vertebra (TH5) by excluding fatty-infiltration based on CT-attenuation. The findings were stratified for gender, and a threshold of the 33rd percentile was set to define sarcopenia. Additionally, tumor volume as well as PAT were measured. The findings were correlated with progression-free survival and long-term mortality.Two-hundred-seventy-eight PM-patients (252 male; 70.2 ± 9 years) were included. The mean progression-free survival was 18.6 ± 12.2 months, and the mean survival time was 23.3 ± 24 months. Progression was associated with chronic obstructive pulmonary disease (COPD) (p = <0.001), tumor-stage (p = 0.001), and type of surgery (p = 0.026). Three-year mortality was associated with higher patient age (p = 0.005), presence of COPD (p < 0.001), higher tumor-stage (p = 0.015), and higher tumor-volume (p < 0.001). Kaplan-Meier statistics showed that sarcopenic patients have a higher three-year mortality (p = 0.002). While there was a negative correlation of progression-free survival and mortality with tumor volume (r = 0.281, p = 0.001 and r = -0.240, p < 0.001; respectively), a correlation with PAT could only be shown for epithelioid PM (p = 0.040).Sarcopenia as well as tumor volume are associated with long-term mortality in surgically treated PM-patients. Further, while there was a negative correlation of progression-free survival and mortality with tumor volume, a correlation with PAT could only be shown for epithelioid PM.

    View details for DOI 10.3390/diagnostics12010099

    View details for Web of Science ID 000747318800001

    View details for PubMedID 35054268

    View details for PubMedCentralID PMC8774409

  • Simplified image acquisition and detection of ischemic and non-ischemic myocardial fibrosis with fixed short inversion time magnetic resonance late gadolinium enhancement BRITISH JOURNAL OF RADIOLOGY Polacin, M., Karolyi, M., Bluthgen, C., Pilz, N., Eberhard, M., Alkadhi, H., Kozerke, S., Manka, R. 2022; 95 (1133): 20210966

    Abstract

    Late gadolinium enhancement with fixed short inversion time (LGEshort) provides excellent tissue contrast with dark scar and bright blood pool and does not need prior myocardial nulling. We hypothesize better visibility of ischemic scars and equal visibility of non-ischemic LGE in LGEshort compared to clinically established LGE (LGEstandard).LGEshort and LGEstandard were retrospectively evaluated in 179 patients (3043 segments) with suspected or known coronary artery disease by four blinded readers (reader A: most experienced - D: least experienced). The amount of ischemic and non-ischemic LGE as well as visibility (4: very good - 1: poor) of ischemic LGE was visually assessed.All readers detected more infarcted segments in LGEshort compared to LGEstandard (378 segments reported as infarcted; A:p = 0.5, B:p = 0.8, C,D:p = 0.03). Scar visibility was scored higher in LGEshort by all readers (A,B:p = 0.03; C,D:p = 0.02), especially for subendocardial infarcts (A,B:p = 0.04, C,D:p = 0.02). Less experienced readers detected significantly more infarcted papillary muscles (C:p = 0.02, D:p = 0.03) in a shorter reading time in LGEshort (C:p = 0.04, D:p = 0.02). Non-ischemic LGE was equally visible in both sequences (A:p = 0.9, B:p = 0.8, C,D:p = 0.6).LGEshort detects more ischemic LGE with improved scar visibility compared to LGEstandard, independent of experience level. The visibility of non-ischemic LGE is equivalent to LGEstandard. Less experienced readers can diagnose ischemic and non-ischemic LGE faster in LGEshort.LGEshort with its maximal operational simplicity can be used for visualization of all types of fibrosis - ischemic and non-ischemic - instead of LGEstandard, independent of experience level.

    View details for DOI 10.1259/bjr.20210966

    View details for Web of Science ID 000850694500016

    View details for PubMedID 35195448

  • Computed tomography-based radiomics decodes prognostic and molecular differences in interstitial lung disease related to systemic sclerosis. The European respiratory journal Schniering, J., Maciukiewicz, M., Gabrys, H. S., Brunner, M., Bluthgen, C., Meier, C., Braga-Lagache, S., Uldry, A., Heller, M., Guckenberger, M., Fretheim, H., Nakas, C. T., Hoffmann-Vold, A., Distler, O., Frauenfelder, T., Tanadini-Lang, S., Maurer, B. 2021

    Abstract

    BACKGROUND: Radiomic features calculated from routine medical images show great potential for personalized medicine in cancer. Patients with systemic sclerosis (SSc), a rare, multi-organ autoimmune disorder, have a similarly poor prognosis due to interstitial lung disease (ILD).OBJECTIVES: To explore computed tomography (CT)-based high-dimensional image analysis (radiomics) for disease characterisation, risk stratification, and relaying information on lung pathophysiology in SSc-ILD.METHODS: We investigated two independent, prospectively followed SSc-ILD cohorts (Zurich, derivation cohort, n=90; Oslo, validation cohort, n=66). For every subject, we defined 1'355 robust radiomic features from standard-of-care CT images. We performed unsupervised clustering to identify and characterize imaging-based patient clusters. A clinically applicable prognostic quantitative radiomic risk score (qRISSc) for progression-free survival was derived from radiomic profiles using supervised analysis. The biological basis of qRISSc was assessed in a cross-species approach by correlation with lung proteomics, histological and gene expression data derived from mice with bleomycin-induced lung fibrosis.RESULTS: Radiomic profiling identified two clinically and prognostically distinct SSc-ILD patient clusters. To evaluate the clinical applicability, we derived and externally validated a binary, quantitative radiomic risk score composed of 26 features, qRISSc, that accurately predicted progression-free survival and significantly improved upon clinical risk stratification parameters in multivariable Cox regression analyses in the pooled cohorts. A high qRISSc score, which identifies patients at risk for progression, was reverse translatable from human to experimental ILD and correlated with fibrotic pathway activation.CONCLUSIONS: Radiomics-based risk stratification using routine CT images provides complementary phenotypic, clinical and prognostic information significantly impacting clinical decision-making in SSc-ILD.

    View details for DOI 10.1183/13993003.04503-2020

    View details for PubMedID 34649979

  • First Performance Evaluation of an Artificial Intelligence-Based Computer-Aided Detection System for Pulmonary Nodule Evaluation in Dual-Source Photon-Counting Detector CT at Different Low-Dose Levels. Investigative radiology Jungblut, L., Bluthgen, C., Polacin, M., Messerli, M., Schmidt, B., Euler, A., Alkadhi, H., Frauenfelder, T., Martini, K. 2021

    Abstract

    OBJECTIVE: The aim of this study was to evaluate the image quality (IQ) and performance of an artificial intelligence (AI)-based computer-aided detection (CAD) system in photon-counting detector computed tomography (PCD-CT) for pulmonary nodule evaluation at different low-dose levels.MATERIALS AND METHODS: An anthropomorphic chest-phantom containing 14 pulmonary nodules of different sizes (range, 3-12 mm) was imaged on a PCD-CT and on a conventional energy-integrating detector CT (EID-CT). Scans were performed with each of the 3 vendor-specific scanning modes (QuantumPlus [Q+], Quantum [Q], and High Resolution [HR]) at decreasing matched radiation dose levels (volume computed tomography dose index ranging from 1.79 to 0.31 mGy) by adapting IQ levels from 30 to 5. Image noise was measured manually in the chest wall at 8 different locations. Subjective IQ was evaluated by 2 readers in consensus. Nodule detection and volumetry were performed using a commercially available AI-CAD system.RESULTS: Subjective IQ was superior in PCD-CT compared with EID-CT (P < 0.001), and objective image noise was similar in the Q+ and Q-mode (P > 0.05) and superior in the HR-mode (PCD 55.8 ± 11.7 HU vs EID 74.8 ± 5.4 HU; P = 0.01). High resolution showed the lowest image noise values among PCD modes (P = 0.01). Overall, the AI-CAD system delivered comparable results for lung nodule detection and volumetry between PCD- and dose-matched EID-CT (P = 0.08-1.00), with a mean sensitivity of 95% for PCD-CT and of 86% for dose-matched EID-CT in the lowest evaluated dose level (IQ5). Q+ and Q-mode showed higher false-positive rates than EID-CT at lower-dose levels (IQ10 and IQ5). The HR-mode showed a sensitivity of 100% with a false-positive rate of 1 even at the lowest evaluated dose level (IQ5; CDTIvol, 0.41 mGy).CONCLUSIONS: Photon-counting detector CT was superior to dose-matched EID-CT in subjective IQ while showing comparable to lower objective image noise. Fully automatized AI-aided nodule detection and volumetry are feasible in PCD-CT, but attention has to be paid to false-positive findings.

    View details for DOI 10.1097/RLI.0000000000000814

    View details for PubMedID 34324462

  • Impact of Vessel Suppressed-CT on Diagnostic Accuracy in Detection of Pulmonary Metastasis and Reading Time ACADEMIC RADIOLOGY Martini, K., Bluethgen, C., Eberhard, M., Schoenenberger, A. N., De Martini, Huber, F. A., Barth, B. K., Euler, A., Frauenfelder, T. 2021; 28 (7): 988-994

    Abstract

    To assess if vessel suppression (VS) improves nodule detection rate, interreader agreement, and reduces reading time in oncologic chest computed tomography (CT).One-hundred consecutive oncologic patients (65 male; median age 60y) who underwent contrast-enhanced chest CT were retrospectively included. For all exams, additional VS series (ClearRead CT, Riverrain Technologies, Miamisburg) were reconstructed. Two groups of three radiologists each with matched experience were defined. Each group evaluated the SD-CT as well as VS-CT. Each reader marked the presence, size, and position of pulmonary nodules and documented reading time. In addition, for the VS-CT the presence of false positive nodules had to be stated. Cohen's Kappa (k) was used to calculate the interreader-agreement between groups. Reading time was compared using paired t test.Nodule detection rate was significantly higher in VS-CT compared to the SD-CT (+21%; p <0.001). Interreader-agreement was higher in the VS-CT (k = 0.431, moderate agreement) compared to SD-CT (k = 0.209, fair agreement). Almost all VS-CT series had false positive findings (97-99 out of 100). Average reading time was significantly shorter in the VS-CT compared to the SD-CT (154 ± 134vs. 194 ± 126; 21%, p<0.001).Vessel suppression increases nodule detection rate, improves interreader agreement, and reduces reading time in chest CT of oncologic patients. Due to false positive results a consensus reading with the SD-CT is essential.

    View details for DOI 10.1016/j.acra.2020.01.014

    View details for Web of Science ID 000669231400018

    View details for PubMedID 32037256

  • Lung Nodules in Melanoma Patients: Morphologic Criteria to Differentiate Non-Metastatic and Metastatic Lesions DIAGNOSTICS Stadelmann, S., Bluethgen, C., Milanese, G., Nguyen-Kim, T., Maul, J., Dummer, R., Frauenfelder, T., Eberhard, M. 2021; 11 (5)

    Abstract

    Lung nodules are frequent findings in chest computed tomography (CT) in patients with metastatic melanoma. In this study, we assessed the frequency and compared morphologic differences of metastases and benign nodules. We retrospectively evaluated 85 patients with melanoma (AJCC stage III or IV). Inclusion criteria were ≤20 lung nodules and follow-up using CT ≥183 days after baseline. Lung nodules were evaluated for size and morphology. Nodules with significant growth, nodule regression in line with RECIST assessment or histologic confirmation were judged to be metastases. A total of 438 lung nodules were evaluated, of which 68% were metastases. At least one metastasis was found in 78% of patients. A 10 mm diameter cut-off (used for RECIST) showed a specificity of 95% and a sensitivity of 20% for diagnosing metastases. Central location (n = 122) was more common in metastatic nodules (p = 0.009). Subsolid morphology (n = 53) was more frequent (p < 0.001), and calcifications (n = 13) were solely found in non-metastatic lung nodules (p < 0.001). Our data show that lung nodules are prevalent in about two-thirds of melanoma patients (AJCC stage III/IV) and the majority are metastases. Even though we found a few morphologic indicators for metastatic or non-metastatic lung nodules, morphology has limited value to predict the presence of lung metastases.

    View details for DOI 10.3390/diagnostics11050837

    View details for Web of Science ID 000653820400001

    View details for PubMedID 34066913

    View details for PubMedCentralID PMC8148527

  • Computed tomography radiomics for the prediction of thymic epithelial tumor histology, TNM stage and myasthenia gravis. PloS one Bluthgen, C., Patella, M., Euler, A., Baessler, B., Martini, K., von Spiczak, J., Schneiter, D., Opitz, I., Frauenfelder, T. 1800; 16 (12): e0261401

    Abstract

    OBJECTIVES: To evaluate CT-derived radiomics for machine learning-based classification of thymic epithelial tumor (TET) stage (TNM classification), histology (WHO classification) and the presence of myasthenia gravis (MG).METHODS: Patients with histologically confirmed TET in the years 2000-2018 were retrospectively included, excluding patients with incompatible imaging or other tumors. CT scans were reformatted uniformly, gray values were normalized and discretized. Tumors were segmented manually; 15 scans were re-segmented after 2 weeks by two readers. 1316 radiomic features were calculated (pyRadiomics). Features with low intra-/inter-reader agreement (ICC<0.75) were excluded. Repeated nested cross-validation was used for feature selection (Boruta algorithm), model training, and evaluation (out-of-fold predictions). Shapley additive explanation (SHAP) values were calculated to assess feature importance.RESULTS: 105 patients undergoing surgery for TET were identified. After applying exclusion criteria, 62 patients (28 female; mean age, 57±14 years; range, 22-82 years) with 34 low-risk TET (LRT; WHO types A/AB/B1), 28 high-risk TET (HRT; WHO B2/B3/C) in early stage (49, TNM stage I-II) or advanced stage (13, TNM III-IV) were included. 14(23%) of the patients had MG. 334(25%) features were excluded after intra-/inter-reader analysis. Discriminatory performance of the random forest classifiers was good for histology(AUC, 87.6%; 95% confidence interval, 76.3-94.3) and TNM stage(AUC, 83.8%; 95%CI, 66.9-93.4) but poor for the prediction of MG (AUC, 63.9%; 95%CI, 44.8-79.5).CONCLUSIONS: CT-derived radiomic features may be a useful imaging biomarker for TET histology and TNM stage.

    View details for DOI 10.1371/journal.pone.0261401

    View details for PubMedID 34928978

  • Accuracy of Conventional and Machine Learning Enhanced Chest Radiography for the Assessment of COVID-19 Pneumonia: Intra-Individual Comparison with CT JOURNAL OF CLINICAL MEDICINE Martini, K., Bluethgen, C., Walter, J. E., Messerli, M., Nguyen-Kim, T., Frauenfelder, T. 2020; 9 (11)

    Abstract

    To evaluate diagnostic accuracy of conventional radiography (CXR) and machine learning enhanced CXR (mlCXR) for the detection and quantification of disease-extent in COVID-19 patients compared to chest-CT.Real-time polymerase chain reaction (rt-PCR)-confirmed COVID-19-patients undergoing CXR from March to April 2020 together with COVID-19 negative patients as control group were retrospectively included. Two independent readers assessed CXR and mlCXR images for presence, disease extent and type (consolidation vs. ground-glass opacities (GGOs) of COVID-19-pneumonia. Further, readers had to assign confidence levels to their diagnosis. CT obtained ≤ 36 h from acquisition of CXR served as standard of reference. Inter-reader agreement, sensitivity for detection and disease extent of COVID-19-pneumonia compared to CT was calculated. McNemar test was used to test for significant differences.Sixty patients (21 females; median age 61 years, range 38-81 years) were included. Inter-reader agreement improved from good to excellent when mlCXR instead of CXR was used (k = 0.831 vs. k = 0.742). Sensitivity for pneumonia detection improved from 79.5% to 92.3%, however, on the cost of specificity 100% vs. 71.4% (p = 0.031). Overall, sensitivity for the detection of consolidation was higher than for GGO (37.5% vs. 70.4%; respectively). No differences could be found in disease extent estimation between mlCXR and CXR, even though the detection of GGO could be improved. Diagnostic confidence was better on mlCXR compared to CXR (p = 0.013).In line with the current literature, the sensitivity for detection and quantification of COVID-19-pneumonia was moderate with CXR and could be improved when mlCXR was used for image interpretation.

    View details for DOI 10.3390/jcm9113576

    View details for Web of Science ID 000593269200001

    View details for PubMedID 33171999

    View details for PubMedCentralID PMC7694629

  • Sarcopenia as independent risk factor of postpneumonectomy respiratory failure, ARDS and mortality LUNG CANCER Martini, K., Chassagnon, G., Fournel, L., Prieto, M., Trieu-Nghi Hoang-Thi, Halm, N., Bobbio, A., Revel, M., Alifano, M. 2020; 149: 130-136

    Abstract

    Sarcopenia is associated with poor outcome in cancer-patients. However, the methods to define sarcopenia are not entirely standardized. We compared several morphometric measurements of sarcopenia and their prognostic value in short-term-outcome prediction after pneumonectomy.Consecutive lung-cancer patients undergoing pneumonectomy from January 2007 to December 2015 and having a pre-operative computed tomography (CT) scan were retrospectively included. Sarcopenia was assessed by the following CT-based parameters measured at the level of the third lumbar vertebra: cross-sectional Total Psoas Area (TPA), cross-sectional Total Muscle Area (TMA), and Total Parietal Muscle Area (TPMA), defined as TMA without TPA. Measures were obtained for entire muscle surface, as well as by excluding fatty infiltration based on CT attenuation. Findings were stratified for gender, and a threshold of 33rd percentile was set to define sarcopenia. Acute Respiratory Failure (ARF), Acute Respiratory Distress Syndrome (ARDS), and 30-day mortality were assessed as parameters of short-term-outcome.Two hundred thirty-four patients with pneumonectomy (right, n = 107; left, n = 127) were analysed. Postoperative mortality rate was 9.0 % (21/234), 17.1 % of patients (40/234) experienced ARF requiring re-intubation, and 10.3 % (24/234) had ARDS. All parameters describing sarcopenia gave significant results; the best discriminating parameter was TMA after excluding fat (p < 0.001). While right sided pneumonectomy and sarcopenia were independently associated to the three short-term outcome parameters, Charlson Comorbidity Index only independently predicted ARF.Sarcopenia defined as the sex-related 33rd percentile of fat-excluded TMA at the level of the third lumbar vertebra is the most discriminating parameter to assess short-term-outcome in patients undergoing pneumonectomy.

    View details for DOI 10.1016/j.lungcan.2020.09.009

    View details for Web of Science ID 000579504300019

    View details for PubMedID 33011374

  • Brown fat does not cause cachexia in cancer patients: A large retrospective longitudinal FDG-PET/CT cohort study PLOS ONE Becker, A. S., Zellweger, C., Bacanovic, S., Franckenberg, S., Nagel, H. W., Frick, L., Schawkat, K., Eberhard, M., Bluethgen, C., Volbracht, J., Moos, R., Wolfrum, C., Burger, I. A. 2020; 15 (10): e0239990

    Abstract

    Brown adipose tissue (BAT) is a specialized form of adipose tissue, able to increase energy expenditure by heat generation in response to various stimuli. Recently, its pathological activation has been implicated in the pathogenesis of cancer cachexia. To establish a causal relationship, we retrospectively investigated the longitudinal changes in BAT and cancer in a large FDG-PET/CT cohort.We retrospectively analyzed 13 461 FDG-PET/CT examinations of n = 8 409 patients at our institution from the winter months of 2007-2015. We graded the activation strength of BAT based on the anatomical location of the most caudally activated BAT depot into three tiers, and the stage of the cancer into five general grades. We validated the cancer grading by an interreader analysis and correlation with histopathological stage. Ambient temperature data (seven-day average before the examination) was obtained from a meteorological station close to the hospital. Changes of BAT, cancer, body mass index (BMI) and temperature between the different examinations were examined with Spearman's test and a mixed linear model for correlation, and with a causal inference algorithm for causality.We found n = 283 patients with at least two examinations and active BAT in at least one of them. There was no significant interaction between the changes in BAT activation, cancer burden or BMI. Temperature changes exhibited a strong negative correlation with BAT activity (ϱ = -0.57, p<0.00001). These results were confirmed with the mixed linear model. Causal inference revealed a link of Temperature ➜ BAT in all subjects and also of BMI ➜ BAT in subjects who had lost weight and increased cancer burden, but no role of cancer and no causal links of BAT ➜ BMI.Our data did not confirm the hypothesis that BAT plays a major role in cancer-mediated weight loss. Temperature changes are the main driver of incidental BAT activity on FDG-PET scans.

    View details for DOI 10.1371/journal.pone.0239990

    View details for Web of Science ID 000581809800092

    View details for PubMedID 33031379

    View details for PubMedCentralID PMC7544086

  • Applicability of radiomics in interstitial lung disease associated with systemic sclerosis: proof of concept EUROPEAN RADIOLOGY Martini, K., Baessler, B., Bogowicz, M., Bluthgen, C., Mannil, M., Tanadini-Lang, S., Schniering, J., Maurer, B., Frauenfelder, T. 2021; 31 (4): 1987-1998

    Abstract

    To retrospectively evaluate if texture-based radiomics features are able to detect interstitial lung disease (ILD) and to distinguish between the different disease stages in patients with systemic sclerosis (SSc) in comparison with mere visual analysis of high-resolution computed tomography (HRCT).Sixty patients (46 females, median age 56 years) with SSc who underwent HRCT of the thorax were retrospectively analyzed. Visual analysis was performed by two radiologists for the presence of ILD features. Gender, age, and pulmonary function (GAP) stage was calculated from clinical data (gender, age, pulmonary function test). Data augmentation was performed and the balanced dataset was split into a training (70%) and a testing dataset (30%). For selecting variables that allow classification of the GAP stage, single and multiple logistic regression models were fitted and compared by using the Akaike information criterion (AIC). Diagnostic accuracy was evaluated from the area under the curve (AUC) from receiver operating characteristic (ROC) analyses, and diagnostic sensitivity and specificity were calculated.Values for some radiomics features were significantly lower (p < 0.05) and those of other radiomics features were significantly higher (p = 0.001) in patients with GAP2 compared with those in patients with GAP1. The combination of two specific radiomics features in a multivariable model resulted in the lowest AIC of 10.73 with an AUC of 0.96, 84% sensitivity, and 99% specificity. Visual assessment of fibrosis was inferior in predicting individual GAP stages (AUC 0.86; 83% sensitivity; 74% specificity).The correlation of radiomics with GAP stage, but not with the visually defined features of ILD-HRCT, implies that radiomics might capture features indicating severity of SSc-ILD on HRCT, which are not recognized by visual analysis.• Radiomics features can predict GAP stage with a sensitivity of 84% and a specificity of almost 100%. • Extent of fibrosis on HRCT and a combined model of different visual HRCT-ILD features perform worse in predicting GAP stage. • The correlation of radiomics with GAP stage, but not with the visually defined features of ILD-HRCT, implies that radiomics might capture features on HRCT, which are not recognized by visual analysis.

    View details for DOI 10.1007/s00330-020-07293-8

    View details for Web of Science ID 000575746000001

    View details for PubMedID 33025174

    View details for PubMedCentralID PMC7979612

  • Comparison of the PI-RADS 2.1 scoring system to PI-RADS 2.0: Impact on diagnostic accuracy and inter-reader agreement PLOS ONE Hotker, A. M., Bluthgen, C., Rupp, N. J., Schneider, A. F., Eberli, D., Donati, O. F. 2020; 15 (10): e0239975

    Abstract

    To assess the value of the PI-RADS 2.1 scoring system in the detection of prostate cancer on multiparametric MRI in comparison to the standard PI-RADS 2.0 system and to assess its inter-reader variability.This IRB-approved study included 229 patients undergoing multiparametric prostate MRI prior to MRI-guided TRUS-based biopsy, which were retrospectively recruited from our prospectively maintained institutional database. Two readers with high (reader 1, 6 years) and low (reader 2, 2 years) level of expertise identified the lesion with the highest PI-RADS score for both version 2.0 and 2.1 for each patient. Inter-reader agreement was estimated, and diagnostic accuracy analysis was performed.Inter-reader agreement on PI-RADS scores was fair for both version 2.0 (kappa: 0.57) and 2.1 (kappa: 0.51). Detection rates for prostate cancer (PCa) and clinically significant prostate cancer (csPCa) were almost identical for both PI-RADS versions and higher for the more experienced reader (AUC, Reader 1: PCa, 0.881-0.887, csPCa, 0.874-0.879; Reader 2: PCa, 0.765, csPCa, 0.746-0.747; both p > 0.05), both when using a PI-RADS score of ≥ 4 and ≥3 as indicators for positivity for cancer.The new PI-RADS 2.1 scoring system showed comparable diagnostic performance and inter-reader variability compared to version 2.0. The introduced changes in the version 2.1 seem only to take effect in a very small number of patients.

    View details for DOI 10.1371/journal.pone.0239975

    View details for Web of Science ID 000578470500043

    View details for PubMedID 33017413

    View details for PubMedCentralID PMC7535021

  • Patterns of organizing pneumonia and microinfarcts as surrogate for endothelial disruption and microangiopathic thromboembolic events in patients with coronavirus disease 2019 PLOS ONE Martini, K., Bluthgen, C., Walter, J., Nguyen-Kim, T., Thienemann, F., Frauenfelder, T. 2020; 15 (10): e0240078

    Abstract

    To evaluate chest-computed-tomography (CT) scans in coronavirus-disease-2019 (COVID-19) patients for signs of organizing pneumonia (OP) and microinfarction as surrogate for microscopic thromboembolic events.Real-time polymerase-chain-reaction (RT-PCR)-confirmed COVID-19 patients undergoing chest-CT (non-enhanced, enhanced, pulmonary-angiography [CT-PA]) from March-April 2020 were retrospectively included (COVID-19-cohort). As control-groups served 175 patients from 2020 (cohort-2020) and 157 patients from 2019 (cohort-2019) undergoing CT-PA for pulmonary embolism (PE) during the respective time frame at our institution. Two independent readers assessed for presence and location of PE in all three cohorts. In COVID-19 patients additionally parenchymal changes typical of COVID-19 pneumonia, infarct pneumonia and OP were assessed. Inter-reader agreement and prevalence of PE in different cohorts were calculated.From 68 COVID-19 patients (42 female [61.8%], median age 59 years [range 32-89]) undergoing chest-CT 38 obtained CT-PA. Inter-reader-agreement was good (k = 0.781). On CT-PA, 13.2% of COVID-19 patients presented with PE whereas in the control-groups prevalence of PE was 9.1% and 8.9%, respectively (p = 0.452). Up to 50% of COVID-19 patients showed changes typical for OP. 21.1% of COVID-19 patients suspected with PE showed subpleural wedge-shaped consolidation resembling infarct pneumonia, while only 13.2% showed visible filling defects of the pulmonary artery branches on CT-PA.Despite the reported hypercoagulability in critically ill patients with COVID-19, we did not encounter higher prevalence of PE in our patient cohort compared to the control cohorts. However, patients with suspected PE showed a higher prevalence of lung changes, resembling patterns of infarct pneumonia or OP and CT-signs of pulmonary-artery hypertension.

    View details for DOI 10.1371/journal.pone.0240078

    View details for Web of Science ID 000578470500009

    View details for PubMedID 33017451

    View details for PubMedCentralID PMC7535037

  • Comparison of 3D and 2D late gadolinium enhancement magnetic resonance imaging in patients with acute and chronic myocarditis INTERNATIONAL JOURNAL OF CARDIOVASCULAR IMAGING Polacin, M., Kapos, I., Gastl, M., Bluethgen, C., Karolyi, M., von Spiczak, J., Eberhard, M., Baessler, B., Alkadhi, H., Kozerke, S., Manka, R. 2021; 37 (1): 305-313

    Abstract

    We compared a fast, single breath-hold three dimensional LGE sequence (3D LGE) with an established two dimensional multi breath-hold sequence (2D LGE) and evaluated image quality and the amount of myocardial fibrosis in patients with acute and chronic myocarditis. 3D LGE and 2D LGE (both spatial resolution 1.5 × 1.5 mm2, slice-thickness 8 mm, field of view 350 × 350 mm2) were acquired in 25 patients with acute myocarditis (mean age 40 ± 18 years, 7 female) and 27 patients with chronic myocarditis (mean age 44 ± 22 years, 9 female) on a 1.5 T MR system. Image quality was evaluated by two independent, blinded readers using a 5-point Likert scale. Total myocardial mass, fibrotic mass and total fibrotic tissue percentage were quantified for both sequences in both groups. There was no significant difference in image quality between 3D und 2D acquisitions in patients with acute (p = 0.8) and chronic (p = 0.5) myocarditis. No significant differences between 3D and 2D acquisitions could be shown for myocardial mass (acute p = 0.2; chronic p = 0.3), fibrous tissue mass (acute p = 0.7; chronic p = 0.1) and total fibrous percentage (acute p = 0.4 and chronic p = 0.2). Inter-observer agreement was substantial to almost perfect. Acquisition time was significantly shorter for 3D LGE (24 ± 5 s) as compared to 2D LGE (350 ± 58 s, p < 0.001). In patients with acute and chronic myocarditis 3D LGE imaging shows equal diagnostic quality compared to standard 2D LGE imaging but with significantly reduced acquisition time.

    View details for DOI 10.1007/s10554-020-01966-7

    View details for Web of Science ID 000559424600001

    View details for PubMedID 32793996

    View details for PubMedCentralID PMC7878221

  • Deep learning based detection of intracranial aneurysms on digital subtraction angiography: A feasibility study NEURORADIOLOGY JOURNAL Hainc, N., Mannil, M., Anagnostakou, V., Alkadhi, H., Bluthgen, C., Wacht, L., Bink, A., Husain, S., Kulcsar, Z., Winklhofer, S. 2020; 33 (4): 311-317

    Abstract

    Digital subtraction angiography is the gold standard for detecting and characterising aneurysms. Here, we assess the feasibility of commercial-grade deep learning software for the detection of intracranial aneurysms on whole-brain anteroposterior and lateral 2D digital subtraction angiography images.Seven hundred and six digital subtraction angiography images were included from a cohort of 240 patients (157 female, mean age 59 years, range 20-92; 83 male, mean age 55 years, range 19-83). Three hundred and thirty-five (47%) single frame anteroposterior and lateral images of a digital subtraction angiography series of 187 aneurysms (41 ruptured, 146 unruptured; average size 7±5.3 mm, range 1-5 mm; total 372 depicted aneurysms) and 371 (53%) aneurysm-negative study images were retrospectively analysed regarding the presence of intracranial aneurysms. The 2D data was split into testing and training sets in a ratio of 4:1 with 3D rotational digital subtraction angiography as gold standard. Supervised deep learning was performed using commercial-grade machine learning software (Cognex, ViDi Suite 2.0). Monte Carlo cross validation was performed.Intracranial aneurysms were detected with a sensitivity of 79%, a specificity of 79%, a precision of 0.75, a F1 score of 0.77, and a mean area-under-the-curve of 0.76 (range 0.68-0.86) after Monte Carlo cross-validation, run 45 times.The commercial-grade deep learning software allows for detection of intracranial aneurysms on whole-brain, 2D anteroposterior and lateral digital subtraction angiography images, with results being comparable to more specifically engineered deep learning techniques.

    View details for DOI 10.1177/1971400920937647

    View details for Web of Science ID 000546402200001

    View details for PubMedID 32633602

    View details for PubMedCentralID PMC7416354

  • "IMAGES ARE MORE THAN PICTURES, THEY ARE DATA" [1] - EXPLORATION OF RADIOMICS ANALYSIS FOR SYSTEMIC SCLEROSIS-ASSOCIATED INTERSTITIAL LUNG DISEASE Schniering, J., Maciukiewicz, M., Gabrys, H., Brunner, M., Bluthgen, C., Distler, O., Guckenberger, M., Frauenfelder, T., Tanadini-Lang, S., Maurer, B. BMJ PUBLISHING GROUP. 2020: 1238-1239
  • Detection and localization of distal radius fractures: Deep learning system versus radiologists EUROPEAN JOURNAL OF RADIOLOGY Bluethgen, C., Becker, A. S., de Martini, I., Meier, A., Martini, K., Frauenfelder, T. 2020; 126: 108925

    Abstract

    To evaluate a deep learning based image analysis software for the detection and localization of distal radius fractures.A deep learning system (DLS) was trained on 524 wrist radiographs (166 showing fractures). Performance was tested on internal (100 radiographs, 42 showing fractures) and external test sets (200 radiographs, 100 showing fractures). Single and combined views of the radiographs were shown to DLS and three readers. Readers were asked to indicate fracture location with regions of interest (ROI). The DLS yielded scores (range 0-1) and a heatmap. Detection performance was expressed as AUC, sensitivity and specificity at the optimal threshold and compared to radiologists' performance. Heatmaps were compared to radiologists' ROIs.The DLS showed excellent performance on the internal test set (AUC 0.93 (95% confidence interval (CI) 0.82-0.98) - 0.96 (0.87-1.00), sensitivity 0.81 (0.58-0.95) - 0.90 (0.70-0.99), specificity 0.86 (0.68-0.96) - 1.0 (0.88-1.0)). DLS performance decreased on the external test set (AUC 0.80 (0.71-0.88) - 0.89 (0.81-0.94), sensitivity 0.64 (0.49-0.77) - 0.92 (0.81-0.98), specificity 0.60 (0.45-0.74) - 0.90 (0.78-0.97)). Radiologists' performance was comparable on internal data (sensitivity 0.71 (0.48-0.89) - 0.95 (0.76-1.0), specificity 0.52 (0.32-0.71) - 0.97 (0.82-1.0)) and better on external data (sensitivity 0.88 (0.76-0.96) - 0.98 (0.89-1.0), specificities 0.66 (0.51-0.79) - 1.0 (0.93-1.0), p < 0.05). In over 90%, the areas of peak activation aligned with radiologists' annotations.The DLS was able to detect and localize wrist fractures with a performance comparable to radiologists, using only a small dataset for training.

    View details for DOI 10.1016/j.ejrad.2020.108925

    View details for Web of Science ID 000525464400014

    View details for PubMedID 32193036

  • Vertical Off-Centering in Reduced Dose Chest-CT: Impact on Effective Dose and Image Noise Values ACADEMIC RADIOLOGY Eberhard, M., Bluethgen, C., Barth, B. K., Frauenfelder, T., Saltybaeva, N., Martini, K. 2020; 27 (4): 508-517

    Abstract

    To assess the effect of vertical off-centering in tube current modulation (TCM) on effective-dose and image-noise in reduced-dose (RD) chest-CT.One-hundred consecutive patients (36 female; mean age 56 years) were scanned on a 192-slice CT scanner with a standard-dose (ND) and a RD chest-CT protocol using tube current modulation. Image-noise was evaluated by placing circular regions of interest in the apical, middle, and lower lung regions. Two independent readers evaluated image quality. Study population was stratified according to patient position in the gantry: positioned in the gantry isocenter (i), higher than the gantry isocenter (ii), and lower than the gantry isocenter, (iii). Pearson correlation was used to determine the correlation between effective radiation dose and vertical off-centering. Student's t test was used to evaluate for differences in image-noise between groups (i-iii).Mean vertical off-centering was of 10.6 mm below the gantry-isocenter (range -45.0-27.9 mm). Effective radiation dose varied in a linear trend, with the highest doses noted below gantry isocenter, and the lowest doses noted above gantry isocenter (ND: r = -0.296; p = 0.003 - RD: r = -0.258; p = 0.010). Lowest image-noise was observed where patients were positioned below the gantry isocenter, and highest in patients positioned above (ND: 79.35 HU vs. 94.86 HU - RD: 143.44 HU vs. 160.13 HU). Subjective image quality was not significantly affected by patient-position (p > 0.05). Overall, there was no over-proportional noise-increase from the ND to the RD protocol in patients which were positioned off-center.Vertical off-centering influences effective radiation dose and image-noise on ND and RD protocols.There is no over-proportional noise increase in RD compared to ND protocols when patients are positioned off-center.

    View details for DOI 10.1016/j.acra.2019.07.004

    View details for Web of Science ID 000520893600010

    View details for PubMedID 31358357

  • Deep learning for automatic quantification of lung abnormalities in COVID-19 patients: First experience and correlation with clinical parameters EUROPEAN JOURNAL OF RADIOLOGY OPEN Mergen, V., Kobe, A., Bluethgen, C., Euler, A., Flohr, T., Frauenfelder, T., Alkadhi, H., Eberhard, M. 2020; 7: 100272

    Abstract

    To demonstrate the first experience of a deep learning-based algorithm for automatic quantification of lung parenchymal abnormalities in chest CT of COVID-19 patients and to correlate quantitative results with clinical and laboratory parameters.We retrospectively included 60 consecutive patients (mean age, 61 ± 12 years; 18 females) with proven COVID-19 infection undergoing chest CT between March and May 2020. Clinical and laboratory data (within 24 h before/after chest CT) were recorded. Prototype software using a deep learning algorithm was applied for automatic segmentation and quantification of lung opacities. Percentage of opacity (PO, ground-glass and consolidations) and percentage of high opacity (PHO, consolidations), were defined as 100 times the volume of segmented abnormalities divided by the volume of the lung mask.Automatic CT analysis of the lung was feasible in all patients (n = 60). The median time to accomplish automatic evaluation was 120 s (IQR: 118-128 s). In four cases (7 %), manual corrections were necessary. Patients with need for mechanical ventilation had a significantly higher PO (median 44 %, IQR: 23-58 % versus 13 %, IQR: 10-24 %; p = 0.001) and PHO (median: 11 %, IQR: 6-21 % versus 3%, IQR: 2-7 %, p = 0.002) compared to those without. The PO and PHO moderately correlated with c-reactive protein (r = 0.49-0.60, both p < 0.001) and leucocyte count (r = 0.30-0.40, both p = 0.05). PO had a negative correlation with SO2 (r=-0.50, p = 0.001).Preliminary experience indicates the feasibility of a rapid, automatic quantification tool of lung parenchymal abnormalities in COVID-19 patients using deep learning, with results correlating with laboratory and clinical parameters.

    View details for DOI 10.1016/j.ejro.2020.100272

    View details for Web of Science ID 000600597400061

    View details for PubMedID 33043101

    View details for PubMedCentralID PMC7538094

  • Can Texture Analysis in Ultrashort Echo-Time MRI Distinguish Primary Graft Dysfunction From Acute Rejection in Lung Transplants? A Multidimensional Assessment in a Mouse Model JOURNAL OF MAGNETIC RESONANCE IMAGING Euler, A., Bluthgen, C., Wurnig, M. C., Jungraithmayr, W., Boss, A. 2020; 51 (1): 108-116

    Abstract

    Differentiation of early postoperative complications affects treatment options after lung transplantation.To assess if texture analysis in ultrashort echo-time (UTE) MRI allows distinction of primary graft dysfunction (PGD) from acute transplant rejection (ATR) in a mouse lung transplant model.Longitudinal.Single left lung transplantation was performed in two cohorts of six mice (strain C57BL/6) receiving six syngeneic (strain C57BL/6) and six allogeneic lung transplants (strain BALB/c (H-2Kd )).4.7T small-animal MRI/eight different UTE sequences (echo times: 50-5000 μs) at three different postoperative timepoints (1, 3, and 7 days after transplantation).Nineteen different first- and higher-order texture features were computed on multiple axial slices for each combination of UTE and timepoint (24 setups) in each mouse. Texture features were compared for transplanted (graft) and contralateral native lungs between and within syngeneic and allogeneic cohorts. Histopathology served as a reference.Nonparametric tests and correlation matrix analysis were used.Pathology revealed PGD in the syngeneic and ATR in the allogeneic cohort. Skewness and low-gray-level run-length features were significantly different between PGD and ATR for all investigated setups (P < 0.03). These features were significantly different between graft and native lung in ATR for most setups (minimum of 20/24 setups; all P < 0.05). The number of significantly different features between PGD and ATR increased with elapsing postoperative time. Differences in significant features were highest for an echo-time of 1500 μs.Our findings suggest that texture analysis in UTE-MRI might be a tool for the differentiation of PGD and ATR in the early postoperative phase after lung transplantation.1 Technical Efficacy: Stage 3 J. Magn. Reson. Imaging 2020;51:108-116.

    View details for DOI 10.1002/jmri.26817

    View details for Web of Science ID 000530627200009

    View details for PubMedID 31150142

  • Dual-Energy Low-keV or Single-Energy Low-kV CT for Endoleak Detection? A 6-Reader Study in an Aortic Aneurysm Phantom INVESTIGATIVE RADIOLOGY Skawran, S., Angst, F., Bluthgen, C., Eberhard, M., Kalin, P., Kobe, A., Nagy, D., Szucs-Farkas, Z., Alkadhi, H., Euler, A. 2020; 55 (1): 45-52

    Abstract

    The aim of this study was to compare image quality, conspicuity, and endoleak detection between single-energy low-kV images (SEIs) and dual-energy low-keV virtual monoenergetic images (VMIs+) in computed tomography angiography of the aorta after endovascular repair.An abdominal aortic aneurysm phantom simulating 36 endoleaks (2 densities; diameters: 2, 4, and 6 mm) in a medium- and large-sized patient was used. Each size was scanned using single-energy at 80 kVp (A) and 100 kVp (B), and dual-energy at 80/Sn150kVp for the medium (C) and 90/Sn150kVp for the large size (D). VMIs+ at 40 keV and 50 keV were reconstructed from protocols C and D. Radiation dose was 3 mGy for the medium and 6 mGy for the large size. Objective image quality and normalized noise power spectrum were determined. Subjective image quality, conspicuity, and sensitivity for endoleaks were independently assessed by 6 radiologists. Sensitivity was compared using Marascuilo procedure and Fisher exact test. Conspicuities were compared using Wilcoxon-matched pairs test, analysis of variance, and Tukey test.The contrast-to-noise-ratio of the aorta was significantly higher for VMI+ compared with SEI (P < 0.001). Noise power spectrum showed a higher noise magnitude and coarser texture in VMI+. Subjective image quality and overall conspicuity was lower for VMI+ compared with SEI (P < 0.05). Sensitivity for endoleaks was overall higher in the medium phantom for SEI (60.9% for A, 62.2% for B) compared with VMI+ (54.2% for C, 49.3% for D) with significant differences between protocols B and D (P < 0.05). In the large phantom, there was no significant difference in sensitivity among protocols (P = 0.79), with highest rates for protocols B (31.4%) and C (31.7%).Our study indicates that low-keV VMI+ results in improved contrast-to-noise-ratio of the aorta, whereas noise properties, subjective image quality, conspicuity, and sensitivity for endoleaks were overall superior for SEI.

    View details for DOI 10.1097/RLI.0000000000000606

    View details for Web of Science ID 000503082400007

    View details for PubMedID 31503078

  • Lung cancer screening with submillisievert chest CT: Potential pitfalls of pulmonary findings in different readers with various experience levels EUROPEAN JOURNAL OF RADIOLOGY Martini, K., Ottilinger, T., Serrallach, B., Markart, S., Glaser-Gallion, N., Bluthgen, C., Leschka, S., Bauer, R. W., Wildermuth, S., Messerli, M. 2019; 121: 108720

    Abstract

    To assess the interreader variability of submillisievert CT for lung cancer screening in radiologists with various experience levels.Six radiologists with different degrees of clinical experience in radiology (range, 1-15 years), rated 100 submillisievert CT chest studies as either negative screening finding (no nodules, benign nodules, nodules <5 mm), indeterminate finding (nodules 5-10 mm), positive finding (nodules >10 mm). Each radiologist interpreted scans randomly ordered and reading time was recorded. Interobserver agreement was assessed with ak statistic. Reasons for differences in nodule classification were analysed on a case-by-case basis. Reading time was correlated with reader experience using Pearson correlation (r).The overall interobserver agreement between all readers was moderate (k = 0.454; p < 0.001). In 57 patients, all radiologists agreed on the differentiation of negative and indeterminate/positive finding. In 64 cases disagreement between readers led to different nodule classification. In 8 cases some readers rated the nodule as benign, whereas others scored the case as positive. Overall, disagreement in nodule classification was mostly due to failure in identification of target lesion (n = 40), different lesion measurement (n = 44) or different classification (n = 26). Mean overall reading time per scan was of 2 min 2 s (range: 7s-7 min 45 s) and correlated with reader-experience (r = -0.824).Our study showed substantial interobserver variability for the detection and classification of pulmonary nodules in submillisievert CT. This highlights the importance for careful standardisation of screening programs with the objective of harmonizing efforts of involved radiologists across different institutions by defining and assuring quality standards.

    View details for DOI 10.1016/j.ejrad.2019.108720

    View details for Web of Science ID 000500465900027

    View details for PubMedID 31711024

  • Detection of tuberculosis patterns in digital photographs of chest X-ray images using Deep Learning: feasibility study INTERNATIONAL JOURNAL OF TUBERCULOSIS AND LUNG DISEASE Becker, A. S., Bluthgen, C., Van, V., Sekaggya-Wiltshire, C., Castelnuovo, B., Kambugu, A., Fehr, J., Frauenfelder, T. 2018; 22 (3): 328-+

    Abstract

    To evaluate the feasibility of Deep Learning-based detection and classification of pathological patterns in a set of digital photographs of chest X-ray (CXR) images of tuberculosis (TB) patients.In this prospective, observational study, patients with previously diagnosed TB were enrolled. Photographs of their CXRs were taken using a consumer-grade digital still camera. The images were stratified by pathological patterns into classes: cavity, consolidation, effusion, interstitial changes, miliary pattern or normal examination. Image analysis was performed with commercially available Deep Learning software in two steps. Pathological areas were first localised; detected areas were then classified. Detection was assessed using receiver operating characteristics (ROC) analysis, and classification using a confusion matrix.The study cohort was 138 patients with human immunodeficiency virus (HIV) and TB co-infection (median age 34 years, IQR 28-40); 54 patients were female. Localisation of pathological areas was excellent (area under the ROC curve 0.82). The software could perfectly distinguish pleural effusions from intraparenchymal changes. The most frequent misclassifications were consolidations as cavitations, and miliary patterns as interstitial patterns (and vice versa).Deep Learning analysis of CXR photographs is a promising tool. Further efforts are needed to build larger, high-quality data sets to achieve better diagnostic performance.

    View details for DOI 10.5588/ijtld.17.0520

    View details for Web of Science ID 000429790700016

    View details for PubMedID 29471912

  • Medicina ex Machina: Machine Learning in der Medizin. Praxis Becker, A. S., Bluthgen, C., Muhlematter, U., Boss, A. 2018; 107 (1): 19-23

    View details for DOI 10.1024/1661-8157/a002920

    View details for PubMedID 29295679

  • Economical Sponge Phantom for Teaching, Understanding, and Researching A- and B-Line Reverberation Artifacts in Lung Ultrasound JOURNAL OF ULTRASOUND IN MEDICINE Bluethgen, C., Sanabria, S., Frauenfelder, T., Klingmueller, V., Rominger, M. 2017; 36 (10): 2133-2142

    Abstract

    This project evaluated a low-cost sponge phantom setup for its capability to teach and study A- and B-line reverberation artifacts known from lung ultrasound and to numerically simulate sound wave interaction with the phantom using a finite-difference time-domain (FDTD) model. Both A- and B-line artifacts were reproducible on B-mode ultrasound imaging as well as in the FDTD-based simulation. The phantom was found to be an easy-to-set up and economical tool for understanding, teaching, and researching A- and B-line artifacts occurring in lung ultrasound. The FDTD method-based simulation was able to reproduce the artifacts and provides intuitive insight into the underlying physics.

    View details for DOI 10.1002/jum.14266

    View details for Web of Science ID 000411063300017

    View details for PubMedID 28626903