Clinical Focus


  • Body MRI
  • Data-driven Medicine
  • Diagnostic Radiology

Academic Appointments


  • Clinical Assistant Professor, Radiology

Professional Education


  • Board Certification: American Board of Radiology, Diagnostic Radiology (2017)
  • Fellowship: Stanford University Body Imaging Fellowship (2017) CA
  • Residency: Stanford University Radiology Residency (2016) CA
  • Internship: Penn State Milton S Hershey Medical Center Surgery Residency (2012) PA
  • Medical Education: Penn State College of Medicine Registrar (2011) PA
  • Grad, Penn State University, Neural Engineering / Engineering Physics
  • BS, University of Pittsburgh, Mathematics / Computer Science

Current Research and Scholarly Interests


Machine learning in medicine

All Publications


  • Analysis of Validation Performance of a Machine Learning Classifier in Interstitial Lung Disease Cases Without Definite or Probable Usual Interstitial Pneumonia Pattern on CT Using Clinical and Pathology-Supported Diagnostic Labels. Journal of imaging informatics in medicine Chang, M., Reicher, J. J., Kalra, A., Muelly, M., Ahmad, Y. 2024

    Abstract

    We previously validated Fibresolve, a machine learning classifier system that non-invasively predicts idiopathic pulmonary fibrosis (IPF) diagnosis. The system incorporates an automated deep learning algorithm that analyzes chest computed tomography (CT) imaging to assess for features associated with idiopathic pulmonary fibrosis. Here, we assess performance in assessment of patterns beyond those that are characteristic features of usual interstitial pneumonia (UIP) pattern. The machine learning classifier was previously developed and validated using standard training, validation, and test sets, with clinical plus pathologically determined ground truth. The multi-site 295-patient validation dataset was used for focused subgroup analysis in this investigation to evaluate the classifier's performance range in cases with and without radiologic UIP and probable UIP designations. Radiologic assessment of specific features for UIP including the presence and distribution of reticulation, ground glass, bronchiectasis, and honeycombing was used for assignment of radiologic pattern. Output from the classifier was assessed within various UIP subgroups. The machine learning classifier was able to classify cases not meeting the criteria for UIP or probable UIP as IPF with estimated sensitivity of 56-65% and estimated specificity of 92-94%. Example cases demonstrated non-basilar-predominant as well as ground glass patterns that were indeterminate for UIP by subjective imaging criteria but for which the classifier system was able to correctly identify the case as IPF as confirmed by multidisciplinary discussion generally inclusive of histopathology. The machine learning classifier Fibresolve may be helpful in the diagnosis of IPF in cases without radiological UIP and probable UIP patterns.

    View details for DOI 10.1007/s10278-023-00914-w

    View details for PubMedID 38343230

  • Development and validation of a CT-based deep learning algorithm to augment non-invasive diagnosis of idiopathic pulmonary fibrosis. Respiratory medicine Maddali, M. V., Kalra, A., Muelly, M., Reicher, J. J. 2023: 107428

    Abstract

    Non-invasive diagnosis of idiopathic pulmonary fibrosis (IPF) involves identification of usual interstitial pneumonia (UIP) pattern by computed tomography (CT) and exclusion of other known etiologies of interstitial lung disease (ILD). However, uncertainty in identification of radiologic UIP pattern leads to the continued need for invasive surgical biopsy. We thus developed and validated a machine learning algorithm using CT scans alone to augment non-invasive diagnosis of IPF.The primary algorithm was a deep learning convolutional neural network (CNN) with model inputs of CT images only. The algorithm was trained to predict IPF among cases of ILD, with reference standard of multidisciplinary discussion (MDD) consensus diagnosis. The algorithm was trained using a multi-center dataset of more than 2000 cases of ILD. A US-based multi-site cohort (n = 295) was used for algorithm tuning, and external validation was performed with a separate dataset (n = 295) from European and South American sources.In the tuning set, the model achieved an area under the receiver operating characteristic curve (AUC) of 0.87 (CI: 0.83-0.92) in differentiating IPF from other ILDs. Sensitivity and specificity were 0.67 (0.57-0.76) and 0.90 (0.83-0.95), respectively. By contrast, pre-recorded assessment prior to MDD diagnosis had sensitivity of 0.31 (0.23-0.42) and specificity of 0.92 (0.87-0.95). In the external test set, c-statistic was also 0.87 (0.83-0.91). Model performance was consistent across a variety of CT scanner manufacturers and slice thickness.The presented deep learning algorithm demonstrated consistent performance in identifying IPF among cases of ILD using CT images alone and suggests generalization across CT manufacturers.

    View details for DOI 10.1016/j.rmed.2023.107428

    View details for PubMedID 37838076

  • Computer-Aided Pulmonary Fibrosis Detection Leveraging an Advanced Artificial Intelligence Triage and Notification Software. Journal of clinical medicine research Selvan, K. C., Kalra, A., Reicher, J., Muelly, M., Adegunsoye, A. 2023; 15 (8-9): 423-429

    Abstract

    Improvement in recognition and referral of pulmonary fibrosis (PF) is vital to improving patient outcomes within interstitial lung disease. We determined the performance metrics and processing time of an artificial intelligence triage and notification software, ScreenDx-LungFibrosis™, developed to improve detection of PF.ScreenDx-LungFibrosis™ was applied to chest computed tomography (CT) scans from multisource data. Device output (+/- PF) was compared to clinical diagnosis (+/- PF), and diagnostic performance was evaluated. Primary endpoints included device sensitivity and specificity > 80% and processing time < 4.5 min.Of 3,018 patients included, PF was present in 22.9%. ScreenDx-LungFibrosis™ detected PF with a sensitivity and specificity of 91.3% (95% confidence interval (CI): 89.0-93.3%) and 95.1% (95% CI: 94.2-96.0%), respectively. Mean processing time was 27.6 s (95% CI: 26.0 - 29.1 s).ScreenDx-LungFibrosis™ accurately and reliably identified PF with a rapid per-case processing time, underscoring its potential for transformative improvement in PF outcomes when routinely applied to chest CTs.

    View details for DOI 10.14740/jocmr5020

    View details for PubMedID 37822853

    View details for PubMedCentralID PMC10563821

  • Machine learning to distinguish lymphangioleiomyomatosis from other diffuse cystic lung diseases. Respiratory investigation Jonas, A., Muelly, M., Gupta, N., Reicher, J. J. 2022

    Abstract

    Patients with lymphangioleiomyomatosis (LAM) frequently experience delays in diagnosis, owing partly to the delayed characterization of imaging findings. This project aimed to develop a machine learning model to distinguish LAM from other diffuse cystic lung diseases (DCLDs). Computed tomography scans from patients with confirmed DCLDs were acquired from registry datasets and a recurrent convolutional neural network was trained for their classification. The final model provided sensitivity and specificity of 85% and 92%, respectively, for LAM, similar to the historical metrics of 88% and 97%, respectively, by experts. The proof-of-concept work holds promise as a clinically useful tool to assist in recognizing LAM.

    View details for DOI 10.1016/j.resinv.2022.01.001

    View details for PubMedID 35181263

  • Spotting brain bleeding after sparse training NATURE BIOMEDICAL ENGINEERING Muelly, M. C., Peng, L. 2019; 3 (3): 161-162

    View details for DOI 10.1038/s41551-019-0368-5

    View details for Web of Science ID 000460576800002

    View details for PubMedID 30948816

  • Generative Modeling for Small-Data Object Detection Liu, L., Muelly, M., Deng, J., Pfister, T., Li, J., IEEE IEEE. 2019: 6072–80
  • View-Sharing Artifact Reduction With Retrospective Compressed Sensing Reconstruction in the Context of Contrast-Enhanced Liver MRI for Hepatocellular Carcinoma (HCC) Screening. Journal of magnetic resonance imaging : JMRI Shaikh, J., Stoddard, P. B., Levine, E. G., Roh, A. T., Saranathan, M., Chang, S. T., Muelly, M. C., Hargreaves, B. A., Vasanawala, S. S., Loening, A. M. 2018

    Abstract

    BACKGROUND: View-sharing (VS) increases spatiotemporal resolution in dynamic contrast-enhanced (DCE) MRI by sharing high-frequency k-space data across temporal phases. This temporal sharing results in respiratory motion within any phase to propagate artifacts across all shared phases. Compressed sensing (CS) eliminates the need for VS by recovering missing k-space data from pseudorandom undersampling, reducing temporal blurring while maintaining spatial resolution.PURPOSE: To evaluate a CS reconstruction algorithm on undersampled DCE-MRI data for image quality and hepatocellular carcinoma (HCC) detection.STUDY TYPE: Retrospective.SUBJECTS: Fifty consecutive patients undergoing MRI for HCC screening (29 males, 21 females, 52-72 years).FIELD STRENGTH/SEQUENCE: 3.0T MRI. Multiphase 3D-SPGR T1 -weighted sequence undersampled in arterial phases with a complementary Poisson disc sampling pattern reconstructed with VS and CS algorithms.ASSESSMENT: VS and CS reconstructions evaluated by blinded assessments of image quality and anatomic delineation on Likert scales (1-4 and 1-5, respectively), and HCC detection by OPTN/UNOS criteria including a diagnostic confidence score (1-5). Blinded side-by-side reconstruction comparisons for lesion depiction and overall series preference (-3-3).STATISTICAL ANALYSIS: Two-tailed Wilcoxon signed rank tests for paired nonparametric analyses with Bonferroni-Holm multiple-comparison corrections. McNemar's test for differences in lesion detection frequency and transplantation eligibility.RESULTS: CS compared with VS demonstrated significantly improved contrast (mean 3.6 vs. 2.9, P<0.0001) and less motion artifact (mean 3.6 vs. 3.2, P=0.006). CS compared with VS demonstrated significantly improved delineations of liver margin (mean 4.5 vs. 3.8, P=0.0002), portal veins (mean 4.5 vs. 3.7, P<0.0001), and hepatic veins (mean 4.6 vs. 3.5, P<0.0001), but significantly decreased delineation of hepatic arteries (mean 3.2 vs. 3.7, P=0.004). No significant differences were seen in the other assessments.DATA CONCLUSION: Applying a CS reconstruction to data acquired for a VS reconstruction significantly reduces motion artifacts in a clinical DCE protocol for HCC screening.LEVEL OF EVIDENCE: 3 Technical Efficacy: Stage 2 J. Magn. Reson. Imaging 2018.

    View details for PubMedID 30390358

  • Sanity Checks for Saliency Maps Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., Kim, B., Bengio, S., Wallach, H., Larochelle, H., Grauman, K., CesaBianchi, N., Garnett, R. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2018