Dr. Safwan Halabi is a Clinical Associate Professor of Radiology at the Stanford University School of Medicine and serves as the Medical Director for Radiology Informatics at Stanford Children's Health. He is board-certified in Radiology with Certificate of Added Qualification in Pediatric Radiology. He is also board-certified in Clinical Informatics. He clinically practices obstetric and pediatric imaging at Lucile Packard Children's Hospital. Dr. Halabi’s clinical and administrative leadership roles are directed at improving quality of care, efficiency, and patient safety. He has also lead strategic efforts to improve the enterprise imaging platforms at Stanford Children’s Health. He is a strong advocate of patient-centric care and has helped guide policies for radiology report and image release to patients. He has published in peer-reviewed journals on various clinical and informatics topics. His current academic and research interests include: imaging informatics, deep/machine learning in imaging, artificial intelligence in medicine, clinical decision support and patient-centric health care delivery. He is currently the Chair of the RSNA Informatics Data Science Committee and serves as a Board Member for the Society for Imaging Informatics in Medicine.
- Medical Informatics
- Fetal Imaging
- Pediatric Imaging
- Pediatric Radiology
Clinical Associate Professor, Radiology - Pediatric Radiology
Board Certification: American Board of Radiology, Diagnostic Radiology (2006)
Fellowship: Cincinnati Childrens Hospital and Medical Center Radiology Fellowships (2007) OH
Board Certification: American Board of Preventive Medicine, Clinical Informatics (2014)
Board Certification: American Board of Radiology, Pediatric Radiology (2009)
Residency: Henry Ford Health System (2006) MI
Internship: Henry Ford Health System (2002) MI
Medical Education: University of Toledo College of Medicine (2001) OH
Validation of an Artificial Intelligence-based Algorithm for Skeletal Age Assessment
The purpose of this study is to understand the effects of using a Artificial Intelligence algorithm for skeletal age estimation as a computer-aided diagnosis (CADx) system. In this prospective real-time study, the investigators will send de-identified hand radiographs to the Artificial Intelligence algorithm and surface the output of this algorithm to the radiologist, who will incorporate this information with their normal workflows to make a diagnosis of the patient's bone age. All radiologists involved in the study will be trained to recognize the surfaced prediction to be the output of the Artificial Intelligence algorithm. The radiologists' diagnosis will be final and considered independent to the output of the algorithm.
Stanford is currently not accepting patients for this trial. For more information, please contact Safwan Halabi, M.D., (650) 721-2850.
- Deep learning augments liver stiffness classification in children. Pediatric radiology 2021
Decoding COVID-19 pneumonia: comparison of deep learning and radiomics CT image signatures.
European journal of nuclear medicine and molecular imaging
High-dimensional image features that underlie COVID-19 pneumonia remain opaque. We aim to compare feature engineering and deep learning methods to gain insights into the image features that drive CT-based for COVID-19 pneumonia prediction, and uncover CT image features significant for COVID-19 pneumonia from deep learning and radiomics framework.A total of 266 patients with COVID-19 and other viral pneumonia with clinical symptoms and CT signs similar to that of COVID-19 during the outbreak were retrospectively collected from three hospitals in China and the USA. All the pneumonia lesions on CT images were manually delineated by four radiologists. One hundred eighty-four patients (n = 93 COVID-19 positive; n = 91 COVID-19 negative; 24,216 pneumonia lesions from 12,001 CT image slices) from two hospitals from China served as discovery cohort for model development. Thirty-two patients (17 COVID-19 positive, 15 COVID-19 negative; 7883 pneumonia lesions from 3799 CT image slices) from a US hospital served as external validation cohort. A bi-directional adversarial network-based framework and PyRadiomics package were used to extract deep learning and radiomics features, respectively. Linear and Lasso classifiers were used to develop models predictive of COVID-19 versus non-COVID-19 viral pneumonia.120-dimensional deep learning image features and 120-dimensional radiomics features were extracted. Linear and Lasso classifiers identified 32 high-dimensional deep learning image features and 4 radiomics features associated with COVID-19 pneumonia diagnosis (P < 0.0001). Both models achieved sensitivity > 73% and specificity > 75% on external validation cohort with slight superior performance for radiomics Lasso classifier. Human expert diagnostic performance improved (increase by 16.5% and 11.6% in sensitivity and specificity, respectively) when using a combined deep learning-radiomics model.We uncover specific deep learning and radiomics features to add insight into interpretability of machine learning algorithms and compare deep learning and radiomics models for COVID-19 pneumonia that might serve to augment human diagnostic performance.
View details for DOI 10.1007/s00259-020-05075-4
View details for PubMedID 33094432
- Deep learning to automate Brasfield chest radiographic scoring for cystic fibrosis JOURNAL OF CYSTIC FIBROSIS 2020; 19 (1): 131–38
Sonographic Diagnosis of Velamentous and Marginal Placental Cord Insertion.
2020; 36 (3): 247–54
Routine second trimester ultrasound (US) examinations include an assessment of the umbilical cord given its vital role as a vascular conduit between the maternal placenta and fetus during fetal development. Placental cord insertion abnormalities can be identified during prenatal US screening and are increasingly recognized as independent risk factors for various complications during pregnancy and delivery. The purpose of this pictorial review is to illustrate examples of velamentous and marginal placental cord insertion with an emphasis on how to differentiate their morphology using color Doppler US.
View details for DOI 10.1097/RUQ.0000000000000437
View details for PubMedID 30870317
The Effect of Including Benchmark Prevalence Data of Common Imaging Findings in Spine Image Reports on Health Care Utilization Among Adults Undergoing Spine Imaging: A Stepped-Wedge Randomized Clinical Trial.
JAMA network open
2020; 3 (9): e2015713
Lumbar spine imaging frequently reveals findings that may seem alarming but are likely unrelated to pain. Prior work has suggested that inserting data on the prevalence of imaging findings among asymptomatic individuals into spine imaging reports may reduce unnecessary subsequent interventions.To evaluate the impact of including benchmark prevalence data in routine spinal imaging reports on subsequent spine-related health care utilization and opioid prescriptions.This stepped-wedge, pragmatic randomized clinical trial included 250 401 adult participants receiving care from 98 primary care clinics at 4 large health systems in the United States. Participants had imaging of their backs between October 2013 and September 2016 without having had spine imaging in the prior year. Data analysis was conducted from November 2018 to October 2019.Either standard lumbar spine imaging reports (control group) or reports containing age-appropriate prevalence data for common imaging findings in individuals without back pain (intervention group).Health care utilization was measured in spine-related relative value units (RVUs) within 365 days of index imaging. The number of subsequent opioid prescriptions written by a primary care clinician was a secondary outcome, and prespecified subgroup analyses examined results by imaging modality.We enrolled 250 401 participants (of whom 238 886 [95.4%] met eligibility for this analysis, with 137 373 [57.5%] women and 105 497 [44.2%] aged >60 years) from 3278 primary care clinicians. A total of 117 455 patients (49.2%) were randomized to the control group, and 121 431 patients (50.8%) were randomized to the intervention group. There was no significant difference in cumulative spine-related RVUs comparing intervention and control conditions through 365 days. The adjusted median (interquartile range) RVU for the control group was 3.56 (2.71-5.12) compared with 3.53 (2.68-5.08) for the intervention group (difference, -0.7%; 95% CI, -2.9% to 1.5%; P = .54). Rates of subsequent RVUs did not differ between groups by specific clinical findings in the report but did differ by type of index imaging (eg, computed tomography: difference, -29.3%; 95% CI, -42.1% to -13.5%; magnetic resonance imaging: difference, -3.4%; 95% CI, -8.3% to 1.8%). We observed a small but significant decrease in the likelihood of opioid prescribing from a study clinician within 1 year of the intervention (odds ratio, 0.95; 95% CI, 0.91 to 1.00; P = .04).In this study, inserting benchmark prevalence information in lumbar spine imaging reports did not decrease subsequent spine-related RVUs but did reduce subsequent opioid prescriptions. The intervention text is simple, inexpensive, and easily implemented.ClinicalTrials.gov Identifier: NCT02015455.
View details for DOI 10.1001/jamanetworkopen.2020.15713
View details for PubMedID 32886121
In fetuses with congenital lung masses, decreased ventricular and atrioventricular valve dimensions are associated with lesion size and clinical outcome.
INTRODUCTION: The clinical importance of mass effect from congenital lung masses on the fetal heart is unknown. We aimed to report cardiac measurements in fetuses with congenital lung masses, and correlate lung mass severity/size with cardiac dimensions and clinical outcomes.METHODS: Cases were identified from our institutional database between 2009 and 2016. We recorded: atrioventricular valve (AVVz) annulus dimensions and ventricular widths (VWz) converted into z-scores, ratio of aortic to total cardiac output (AoCO), lesion side, and congenital pulmonary airway malformation volume ratio (CVR). Respiratory intervention (RI) was defined as: intubation, ECMO use or surgical intervention prior to discharge.RESULTS: Fifty-two fetuses comprised the study cohort. Mean AVVz and VWz were below expected for gestational age. CVR correlated with ipsilateral AVVz (RS =-0.59, p<0.001) and ipsilateral VWz (-0.59, p<0.001). Lower AVVz, AoCO, and higher CVR were associated with RI. No patient had significant structural heart disease identified postnatally.CONCLUSION: In fetuses with left-sided lung masses, ipsilateral cardiac structures tend to be smaller, but in our cohort there were no patients with structural heart disease. However, smaller left-sided structures may contribute to the need for RI that affects a portion of these fetuses.
View details for DOI 10.1002/pd.5612
View details for PubMedID 31742724
Deep learning to automate Brasfield chest radiographic scoring for cystic fibrosis.
Journal of cystic fibrosis : official journal of the European Cystic Fibrosis Society
BACKGROUND: The aim of this study was to evaluate the hypothesis that a deep convolutional neural network (DCNN) model could facilitate automated Brasfield scoring of chest radiographs (CXRs) for patients with cystic fibrosis (CF), performing similarly to a pediatric radiologist.METHODS: All frontal/lateral chest radiographs (2058 exams) performed in CF patients at a single institution from January 2008-2018 were retrospectively identified, and ground-truth Brasfield scoring performed by a board-certified pediatric radiologist. 1858 exams (90.3%) were used to train and validate the DCNN model, while 200 exams (9.7%) were reserved for a test set. Five board-certified pediatric radiologists independently scored the test set according to the Brasfield method. DCNN model vs. radiologist performance was compared using Spearman correlation (rho) as well as mean difference (MD), mean absolute difference (MAD), and root mean squared error (RMSE) estimation.RESULTS: For the total Brasfield score, rho for the model-derived results computed pairwise with each radiologist's scores ranged from 0.79-0.83, compared to 0.85-0.90 for radiologist vs. radiologist scores. The MD between model estimates of the total Brasfield score and the average score of radiologists was -0.09. Based on MD, MAD, and RMSE, the model matched or exceeded radiologist performance for all subfeatures except air-trapping and large lesions.CONCLUSIONS: A DCNN model is promising for predicting CF Brasfield scores with accuracy similar to that of a pediatric radiologist.
View details for PubMedID 31056440
- The RSNA Pediatric Bone Age Machine Learning Challenge RADIOLOGY 2019; 290 (2): 498–503
Human-machine partnership with artificial intelligence for chest radiograph diagnosis.
NPJ digital medicine
2019; 2: 111
Human-in-the-loop (HITL) AI may enable an ideal symbiosis of human experts and AI models, harnessing the advantages of both while at the same time overcoming their respective limitations. The purpose of this study was to investigate a novel collective intelligence technology designed to amplify the diagnostic accuracy of networked human groups by forming real-time systems modeled on biological swarms. Using small groups of radiologists, the swarm-based technology was applied to the diagnosis of pneumonia on chest radiographs and compared against human experts alone, as well as two state-of-the-art deep learning AI models. Our work demonstrates that both the swarm-based technology and deep-learning technology achieved superior diagnostic accuracy than the human experts alone. Our work further demonstrates that when used in combination, the swarm-based technology and deep-learning technology outperformed either method alone. The superior diagnostic accuracy of the combined HITL AI solution compared to radiologists and AI alone has broad implications for the surging clinical AI deployment and implementation strategies in future practice.
View details for DOI 10.1038/s41746-019-0189-7
View details for PubMedID 31754637
View details for PubMedCentralID PMC6861262
Obstetric and neonatal outcomes in pregnancies complicated by fetal lung masses: does final histology matter?
The journal of maternal-fetal & neonatal medicine : the official journal of the European Association of Perinatal Medicine, the Federation of Asia and Oceania Perinatal Societies, the International Society of Perinatal Obstetricians
Purpose: Fetal lung masses complicate approximately 1 in 2000 live births. Our aim was to determine whether obstetric and neonatal outcomes differ by final fetal lung mass histology.Materials and methods: A review of all pregnancies complicated by a prenatally diagnosed fetal lung mass between 2009 and 2017 at a single academic center was conducted. All cases included in the final analysis underwent surgical resection and histology diagnosis was determined by a trained pathologist. Clinical data were obtained from review of stored electronic medical records which contained linked maternal and neonatal records. Imaging records included both prenatal ultrasound and magnetic resonance imaging. Fisher's exact test was used for categorical variables and the Kruskal-Wallis test was used for continuous variables. The level of significance was p<.05.Results: Of 61 pregnancies complicated by fetal lung mass during the study period, 45 cases underwent both prenatal care and postnatal resection. Final histology revealed 10 cases of congenital pulmonary airway malformation (CPAM) type 1, nine cases of CPAM type 2, and 16 cases of bronchopulmonary sequestration. There was no difference in initial, maximal, or final CPAM volume ratio between groups, with median final CPAM volume ratio of 0.6 for CPAM type 1, 0.7 for CPAM type 2, and 0.3 for bronchopulmonary sequestration (p = .12). There were no differences in any of the maternal or obstetric outcomes including gestational age at delivery and mode of delivery between the groups. The primary outcome of neonatal respiratory distress was not statistically different between groups (p = .66). Median neonatal length of stay following delivery ranged from 3 to 4 days, and time to postnatal resection was similar as well, with a median of 126 days for CPAM type 1, 122 days for CPAM type 2, and 132 days for bronchopulmonary sequestration (p = .76).Conclusions: In our cohort, there was no significant association between histologic lung mass subtypes and any obstetric or neonatal morbidity including respiratory distress.
View details for DOI 10.1080/14767058.2019.1689559
View details for PubMedID 31722592
Improving Automated Pediatric Bone Age Estimation Using Ensembles of Models from the 2017 RSNA Machine Learning Challenge.
Radiology. Artificial intelligence
2019; 1 (6): e190053
To investigate improvements in performance for automatic bone age estimation that can be gained through model ensembling.A total of 48 submissions from the 2017 RSNA Pediatric Bone Age Machine Learning Challenge were used. Participants were provided with 12 611 pediatric hand radiographs with bone ages determined by a pediatric radiologist to develop models for bone age determination. The final results were determined using a test set of 200 radiographs labeled with the weighted average of six ratings. The mean pairwise model correlation and performance of all possible model combinations for ensembles of up to 10 models using the mean absolute deviation (MAD) were evaluated. A bootstrap analysis using the 200 test radiographs was conducted to estimate the true generalization MAD.The estimated generalization MAD of a single model was 4.55 months. The best-performing ensemble consisted of four models with an MAD of 3.79 months. The mean pairwise correlation of models within this ensemble was 0.47. In comparison, the lowest achievable MAD by combining the highest-ranking models based on individual scores was 3.93 months using eight models with a mean pairwise model correlation of 0.67.Combining less-correlated, high-performing models resulted in better performance than naively combining the top-performing models. Machine learning competitions within radiology should be encouraged to spur development of heterogeneous models whose predictions can be combined to achieve optimal performance.© RSNA, 2019 Supplemental material is available for this article. See also the commentary by Siegel in this issue.
View details for DOI 10.1148/ryai.2019190053
View details for PubMedID 32090207
View details for PubMedCentralID PMC6884060
- Obstetric and neonatal outcomes in pregnancies complicated by fetal lung masses: does final histology matter? MOSBY-ELSEVIER. 2019: S151
Deep Learning-Assisted Diagnosis of Cerebral Aneurysms Using the HeadXNet Model.
JAMA network open
2019; 2 (6): e195600
Deep learning has the potential to augment clinician performance in medical imaging interpretation and reduce time to diagnosis through automated segmentation. Few studies to date have explored this topic.To develop and apply a neural network segmentation model (the HeadXNet model) capable of generating precise voxel-by-voxel predictions of intracranial aneurysms on head computed tomographic angiography (CTA) imaging to augment clinicians' intracranial aneurysm diagnostic performance.In this diagnostic study, a 3-dimensional convolutional neural network architecture was developed using a training set of 611 head CTA examinations to generate aneurysm segmentations. Segmentation outputs from this support model on a test set of 115 examinations were provided to clinicians. Between August 13, 2018, and October 4, 2018, 8 clinicians diagnosed the presence of aneurysm on the test set, both with and without model augmentation, in a crossover design using randomized order and a 14-day washout period. Head and neck examinations performed between January 3, 2003, and May 31, 2017, at a single academic medical center were used to train, validate, and test the model. Examinations positive for aneurysm had at least 1 clinically significant, nonruptured intracranial aneurysm. Examinations with hemorrhage, ruptured aneurysm, posttraumatic or infectious pseudoaneurysm, arteriovenous malformation, surgical clips, coils, catheters, or other surgical hardware were excluded. All other CTA examinations were considered controls.Sensitivity, specificity, accuracy, time, and interrater agreement were measured. Metrics for clinician performance with and without model augmentation were compared.The data set contained 818 examinations from 662 unique patients with 328 CTA examinations (40.1%) containing at least 1 intracranial aneurysm and 490 examinations (59.9%) without intracranial aneurysms. The 8 clinicians reading the test set ranged in experience from 2 to 12 years. Augmenting clinicians with artificial intelligence-produced segmentation predictions resulted in clinicians achieving statistically significant improvements in sensitivity, accuracy, and interrater agreement when compared with no augmentation. The clinicians' mean sensitivity increased by 0.059 (95% CI, 0.028-0.091; adjusted P = .01), mean accuracy increased by 0.038 (95% CI, 0.014-0.062; adjusted P = .02), and mean interrater agreement (Fleiss κ) increased by 0.060, from 0.799 to 0.859 (adjusted P = .05). There was no statistically significant change in mean specificity (0.016; 95% CI, -0.010 to 0.041; adjusted P = .16) and time to diagnosis (5.71 seconds; 95% CI, 7.22-18.63 seconds; adjusted P = .19).The deep learning model developed successfully detected clinically significant intracranial aneurysms on CTA. This suggests that integration of an artificial intelligence-assisted diagnostic model may augment clinician performance with dependable and accurate predictions and thereby optimize patient care.
View details for DOI 10.1001/jamanetworkopen.2019.5600
View details for PubMedID 31173130
CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison
ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE. 2019: 590–97
View details for Web of Science ID 000485292600073
- Erratum: Author Correction: Human-machine partnership with artificial intelligence for chest radiograph diagnosis. NPJ digital medicine 2019; 2: 129
- Author Correction: Human-machine partnership with artificial intelligence for chest radiograph diagnosis. NPJ digital medicine 2019; 2 (1): 129
The RSNA Pediatric Bone Age Machine Learning Challenge.
Purpose The Radiological Society of North America (RSNA) Pediatric Bone Age Machine Learning Challenge was created to show an application of machine learning (ML) and artificial intelligence (AI) in medical imaging, promote collaboration to catalyze AI model creation, and identify innovators in medical imaging. Materials and Methods The goal of this challenge was to solicit individuals and teams to create an algorithm or model using ML techniques that would accurately determine skeletal age in a curated data set of pediatric hand radiographs. The primary evaluation measure was the mean absolute distance (MAD) in months, which was calculated as the mean of the absolute values of the difference between the model estimates and those of the reference standard, bone age. Results A data set consisting of 14 236 hand radiographs (12 611 training set, 1425 validation set, 200 test set) was made available to registered challenge participants. A total of 260 individuals or teams registered on the Challenge website. A total of 105 submissions were uploaded from 48 unique users during the training, validation, and test phases. Almost all methods used deep neural network techniques based on one or more convolutional neural networks (CNNs). The best five results based on MAD were 4.2, 4.4, 4.4, 4.5, and 4.5 months, respectively. Conclusion The RSNA Pediatric Bone Age Machine Learning Challenge showed how a coordinated approach to solving a medical imaging problem can be successfully conducted. Future ML challenges will catalyze collaboration and development of ML tools and methods that can potentially improve diagnostic accuracy and patient care. © RSNA, 2018 Online supplemental material is available for this article.
View details for PubMedID 30480490
- Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists PLOS MEDICINE 2018; 15 (11)
Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists.
2018; 15 (11): e1002686
BACKGROUND: Chest radiograph interpretation is critical for the detection of thoracic diseases, including tuberculosis and lung cancer, which affect millions of people worldwide each year. This time-consuming task typically requires expert radiologists to read the images, leading to fatigue-based diagnostic error and lack of diagnostic expertise in areas of the world where radiologists are not available. Recently, deep learning approaches have been able to achieve expert-level performance in medical image interpretation tasks, powered by large network architectures and fueled by the emergence of large labeled datasets. The purpose of this study is to investigate the performance of a deep learning algorithm on the detection of pathologies in chest radiographs compared with practicing radiologists.METHODS AND FINDINGS: We developed CheXNeXt, a convolutional neural network to concurrently detect the presence of 14 different pathologies, including pneumonia, pleural effusion, pulmonary masses, and nodules in frontal-view chest radiographs. CheXNeXt was trained and internally validated on the ChestX-ray8 dataset, with a held-out validation set consisting of 420 images, sampled to contain at least 50 cases of each of the original pathology labels. On this validation set, the majority vote of a panel of 3 board-certified cardiothoracic specialist radiologists served as reference standard. We compared CheXNeXt's discriminative performance on the validation set to the performance of 9 radiologists using the area under the receiver operating characteristic curve (AUC). The radiologists included 6 board-certified radiologists (average experience 12 years, range 4-28 years) and 3 senior radiology residents, from 3 academic institutions. We found that CheXNeXt achieved radiologist-level performance on 11 pathologies and did not achieve radiologist-level performance on 3 pathologies. The radiologists achieved statistically significantly higher AUC performance on cardiomegaly, emphysema, and hiatal hernia, with AUCs of 0.888 (95% confidence interval [CI] 0.863-0.910), 0.911 (95% CI 0.866-0.947), and 0.985 (95% CI 0.974-0.991), respectively, whereas CheXNeXt's AUCs were 0.831 (95% CI 0.790-0.870), 0.704 (95% CI 0.567-0.833), and 0.851 (95% CI 0.785-0.909), respectively. CheXNeXt performed better than radiologists in detecting atelectasis, with an AUC of 0.862 (95% CI 0.825-0.895), statistically significantly higher than radiologists' AUC of 0.808 (95% CI 0.777-0.838); there were no statistically significant differences in AUCs for the other 10 pathologies. The average time to interpret the 420 images in the validation set was substantially longer for the radiologists (240 minutes) than for CheXNeXt (1.5 minutes). The main limitations of our study are that neither CheXNeXt nor the radiologists were permitted to use patient history or review prior examinations and that evaluation was limited to a dataset from a single institution.CONCLUSIONS: In this study, we developed and validated a deep learning algorithm that classified clinically important abnormalities in chest radiographs at a performance level comparable to practicing radiologists. Once tested prospectively in clinical settings, the algorithm could have the potential to expand patient access to chest radiograph diagnostics.
View details for PubMedID 30457988
Deep-learning-assisted diagnosis for knee magnetic resonance imaging: Development and retrospective validation of MRNet.
2018; 15 (11): e1002699
BACKGROUND: Magnetic resonance imaging (MRI) of the knee is the preferred method for diagnosing knee injuries. However, interpretation of knee MRI is time-intensive and subject to diagnostic error and variability. An automated system for interpreting knee MRI could prioritize high-risk patients and assist clinicians in making diagnoses. Deep learning methods, in being able to automatically learn layers of features, are well suited for modeling the complex relationships between medical images and their interpretations. In this study we developed a deep learning model for detecting general abnormalities and specific diagnoses (anterior cruciate ligament [ACL] tears and meniscal tears) on knee MRI exams. We then measured the effect of providing the model's predictions to clinical experts during interpretation.METHODS AND FINDINGS: Our dataset consisted of 1,370 knee MRI exams performed at Stanford University Medical Center between January 1, 2001, and December 31, 2012 (mean age 38.0 years; 569 [41.5%] female patients). The majority vote of 3 musculoskeletal radiologists established reference standard labels on an internal validation set of 120 exams. We developed MRNet, a convolutional neural network for classifying MRI series and combined predictions from 3 series per exam using logistic regression. In detecting abnormalities, ACL tears, and meniscal tears, this model achieved area under the receiver operating characteristic curve (AUC) values of 0.937 (95% CI 0.895, 0.980), 0.965 (95% CI 0.938, 0.993), and 0.847 (95% CI 0.780, 0.914), respectively, on the internal validation set. We also obtained a public dataset of 917 exams with sagittal T1-weighted series and labels for ACL injury from Clinical Hospital Centre Rijeka, Croatia. On the external validation set of 183 exams, the MRNet trained on Stanford sagittal T2-weighted series achieved an AUC of 0.824 (95% CI 0.757, 0.892) in the detection of ACL injuries with no additional training, while an MRNet trained on the rest of the external data achieved an AUC of 0.911 (95% CI 0.864, 0.958). We additionally measured the specificity, sensitivity, and accuracy of 9 clinical experts (7 board-certified general radiologists and 2 orthopedic surgeons) on the internal validation set both with and without model assistance. Using a 2-sided Pearson's chi-squared test with adjustment for multiple comparisons, we found no significant differences between the performance of the model and that of unassisted general radiologists in detecting abnormalities. General radiologists achieved significantly higher sensitivity in detecting ACL tears (p-value = 0.002; q-value = 0.019) and significantly higher specificity in detecting meniscal tears (p-value = 0.003; q-value = 0.019). Using a 1-tailed t test on the change in performance metrics, we found that providing model predictions significantly increased clinical experts' specificity in identifying ACL tears (p-value < 0.001; q-value = 0.006). The primary limitations of our study include lack of surgical ground truth and the small size of the panel of clinical experts.CONCLUSIONS: Our deep learning model can rapidly generate accurate clinical pathology classifications of knee MRI exams from both internal and external datasets. Moreover, our results support the assertion that deep learning models can improve the performance of clinical experts during medical imaging interpretation. Further research is needed to validate the model prospectively and to determine its utility in the clinical setting.
View details for PubMedID 30481176
- Deep-learning-assisted diagnosis for knee magnetic resonance imaging: Development and retrospective validation of MRNet PLOS MEDICINE 2018; 15 (11)
- Migrating to the Modern PACS: Challenges and Opportunities RADIOGRAPHICS 2018; 38 (6): 1761–72
Migrating to the Modern PACS: Challenges and Opportunities.
Radiographics : a review publication of the Radiological Society of North America, Inc
2018; 38 (6): 1761–72
With progressive advancements in picture archiving and communication system (PACS) technology, radiology practices frequently look toward system upgrades and replacements to further improve efficiency and capabilities. The transition between PACS has the potential to derail the operations of a radiology department. Careful planning and attention to detail from radiology informatics leaders are imperative to ensure a smooth transition. This article is a review of the architecture of a modern PACS, highlighting areas of recent innovation. Key considerations for planning a PACS migration and important issues to consider in data migration, change management, and business continuity are discussed. Beyond the technical aspects of a PACS migration, the human factors to consider when managing the cultural change that accompanies a new informatics tool and the keys to success when managing technical failures are explored. Online supplemental material is available for this article. ©RSNA, 2018.
View details for PubMedID 30303805
Performance of a Deep-Learning Neural Network Model in Assessing Skeletal Maturity on Pediatric Hand Radiographs
2018; 287 (1): 313–22
Purpose To compare the performance of a deep-learning bone age assessment model based on hand radiographs with that of expert radiologists and that of existing automated models. Materials and Methods The institutional review board approved the study. A total of 14 036 clinical hand radiographs and corresponding reports were obtained from two children's hospitals to train and validate the model. For the first test set, composed of 200 examinations, the mean of bone age estimates from the clinical report and three additional human reviewers was used as the reference standard. Overall model performance was assessed by comparing the root mean square (RMS) and mean absolute difference (MAD) between the model estimates and the reference standard bone ages. Ninety-five percent limits of agreement were calculated in a pairwise fashion for all reviewers and the model. The RMS of a second test set composed of 913 examinations from the publicly available Digital Hand Atlas was compared with published reports of an existing automated model. Results The mean difference between bone age estimates of the model and of the reviewers was 0 years, with a mean RMS and MAD of 0.63 and 0.50 years, respectively. The estimates of the model, the clinical report, and the three reviewers were within the 95% limits of agreement. RMS for the Digital Hand Atlas data set was 0.73 years, compared with 0.61 years of a previously reported model. Conclusion A deep-learning convolutional neural network model can estimate skeletal maturity with accuracy similar to that of an expert radiologist and to that of existing automated models. © RSNA, 2017 An earlier incorrect version of this article appeared online. This article was corrected on January 19, 2018.
View details for PubMedID 29095675
Translational Radiomics: Defining the Strategy Pipeline and Considerations for Application-Part 2: From Clinical Implementation to Enterprise
JOURNAL OF THE AMERICAN COLLEGE OF RADIOLOGY
2018; 15 (3): 543–49
Enterprise imaging has channeled various technological innovations to the field of clinical radiology, ranging from advanced imaging equipment and postacquisition iterative reconstruction tools to image analysis and computer-aided detection tools. More recently, the advancement in the field of quantitative image analysis coupled with machine learning-based data analytics, classification, and integration has ushered in the era of radiomics, a paradigm shift that holds tremendous potential in clinical decision support as well as drug discovery. However, there are important issues to consider to incorporate radiomics into a clinically applicable system and a commercially viable solution. In this two-part series, we offer insights into the development of the translational pipeline for radiomics from methodology to clinical implementation (Part 1) and from that point to enterprise development (Part 2). In Part 2 of this two-part series, we study the components of the strategy pipeline, from clinical implementation to building enterprise solutions.
View details for PubMedID 29366598
Data Science: Big Data, Machine Learning, and Artificial Intelligence
JOURNAL OF THE AMERICAN COLLEGE OF RADIOLOGY
2018; 15 (3): 497–98
View details for PubMedID 29502583
Translational Radiomics: Defining the Strategy Pipeline and Considerations for Application-Part 1: From Methodology to Clinical Implementation
JOURNAL OF THE AMERICAN COLLEGE OF RADIOLOGY
2018; 15 (3): 538–42
Enterprise imaging has channeled various technological innovations to the field of clinical radiology, ranging from advanced imaging equipment and postacquisition iterative reconstruction tools to image analysis and computer-aided detection tools. More recently, the advancements in the field of quantitative image analysis coupled with machine learning-based data analytics, classification, and integration have ushered us into the era of radiomics, which has tremendous potential in clinical decision support as well as drug discovery. There are important issues to consider to incorporate radiomics as a clinically applicable system and a commercially viable solution. In this two-part series, we offer insights into the development of the translational pipeline for radiomics from methodology to clinical implementation (Part 1) and from that to enterprise development (Part 2).
View details for PubMedID 29366600
Imaging before 24 weeks gestation can predict neonatal respiratory morbidity in pregnancies complicated by fetal lung masses
MOSBY-ELSEVIER. 2018: S287–S288
View details for Web of Science ID 000422946900478
Artificial Swarm Intelligence employed to Amplify Diagnostic Accuracy in Radiology
IEEE. 2018: 1186–91
View details for Web of Science ID 000461314200195
Evaluating the Effect of Unstructured Clinical Information on Clinical Decision Support Appropriateness Ratings.
Journal of the American College of Radiology
2017; 14 (6): 737-743
To determine the appropriateness rating (AR) of advanced inpatient imaging requests that were not rated by prospective, point-of-care clinical decision support (CDS) using computerized provider order entry.During 30-day baseline and intervention periods, CDS generated an AR for advanced inpatient imaging requests (nuclear medicine, CT, and MRI) using provider-selected structured indications from pull-down menus in the computerized provider order entry portal. The AR was only displayed during the intervention, and providers were required to acknowledge the AR to finalize the request. Subsequently, the unstructured free text information accompanying all requests was reviewed, and the AR was revised when possible. The percentage of unrated requests and the overall AR, before and after radiologist review, were compared between periods and by provider type.CDS software prospectively generated an AR for only 25.4% and 28.4% of baseline and intervention imaging requests, respectively; however, radiologist review generated an AR for 82.4% and 93.6% of the same requests. During the respective periods, the percentage of baseline and intervention imaging requests considered appropriate was 18.7% and 22.9% by prospective CDS software rating and increased to 82.4% and 88.7% with radiologist review.Despite limited effective use of CDS software, the percentage of requests containing additional, relevant clinical information increased, and the majority of requests had overall high appropriateness when reviewed by a radiologist. Additional work is needed to improve the amount and quality of clinical information available to CDS software and to facilitate the entry of this information by appropriate end users.
View details for DOI 10.1016/j.jacr.2017.02.003
View details for PubMedID 28434848
- Technical Challenges in the Clinical Application of Radiomics JCO CLINICAL CANCER INFORMATICS 2017; 1
Technical Challenges in the Clinical Application of Radiomics.
JCO clinical cancer informatics
2017; 1: 1–8
Radiomics is a quantitative approach to medical image analysis targeted at deciphering the morphologic and functional features of a lesion. Radiomic methods can be applied across various malignant conditions to identify tumor phenotype characteristics in the images that correlate with their likelihood of survival, as well as their association with the underlying biology. Identifying this set of characteristic features, called tumor signature, holds tremendous value in predicting the behavior and progression of cancer, which in turn has the potential to predict its response to various therapeutic options. We discuss the technical challenges encountered in the application of radiomics, in terms of methodology, workflow integration, and user experience, that need to be addressed to harness its true potential.
View details for PubMedID 30657374
Concierge and Second-Opinion Radiology: Review of Current Practices.
Current problems in diagnostic radiology
2016; 45 (2): 111-114
Radiology's core assets include the production, interpretation, and distribution of quality imaging studies. Second-opinion services and concierge practices in radiology aim to augment traditional services by providing patient-centered and physician-centered care, respectively. Patient centeredness enhances patients' understanding and comfort with their radiology tests and procedures and allows them to make better decisions about their health care. As the fee-for-service paradigm shifts to value-based care models, radiology practices have begun to diversify imaging service delivery and communication to coincide with the American College of Radiology Imaging 3.0 campaign. Physician-centered consultation allows for communication of evidence-based guidelines to assist referring physicians and other providers in making the most appropriate imaging or treatment decision for a specific clinical condition. There are disparate practice models and payment schema for the various second-opinion and concierge practices. This review article explores the current state and payment models of second-opinion and concierge practices in radiology. This review also includes a discussion on the benefits, roadblocks, and ethical issues that surround these novel types of practices.
View details for DOI 10.1067/j.cpradiol.2015.07.011
View details for PubMedID 26305521
- Large intra-thoracic desmoid tumor with airway compression: A case report and review of the literature JOURNAL OF PEDIATRIC SURGERY CASE REPORTS 2016; 5: 15–18
Systematic Literature Review of Imaging Features of Spinal Degeneration in Asymptomatic Populations
AMERICAN JOURNAL OF NEURORADIOLOGY
2015; 36 (4): 811-816
Degenerative changes are commonly found in spine imaging but often occur in pain-free individuals as well as those with back pain. We sought to estimate the prevalence, by age, of common degenerative spine conditions by performing a systematic review studying the prevalence of spine degeneration on imaging in asymptomatic individuals.We performed a systematic review of articles reporting the prevalence of imaging findings (CT or MR imaging) in asymptomatic individuals from published English literature through April 2014. Two reviewers evaluated each manuscript. We selected age groupings by decade (20, 30, 40, 50, 60, 70, 80 years), determining age-specific prevalence estimates. For each imaging finding, we fit a generalized linear mixed-effects model for the age-specific prevalence estimate clustering in the study, adjusting for the midpoint of the reported age interval.Thirty-three articles reporting imaging findings for 3110 asymptomatic individuals met our study inclusion criteria. The prevalence of disk degeneration in asymptomatic individuals increased from 37% of 20-year-old individuals to 96% of 80-year-old individuals. Disk bulge prevalence increased from 30% of those 20 years of age to 84% of those 80 years of age. Disk protrusion prevalence increased from 29% of those 20 years of age to 43% of those 80 years of age. The prevalence of annular fissure increased from 19% of those 20 years of age to 29% of those 80 years of age.Imaging findings of spine degeneration are present in high proportions of asymptomatic individuals, increasing with age. Many imaging-based degenerative features are likely part of normal aging and unassociated with pain. These imaging findings must be interpreted in the context of the patient's clinical condition.
View details for DOI 10.3174/ajnr.A4173
View details for PubMedID 25430861
The Effect of Clinical Decision Support for Advanced Inpatient Imaging
JOURNAL OF THE AMERICAN COLLEGE OF RADIOLOGY
2015; 12 (4): 358-363
To examine the effect of integrating point-of-care clinical decision support (CDS) using the ACR Appropriateness Criteria (AC) into an inpatient computerized provider order entry (CPOE) system for advanced imaging requests.Over 12 months, inpatient CPOE requests for nuclear medicine, CT, and MRI were processed by CDS to generate an AC score using provider-selected data from pull-down menus. During the second 6-month period, AC scores were displayed to ordering providers, and acknowledgement was required to finalize a request. Request AC scores and percentages of requests not scored by CDS were compared among primary care providers (PCPs) and specialists, and by years in practice of the responsible physician of record.CDS prospectively generated a score for 26.0% and 30.3% of baseline and intervention requests, respectively. The average AC score increased slightly for all requests (7.2 ± 1.6 versus 7.4 ± 1.5; P < .001), for PCPs (6.9 ± 1.9 versus 7.4 ± 1.6; P < .001), and minimally for specialists (7.3 ± 1.6 versus 7.4 ± 1.5; P < .001). The percentage of requests lacking sufficient structured clinical information to generate an AC score decreased for all requests (73.1% versus 68.9%; P < .001), for PCPs (78.0% versus 71.7%; P < .001), and for specialists (72.9% versus 69.1%; P < .001).Integrating CDS into inpatient CPOE slightly increased the overall AC score of advanced imaging requests as well as the provision of sufficient structured data to automatically generate AC scores. Both effects were more pronounced in PCPs compared with specialists.
View details for DOI 10.1016/j.jacr.2014.11.013
View details for Web of Science ID 000352181000011
View details for PubMedID 25622766
Improving the Application of Imaging Clinical Decision Support Tools: Making the Complex Simple
JOURNAL OF THE AMERICAN COLLEGE OF RADIOLOGY
2014; 11 (3): 257-261
With the promotion and incentivization of electronic health records and computerized order entry by CMS, there is a unique opportunity to catalyze the use of evidence-based guidelines with the inclusion of clinical decision support (CDS) tools. Imaging CDS tools have evolved from static paper algorithms, checklists, and scores to interactive systems that provide feedback and recommendations with the intent of directing health care providers to deliver best practices. Some of the major limitations of first generation imaging CDS tools include a lack of comprehensive evidence-based guidelines, limited ability to input detailed patient conditions and symptoms, and time-intensive user interfaces. Next-generation imaging CDS tools will attempt to close the information and interface gaps to provide more meaningful guidance to health care providers and improve the delivery of best practices to patients.
View details for DOI 10.1016/j.jacr.2013.10.007
View details for Web of Science ID 000332354800015
View details for PubMedID 24589400