My current research is in deep neural networks that learn from multimodal clinical data including images and clinical information. I would like to combine these primary computer vision algorithms with large language models/EHR encoding models in order to integrate them into the clinical workflow, potentially as a virtual assistant.

Honors & Awards

  • DFG Walter Benjamin Award (fellowship), Deutsche Forschungsgesellschaft (2023)
  • DAAD RISE worldwide (fellowship), German Academic Exchange Service (2019)
  • Merit Scholarship, Kurt Hahn Foundation (2013)
  • Athletic Scholarship, Mercersburg Academy (2010)

Professional Education

  • Dr. med., Technical University of Munich, Germany, Radiology (2022)
  • MD, Technical University of Munich, Germany, pre-clinical and clinical studies (2021)

Stanford Advisors

Lab Affiliations

All Publications

  • Multicenter US clinical experience with the Scepter Mini balloon catheter. Interventional neuroradiology : journal of peritherapeutic neuroradiology, surgical procedures and related neurosciences Salem, M. M., Kelmer, P., Sioutas, G. S., Ostmeier, S., Hoang, A., Cortez, G., El Naamani, K., Abbas, R., Hanel, R., Tanweer, O., Srinivasan, V. M., Jabbour, P., Kan, P., Jankowitz, B. T., Heit, J. J., Burkhardt, J. K. 2024: 15910199241246135


    Distal navigability and imprecise delivery of embolic agents are two limitations encountered during liquid embolization of cerebrospinal lesions. The dual-lumen Scepter Mini balloon (SMB) microcatheter was introduced to overcome these conventional microcatheters' limitations with few small single-center reports suggesting favorable results.A series of consecutive patients undergoing SMB-assisted endovascular embolization were extracted from prospectively maintained registries in seven North-American centers (November 2019 to September 2022).Fifty-four patients undergoing 55 embolization procedures utilizing SMB were included (median age 58.5; 48.1% females). Cranial dural arteriovenous fistula embolization was the most common indication (54.5%) followed by cranial arteriovenous malformation (27.3%). Staged/pre-operative embolization was done in 36.4% of cases; and 83.6% of procedures using Onyx-18. Most procedures utilized a transarterial approach (89.1%), and SMB-induced arterial-flow arrest concurrently with transvenous embolization was used in 10.9% of procedures. Femoral access/triaxial setups were utilized in the majority of procedures (65.5% and 60%, respectively). The median vessel diameter where the balloon was inflated of 1.8 mm, with a median of 1.5 cc of injected embolic material per procedure. Technical failures occurred in 5.5% of cases requiring aborting/replacement with other devices without clinical sequelae in any of the patients, with SMB-related procedural complications of 3.6% without clinical sequelae. Radiographic imaging follow-up was available in 76.9% of the patients (median follow-up 3.8 months), with complete occlusion (100%) or >50% occlusion in 92.5% of the cases, and unplanned retreatments in 1.8%.The SMB microcatheter is a useful new adjunctive device for balloon-assisted embolization of cerebrospinal lesions with a high technical success rate, favorable outcomes, and a reasonable safety profile.

    View details for DOI 10.1177/15910199241246135

    View details for PubMedID 38613371

  • Random expert sampling for deep learning segmentation of acute ischemic stroke on non-contrast CT. Journal of neurointerventional surgery Ostmeier, S., Axelrod, B., Liu, Y., Yu, Y., Jiang, B., Yuen, N., Pulli, B., Verhaaren, B. F., Kaka, H., Wintermark, M., Michel, P., Mahammedi, A., Federau, C., Lansberg, M. G., Albers, G. W., Moseley, M. E., Zaharchuk, G., Heit, J. J. 2024


    Outlining acutely infarcted tissue on non-contrast CT is a challenging task for which human inter-reader agreement is limited. We explored two different methods for training a supervised deep learning algorithm: one that used a segmentation defined by majority vote among experts and another that trained randomly on separate individual expert segmentations.The data set consisted of 260 non-contrast CT studies in 233 patients with acute ischemic stroke recruited from the multicenter DEFUSE 3 (Endovascular Therapy Following Imaging Evaluation for Ischemic Stroke 3) trial. Additional external validation was performed using 33 patients with matched stroke onset times from the University Hospital Lausanne. A benchmark U-Net was trained on the reference annotations of three experienced neuroradiologists to segment ischemic brain tissue using majority vote and random expert sampling training schemes. The median of volume, overlap, and distance segmentation metrics were determined for agreement in lesion segmentations between (1) three experts, (2) the majority model and each expert, and (3) the random model and each expert. The two sided Wilcoxon signed rank test was used to compare performances (1) to 2) and (1) to (3). We further compared volumes with the 24 hour follow-up diffusion weighted imaging (DWI, final infarct core) and correlations with clinical outcome (modified Rankin Scale (mRS) at 90 days) with the Spearman method.The random model outperformed the inter-expert agreement ((1) to (2)) and the majority model ((1) to (3)) (dice 0.51±0.04 vs 0.36±0.05 (P<0.0001) vs 0.45±0.05 (P<0.0001)). The random model predicted volume correlated with clinical outcome (0.19, P<0.05), whereas the median expert volume and majority model volume did not. There was no significant difference when comparing the volume correlations between random model, median expert volume, and majority model to 24 hour follow-up DWI volume (P>0.05, n=51).The random model for ischemic injury delineation on non-contrast CT surpassed the inter-expert agreement ((1) to (2)) and the performance of the majority model ((1) to (3)). We showed that the random model volumetric measures of the model were consistent with 24 hour follow-up DWI.

    View details for DOI 10.1136/jnis-2023-021283

    View details for PubMedID 38302420

  • Non-inferiority of deep learning ischemic stroke segmentation on non-contrast CT within 16-hours compared to expert neuroradiologists. Scientific reports Ostmeier, S., Axelrod, B., Verhaaren, B. F., Christensen, S., Mahammedi, A., Liu, Y., Pulli, B., Li, L., Zaharchuk, G., Heit, J. J. 2023; 13 (1): 16153


    We determined if a convolutional neural network (CNN) deep learning model can accurately segment acute ischemic changes on non-contrast CT compared to neuroradiologists. Non-contrast CT (NCCT) examinations from 232 acute ischemic stroke patients who were enrolled in the DEFUSE 3 trial were included in this study. Three experienced neuroradiologists independently segmented hypodensity that reflected the ischemic core on each scan. The neuroradiologist with the most experience (expert A) served as the ground truth for deep learning model training. Two additional neuroradiologists' (experts B and C) segmentations were used for data testing. The 232 studies were randomly split into training and test sets. The training set was further randomly divided into 5 folds with training and validation sets. A 3-dimensional CNN architecture was trained and optimized to predict the segmentations of expert A from NCCT. The performance of the model was assessed using a set of volume, overlap, and distance metrics using non-inferiority thresholds of 20%, 3 ml, and 3 mm, respectively. The optimized model trained on expert A was compared to test experts B and C. We used a one-sided Wilcoxon signed-rank test to test for the non-inferiority of the model-expert compared to the inter-expert agreement. The final model performance for the ischemic core segmentation task reached a performance of 0.46 ± 0.09 Surface Dice at Tolerance 5mm and 0.47 ± 0.13 Dice when trained on expert A. Compared to the two test neuroradiologists the model-expert agreement was non-inferior to the inter-expert agreement, [Formula: see text]. The before, CNN accurately delineates the hypodense ischemic core on NCCT in acute ischemic stroke patients with an accuracy comparable to neuroradiologists.

    View details for DOI 10.1038/s41598-023-42961-x

    View details for PubMedID 37752162

  • USE-Evaluator: Performance metrics for medical image segmentation models supervised by uncertain, small or empty reference annotations in neuroimaging. Medical image analysis Ostmeier, S., Axelrod, B., Isensee, F., Bertels, J., Mlynash, M., Christensen, S., Lansberg, M. G., Albers, G. W., Sheth, R., Verhaaren, B. F., Mahammedi, A., Li, L. J., Zaharchuk, G., Heit, J. J. 2023; 90: 102927


    Performance metrics for medical image segmentation models are used to measure the agreement between the reference annotation and the predicted segmentation. Usually, overlap metrics, such as the Dice, are used as a metric to evaluate the performance of these models in order for results to be comparable. However, there is a mismatch between the distributions of cases and the difficulty level of segmentation tasks in public data sets compared to clinical practice. Common metrics used to assess performance fail to capture the impact of this mismatch, particularly when dealing with datasets in clinical settings that involve challenging segmentation tasks, pathologies with low signal, and reference annotations that are uncertain, small, or empty. Limitations of common metrics may result in ineffective machine learning research in designing and optimizing models. To effectively evaluate the clinical value of such models, it is essential to consider factors such as the uncertainty associated with reference annotations, the ability to accurately measure performance regardless of the size of the reference annotation volume, and the classification of cases where reference annotations are empty. We study how uncertain, small, and empty reference annotations influence the value of metrics on a stroke in-house data set regardless of the model. We examine metrics behavior on the predictions of a standard deep learning framework in order to identify suitable metrics in such a setting. We compare our results to the BRATS 2019 and Spinal Cord public data sets. We show how uncertain, small, or empty reference annotations require a rethinking of the evaluation. The evaluation code was released to encourage further analysis of this topic

    View details for DOI 10.1016/

    View details for PubMedID 37672900

  • Functional Outcome Prediction in Acute Ischemic Stroke Using a Fused Imaging and Clinical Deep Learning Model. Stroke Liu, Y., Yu, Y., Ouyang, J., Jiang, B., Yang, G., Ostmeier, S., Wintermark, M., Michel, P., Liebeskind, D. S., Lansberg, M. G., Albers, G. W., Zaharchuk, G. 2023


    Predicting long-term clinical outcome based on the early acute ischemic stroke information is valuable for prognostication, resource management, clinical trials, and patient expectations. Current methods require subjective decisions about which imaging features to assess and may require time-consuming postprocessing. This study's goal was to predict ordinal 90-day modified Rankin Scale (mRS) score in acute ischemic stroke patients by fusing a Deep Learning model of diffusion-weighted imaging images and clinical information from the acute period.A total of 640 acute ischemic stroke patients who underwent magnetic resonance imaging within 1 to 7 days poststroke and had 90-day mRS follow-up data were randomly divided into 70% (n=448) for model training, 15% (n=96) for validation, and 15% (n=96) for internal testing. Additionally, external testing on a cohort from Lausanne University Hospital (n=280) was performed to further evaluate model generalization. Accuracy for ordinal mRS, accuracy within ±1 mRS category, mean absolute prediction error, and determination of unfavorable outcome (mRS score >2) were evaluated for clinical only, imaging only, and 2 fused clinical-imaging models.The fused models demonstrated superior performance in predicting ordinal mRS score and unfavorable outcome in both internal and external test cohorts when compared with the clinical and imaging models. For the internal test cohort, the top fused model had the highest area under the curve of 0.92 for unfavorable outcome prediction and the lowest mean absolute error (0.96 [95% CI, 0.77-1.16]), with the highest proportion of mRS score predictions within ±1 category (79% [95% CI, 71%-88%]). On the external Lausanne University Hospital cohort, the best fused model had an area under the curve of 0.90 for unfavorable outcome prediction and outperformed other models with an mean absolute error of 0.90 (95% CI, 0.79-1.01), and the highest percentage of mRS score predictions within ±1 category (83% [95% CI, 78%-87%]).A Deep Learning-based imaging model fused with clinical variables can be used to predict 90-day stroke outcome with reduced subjectivity and user burden.

    View details for DOI 10.1161/STROKEAHA.123.044072

    View details for PubMedID 37485663

  • Prediction of delayed cerebral ischemia after cerebral aneurysm rupture using explainable machine learning approach. Interventional neuroradiology : journal of peritherapeutic neuroradiology, surgical procedures and related neurosciences Taghavi, R. M., Zhu, G., Wintermark, M., Kuraitis, G. M., Sussman, E. S., Pulli, B., Biniam, B., Ostmeier, S., Steinberg, G. K., Heit, J. J. 2023: 15910199231170411


    Aneurysmal subarachnoid hemorrhage results in significant mortality and disability, which is worsened by the development of delayed cerebral ischemia. Tests to identify patients with delayed cerebral ischemia prospectively are of high interest.We created a machine learning system based on clinical variables to predict delayed cerebral ischemia in aneurysmal subarachnoid hemorrhage patients. We also determined which variables have the most impact on delayed cerebral ischemia prediction using SHapley Additive exPlanations method.500 aneurysmal subarachnoid hemorrhage patients were identified and 369 met inclusion criteria: 70 patients developed delayed cerebral ischemia (delayed cerebral ischemia+) and 299 did not (delayed cerebral ischemia-). The algorithm was trained based upon age, sex, hypertension (HTN), diabetes, hyperlipidemia, congestive heart failure, coronary artery disease, smoking history, family history of aneurysm, Fisher Grade, Hunt and Hess score, and external ventricular drain placement. Random Forest was selected for this project, and prediction outcome of the algorithm was delayed cerebral ischemia+. SHapley Additive exPlanations was used to visualize each feature's contribution to the model prediction.The Random Forest machine learning algorithm predicted delayed cerebral ischemia: accuracy 80.65% (95% CI: 72.62-88.68), area under the curve 0.780 (95% CI: 0.696-0.864), sensitivity 12.5% (95% CI: -3.7 to 28.7), specificity 94.81% (95% CI: 89.85-99.77), PPV 33.3% (95% CI: -4.39 to 71.05), and NPV 84.1% (95% CI: 76.38-91.82). SHapley Additive exPlanations value demonstrated Age, external ventricular drain placement, Fisher Grade, and Hunt and Hess score, and HTN had the highest predictive values for delayed cerebral ischemia. Lower age, absence of hypertension, higher Hunt and Hess score, higher Fisher Grade, and external ventricular drain placement increased risk of delayed cerebral ischemia.Machine learning models based upon clinical variables predict delayed cerebral ischemia with high specificity and good accuracy.

    View details for DOI 10.1177/15910199231170411

    View details for PubMedID 37070145

  • Iodine concentration of healthy lymph nodes of the neck, axilla and groin in Dual Energy Computed Tomography Ostmeier, S. Technical University Munich. 2022
  • Iodine concentration of healthy lymph nodes of neck, axilla, and groin in dual-energy computed tomography ACTA RADIOLOGICA Sauter, A. P., Ostmeier, S., Nadjiri, J., Deniffel, D., Rummeny, E. J., Pfeiffer, D. 2020; 61 (11): 1505-1511


    Lymph nodes (LN) are examined in every computed tomography (CT) scan. Until now, an evaluation is only possible based on morphological criteria. With dual-energy CT (DECT) systems, iodine concentration (IC) can be measured which could conduct in an improved diagnostic evaluation of LNs.To define standard values for IC of cervical, axillary, and inguinal LNs in DECT.Imaging data of 297 patients who received a DECT scan of the neck, thorax, abdomen-pelvis, or a combination of those in a portal-venous phase were retrospectively collected from the institutional PACS. No present history of malignancy, inflammation, or trauma in the examined region was present. For each examined region, the data of 99 patients were used. The IC of the three largest LNs, the main artery, the main vein, and a local muscle of the examined area was measured, respectively.Normalization of the IC of LNs to the artery, vein, muscle, or a combination of those did not lead to a decreased value-range. The smallest range and confidence interval (CI) of IC was found when using absolute values of IC for each region. Hereby, mean values (95% CI) for IC of LN were found: 2.09 mg/mL (2.00-2.18 mg/mL) for neck, 1.24 mg/mL (1.16-1.33 mg/mL) for axilla, and 1.11 mg/mL (1.04-1.17 mg/mL) for groin.The present study suggests standard values for IC of LNs in dual-layer CT could be used to differentiate between healthy and pathological lymph nodes, considering the used contrast injection protocol.

    View details for DOI 10.1177/0284185120903448

    View details for Web of Science ID 000514045200001

    View details for PubMedID 32064891