Bio


I am a Postdoctoral Scholar working with Prof. Akshay Chaudhari and Prof. David Larson in the Department of Radiology at Stanford focusing on evaluating the robustness of large-scale AI models and identifying early disease biomarkers.

Until August 2023, I was a Postdoctoral Scholar in the Computational Neuroimage Science Laboratory (CNS Lab) with Prof. Kilian M. Pohl working on multi-modal machine learning models that can improve the understanding, diagnosis, and treatment of neuropsychiatric disorders.

Previously I completed my PhD at the Chair for Computer Aided Medical Procedures at the Technical University of Munich under the supervision of Prof. Nassir Navab and my dissertation was titled "Learning Robust Representations for Medical Diagnosis". I am passionate about designing trustworthy deep learning methods for challenging applications.

Honors & Awards


  • Best Paper Award, PRedictive Intelligence In MEdicine - PRIME - MICCAI (September 2022)
  • Best Paper Award, Uncertainty for Safe Utilization of Machine Learning in Medical Imaging - UNSURE - MICCAI (September 2021)
  • Graduate Student Travel Award, Medical Image Computing and Computer Assisted Interventions (MICCAI) (October 2019)
  • Best Poster Award, International Conference on Information Processing in Medical Imaging (IPMI) (June 2019)

Boards, Advisory Committees, Professional Organizations


  • Public Relations Officer, MICCAI Student Board (2017 - 2020)

Professional Education


  • Doctor of Philosophy, Technische Universitat Munchen (2021)
  • Master of Science, Technische Universitat Munchen (2017)
  • Bachelor of Science, Aristoteleio College Thessaloniki (2015)
  • PhD, Technical University of Munich, Learning Robust Representations for Medical Diagnosis (2021)
  • M.Sc., Technical University of Munich, Informatics (2017)
  • B.Sc., Aristotle University of Thessaloniki, Informatics (2015)

Stanford Advisors


Current Research and Scholarly Interests


My research focuses on utilizing machine learning models to enhance the understanding, diagnosis, and treatment of clinical disorders. I am interested in multi-modal learning, combining imaging data like MRI and CT scans with non-imaging data such as electronic health records, creating more holistic and accurate diagnostic models. I am also interested in the robustness of deep neural networks under domain shifts, investigating how models perform when faced with changes in input data distributions.
Finally, I am interested in early biomarker identification using AI model interpretability, to enable the early detection and targeted treatment of chronic disorders.

All Publications


  • Spectral Graph Sample Weighting for Interpretable Sub-cohort Analysis in Predictive Models for Neuroimaging. PRedictive Intelligence in MEdicine. PRIME (Workshop) Paschali, M., Jiang, Y. H., Siegel, S., Gonzalez, C., Pohl, K. M., Chaudhari, A., Zhao, Q. 2025; 15155: 24-34

    Abstract

    Recent advancements in medicine have confirmed that brain disorders often comprise multiple subtypes of mechanisms, developmental trajectories, or severity levels. Such heterogeneity is often associated with demographic aspects (e.g., sex) or disease-related contributors (e.g., genetics). Thus, the predictive power of machine learning models used for symptom prediction varies across subjects based on such factors. To model this heterogeneity, one can assign each training sample a factor-dependent weight, which modulates the subject's contribution to the overall objective loss function. To this end, we propose to model the subject weights as a linear combination of the eigenbases of a spectral population graph that captures the similarity of factors across subjects. In doing so, the learned weights smoothly vary across the graph, highlighting sub-cohorts with high and low predictability. Our proposed sample weighting scheme is evaluated on two tasks. First, we predict initiation of heavy alcohol drinking in young adulthood from imaging and neuropsychological measures from the National Consortium on Alcohol and NeuroDevelopment in Adolescence (NCANDA). Next, we detect Dementia vs. Mild Cognitive Impairment (MCI) using imaging and demographic measurements in subjects from the Alzheimer's Disease Neuroimaging Initiative (ADNI). Compared to existing sample weighting schemes, our sample weights improve interpretability and highlight sub-cohorts with distinct characteristics and varying model accuracy.

    View details for DOI 10.1007/978-3-031-74561-4_3

    View details for PubMedID 39525051

    View details for PubMedCentralID PMC11549025

  • Multi-domain predictors of grip strength differentiate individuals with and without alcohol use disorder. Addiction biology Paschali, M., Zhao, Q., Sassoon, S. A., Pfefferbaum, A., Sullivan, E. V., Pohl, K. M. 2024; 29 (11): e70007

    Abstract

    Grip strength is considered one of the simplest and reliable indices of general health. Although motor ability and strength are commonly affected in people with alcohol use disorder (AUD), factors predictive of grip strength decline in AUD have not been investigated. Here, we employed a data-driven analysis predicting grip strength from measurements in 53 controls and 110 AUD participants, 53 of whom were comorbid with HIV infection. Controls and AUD were matched on sex, age, and body mass index. Measurements included commonly available metrics of brain structure, neuropsychological functioning, behavioural status, haematological and health status, and demographics. Based on 5-fold stratified cross-validation, a machine learning approach predicted grip strength separately for each cohort. The strongest (top 10%) predictors of grip were then tested against grip strength with correlational analysis. Leading grip strength predictors for both cohorts were cerebellar volume and mean corpuscular haemoglobin concentration. Predictors specific to controls were Backwards Digit Span, precentral gyrus volume, diastolic blood pressure, and mean platelet volume, which together significantly predicted grip strength (R2 = 0.255, p = 0.001). Unique predictors for AUD were comorbidity for HIV infection, social functioning, insular volume, and platelet count, which together significantly predicted grip strength (R2 = 0.162, p = 0.002). These cohort-specific predictors were doubly dissociated. Salient predictors of grip strength differed by diagnosis with only modest overlap. The constellation of cohort-specific predictive measurements of compromised grip strength provides insight into brain, behavioural, and physiological factors that may signal subtly affected yet treatable processes of physical decline and frailty.

    View details for DOI 10.1111/adb.70007

    View details for PubMedID 39532141

  • Merlin: A Vision Language Foundation Model for 3D Computed Tomography. Research square Blankemeier, L., Cohen, J. P., Kumar, A., Veen, D. V., Gardezi, S., Paschali, M., Chen, Z., Delbrouck, J. B., Reis, E., Truyts, C., Bluethgen, C., Jensen, M., Ostmeier, S., Varma, M., Valanarasu, J., Fang, Z., Huo, Z., Nabulsi, Z., Ardila, D., Weng, W. H., Junior, E. A., Ahuja, N., Fries, J., Shah, N., Johnston, A., Boutin, R., Wentland, A., Langlotz, C., Hom, J., Gatidis, S., Chaudhari, A. 2024

    Abstract

    Over 85 million computed tomography (CT) scans are performed annually in the US, of which approximately one quarter focus on the abdomen. Given the current shortage of both general and specialized radiologists, there is a large impetus to use artificial intelligence to alleviate the burden of interpreting these complex imaging studies while simultaneously using the images to extract novel physiological insights. Prior state-of-the-art approaches for automated medical image interpretation leverage vision language models (VLMs) that utilize both the image and the corresponding textual radiology reports. However, current medical VLMs are generally limited to 2D images and short reports. To overcome these shortcomings for abdominal CT interpretation, we introduce Merlin - a 3D VLM that leverages both structured electronic health records (EHR) and unstructured radiology reports for pretraining without requiring additional manual annotations. We train Merlin using a high-quality clinical dataset of paired CT scans (6+ million images from 15,331 CTs), EHR diagnosis codes (1.8+ million codes), and radiology reports (6+ million tokens) for training. We comprehensively evaluate Merlin on 6 task types and 752 individual tasks. The non-adapted (off-the-shelf) tasks include zero-shot findings classification (31 findings), phenotype classification (692 phenotypes), and zero-shot cross-modal retrieval (image to findings and image to impressions), while model adapted tasks include 5-year chronic disease prediction (6 diseases), radiology report generation, and 3D semantic segmentation (20 organs). We perform internal validation on a test set of 5,137 CTs, and external validation on 7,000 clinical CTs and on two public CT datasets (VerSe, TotalSegmentator). Beyond these clinically-relevant evaluations, we assess the efficacy of various network architectures and training strategies to depict that Merlin has favorable performance to existing task-specific baselines. We derive data scaling laws to empirically assess training data needs for requisite downstream task performance. Furthermore, unlike conventional VLMs that require hundreds of GPUs for training, we perform all training on a single GPU. This computationally efficient design can help democratize foundation model training, especially for health systems with compute constraints. We plan to release our trained models, code, and dataset, pending manual removal of all protected health information.

    View details for DOI 10.21203/rs.3.rs-4546309/v1

    View details for PubMedID 38978576

    View details for PubMedCentralID PMC11230513

  • Identifying high school risk factors that forecast heavy drinking onset in understudied young adults. Developmental cognitive neuroscience Zhao, Q., Paschali, M., Dehoney, J., Baker, F. C., de Zambotti, M., De Bellis, M. D., Goldston, D. B., Nooner, K. B., Clark, D. B., Luna, B., Nagel, B. J., Brown, S. A., Tapert, S. F., Eberson, S., Thompson, W. K., Pfefferbaum, A., Sullivan, E. V., Pohl, K. M. 2024; 68: 101413

    Abstract

    Heavy alcohol drinking is a major, preventable problem that adversely impacts the physical and mental health of US young adults. Studies seeking drinking risk factors typically focus on young adults who enrolled in 4-year residential college programs (4YCP) even though most high school graduates join the workforce, military, or community colleges. We examined 106 of these understudied young adults (USYA) and 453 4YCPs from the National Consortium on Alcohol and NeuroDevelopment in Adolescence (NCANDA) by longitudinally following their drinking patterns for 8 years from adolescence to young adulthood. All participants were no-to-low drinkers during high school. Whereas 4YCP individuals were more likely to initiate heavy drinking during college years, USYA participants did so later. Using mental health metrics recorded during high school, machine learning forecasted individual-level risk for initiating heavy drinking after leaving high school. The risk factors differed between demographically matched USYA and 4YCP individuals and between sexes. Predictors for USYA drinkers were sexual abuse, physical abuse for girls, and extraversion for boys, whereas 4YCP drinkers were predicted by the ability to recognize facial emotion and, for boys, greater openness. Thus, alcohol prevention programs need to give special consideration to those joining the workforce, military, or community colleges, who make up the majority of this age group.

    View details for DOI 10.1016/j.dcn.2024.101413

    View details for PubMedID 38943839

  • Multimodal graph attention network for COVID-19 outcome prediction. Scientific reports Keicher, M., Burwinkel, H., Bani-Harouni, D., Paschali, M., Czempiel, T., Burian, E., Makowski, M. R., Braren, R., Navab, N., Wendler, T. 2023; 13 (1): 19539

    Abstract

    When dealing with a newly emerging disease such as COVID-19, the impact of patient- and disease-specific factors (e.g., body weight or known co-morbidities) on the immediate course of the disease is largely unknown. An accurate prediction of the most likely individual disease progression can improve the planning of limited resources and finding the optimal treatment for patients. In the case of COVID-19, the need for intensive care unit (ICU) admission of pneumonia patients can often only be determined on short notice by acute indicators such as vital signs (e.g., breathing rate, blood oxygen levels), whereas statistical analysis and decision support systems that integrate all of the available data could enable an earlier prognosis. To this end, we propose a holistic, multimodal graph-based approach combining imaging and non-imaging information. Specifically, we introduce a multimodal similarity metric to build a population graph that shows a clustering of patients. For each patient in the graph, we extract radiomic features from a segmentation network that also serves as a latent image feature encoder. Together with clinical patient data like vital signs, demographics, and lab results, these modalities are combined into a multimodal representation of each patient. This feature extraction is trained end-to-end with an image-based Graph Attention Network to process the population graph and predict the COVID-19 patient outcomes: admission to ICU, need for ventilation, and mortality. To combine multiple modalities, radiomic features are extracted from chest CTs using a segmentation neural network. Results on a dataset collected in Klinikum rechts der Isar in Munich, Germany and the publicly available iCTCF dataset show that our approach outperforms single modality and non-graph baselines. Moreover, our clustering and graph attention increases understanding of the patient relationships within the population graph and provides insight into the network's decision-making process.

    View details for DOI 10.1038/s41598-023-46625-8

    View details for PubMedID 37945590

    View details for PubMedCentralID 7869614

  • Investigating pulse-echo sound speed estimation in breast ultrasound with deep learning. Ultrasonics Simson, W. A., Paschali, M., Sideri-Lampretsa, V., Navab, N., Dahl, J. J. 2023; 137: 107179

    Abstract

    Ultrasound is an adjunct tool to mammography that can quickly and safely aid physicians in diagnosing breast abnormalities. Clinical ultrasound often assumes a constant sound speed to form diagnostic B-mode images. However, the components of breast tissue, such as glandular tissue, fat, and lesions, differ in sound speed. Given a constant sound speed assumption, these differences can degrade the quality of reconstructed images via phase aberration. Sound speed images can be a powerful tool for improving image quality and identifying diseases if properly estimated. To this end, we propose a supervised deep-learning approach for sound speed estimation from analytic ultrasound signals. We develop a large-scale simulated ultrasound dataset that generates representative breast tissue samples by modeling breast gland, skin, and lesions with varying echogenicity and sound speed. We adopt a fully convolutional neural network architecture trained on a simulated dataset to produce an estimated sound speed map. The simulated tissue is interrogated with a plane wave transmit sequence, and the complex-value reconstructed images are used as input for the convolutional network. The network is trained on the sound speed distribution map of the simulated data, and the trained model can estimate sound speed given reconstructed pulse-echo signals. We further incorporate thermal noise augmentation during training to enhance model robustness to artifacts found in real ultrasound data. To highlight the ability of our model to provide accurate sound speed estimations, we evaluate it on simulated, phantom, and in-vivo breast ultrasound data.

    View details for DOI 10.1016/j.ultras.2023.107179

    View details for PubMedID 37939413

  • Interactive Segmentation for COVID-19 Infection Quantification on Longitudinal CT Scans IEEE ACCESS Foo, M., Kim, S., Paschali, M., Goli, L., Burian, E., Makowski, M., Braren, R., Navab, N., Wendler, T. 2023; 11: 77596-77607
  • Self-supervised Learning for Physiologically-Based Pharmacokinetic Modeling in Dynamic PET De Benetti, F., Simson, W., Paschali, M., Sari, H., Rominger, A., Shi, K., Navab, N., Wendler, T., Madabhushi, A., Greenspan, H., Mousavi, P., Salcudean, S., Duncan, J., Syeda-Mahmood, T., Taylor, R. SPRINGER INTERNATIONAL PUBLISHING AG. 2023: 290-299
  • Bridging the Gap between Deep Learning and Hypothesis-Driven Analysis via Permutation Testing. PRedictive Intelligence in MEdicine. PRIME (Workshop) Paschali, M., Zhao, Q., Adeli, E., Pohl, K. M. 2022; 13564: 13-23

    Abstract

    A fundamental approach in neuroscience research is to test hypotheses based on neuropsychological and behavioral measures, i.e., whether certain factors (e.g., related to life events) are associated with an outcome (e.g., depression). In recent years, deep learning has become a potential alternative approach for conducting such analyses by predicting an outcome from a collection of factors and identifying the most "informative" ones driving the prediction. However, this approach has had limited impact as its findings are not linked to statistical significance of factors supporting hypotheses. In this article, we proposed a flexible and scalable approach based on the concept of permutation testing that integrates hypothesis testing into the data-driven deep learning analysis. We apply our approach to the yearly self-reported assessments of 621 adolescent participants of the National Consortium of Alcohol and Neurodevelopment in Adolescence (NCANDA) to predict negative valence, a symptom of major depressive disorder according to the NIMH Research Domain Criteria (RDoC). Our method successfully identifies categories of risk factors that further explain the symptom.

    View details for DOI 10.1007/978-3-031-16919-9_2

    View details for PubMedID 36342897

    View details for PubMedCentralID PMC9632755

  • Detecting negative valence symptoms in adolescents based on longitudinal self-reports and behavioral assessments. Journal of affective disorders Paschali, M., Kiss, O., Zhao, Q., Adeli, E., Podhajsky, S., Muller-Oehring, E. M., Gotlib, I. H., Pohl, K. M., Baker, F. C. 2022

    Abstract

    BACKGROUND: Given the high prevalence of depressive symptoms reported by adolescents and associated risk of experiencing psychiatric disorders as adults, differentiating the trajectories of the symptoms related to negative valence at an individual level could be crucial in gaining a better understanding of their effects later in life.METHODS: A longitudinal deep learning framework is presented, identifying self-reported and behavioral measurements that detect the depressive symptoms associated with the Negative Valence System domain of the NIMH Research Domain Criteria (RDoC).RESULTS: Applied to the annual records of 621 participants (age range: 12 to 17 years) of the National Consortium on Alcohol and Neurodevelopment in Adolescence (NCANDA), the deep learning framework identifies predictors of negative valence symptoms, which include lower extraversion, poorer sleep quality, impaired executive control function and factors related to substance use.LIMITATIONS: The results rely mainly on self-reported measures and do not provide information about the underlying neural correlates. Also, a larger sample is required to understand the role of sex and other demographics related to the risk of experiencing symptoms of negative valence.CONCLUSIONS: These results provide new information about predictors of negative valence symptoms in individuals during adolescence that could be critical in understanding the development of depression and identifying targets for intervention. Importantly, findings can inform preventive and treatment approaches for depression in adolescents, focusing on a unique predictor set of modifiable modulators to include factors such as sleep hygiene training, cognitive-emotional therapy enhancing coping and controllability experience and/or substance use interventions.

    View details for DOI 10.1016/j.jad.2022.06.002

    View details for PubMedID 35688394

  • OperA: Attention-Regularized Transformers for Surgical Phase Recognition Czempiel, T., Paschali, M., Ostler, D., Kim, S., Busam, B., Navab, N., DeBruijne, M., Cattin, P. C., Cotin, S., Padoy, N., Speidel, S., Zheng, Y., Essert, C. SPRINGER INTERNATIONAL PUBLISHING AG. 2021: 604-614
  • Rethinking Ultrasound Augmentation: A Physics-Inspired Approach Tirindelli, M., Eilers, C., Simson, W., Paschali, M., Azampour, M., Navab, N., DeBruijne, M., Cattin, P. C., Cotin, S., Padoy, N., Speidel, S., Zheng, Y., Essert, C. SPRINGER INTERNATIONAL PUBLISHING AG. 2021: 690-700
  • Longitudinal Quantitative Assessment of COVID-19 Infection Progression from Chest CTs Kim, S., Goli, L., Paschali, M., Khakzar, A., Keicher, M., Czempiel, T., Burian, E., Braren, R., Navab, N., Wendler, T., DeBruijne, M., Cattin, P. C., Cotin, S., Padoy, N., Speidel, S., Zheng, Y., Essert, C. SPRINGER INTERNATIONAL PUBLISHING AG. 2021: 273-282
  • Ultrasound-Guided Robotic Navigation with Deep Reinforcement Learning Hase, H., Azampour, M., Tirindelli, M., Paschali, M., Simson, W., Fatemizadeh, E., Navab, N., IEEE IEEE. 2020: 5534-5541
  • SIGNAL CLUSTERING WITH CLASS-INDEPENDENT SEGMENTATION Gasperini, S., Paschali, M., Hopke, C., Wittmann, D., Navab, N., IEEE IEEE. 2020: 3982-3986