Dr Cyril Zakka completed his medical training at the American University of Beirut Medical Center (AUBMC), where he founded and served as the Director of the Artificial Intelligence in Medicine (AIM) program. As a postdoctoral fellow in the Department of Cardiothoracic Surgery at Stanford University, his work has focused on building, training, and evaluating multimodal large language models (MLLM) for clinical medicine, as well as foundation models for surgery and cardiac imaging. His works have been featured in highly-regarded journals and conferences such as NEJM AI, ML4H and AATS. Prior to his medical training, Dr. Zakka was heavily involved in the software development scene, and continues his work building applications for the Apple ecosystem and contributing to open-source software.

Honors & Awards

  • Member, Gold Humanism Honor Society (GHHS) (2022-Present)

Professional Education

  • Bachelor of Science, Boston College (2018)
  • Doctor of Medicine, American University Of Beirut (2022)
  • BS, Boston College, Biology (2018)
  • MD, American University of Beirut Medical Center, Medicine (2022)

Stanford Advisors


  • Jad Farid Assaf, Shady Awwad, Cyril Zakka. "United States Patent 63/395,935 AUTOMATED DETECTION OF KERATOREFRACTIVE SURGERIES ON ANTERIOR SEGMENT OPTICAL COHERENCE TOMOGRAPHY (AS-OCT) SCANS AND METHODS OF USE", American University of Beirut, Aug 8, 2022

Research Interests

  • Data Sciences
  • Technology and Education

Current Research and Scholarly Interests

Cyril Zakka's research is primarily focused on building unsupervised deep learning representation learners for use in a variety of medical tasks, such as medical imaging (e.g. cardiac MRIs and echocardiograms), and autonomous robotic surgical systems. He is particularly interested in developing algorithms that augment operating physicians' capabilities in order to improve on patient outcomes post-operatively.

All Publications

  • Almanac - Retrieval-Augmented Language Models for Clinical Medicine. NEJM AI Zakka, C., Shad, R., Chaurasia, A., Dalal, A. R., Kim, J. L., Moor, M., Fong, R., Phillips, C., Alexander, K., Ashley, E., Boyd, J., Boyd, K., Hirsch, K., Langlotz, C., Lee, R., Melia, J., Nelson, J., Sallam, K., Tullis, S., Vogelsong, M. A., Cunningham, J. P., Hiesinger, W. 2024; 1 (2)


    Large language models (LLMs) have recently shown impressive zero-shot capabilities, whereby they can use auxiliary data, without the availability of task-specific training examples, to complete a variety of natural language tasks, such as summarization, dialogue generation, and question answering. However, despite many promising applications of LLMs in clinical medicine, adoption of these models has been limited by their tendency to generate incorrect and sometimes even harmful statements.We tasked a panel of eight board-certified clinicians and two health care practitioners with evaluating Almanac, an LLM framework augmented with retrieval capabilities from curated medical resources for medical guideline and treatment recommendations. The panel compared responses from Almanac and standard LLMs (ChatGPT-4, Bing, and Bard) versus a novel data set of 314 clinical questions spanning nine medical specialties.Almanac showed a significant improvement in performance compared with the standard LLMs across axes of factuality, completeness, user preference, and adversarial safety.Our results show the potential for LLMs with access to domain-specific corpora to be effective in clinical decision-making. The findings also underscore the importance of carefully testing LLMs before deployment to mitigate their shortcomings. (Funded by the National Institutes of Health, National Heart, Lung, and Blood Institute.).

    View details for DOI 10.1056/aioa2300068

    View details for PubMedID 38343631

    View details for PubMedCentralID PMC10857783

  • Machine Learning Multicenter Risk Model to Predict Right Ventricular Failure After Mechanical Circulatory Support: The STOP-RVF Score. JAMA cardiology Taleb, I., Kyriakopoulos, C. P., Fong, R., Ijaz, N., Demertzis, Z., Sideris, K., Wever-Pinzon, O., Koliopoulou, A. G., Bonios, M. J., Shad, R., Peruri, A., Hanff, T. C., Dranow, E., Giannouchos, T. V., Krauspe, E., Zakka, C., Tang, D. G., Nemeh, H. W., Stehlik, J., Fang, J. C., Selzman, C. H., Alharethi, R., Caine, W. T., Cowger, J. A., Hiesinger, W., Shah, P., Drakos, S. G. 2024


    The existing models predicting right ventricular failure (RVF) after durable left ventricular assist device (LVAD) support might be limited, partly due to lack of external validation, marginal predictive power, and absence of intraoperative characteristics.To derive and validate a risk model to predict RVF after LVAD implantation.This was a hybrid prospective-retrospective multicenter cohort study conducted from April 2008 to July 2019 of patients with advanced heart failure (HF) requiring continuous-flow LVAD. The derivation cohort included patients enrolled at 5 institutions. The external validation cohort included patients enrolled at a sixth institution within the same period. Study data were analyzed October 2022 to August 2023.Study participants underwent chronic continuous-flow LVAD support.The primary outcome was RVF incidence, defined as the need for RV assist device or intravenous inotropes for greater than 14 days. Bootstrap imputation and adaptive least absolute shrinkage and selection operator variable selection techniques were used to derive a predictive model. An RVF risk calculator (STOP-RVF) was then developed and subsequently externally validated, which can provide personalized quantification of the risk for LVAD candidates. Its predictive accuracy was compared with previously published RVF scores.The derivation cohort included 798 patients (mean [SE] age, 56.1 [13.2] years; 668 male [83.7%]). The external validation cohort included 327 patients. RVF developed in 193 of 798 patients (24.2%) in the derivation cohort and 107 of 327 patients (32.7%) in the validation cohort. Preimplant variables associated with postoperative RVF included nonischemic cardiomyopathy, intra-aortic balloon pump, microaxial percutaneous left ventricular assist device/venoarterial extracorporeal membrane oxygenation, LVAD configuration, Interagency Registry for Mechanically Assisted Circulatory Support profiles 1 to 2, right atrial/pulmonary capillary wedge pressure ratio, use of angiotensin-converting enzyme inhibitors, platelet count, and serum sodium, albumin, and creatinine levels. Inclusion of intraoperative characteristics did not improve model performance. The calculator achieved a C statistic of 0.75 (95% CI, 0.71-0.79) in the derivation cohort and 0.73 (95% CI, 0.67-0.80) in the validation cohort. Cumulative survival was higher in patients composing the low-risk group (estimated <20% RVF risk) compared with those in the higher-risk groups. The STOP-RVF risk calculator exhibited a significantly better performance than commonly used risk scores proposed by Kormos et al (C statistic, 0.58; 95% CI, 0.53-0.63) and Drakos et al (C statistic, 0.62; 95% CI, 0.57-0.67).Implementing routine clinical data, this multicenter cohort study derived and validated the STOP-RVF calculator as a personalized risk assessment tool for the prediction of RVF and RVF-associated all-cause mortality.

    View details for DOI 10.1001/jamacardio.2023.5372

    View details for PubMedID 38294795

  • Deep Learning-Based Estimation of Implantable Collamer Lens Vault Using Optical Coherence Tomography. American journal of ophthalmology Assaf, J. F., Reinstein, D. Z., Zakka, C., Arbelaez, J. G., Boufadel, P., Choufani, M., Archer, T., Ibrahim, P., Awwad, S. T. 2023; 253: 29-36


    PURPOSE: To develop and validate a deep learning neural network for automated measurement of implantable collamer lens (ICL) vault using anterior segment optical coherence tomography (AS-OCT).DESIGN: Cross-sectional retrospective study.METHODS: A total of 2647 AS-OCT scans were used from 139 eyes of 82 subjects who underwent ICL surgery in 3 different centers. Using transfer learning, a deep learning network was trained and validated for estimating the ICL vault on OCT. A trained operator separately reviewed all OCT scans and measured the central vault using a built-in caliper tool. The model was then separately tested on 191 scans. A Bland-Altman plot was constructed and the mean absolute percentage error (MAPE), mean absolute error (MAE), root mean squared error (RMSE), Pearson correlation coefficient (r), and determination coefficient (R2) were calculated to evaluate the strength and validity of the model.RESULTS: On the test set, the model achieved a MAPE of 3.42%, an MAE of 15.82 m, a RMSE of 18.85 m, a Pearson correlation coefficient r of +0.98 (P < .00001), and a coefficient of determination R2 of +0.96. There was no significant difference between the vaults of the test set labeled by the technician vs those estimated by the model: 478 ± 95 m vs 475 ± 97 m, respectively, P=.064).CONCLUSIONS: Using transfer learning, our deep learning neural network was able to accurately compute the ICL vault from AS-OCT scans, overcoming the limitations of an imbalanced data set and limited training data. Such an algorithm can assist the postoperative assessment in ICL surgery.

    View details for DOI 10.1016/j.ajo.2023.04.008

    View details for PubMedID 37142173

  • Almanac: Retrieval-Augmented Language Models for Clinical Medicine. Research square Zakka, C., Chaurasia, A., Shad, R., Dalal, A. R., Kim, J. L., Moor, M., Alexander, K., Ashley, E., Boyd, J., Boyd, K., Hirsch, K., Langlotz, C., Nelson, J., Hiesinger, W. 2023


    Large-language models have recently demonstrated impressive zero-shot capabilities in a variety of natural language tasks such as summarization, dialogue generation, and question-answering. Despite many promising applications in clinical medicine, adoption of these models in real-world settings has been largely limited by their tendency to generate incorrect and sometimes even toxic statements. In this study, we develop Almanac, a large language model framework augmented with retrieval capabilities for medical guideline and treatment recommendations. Performance on a novel dataset of clinical scenarios (n= 130) evaluated by a panel of 5 board-certified and resident physicians demonstrates significant increases in factuality (mean of 18% at p-value < 0.05) across all specialties, with improvements in completeness and safety. Our results demonstrate the potential for large language models to be effective tools in the clinical decision-making process, while also emphasizing the importance of careful testing and deployment to mitigate their shortcomings.

    View details for DOI 10.21203/

    View details for PubMedID 37205549

    View details for PubMedCentralID PMC10187428