Shaimaa is a graduate of the Ph.D. program, the Department of Electrical Engineering at Stanford and currently a postdoctoral researcher at the Gevaert lab at the Stanford Center for Biomedical Informatics Research (BMIR). Shaimaa is interested in developing multi-modal deep learning models using biomedical data with focus on genomic, radiology and histopathology data and applying these models to solve problems in cancer and other diseases. Prior to Stanford, she received her B.Sc. (Summa Cum Laude) from the American University in Cairo, where she studied Electronics Engineering and Computer Science. She obtained her MS degree in Electrical Engineering from Rensselaer Polytechnic Institute, working in the Cognitive and Immersive Systems lab, and advised by Professor Richard Radke.

Professional Education

  • Doctor of Philosophy, Stanford University, EE-PHD (2022)
  • PhD, Stanford University, Electrical Engineering (2021)

Lab Affiliations

All Publications

  • Identifying key multifunctional components shared by critical cancer and normal liver pathways via SparseGMM CELL REPORTS METHODS Bakr, S., Brennan, K., Mukherjee, P., Argemi, J., Hernaez, M., Gevaert, O. 2023; 3 (1): 100392


    Despite the abundance of multimodal data, suitable statistical models that can improve our understanding of diseases with genetic underpinnings are challenging to develop. Here, we present SparseGMM, a statistical approach for gene regulatory network discovery. SparseGMM uses latent variable modeling with sparsity constraints to learn Gaussian mixtures from multiomic data. By combining coexpression patterns with a Bayesian framework, SparseGMM quantitatively measures confidence in regulators and uncertainty in target gene assignment by computing gene entropy. We apply SparseGMM to liver cancer and normal liver tissue data and evaluate discovered gene modules in an independent single-cell RNA sequencing (scRNA-seq) dataset. SparseGMM identifies PROCR as a regulator of angiogenesis and PDCD1LG2 and HNF4A as regulators of immune response and blood coagulation in cancer. Furthermore, we show that more genes have significantly higher entropy in cancer compared with normal liver. Among high-entropy genes are key multifunctional components shared by critical pathways, including p53 and estrogen signaling.

    View details for DOI 10.1016/j.crmeth.2022.100392

    View details for Web of Science ID 000925842300001

    View details for PubMedID 36814838

    View details for PubMedCentralID PMC9939431

  • Quantitative imaging feature pipeline: a web-based tool for utilizing, sharing, and building image-processing pipelines. Journal of medical imaging (Bellingham, Wash.) Mattonen, S. A., Gude, D., Echegaray, S., Bakr, S., Rubin, D. L., Napel, S. 2020; 7 (4): 042803


    Quantitative image features that can be computed from medical images are proving to be valuable biomarkers of underlying cancer biology that can be used for assessing treatment response and predicting clinical outcomes. However, validation and eventual clinical implementation of these tools is challenging due to the absence of shared software algorithms, architectures, and the tools required for computing, comparing, evaluating, and disseminating predictive models. Similarly, researchers need to have programming expertise in order to complete these tasks. The quantitative image feature pipeline (QIFP) is an open-source, web-based, graphical user interface (GUI) of configurable quantitative image-processing pipelines for both planar (two-dimensional) and volumetric (three-dimensional) medical images. This allows researchers and clinicians a GUI-driven approach to process and analyze images, without having to write any software code. The QIFP allows users to upload a repository of linked imaging, segmentation, and clinical data or access publicly available datasets (e.g., The Cancer Imaging Archive) through direct links. Researchers have access to a library of file conversion, segmentation, quantitative image feature extraction, and machine learning algorithms. An interface is also provided to allow users to upload their own algorithms in Docker containers. The QIFP gives researchers the tools and infrastructure for the assessment and development of new imaging biomarkers and the ability to use them for single and multicenter clinical and virtual clinical trials.

    View details for DOI 10.1117/1.JMI.7.4.042803

    View details for PubMedID 32206688

  • Interreader Variability in Semantic Annotation of Microvascular Invasion in Hepatocellular Carcinoma on Contrast-enhanced Triphasic CT Images. Radiology. Imaging cancer Bakr, S., Gevaert, O., Patel, B., Kesselman, A., Shah, R., Napel, S., Kothary, N. 2020; 2 (3): e190062


    Purpose: To evaluate interreader agreement in annotating semantic features on preoperative CT images to predict microvascular invasion (MVI) in patients with hepatocellular carcinoma (HCC).Materials and Methods: Preoperative, contrast material-enhanced triphasic CT studies from 89 patients (median age, 64 years; age range, 36-85 years; 70 men) who underwent hepatic resection between 2008 and 2017 for a solitary HCC were reviewed. Three radiologists annotated CT images obtained during the arterial and portal venous phases, independently and in consensus, with features associated with MVI reported by other investigators. The assessed factors were the presence or absence of discrete internal arteries, hypoattenuating halo, tumor-liver difference, peritumoral enhancement, and tumor margin. Testing also included previously proposed MVI signatures: radiogenomic venous invasion (RVI) and two-trait predictor of venous invasion (TTPVI), using single-reader and consensus annotations. Cohen (two-reader) and Fleiss (three-reader) kappa and the bootstrap method were used to analyze interreader agreement and differences in model performance, respectively.Results: Of HCCs assessed, 32.6% (29 of 89) had MVI at histopathologic findings. Two-reader agreement, as assessed by pairwise Cohen kappa statistics, varied as a function of feature and imaging phase, ranging from 0.02 to 0.6; three-reader Fleiss kappa varied from -0.17 to 0.56. For RVI and TTPVI, the best single-reader performance had sensitivity and specificity of 52% and 77% and 67% and 74%, respectively. In consensus, the sensitivity and specificity for the RVI and TTPVI signatures were 59% and 67% and 70% and 62%, respectively.Conclusion: Interreader variability in semantic feature annotation remains a challenge and affects the reproducibility of predictive models for preoperative detection of MVI in HCC.Supplemental material is available for this article.© RSNA, 2020.

    View details for DOI 10.1148/rycan.2020190062

    View details for PubMedID 32550600

  • Imaging-AMARETTO: An Imaging Genomics Software Tool to Interrogate Multiomics Networks for Relevance to Radiography and Histopathology Imaging Biomarkers of Clinical Outcomes. JCO clinical cancer informatics Gevaert, O. n., Nabian, M. n., Bakr, S. n., Everaert, C. n., Shinde, J. n., Manukyan, A. n., Liefeld, T. n., Tabor, T. n., Xu, J. n., Lupberger, J. n., Haas, B. J., Baumert, T. F., Hernaez, M. n., Reich, M. n., Quintana, F. J., Uhlmann, E. J., Krichevsky, A. M., Mesirov, J. P., Carey, V. n., Pochet, N. n. 2020; 4: 421–35


    The availability of increasing volumes of multiomics, imaging, and clinical data in complex diseases such as cancer opens opportunities for the formulation and development of computational imaging genomics methods that can link multiomics, imaging, and clinical data.Here, we present the Imaging-AMARETTO algorithms and software tools to systematically interrogate regulatory networks derived from multiomics data within and across related patient studies for their relevance to radiography and histopathology imaging features predicting clinical outcomes.To demonstrate its utility, we applied Imaging-AMARETTO to integrate three patient studies of brain tumors, specifically, multiomics with radiography imaging data from The Cancer Genome Atlas (TCGA) glioblastoma multiforme (GBM) and low-grade glioma (LGG) cohorts and transcriptomics with histopathology imaging data from the Ivy Glioblastoma Atlas Project (IvyGAP) GBM cohort. Our results show that Imaging-AMARETTO recapitulates known key drivers of tumor-associated microglia and macrophage mechanisms, mediated by STAT3, AHR, and CCR2, and neurodevelopmental and stemness mechanisms, mediated by OLIG2. Imaging-AMARETTO provides interpretation of their underlying molecular mechanisms in light of imaging biomarkers of clinical outcomes and uncovers novel master drivers, THBS1 and MAP2, that establish relationships across these distinct mechanisms.Our network-based imaging genomics tools serve as hypothesis generators that facilitate the interrogation of known and uncovering of novel hypotheses for follow-up with experimental validation studies. We anticipate that our Imaging-AMARETTO imaging genomics tools will be useful to the community of biomedical researchers for applications to similar studies of cancer and other complex diseases with available multiomics, imaging, and clinical data.

    View details for DOI 10.1200/CCI.19.00125

    View details for PubMedID 32383980

  • [18F] FDG Positron Emission Tomography (PET) Tumor and Penumbra Imaging Features Predict Recurrence in Non-Small Cell Lung Cancer. Tomography (Ann Arbor, Mich.) Mattonen, S. A., Davidzon, G. A., Bakr, S., Echegaray, S., Leung, A. N., Vasanawala, M., Horng, G., Napel, S., Nair, V. S. 2019; 5 (1): 145–53


    We identified computational imaging features on 18F-fluorodeoxyglucose positron emission tomography (PET) that predict recurrence/progression in non-small cell lung cancer (NSCLC). We retrospectively identified 291 patients with NSCLC from 2 prospectively acquired cohorts (training, n = 145; validation, n = 146). We contoured the metabolic tumor volume (MTV) on all pretreatment PET images and added a 3-dimensional penumbra region that extended outward 1 cm from the tumor surface. We generated 512 radiomics features, selected 435 features based on robustness to contour variations, and then applied randomized sparse regression (LASSO) to identify features that predicted time to recurrence in the training cohort. We built Cox proportional hazards models in the training cohort and independently evaluated the models in the validation cohort. Two features including stage and a MTV plus penumbra texture feature were selected by LASSO. Both features were significant univariate predictors, with stage being the best predictor (hazard ratio [HR] = 2.15 [95% confidence interval (CI): 1.56-2.95], P < .001). However, adding the MTV plus penumbra texture feature to stage significantly improved prediction (P = .006). This multivariate model was a significant predictor of time to recurrence in the training cohort (concordance = 0.74 [95% CI: 0.66-0.81], P < .001) that was validated in a separate validation cohort (concordance = 0.74 [95% CI: 0.67-0.81], P < .001). A combined radiomics and clinical model improved NSCLC recurrence prediction. FDG PET radiomic features may be useful biomarkers for lung cancer prognosis and add clinical utility for risk stratification.

    View details for PubMedID 30854452

  • [18F] FDG Positron Emission Tomography (PET) Tumor and Penumbra Imaging Features Predict Recurrence in Non-Small Cell Lung Cancer TOMOGRAPHY Mattonen, S. A., Davidzon, G. A., Bakr, S., Echegaray, S., Leung, A. C., Vasanawala, M., Horng, G., Napel, S., Nair, V. S. 2019; 5 (1): 145–53
  • A radiogenomic dataset of non-small cell lung cancer. Scientific data Bakr, S., Gevaert, O., Echegaray, S., Ayers, K., Zhou, M., Shafiq, M., Zheng, H., Benson, J. A., Zhang, W., Leung, A. N., Kadoch, M., D Hoang, C., Shrager, J., Quon, A., Rubin, D. L., Plevritis, S. K., Napel, S. 2018; 5: 180202


    Medical image biomarkers of cancer promise improvements in patient care through advances in precision medicine. Compared to genomic biomarkers, image biomarkers provide the advantages of being non-invasive, and characterizing a heterogeneous tumor in its entirety, as opposed to limited tissue available via biopsy. We developed a unique radiogenomic dataset from a Non-Small Cell Lung Cancer (NSCLC) cohort of 211 subjects. The dataset comprises Computed Tomography (CT), Positron Emission Tomography (PET)/CT images, semantic annotations of the tumors as observed on the medical images using a controlled vocabulary, and segmentation maps of tumors in the CT scans. Imaging data are also paired with results of gene mutation analyses, gene expression microarrays and RNA sequencing data from samples of surgically excised tumor tissue, and clinical data, including survival outcomes. This dataset was created to facilitate the discovery of the underlying relationship between tumor molecular and medical image features, as well as the development and evaluation of prognostic medical image biomarkers.

    View details for PubMedID 30325352

  • A radiogenomic dataset of non-small cell lung cancer SCIENTIFIC DATA Bakr, S., Gevaert, O., Echegaray, S., Ayers, K., Zhou, M., Shafiq, M., Zheng, H., Benson, J., Zhang, W., Leung, A. C., Kadoch, M., Hoang, C. D., Shrager, J., Quon, A., Rubin, D. L., Plevritis, S. K., Napel, S. 2018; 5
  • Quantitative Image Feature Engine (QIFE): an Open-Source, Modular Engine for 3D Quantitative Feature Extraction from Volumetric Medical Images JOURNAL OF DIGITAL IMAGING Echegaray, S., Bakr, S., Rubin, D. L., Napel, S. 2018; 31 (4): 403–14
  • GFPT2-Expressing Cancer-Associated Fibroblasts Mediate Metabolic Reprogramming in Human Lung Adenocarcinoma CANCER RESEARCH Zhang, W., Bouchard, G., Yu, A., Shafiq, M., Jamali, M., Shrager, J. B., Ayers, K., Bakr, S., Gentles, A. J., Diehn, M., Quon, A., West, R. B., Nair, V., van de Rijn, M., Napel, S., Plevritis, S. K. 2018; 78 (13): 3445–57
  • GFPT2-expressing cancer-associated fibroblasts mediate metabolic reprogramming in human lung adenocarcinoma. Cancer research Zhang, W., Bouchard, G., Yu, A., Shafiq, M., Jamali, M., Shrager, J. B., Ayers, K., Bakr, S., Gentles, A. J., Diehn, M., Quon, A., West, R. B., Nair, V., van de Rijn, M., Napel, S., Plevritis, S. K. 2018


    Metabolic reprogramming of the tumor microenvironment is recognized as a cancer hallmark. To identify new molecular processes associated with tumor metabolism, we analyzed the transcriptome of bulk and flow-sorted human primary non-small cell lung cancer (NSCLC) together with 18FDG-positron emission tomography scans, which provide a clinical measure of glucose uptake. Tumors with higher glucose uptake were functionally enriched for molecular processes associated with invasion in adenocarcinoma (AD) and cell growth in squamous cell carcinoma (SCC). Next, we identified genes correlated to glucose uptake that were predominately overexpressed in a single cell-type comprising the tumor microenvironment. For SCC, most of these genes were expressed by malignant cells, whereas in AD they were predominately expressed by stromal cells, particularly cancer-associated fibroblasts (CAFs). Among these AD genes correlated to glucose uptake, we focused on Glutamine-Fructose-6-Phosphate Transaminase 2 (GFPT2), which codes for the Glutamine-Fructose-6-Phosphate Aminotransferase 2 (GFAT2), a rate-limiting enzyme of the hexosamine biosynthesis pathway (HBP), which is responsible for glycosylation. GFPT2 was predictive of glucose uptake independent of GLUT1, the primary glucose transporter, and was prognostically significant at both gene and protein level. We confirmed that normal fibroblasts transformed to CAF-like cells, following TGF-beta treatment, upregulated HBP genes, including GFPT2, with less change in genes driving glycolysis, pentose phosphate pathway and TCA cycle. Our work provides new evidence of histology-specific tumor-stromal properties associated with glucose uptake in NSCLC and identifies GFPT2 as a critical regulator of tumor metabolic reprogramming in AD.

    View details for PubMedID 29760045

  • Quantitative Image Feature Engine (QIFE): an Open-Source, Modular Engine for 3D Quantitative Feature Extraction from Volumetric Medical Images. Journal of digital imaging Echegaray, S., Bakr, S., Rubin, D. L., Napel, S. 2017


    The aim of this study was to develop an open-source, modular, locally run or server-based system for 3D radiomics feature computation that can be used on any computer system and included in existing workflows for understanding associations and building predictive models between image features and clinical data, such as survival. The QIFE exploits various levels of parallelization for use on multiprocessor systems. It consists of a managing framework and four stages: input, pre-processing, feature computation, and output. Each stage contains one or more swappable components, allowing run-time customization. We benchmarked the engine using various levels of parallelization on a cohort of CT scans presenting 108 lung tumors. Two versions of the QIFE have been released: (1) the open-source MATLAB code posted to Github, (2) a compiled version loaded in a Docker container, posted to DockerHub, which can be easily deployed on any computer. The QIFE processed 108 objects (tumors) in 2:12 (h/mm) using 1 core, and 1:04 (h/mm) hours using four cores with object-level parallelization. We developed the Quantitative Image Feature Engine (QIFE), an open-source feature-extraction framework that focuses on modularity, standards, parallelism, provenance, and integration. Researchers can easily integrate it with their existing segmentation and imaging workflows by creating input and output components that implement their existing interfaces. Computational efficiency can be improved by parallelizing execution at the cost of memory usage. Different parallelization levels provide different trade-offs, and the optimal setting will depend on the size and composition of the dataset to be processed.

    View details for PubMedID 28993897

  • Noninvasive radiomics signature based on quantitative analysis of computed tomography images as a surrogate for microvascular invasion in hepatocellular carcinoma: a pilot study. Journal of medical imaging (Bellingham, Wash.) Bakr, S. n., Echegaray, S. n., Shah, R. n., Kamaya, A. n., Louie, J. n., Napel, S. n., Kothary, N. n., Gevaert, O. n. 2017; 4 (4): 041303


    We explore noninvasive biomarkers of microvascular invasion (mVI) in patients with hepatocellular carcinoma (HCC) using quantitative and semantic image features extracted from contrast-enhanced, triphasic computed tomography (CT). Under institutional review board approval, we selected 28 treatment-naive HCC patients who underwent surgical resection. Four radiologists independently selected and delineated tumor margins on three axial CT images and extracted computational features capturing tumor shape, image intensities, and texture. We also computed two types of "delta features," defined as the absolute difference and the ratio computed from all pairs of imaging phases for each feature. 717 arterial, portal-venous, delayed single-phase, and delta-phase features were robust against interreader variability ([Formula: see text]). An enhanced cross-validation analysis showed that combining robust single-phase and delta features in the arterial and venous phases identified mVI (AUC [Formula: see text]). Compared to a previously reported semantic feature signature (AUC 0.47 to 0.58), these features in our cohort showed only slight to moderate agreement (Cohen's kappa range: 0.03 to 0.59). Though preliminary, quantitative analysis of image features in arterial and venous phases may be potential surrogate biomarkers for mVI in HCC. Further study in a larger cohort is warranted.

    View details for PubMedID 28840174