Honors & Awards


  • Wu Tsai Neuroscience Institute Postdoctoral Scholar Awards, Stanford University (2024-2026)

All Publications


  • Generating pregnant patient biological profiles by deconvoluting clinical records with electronic health record foundation models. Briefings in bioinformatics Seong, D., Mataraso, S., Espinosa, C., Berson, E., Reincke, S. M., Xue, L., Kashiwagi, C., Kim, Y., Shu, C. H., Chung, P., Ghanem, M., Xie, F., Wong, R. J., Angst, M. S., Gaudilliere, B., Shaw, G. M., Stevenson, D. K., Aghaeepour, N. 2024; 25 (6)

    Abstract

    Translational biology posits a strong bi-directional link between clinical phenotypes and a patient's biological profile. By leveraging this bi-directional link, we can efficiently deconvolute pre-existing clinical information into biological profiles. However, traditional computational tools are limited in their ability to resolve this link because of the relatively small sizes of paired clinical-biological datasets for training and the high dimensionality/sparsity of tabular clinical data. Here, we use state-of-the-art foundation models (FMs) for electronic health record (EHR) data to generate proteomics profiles of pregnant patients, thereby deconvoluting pre-existing clinical information into biological profiles without the cost and effort of running large-scale traditional omics studies. We show that FM-derived representations of a patient's EHR data coupled with a fully connected neural network prediction head can generate 206 blood protein expression levels. Interestingly, these proteins were enriched for developmental pathways, while proteins not able to be generated from EHR data were enriched for metabolic pathways. Finally, we show a proteomic signature of gestational diabetes that includes proteins with established and novel links to gestational diabetes. These results showcase the power of FM-derived EHR representations in efficiently generating biological states of pregnant patients. This capability can revolutionize disease understanding and therapeutic development, offering a cost-effective, time-efficient, and less invasive alternative to traditional methods of generating proteomics.

    View details for DOI 10.1093/bib/bbae574

    View details for PubMedID 39545787

    View details for PubMedCentralID PMC11565587

  • Unlocking human immune system complexity through AI. Nature methods Berson, E., Chung, P., Espinosa, C., Montine, T. J., Aghaeepour, N. 2024; 21 (8): 1400-1402

    View details for DOI 10.1038/s41592-024-02351-1

    View details for PubMedID 39122943

    View details for PubMedCentralID 9586871

  • Single-cell peripheral immunoprofiling of lewy body and Parkinson's disease in a multi-site cohort. Molecular neurodegeneration Phongpreecha, T., Mathi, K., Cholerton, B., Fox, E. J., Sigal, N., Espinosa, C., Reincke, M., Chung, P., Hwang, L. J., Gajera, C. R., Berson, E., Perna, A., Xie, F., Shu, C. H., Hazra, D., Channappa, D., Dunn, J. E., Kipp, L. B., Poston, K. L., Montine, K. S., Maecker, H. T., Aghaeepour, N., Montine, T. J. 2024; 19 (1): 59

    Abstract

    Multiple lines of evidence support peripheral organs in the initiation or progression of Lewy body disease (LBD), a spectrum of neurodegenerative diagnoses that include Parkinson's Disease (PD) without or with dementia (PDD) and dementia with Lewy bodies (DLB). However, the potential contribution of the peripheral immune response to LBD remains unclear. This study aims to characterize peripheral immune responses unique to participants with LBD at single-cell resolution to highlight potential biomarkers and increase mechanistic understanding of LBD pathogenesis in humans.In a case-control study, peripheral mononuclear cell (PBMC) samples from research participants were randomly sampled from multiple sites across the United States. The diagnosis groups comprise healthy controls (HC, n = 159), LBD (n = 110), Alzheimer's disease dementia (ADD, n = 97), other neurodegenerative disease controls (NDC, n = 19), and immune disease controls (IDC, n = 14). PBMCs were activated with three stimulants (LPS, IL-6, and IFNa) or remained at basal state, stained by 13 surface markers and 7 intracellular signal markers, and analyzed by flow cytometry, which generated 1,184 immune features after gating.The model classified LBD from HC with an AUROC of 0.87 ± 0.06 and AUPRC of 0.80 ± 0.06. Without retraining, the same model was able to distinguish LBD from ADD, NDC, and IDC. Model predictions were driven by pPLCγ2, p38, and pSTAT5 signals from specific cell populations under specific activation. The immune responses characteristic for LBD were not associated with other common medical conditions related to the risk of LBD or dementia, such as sleep disorders, hypertension, or diabetes.Quantification of PBMC immune response from multisite research participants yielded a unique pattern for LBD compared to HC, multiple related neurodegenerative diseases, and autoimmune diseases thereby highlighting potential biomarkers and mechanisms of disease.

    View details for DOI 10.1186/s13024-024-00748-2

    View details for PubMedID 39090623

    View details for PubMedCentralID 9739123

  • Comprehensive overview of the anesthesiology research landscape: A machine Learning Analysis of 737 NIH-funded anesthesiology primary Investigator's publication trends. Heliyon Ghanem, M., Espinosa, C., Chung, P., Reincke, M., Harrison, N., Phongpreecha, T., Shome, S., Saarunya, G., Berson, E., James, T., Xie, F., Shu, C. H., Hazra, D., Mataraso, S., Kim, Y., Seong, D., Chakraborty, D., Studer, M., Xue, L., Marić, I., Chang, A. L., Tjoa, E., Gaudillière, B., Tawfik, V. L., Mackey, S., Aghaeepour, N. 2024; 10 (7): e29050

    Abstract

    Anesthesiology plays a crucial role in perioperative care, critical care, and pain management, impacting patient experiences and clinical outcomes. However, our understanding of the anesthesiology research landscape is limited. Accordingly, we initiated a data-driven analysis through topic modeling to uncover research trends, enabling informed decision-making and fostering progress within the field.The easyPubMed R package was used to collect 32,300 PubMed abstracts spanning from 2000 to 2022. These abstracts were authored by 737 Anesthesiology Principal Investigators (PIs) who were recipients of National Institute of Health (NIH) funding from 2010 to 2022. Abstracts were preprocessed, vectorized, and analyzed with the state-of-the-art BERTopic algorithm to identify pillar topics and trending subtopics within anesthesiology research. Temporal trends were assessed using the Mann-Kendall test.The publishing journals with most abstracts in this dataset were Anesthesia & Analgesia 1133, Anesthesiology 992, and Pain 671. Eight pillar topics were identified and categorized as basic or clinical sciences based on a hierarchical clustering analysis. Amongst the pillar topics, "Cells & Proteomics" had both the highest annual and total number of abstracts. Interestingly, there was an overall upward trend for all topics spanning the years 2000-2022. However, when focusing on the period from 2015 to 2022, topics "Cells & Proteomics" and "Pulmonology" exhibit a downward trajectory. Additionally, various subtopics were identified, with notable increasing trends in "Aneurysms", "Covid 19 Pandemic", and "Artificial intelligence & Machine Learning".Our work offers a comprehensive analysis of the anesthesiology research landscape by providing insights into pillar topics, and trending subtopics. These findings contribute to a better understanding of anesthesiology research and can guide future directions.

    View details for DOI 10.1016/j.heliyon.2024.e29050

    View details for PubMedID 38623206

    View details for PubMedCentralID PMC11016610

  • Intra- and post-pandemic impact of the COVID-19 outbreak on Stanford Health Care. Academic pathology Phongpreecha, T., Berson, E., Xue, L., Shome, S., Saarunya, G., Fralick, J., Ruiz-Tagle, B. G., Foody, A., Chin, A. L., Lim, M., Arthofer, R., Albini, C., Montine, K., Folkins, A. K., Kong, C. S., Aghaeepour, N., Montine, T., Kerr, A. 2024; 11 (2): 100113

    Abstract

    Stanford Health Care, which provides about 7% of overall healthcare to approximately 9 million people in the San Francisco Bay Area, has undergone significant changes due to the opening of a second hospital in late 2019 and, more importantly, the COVID-19 pandemic. We examine the impact of these events on anatomic pathology (AP) cases, aiming to enhance operational efficiency in response to evolving healthcare demands. We extracted historical census, admission, lab tests, operation, and APdata since 2015. An approximately 45% increase in the volume of laboratory tests (P<0.0001) and a 17% increase in AP cases (P<0.0001) occurred post-pandemic. These increases were associated with progressively increasing (P<0.0001) hospital census. Census increase stemmed from higher admission through the emergency department (ED), and longer lengths of stay mostly for transfer patients, likely due to the greater capability of the new ED and changes in regional and local practice patterns post-pandemic. Higher census led to overcapacity, which has an inverted U relationship that peaked at 103% capacity for AP cases and 114% capacity for laboratory tests. Overcapacity led to a lower capability to perform clinical activities, particularly those related to surgical procedures. We conclude by suggesting parameters for optimal operations in the post-pandemic era.

    View details for DOI 10.1016/j.acpath.2024.100113

    View details for PubMedID 38562568

  • RETRACTED: 60 Predicting chorioamnionitis using AI-based methods: a retrospective cohort study. American journal of obstetrics and gynecology Waldrop, A. R., James, T. K., Suharwardy, S., Studer, M., Chang, A., Bernal, C. E., Xie, F., Shome, S., Hazra, D., Kim, Y., Clarke, G., Chakraborty, D., Mataraso, S., Berson, E., Xue, L., Payrovnaziri, S., Mohammadi, N., Haberkorn, W., Maric, I., El-Sayed, Y. Y., Carvalho, B., Aghaeepour, N. 2024; 230 (1S): S46

    Abstract

    This article has been retracted: please see Elsevier Policy on Article Withdrawal (https://www.elsevier.com/about/policies/article-withdrawal). This meeting abstract has been retracted at the request of the authors. The team determined further analysis is warranted before the formal presentation of the results.

    View details for DOI 10.1016/j.ajog.2023.11.081

    View details for PubMedID 38355237

  • Understanding the molecular basis of resilience to Alzheimer's disease. Frontiers in neuroscience Montine, K. S., Berson, E., Phongpreecha, T., Huang, Z., Aghaeepour, N., Zou, J. Y., MacCoss, M. J., Montine, T. J. 2023; 17: 1311157

    Abstract

    The cellular and molecular distinction between brain aging and neurodegenerative disease begins to blur in the oldest old. Approximately 15-25% of observations in humans do not fit predicted clinical manifestations, likely the result of suppressed damage despite usually adequate stressors and of resilience, the suppression of neurological dysfunction despite usually adequate degeneration. Factors during life may predict the clinico-pathologic state of resilience: cardiovascular health and mental health, more so than educational attainment, are predictive of a continuous measure of resilience to Alzheimer's disease (AD) and AD-related dementias (ADRDs). In resilience to AD alone (RAD), core features include synaptic and axonal processes, especially in the hippocampus. Future focus on larger and more diverse cohorts and additional regions offer emerging opportunities to understand this counterforce to neurodegeneration. The focus of this review is the molecular basis of resilience to AD.

    View details for DOI 10.3389/fnins.2023.1311157

    View details for PubMedID 38192507

    View details for PubMedCentralID PMC10773681

  • Quantitative estimate of cognitive resilience and its medical and genetic associations. Alzheimer's research & therapy Phongpreecha, T., Godrich, D., Berson, E., Espinosa, C., Kim, Y., Cholerton, B., Chang, A. L., Mataraso, S., Bukhari, S. A., Perna, A., Yakabi, K., Montine, K. S., Poston, K. L., Mormino, E., White, L., Beecham, G., Aghaeepour, N., Montine, T. J. 2023; 15 (1): 192

    Abstract

    We have proposed that cognitive resilience (CR) counteracts brain damage from Alzheimer's disease (AD) or AD-related dementias such that older individuals who harbor neurodegenerative disease burden sufficient to cause dementia remain cognitively normal. However, CR traditionally is considered a binary trait, capturing only the most extreme examples, and is often inconsistently defined.This study addressed existing discrepancies and shortcomings of the current CR definition by proposing a framework for defining CR as a continuous variable for each neuropsychological test. The linear equations clarified CR's relationship to closely related terms, including cognitive function, reserve, compensation, and damage. Primarily, resilience is defined as a function of cognitive performance and damage from neuropathologic damage. As such, the study utilized data from 844 individuals (age = 79 ± 12, 44% female) in the National Alzheimer's Coordinating Center cohort that met our inclusion criteria of comprehensive lesion rankings for 17 neuropathologic features and complete neuropsychological test results. Machine learning models and GWAS then were used to identify medical and genetic factors that are associated with CR.CR varied across five cognitive assessments and was greater in female participants, associated with longer survival, and weakly associated with educational attainment or APOE ε4 allele. In contrast, damage was strongly associated with APOE ε4 allele (P value < 0.0001). Major predictors of CR were cardiovascular health and social interactions, as well as the absence of behavioral symptoms.Our framework explicitly decoupled the effects of CR from neuropathologic damage. Characterizations and genetic association study of these two components suggest that the underlying CR mechanism has minimal overlap with the disease mechanism. Moreover, the identified medical features associated with CR suggest modifiable features to counteract clinical expression of damage and maintain cognitive function in older individuals.

    View details for DOI 10.1186/s13195-023-01329-z

    View details for PubMedID 37926851

    View details for PubMedCentralID 6410486

  • Deep representation learning identifies associations between physical activity and sleep patterns during pregnancy and prematurity. NPJ digital medicine Ravindra, N. G., Espinosa, C., Berson, E., Phongpreecha, T., Zhao, P., Becker, M., Chang, A. L., Shome, S., Marić, I., De Francesco, D., Mataraso, S., Saarunya, G., Thuraiappah, M., Xue, L., Gaudillière, B., Angst, M. S., Shaw, G. M., Herzog, E. D., Stevenson, D. K., England, S. K., Aghaeepour, N. 2023; 6 (1): 171

    Abstract

    Preterm birth (PTB) is the leading cause of infant mortality globally. Research has focused on developing predictive models for PTB without prioritizing cost-effective interventions. Physical activity and sleep present unique opportunities for interventions in low- and middle-income populations (LMICs). However, objective measurement of physical activity and sleep remains challenging and self-reported metrics suffer from low-resolution and accuracy. In this study, we use physical activity data collected using a wearable device comprising over 181,944 h of data across N = 1083 patients. Using a new state-of-the art deep learning time-series classification architecture, we develop a 'clock' of healthy dynamics during pregnancy by using gestational age (GA) as a surrogate for progression of pregnancy. We also develop novel interpretability algorithms that integrate unsupervised clustering, model error analysis, feature attribution, and automated actigraphy analysis, allowing for model interpretation with respect to sleep, activity, and clinical variables. Our model performs significantly better than 7 other machine learning and AI methods for modeling the progression of pregnancy. We found that deviations from a normal 'clock' of physical activity and sleep changes during pregnancy are strongly associated with pregnancy outcomes. When our model underestimates GA, there are 0.52 fewer preterm births than expected (P = 1.01e - 67, permutation test) and when our model overestimates GA, there are 1.44 times (P = 2.82e - 39, permutation test) more preterm births than expected. Model error is negatively correlated with interdaily stability (P = 0.043, Spearman's), indicating that our model assigns a more advanced GA when an individual's daily rhythms are less precise. Supporting this, our model attributes higher importance to sleep periods in predicting higher-than-actual GA, relative to lower-than-actual GA (P = 1.01e - 21, Mann-Whitney U). Combining prediction and interpretability allows us to signal when activity behaviors alter the likelihood of preterm birth and advocates for the development of clinical decision support through passive monitoring and exercise habit and sleep recommendations, which can be easily implemented in LMICs.

    View details for DOI 10.1038/s41746-023-00911-x

    View details for PubMedID 37770643

    View details for PubMedCentralID 3796350

  • Cross-species comparative analysis of single presynapses. Scientific reports Berson, E., Gajera, C. R., Phongpreecha, T., Perna, A., Bukhari, S. A., Becker, M., Chang, A. L., De Francesco, D., Espinosa, C., Ravindra, N. G., Postupna, N., Latimer, C. S., Shively, C. A., Register, T. C., Craft, S., Montine, K. S., Fox, E. J., Keene, C. D., Bendall, S. C., Aghaeepour, N., Montine, T. J. 2023; 13 (1): 13849

    Abstract

    Comparing brain structure across species and regions enables key functional insights. Leveraging publicly available data from a novel mass cytometry-based method, synaptometry by time of flight (SynTOF), we applied an unsupervised machine learning approach to conduct a comparative study of presynapse molecular abundance across three species and three brain regions. We used neural networks and their attractive properties to model complex relationships among high dimensional data to develop a unified, unsupervised framework for comparing the profile of more than 4.5 million single presynapses among normal human, macaque, and mouse samples. An extensive validation showed the feasibility of performing cross-species comparison using SynTOF profiling. Integrative analysis of the abundance of 20 presynaptic proteins revealed near-complete separation between primates and mice involving synaptic pruning, cellular energy, lipid metabolism, and neurotransmission. In addition, our analysis revealed a strong overlap between the presynaptic composition of human and macaque in the cerebral cortex and neostriatum. Our unique approach illuminates species- and region-specific variation in presynapse molecular composition.

    View details for DOI 10.1038/s41598-023-40683-8

    View details for PubMedID 37620363

    View details for PubMedCentralID 3365257

  • Whole genome deconvolution unveils Alzheimer's resilient epigenetic signature. Nature communications Berson, E., Sreenivas, A., Phongpreecha, T., Perna, A., Grandi, F. C., Xue, L., Ravindra, N. G., Payrovnaziri, N., Mataraso, S., Kim, Y., Espinosa, C., Chang, A. L., Becker, M., Montine, K. S., Fox, E. J., Chang, H. Y., Corces, M. R., Aghaeepour, N., Montine, T. J. 2023; 14 (1): 4947

    Abstract

    Assay for Transposase Accessible Chromatin by sequencing (ATAC-seq) accurately depicts the chromatin regulatory state and altered mechanisms guiding gene expression in disease. However, bulk sequencing entangles information from different cell types and obscures cellular heterogeneity. To address this, we developed Cellformer, a deep learning method that deconvolutes bulk ATAC-seq into cell type-specific expression across the whole genome. Cellformer enables cost-effective cell type-specific open chromatin profiling in large cohorts. Applied to 191 bulk samples from 3 brain regions, Cellformer identifies cell type-specific gene regulatory mechanisms involved in resilience to Alzheimer's disease, an uncommon group of cognitively healthy individuals that harbor a high pathological load of Alzheimer's disease. Cell type-resolved chromatin profiling unveils cell type-specific pathways and nominates potential epigenetic mediators underlying resilience that may illuminate therapeutic opportunities to limit the cognitive impact of the disease. Cellformer is freely available to facilitate future investigations using high-throughput bulk ATAC-seq data.

    View details for DOI 10.1038/s41467-023-40611-4

    View details for PubMedID 37587197

    View details for PubMedCentralID 6071637

  • Multiomic signals associated with maternal epidemiological factors contributing to preterm birth in low- and middle-income countries. Science advances Espinosa, C. A., Khan, W., Khanam, R., Das, S., Khalid, J., Pervin, J., Kasaro, M. P., Contrepois, K., Chang, A. L., Phongpreecha, T., Michael, B., Ellenberger, M., Mehmood, U., Hotwani, A., Nizar, A., Kabir, F., Wong, R. J., Becker, M., Berson, E., Culos, A., De Francesco, D., Mataraso, S., Ravindra, N., Thuraiappah, M., Xenochristou, M., Stelzer, I. A., Marić, I., Dutta, A., Raqib, R., Ahmed, S., Rahman, S., Hasan, A. S., Ali, S. M., Juma, M. H., Rahman, M., Aktar, S., Deb, S., Price, J. T., Wise, P. H., Winn, V. D., Druzin, M. L., Gibbs, R. S., Darmstadt, G. L., Murray, J. C., Stringer, J. S., Gaudilliere, B., Snyder, M. P., Angst, M. S., Rahman, A., Baqui, A. H., Jehan, F., Nisar, M. I., Vwalika, B., Sazawal, S., Shaw, G. M., Stevenson, D. K., Aghaeepour, N. 2023; 9 (21): eade7692

    Abstract

    Preterm birth (PTB) is the leading cause of death in children under five, yet comprehensive studies are hindered by its multiple complex etiologies. Epidemiological associations between PTB and maternal characteristics have been previously described. This work used multiomic profiling and multivariate modeling to investigate the biological signatures of these characteristics. Maternal covariates were collected during pregnancy from 13,841 pregnant women across five sites. Plasma samples from 231 participants were analyzed to generate proteomic, metabolomic, and lipidomic datasets. Machine learning models showed robust performance for the prediction of PTB (AUROC = 0.70), time-to-delivery (r = 0.65), maternal age (r = 0.59), gravidity (r = 0.56), and BMI (r = 0.81). Time-to-delivery biological correlates included fetal-associated proteins (e.g., ALPP, AFP, and PGF) and immune proteins (e.g., PD-L1, CCL28, and LIFR). Maternal age negatively correlated with collagen COL9A1, gravidity with endothelial NOS and inflammatory chemokine CXCL13, and BMI with leptin and structural protein FABP4. These results provide an integrated view of epidemiological factors associated with PTB and identify biological signatures of clinical covariates affecting this disease.

    View details for DOI 10.1126/sciadv.ade7692

    View details for PubMedID 37224249

  • Large-scale correlation network construction for unraveling the coordination of complex biological systems NATURE COMPUTATIONAL SCIENCE Becker, M., Nassar, H., Espinosa, C., Stelzer, I. A., Feyaerts, D., Berson, E., Bidoki, N. H., Chang, A. L., Saarunya, G., Culos, A., De Francesco, D., Fallahzadeh, R., Liu, Q., Kim, Y., Maric, I., Mataraso, S. J., Payrovnaziri, S., Phongpreecha, T., Ravindra, N. G., Stanley, N., Shome, S., Tan, Y., Thuraiappah, M., Xenochristou, M., Xue, L., Shaw, G., Stevenson, D., Angst, M. S., Gaudilliere, B., Aghaeepour, N. 2023
  • Large-scale correlation network construction for unraveling the coordination of complex biological systems. Nature computational science Becker, M., Nassar, H., Espinosa, C., Stelzer, I. A., Feyaerts, D., Berson, E., Bidoki, N. H., Chang, A. L., Saarunya, G., Culos, A., De Francesco, D., Fallahzadeh, R., Liu, Q., Kim, Y., Marić, I., Mataraso, S. J., Payrovnaziri, S. N., Phongpreecha, T., Ravindra, N. G., Stanley, N., Shome, S., Tan, Y., Thuraiappah, M., Xenochristou, M., Xue, L., Shaw, G., Stevenson, D., Angst, M. S., Gaudilliere, B., Aghaeepour, N. 2023; 3 (4): 346-359

    Abstract

    Advanced measurement and data storage technologies have enabled high-dimensional profiling of complex biological systems. For this, modern multiomics studies regularly produce datasets with hundreds of thousands of measurements per sample, enabling a new era of precision medicine. Correlation analysis is an important first step to gain deeper insights into the coordination and underlying processes of such complex systems. However, the construction of large correlation networks in modern high-dimensional datasets remains a major computational challenge owing to rapidly growing runtime and memory requirements. Here we address this challenge by introducing CorALS (Correlation Analysis of Large-scale (biological) Systems), an open-source framework for the construction and analysis of large-scale parametric as well as non-parametric correlation networks for high-dimensional biological data. It features off-the-shelf algorithms suitable for both personal and high-performance computers, enabling workflows and downstream analysis approaches. We illustrate the broad scope and potential of CorALS by exploring perspectives on complex biological processes in large-scale multiomics and single-cell studies.

    View details for DOI 10.1038/s43588-023-00429-y

    View details for PubMedID 38116462

    View details for PubMedCentralID PMC10727505

  • Data-driven longitudinal characterization of neonatal health and morbidity. Science translational medicine De Francesco, D., Reiss, J. D., Roger, J., Tang, A. S., Chang, A. L., Becker, M., Phongpreecha, T., Espinosa, C., Morin, S., Berson, E., Thuraiappah, M., Le, B. L., Ravindra, N. G., Payrovnaziri, S. N., Mataraso, S., Kim, Y., Xue, L., Rosenstein, M. G., Oskotsky, T., Marić, I., Gaudilliere, B., Carvalho, B., Bateman, B. T., Angst, M. S., Prince, L. S., Blumenfeld, Y. J., Benitz, W. E., Fuerch, J. H., Shaw, G. M., Sylvester, K. G., Stevenson, D. K., Sirota, M., Aghaeepour, N. 2023; 15 (683): eadc9854

    Abstract

    Although prematurity is the single largest cause of death in children under 5 years of age, the current definition of prematurity, based on gestational age, lacks the precision needed for guiding care decisions. Here, we propose a longitudinal risk assessment for adverse neonatal outcomes in newborns based on a deep learning model that uses electronic health records (EHRs) to predict a wide range of outcomes over a period starting shortly before conception and ending months after birth. By linking the EHRs of the Lucile Packard Children's Hospital and the Stanford Healthcare Adult Hospital, we developed a cohort of 22,104 mother-newborn dyads delivered between 2014 and 2018. Maternal and newborn EHRs were extracted and used to train a multi-input multitask deep learning model, featuring a long short-term memory neural network, to predict 24 different neonatal outcomes. An additional cohort of 10,250 mother-newborn dyads delivered at the same Stanford Hospitals from 2019 to September 2020 was used to validate the model. Areas under the receiver operating characteristic curve at delivery exceeded 0.9 for 10 of the 24 neonatal outcomes considered and were between 0.8 and 0.9 for 7 additional outcomes. Moreover, comprehensive association analysis identified multiple known associations between various maternal and neonatal features and specific neonatal outcomes. This study used linked EHRs from more than 30,000 mother-newborn dyads and would serve as a resource for the investigation and prediction of neonatal outcomes. An interactive website is available for independent investigators to leverage this unique dataset: https://maternal-child-health-associations.shinyapps.io/shiny_app/.

    View details for DOI 10.1126/scitranslmed.adc9854

    View details for PubMedID 36791208

  • Prediction of neuropathologic lesions from clinical data. Alzheimer's & dementia : the journal of the Alzheimer's Association Phongpreecha, T., Cholerton, B., Bhukari, S., Chang, A. L., De Francesco, D., Thuraiappah, M., Godrich, D., Perna, A., Becker, M. G., Ravindra, N. G., Espinosa, C., Kim, Y., Berson, E., Mataraso, S., Sha, S. J., Fox, E. J., Montine, K. S., Baker, L. D., Craft, S., White, L., Poston, K. L., Beecham, G., Aghaeepour, N., Montine, T. J. 2023

    Abstract

    Post-mortem analysis provides definitive diagnoses of neurodegenerative diseases; however, only a few can be diagnosed during life.This study employed statistical tools and machine learning to predict 17 neuropathologic lesions from a cohort of 6518 individuals using 381 clinical features (Table S1). The multisite data allowed validation of the model's robustness by splitting train/test sets by clinical sites. A similar study was performed for predicting Alzheimer's disease (AD) neuropathologic change without specific comorbidities.Prediction results show high performance for certain lesions that match or exceed that of research annotation. Neurodegenerative comorbidities in addition to AD neuropathologic change resulted in compounded, but disproportionate, effects across cognitive domains as the comorbidity number increased.Certain clinical features could be strongly associated with multiple neurodegenerative diseases, others were lesion-specific, and some were divergent between lesions. Our approach could benefit clinical research, and genetic and biomarker research by enriching cohorts for desired lesions.

    View details for DOI 10.1002/alz.12921

    View details for PubMedID 36681388

  • In-Silico Generation of High-Dimensional Immune Response Data in Patients using a Deep Neural Network. Cytometry. Part A : the journal of the International Society for Analytical Cytology Fallahzadeh, R., Bidoki, N. H., Stelzer, I. A., Becker, M., Marić, I., Chang, A. L., Culos, A., Phongpreecha, T., Xenochristou, M., De Francesco, D., Espinosa, C., Berson, E., Verdonk, F., Angst, M. S., Gaudilliere, B., Aghaeepour, N. 2022

    Abstract

    Technologies for single-cell profiling of the immune system have enabled researchers to extract rich interconnected networks of cellular abundance, phenotypical and functional cellular parameters. These studies can power machine learning approaches to understand the role of the immune system in various diseases. However, the performance of these approaches and the generalizability of the findings have been hindered by limited cohort sizes in translational studies, partially due to logistical demands and costs associated with longitudinal data collection in sufficiently large patient cohorts. An evolving challenge is the requirement for ever-increasing cohort sizes as the dimensionality of datasets grows. We propose a deep learning model derived from a novel pipeline of optimal temporal cell matching and overcomplete autoencoders that uses data from a small subset of patients to learn to forecast an entire patient's immune response in a high dimensional space from one timepoint to another. In our analysis of 1.08 million cells from patients pre- and post-surgical intervention, we demonstrate that the generated patient-specific data are qualitatively and quantitatively similar to real patient data by demonstrating fidelity, diversity, and usefulness. This article is protected by copyright. All rights reserved.

    View details for DOI 10.1002/cyto.a.24709

    View details for PubMedID 36507780

  • Revealing the impact of lifestyle stressors on the risk of adverse pregnancy outcomes with multitask machine learning. Frontiers in pediatrics Becker, M., Dai, J., Chang, A. L., Feyaerts, D., Stelzer, I. A., Zhang, M., Berson, E., Saarunya, G., De Francesco, D., Espinosa, C., Kim, Y., Maric, I., Mataraso, S., Payrovnaziri, S. N., Phongpreecha, T., Ravindra, N. G., Shome, S., Tan, Y., Thuraiappah, M., Xue, L., Mayo, J. A., Quaintance, C. C., Laborde, A., King, L. S., Dhabhar, F. S., Gotlib, I. H., Wong, R. J., Angst, M. S., Shaw, G. M., Stevenson, D. K., Gaudilliere, B., Aghaeepour, N. 2022; 10: 933266

    Abstract

    Psychosocial and stress-related factors (PSFs), defined as internal or external stimuli that induce biological changes, are potentially modifiable factors and accessible targets for interventions that are associated with adverse pregnancy outcomes (APOs). Although individual APOs have been shown to be connected to PSFs, they are biologically interconnected, relatively infrequent, and therefore challenging to model. In this context, multi-task machine learning (MML) is an ideal tool for exploring the interconnectedness of APOs on the one hand and building on joint combinatorial outcomes to increase predictive power on the other hand. Additionally, by integrating single cell immunological profiling of underlying biological processes, the effects of stress-based therapeutics may be measurable, facilitating the development of precision medicine approaches.Objectives: The primary objectives were to jointly model multiple APOs and their connection to stress early in pregnancy, and to explore the underlying biology to guide development of accessible and measurable interventions.Materials and Methods: In a prospective cohort study, PSFs were assessed during the first trimester with an extensive self-filled questionnaire for 200 women. We used MML to simultaneously model, and predict APOs (severe preeclampsia, superimposed preeclampsia, gestational diabetes and early gestational age) as well as several risk factors (BMI, diabetes, hypertension) for these patients based on PSFs. Strongly interrelated stressors were categorized to identify potential therapeutic targets. Furthermore, for a subset of 14 women, we modeled the connection of PSFs to the maternal immune system to APOs by building corresponding ML models based on an extensive single cell immune dataset generated by mass cytometry time of flight (CyTOF).Results: Jointly modeling APOs in a MML setting significantly increased modeling capabilities and yielded a highly predictive integrated model of APOs underscoring their interconnectedness. Most APOs were associated with mental health, life stress, and perceived health risks. Biologically, stressors were associated with specific immune characteristics revolving around CD4/CD8 T cells. Immune characteristics predicted based on stress were in turn found to be associated with APOs.Conclusions: Elucidating connections among stress, multiple APOs simultaneously, and immune characteristics has the potential to facilitate the implementation of ML-based, individualized, integrative models of pregnancy in clinical decision making. The modifiable nature of stressors may enable the development of accessible interventions, with success tracked through immune characteristics.

    View details for DOI 10.3389/fped.2022.933266

    View details for PubMedID 36582513