Samson Mataraso
Ph.D. Student in Biomedical Data Science, admitted Autumn 2020
All Publications
-
Whole genome deconvolution unveils Alzheimer's resilient epigenetic signature.
Nature communications
2023; 14 (1): 4947
Abstract
Assay for Transposase Accessible Chromatin by sequencing (ATAC-seq) accurately depicts the chromatin regulatory state and altered mechanisms guiding gene expression in disease. However, bulk sequencing entangles information from different cell types and obscures cellular heterogeneity. To address this, we developed Cellformer, a deep learning method that deconvolutes bulk ATAC-seq into cell type-specific expression across the whole genome. Cellformer enables cost-effective cell type-specific open chromatin profiling in large cohorts. Applied to 191 bulk samples from 3 brain regions, Cellformer identifies cell type-specific gene regulatory mechanisms involved in resilience to Alzheimer's disease, an uncommon group of cognitively healthy individuals that harbor a high pathological load of Alzheimer's disease. Cell type-resolved chromatin profiling unveils cell type-specific pathways and nominates potential epigenetic mediators underlying resilience that may illuminate therapeutic opportunities to limit the cognitive impact of the disease. Cellformer is freely available to facilitate future investigations using high-throughput bulk ATAC-seq data.
View details for DOI 10.1038/s41467-023-40611-4
View details for PubMedID 37587197
View details for PubMedCentralID 6071637
-
Multiomic signals associated with maternal epidemiological factors contributing to preterm birth in low- and middle-income countries.
Science advances
2023; 9 (21): eade7692
Abstract
Preterm birth (PTB) is the leading cause of death in children under five, yet comprehensive studies are hindered by its multiple complex etiologies. Epidemiological associations between PTB and maternal characteristics have been previously described. This work used multiomic profiling and multivariate modeling to investigate the biological signatures of these characteristics. Maternal covariates were collected during pregnancy from 13,841 pregnant women across five sites. Plasma samples from 231 participants were analyzed to generate proteomic, metabolomic, and lipidomic datasets. Machine learning models showed robust performance for the prediction of PTB (AUROC = 0.70), time-to-delivery (r = 0.65), maternal age (r = 0.59), gravidity (r = 0.56), and BMI (r = 0.81). Time-to-delivery biological correlates included fetal-associated proteins (e.g., ALPP, AFP, and PGF) and immune proteins (e.g., PD-L1, CCL28, and LIFR). Maternal age negatively correlated with collagen COL9A1, gravidity with endothelial NOS and inflammatory chemokine CXCL13, and BMI with leptin and structural protein FABP4. These results provide an integrated view of epidemiological factors associated with PTB and identify biological signatures of clinical covariates affecting this disease.
View details for DOI 10.1126/sciadv.ade7692
View details for PubMedID 37224249
-
Large-scale correlation network construction for unraveling the coordination of complex biological systems
NATURE COMPUTATIONAL SCIENCE
2023
View details for DOI 10.1038/s43588-023-00429-y
View details for Web of Science ID 000968297800002
-
Data-driven longitudinal characterization of neonatal health and morbidity.
Science translational medicine
2023; 15 (683): eadc9854
Abstract
Although prematurity is the single largest cause of death in children under 5 years of age, the current definition of prematurity, based on gestational age, lacks the precision needed for guiding care decisions. Here, we propose a longitudinal risk assessment for adverse neonatal outcomes in newborns based on a deep learning model that uses electronic health records (EHRs) to predict a wide range of outcomes over a period starting shortly before conception and ending months after birth. By linking the EHRs of the Lucile Packard Children's Hospital and the Stanford Healthcare Adult Hospital, we developed a cohort of 22,104 mother-newborn dyads delivered between 2014 and 2018. Maternal and newborn EHRs were extracted and used to train a multi-input multitask deep learning model, featuring a long short-term memory neural network, to predict 24 different neonatal outcomes. An additional cohort of 10,250 mother-newborn dyads delivered at the same Stanford Hospitals from 2019 to September 2020 was used to validate the model. Areas under the receiver operating characteristic curve at delivery exceeded 0.9 for 10 of the 24 neonatal outcomes considered and were between 0.8 and 0.9 for 7 additional outcomes. Moreover, comprehensive association analysis identified multiple known associations between various maternal and neonatal features and specific neonatal outcomes. This study used linked EHRs from more than 30,000 mother-newborn dyads and would serve as a resource for the investigation and prediction of neonatal outcomes. An interactive website is available for independent investigators to leverage this unique dataset: https://maternal-child-health-associations.shinyapps.io/shiny_app/.
View details for DOI 10.1126/scitranslmed.adc9854
View details for PubMedID 36791208
-
Prediction of neuropathologic lesions from clinical data.
Alzheimer's & dementia : the journal of the Alzheimer's Association
2023
Abstract
Post-mortem analysis provides definitive diagnoses of neurodegenerative diseases; however, only a few can be diagnosed during life.This study employed statistical tools and machine learning to predict 17 neuropathologic lesions from a cohort of 6518 individuals using 381 clinical features (Table S1). The multisite data allowed validation of the model's robustness by splitting train/test sets by clinical sites. A similar study was performed for predicting Alzheimer's disease (AD) neuropathologic change without specific comorbidities.Prediction results show high performance for certain lesions that match or exceed that of research annotation. Neurodegenerative comorbidities in addition to AD neuropathologic change resulted in compounded, but disproportionate, effects across cognitive domains as the comorbidity number increased.Certain clinical features could be strongly associated with multiple neurodegenerative diseases, others were lesion-specific, and some were divergent between lesions. Our approach could benefit clinical research, and genetic and biomarker research by enriching cohorts for desired lesions.
View details for DOI 10.1002/alz.12921
View details for PubMedID 36681388
-
Revealing the impact of lifestyle stressors on the risk of adverse pregnancy outcomes with multitask machine learning.
Frontiers in pediatrics
2022; 10: 933266
Abstract
Psychosocial and stress-related factors (PSFs), defined as internal or external stimuli that induce biological changes, are potentially modifiable factors and accessible targets for interventions that are associated with adverse pregnancy outcomes (APOs). Although individual APOs have been shown to be connected to PSFs, they are biologically interconnected, relatively infrequent, and therefore challenging to model. In this context, multi-task machine learning (MML) is an ideal tool for exploring the interconnectedness of APOs on the one hand and building on joint combinatorial outcomes to increase predictive power on the other hand. Additionally, by integrating single cell immunological profiling of underlying biological processes, the effects of stress-based therapeutics may be measurable, facilitating the development of precision medicine approaches.Objectives: The primary objectives were to jointly model multiple APOs and their connection to stress early in pregnancy, and to explore the underlying biology to guide development of accessible and measurable interventions.Materials and Methods: In a prospective cohort study, PSFs were assessed during the first trimester with an extensive self-filled questionnaire for 200 women. We used MML to simultaneously model, and predict APOs (severe preeclampsia, superimposed preeclampsia, gestational diabetes and early gestational age) as well as several risk factors (BMI, diabetes, hypertension) for these patients based on PSFs. Strongly interrelated stressors were categorized to identify potential therapeutic targets. Furthermore, for a subset of 14 women, we modeled the connection of PSFs to the maternal immune system to APOs by building corresponding ML models based on an extensive single cell immune dataset generated by mass cytometry time of flight (CyTOF).Results: Jointly modeling APOs in a MML setting significantly increased modeling capabilities and yielded a highly predictive integrated model of APOs underscoring their interconnectedness. Most APOs were associated with mental health, life stress, and perceived health risks. Biologically, stressors were associated with specific immune characteristics revolving around CD4/CD8 T cells. Immune characteristics predicted based on stress were in turn found to be associated with APOs.Conclusions: Elucidating connections among stress, multiple APOs simultaneously, and immune characteristics has the potential to facilitate the implementation of ML-based, individualized, integrative models of pregnancy in clinical decision making. The modifiable nature of stressors may enable the development of accessible interventions, with success tracked through immune characteristics.
View details for DOI 10.3389/fped.2022.933266
View details for PubMedID 36582513
-
A longitudinal big data approach for precision health
NATURE MEDICINE
2019; 25 (5): 792-+
View details for DOI 10.1038/s41591-019-0414-6
View details for Web of Science ID 000468247800023
-
Effect of a machine learning-based severe sepsis prediction algorithm on patient survival and hospital length of stay: a randomised clinical trial.
BMJ open respiratory research
2017; 4 (1): e000234
Abstract
Several methods have been developed to electronically monitor patients for severe sepsis, but few provide predictive capabilities to enable early intervention; furthermore, no severe sepsis prediction systems have been previously validated in a randomised study. We tested the use of a machine learning-based severe sepsis prediction system for reductions in average length of stay and in-hospital mortality rate.We conducted a randomised controlled clinical trial at two medical-surgical intensive care units at the University of California, San Francisco Medical Center, evaluating the primary outcome of average length of stay, and secondary outcome of in-hospital mortality rate from December 2016 to February 2017. Adult patients (18+) admitted to participating units were eligible for this factorial, open-label study. Enrolled patients were assigned to a trial arm by a random allocation sequence. In the control group, only the current severe sepsis detector was used; in the experimental group, the machine learning algorithm (MLA) was also used. On receiving an alert, the care team evaluated the patient and initiated the severe sepsis bundle, if appropriate. Although participants were randomly assigned to a trial arm, group assignments were automatically revealed for any patients who received MLA alerts.Outcomes from 75 patients in the control and 67 patients in the experimental group were analysed. Average length of stay decreased from 13.0 days in the control to 10.3 days in the experimental group (p=0.042). In-hospital mortality decreased by 12.4 percentage points when using the MLA (p=0.018), a relative reduction of 58.0%. No adverse events were reported during this trial.The MLA was associated with improved patient outcomes. This is the first randomised controlled trial of a sepsis surveillance system to demonstrate statistically significant differences in length of stay and in-hospital mortality.NCT03015454.
View details for DOI 10.1136/bmjresp-2017-000234
View details for PubMedID 29435343
View details for PubMedCentralID PMC5687546
-
Cost and mortality impact of an algorithm-driven sepsis prediction system
JOURNAL OF MEDICAL ECONOMICS
2017; 20 (6): 646-651
Abstract
To compute the financial and mortality impact of InSight, an algorithm-driven biomarker, which forecasts the onset of sepsis with minimal use of electronic health record data.This study compares InSight with existing sepsis screening tools and computes the differential life and cost savings associated with its use in the inpatient setting. To do so, mortality reduction is obtained from an increase in the number of sepsis cases correctly identified by InSight. Early sepsis detection by InSight is also associated with a reduction in length-of-stay, from which cost savings are directly computed.InSight identifies more true positive cases of severe sepsis, with fewer false alarms, than comparable methods. For an individual ICU with 50 beds, for example, it is determined that InSight annually saves 75 additional lives and reduces sepsis-related costs by $560,000.InSight performance results are derived from analysis of a single-center cohort. Mortality reduction results rely on a simplified use case, which fixes prediction times at 0, 1, and 2 h before sepsis onset, likely leading to under-estimates of lives saved. The corresponding cost reduction numbers are based on national averages for daily patient length-of-stay cost.InSight has the potential to reduce sepsis-related deaths and to lead to substantial cost savings for healthcare facilities.
View details for DOI 10.1080/13696998.2017.1307203
View details for Web of Science ID 000401763300012
View details for PubMedID 28294646