Bio
Amir Bahmani is a Genetics Instructor and Director of Stanford's Deep Data Research Center (https://deepdata.stanford.edu ) at the Stanford School of Medicine. He has worked on distributed and parallel computing applications since 2008. Amir is currently an active researcher in the VA Million Veteran Program (MVP), Human Tumor Atlas Network (HTAN), Human BioMolecular Atlas Program (HuBMAP), Stanford Metabolic Health Center (MHC), Integrated Personal Omics Profiling (iPOP), and Bridge to Artificial Intelligence (Bridge2AI).
His team has designed and developed several notable cloud-scale frameworks, including the Personal Health Dashboard (PHD), cloud-based cost-saving platforms such as Hummingbird and Swarm, and the MyPHD platform, which now has over 12,000 participants and hosts more than 37 studies. His team also created Stanford Data Ocean (SDO), an innovative platform for educating engineers and biologists. SDO is the first serverless multi-omics and wearables data platform used for education and training.
Since 2017, he has trained more than 30 graduate interns (engineers and designers) from outside the School of Medicine, engaging them in the field of medicine. His course has been offered to physicians, biologists, engineers, and designers, earning him recognition as the recipient of Stanford’s 2024 Walter J. Gores Award for Excellence in Teaching. In 2023, he received the Terman Mentorship Award for mentoring Terman Fellow Ryan Park (top 1%), who transitioned to a Genetics PhD program inspired by Amir’s course. Committed to accessibility in education, Amir created a first-of-its-kind scholarship for under-resourced communities at Stanford, providing his course free of charge—along with Genetics certificates—to over 4,500 students from under-resourced backgrounds across 104 countries and all 50 U.S. states.
Academic Appointments
-
Instructor, Genetics
-
Member, Stanford Cancer Institute
-
Member, Wu Tsai Neurosciences Institute
Honors & Awards
-
Anthem Awards for SDO: Community Voice, Silver in Product, Bronze in AI & Emerging Company, The Webby Awards (2024)
-
Don Norman Design Award for Stanford Data Ocean (SDO), Don Norman Design Award (DNDA) (2024)
-
Walter J. Gores Award for Excellence in Teaching, Stanford University (2024)
-
Alumni 'Rising Star' Award, North Carolina State University (2022)
-
Graduate Student Leadership Award, North Carolina State University (2016)
Boards, Advisory Committees, Professional Organizations
-
Director, Deep Data Research Computing Center (2021 - Present)
Professional Education
-
Ph.D., North Carolina State University, Computer Science (2017)
-
M.S., North Carolina State University, Computer Science (2014)
2023-24 Courses
- Cloud Computing for Biology and Healthcare
BIOMEDIN 222, CS 273C, GENE 222 (Spr) -
Prior Year Courses
2022-23 Courses
- Cloud Computing for Biology and Healthcare
BIOMEDIN 222, CS 273C, GENE 222 (Spr)
- Cloud Computing for Biology and Healthcare
All Publications
-
Achieving inclusive healthcare through integrating education and research with AI and personalized curricula.
Communications medicine
2025; 5 (1): 356
Abstract
Precision medicine promises significant health benefits but faces challenges such as complex data management and analytics, interdisciplinary collaboration, and education of researchers, healthcare professionals, and participants. Addressing these needs requires the integration of computational experts, engineers, designers, and healthcare professionals to develop user-friendly systems and shared terminologies. The widespread adoption of large language models (LLMs) such as Generative Pretrained Transformer (GPT) and Claude highlights the importance of making complex data accessible to non-specialists.We evaluated the Stanford Data Ocean (SDO) precision medicine training program's learning outcomes, AI Tutor performance, and learner satisfaction by assessing self-rated competency on key learning objectives through pre- and post-learning surveys, along with formative and summative assessment completion rates. We also analyzed AI Tutor accuracy and learners' self-reported satisfaction, and post-program academic and career impacts. Additionally, we demonstrated the capabilities of the AI Data Visualization tool.SDO demonstrates the ability to improve learning outcomes for learners from broad educational and socioeconomic backgrounds with the support of the AI Tutor. The AI Data Visualization tool enables learners to interpret multi-omics and wearable data and replicate research findings.SDO strives to mitigate challenges in precision medicine through a scalable, cloud-based platform that supports data management for various data types, advanced research, and personalized learning. SDO provides AI Tutors and AI-powered data visualization tools to enhance educational and research outcomes and make data analysis accessible to users from broad educational backgrounds. By extending engagement and cutting-edge research capabilities globally, SDO particularly benefits economically disadvantaged and historically marginalized communities, fostering interdisciplinary biomedical research and bridging the gap between education and practical application in the biomedical field.
View details for DOI 10.1038/s43856-025-01034-y
View details for PubMedID 40819118
View details for PubMedCentralID 9108683
-
AI-READI: rethinking AI data collection, preparation and sharing in diabetes research and beyond
NATURE METABOLISM
2024
View details for DOI 10.1038/s42255-024-01165-x
View details for Web of Science ID 001350162600001
View details for PubMedID 39516364
View details for PubMedCentralID 4792175
-
Real-time alerting system for COVID-19 and other stress events using wearable data.
Nature medicine
2021
Abstract
Early detection of infectious diseases is crucial for reducing transmission and facilitating early intervention. In this study, we built a real-time smartwatch-based alerting system that detects aberrant physiological and activity signals (heart rates and steps) associated with the onset of early infection and implemented this system in a prospective study. In a cohort of 3,318 participants, of whom 84 were infected with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), this system generated alerts for pre-symptomatic and asymptomatic SARS-CoV-2 infection in 67 (80%) of the infected individuals. Pre-symptomatic signals were observed at a median of 3 days before symptom onset. Examination of detailed survey responses provided by the participants revealed that other respiratory infections as well as events not associated with infection, such as stress, alcohol consumption and travel, could also trigger alerts, albeit at a much lower mean frequency (1.15 alert days per person compared to 3.42 alert days per person for coronavirus disease 2019 cases). Thus, analysis of smartwatch signals by an online detection algorithm provides advance warning of SARS-CoV-2 infection in a high percentage of cases. This study shows that a real-time alerting system can be used for early detection of infection and other stressors and employed on an open-source platform that is scalable to millions of users.
View details for DOI 10.1038/s41591-021-01593-2
View details for PubMedID 34845389
-
A scalable, secure, and interoperable platform for deep data-driven health management.
Nature communications
2021; 12 (1): 5757
Abstract
The large amount of biomedical data derived from wearable sensors, electronic health records, and molecular profiling (e.g., genomics data) is rapidly transforming our healthcare systems. The increasing scale and scope of biomedical data not only is generating enormous opportunities for improving health outcomes but also raises new challenges ranging from data acquisition and storage to data analysis and utilization. To meet these challenges, we developed the Personal Health Dashboard (PHD), which utilizes state-of-the-art security and scalability technologies to provide an end-to-end solution for big biomedical data analytics. The PHD platform is an open-source software framework that can be easily configured and deployed to any big data health project to store, organize, and process complex biomedical data sets, support real-time data analysis at both the individual level and the cohort level, and ensure participant privacy at every step. In addition to presenting the system, we illustrate the use of the PHD framework for large-scale applications in emerging multi-omics disease studies, such as collecting and visualization of diverse data types (wearable, clinical, omics) at a personal level, investigation of insulin resistance, and an infrastructure for the detection of presymptomatic COVID-19.
View details for DOI 10.1038/s41467-021-26040-1
View details for PubMedID 34599181
-
Hummingbird: Efficient Performance Prediction for Executing Genomic Applications in the Cloud.
Bioinformatics (Oxford, England)
2021
Abstract
MOTIVATION: A major drawback of executing genomic applications on cloud computing facilities is the lack of tools to predict which instance type is the most appropriate, often resulting in an over- or under- matching of resources. Determining the right configuration before actually running the applications will save money and time. Here, we introduce Hummingbird, a tool for predicting performance of computing instances with varying memory and CPU on multiple cloud platforms.RESULTS: Our experiments on three major genomic data pipelines, including GATK HaplotypeCaller, GATK MuTect2, and ENCODE ATAC-seq, showed that Hummingbird was able to address applications in command line specified in JSON format or workflow description language (WDL) format, and accurately predicted the fastest, the cheapest, and the most cost-efficient compute instances in an economic manner.AVAILABILITY: Hummingbird is available as an open source tool at: https://github.com/StanfordBioinformatics/Hummingbird.SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
View details for DOI 10.1093/bioinformatics/btab161
View details for PubMedID 33693476
-
Swarm: A federated cloud framework for large-scale variant analysis.
PLoS computational biology
2021; 17 (5): e1008977
Abstract
Genomic data analysis across multiple cloud platforms is an ongoing challenge, especially when large amounts of data are involved. Here, we present Swarm, a framework for federated computation that promotes minimal data motion and facilitates crosstalk between genomic datasets stored on various cloud platforms. We demonstrate its utility via common inquiries of genomic variants across BigQuery in the Google Cloud Platform (GCP), Athena in the Amazon Web Services (AWS), Apache Presto and MySQL. Compared to single-cloud platforms, the Swarm framework significantly reduced computational costs, run-time delays and risks of security breach and privacy violation.
View details for DOI 10.1371/journal.pcbi.1008977
View details for PubMedID 33979321
-
Advances and prospects for the Human BioMolecular Atlas Program (HuBMAP).
Nature cell biology
2023
Abstract
The Human BioMolecular Atlas Program (HuBMAP) aims to create a multi-scale spatial atlas of the healthy human body at single-cell resolution by applying advanced technologies and disseminating resources to the community. As the HuBMAP moves past its first phase, creating ontologies, protocols and pipelines, this Perspective introduces the production phase: the generation of reference spatial maps of functional tissue units across many organs from diverse populations and the creation of mapping tools and infrastructure to advance biomedical research.
View details for DOI 10.1038/s41556-023-01194-w
View details for PubMedID 37468756
View details for PubMedCentralID 8238499
-
A method for intelligent allocation of diagnostic testing by leveraging data from commercial wearable devices: a case study on COVID-19.
NPJ digital medicine
2022; 5 (1): 130
Abstract
Mass surveillance testing can help control outbreaks of infectious diseases such as COVID-19. However, diagnostic test shortages are prevalent globally and continue to occur in the US with the onset of new COVID-19 variants and emerging diseases like monkeypox, demonstrating an unprecedented need for improving our current methods for mass surveillance testing. By targeting surveillance testing toward individuals who are most likely to be infected and, thus, increasing the testing positivity rate (i.e., percent positive in the surveillance group), fewer tests are needed to capture the same number of positive cases. Here, we developed an Intelligent Testing Allocation (ITA) method by leveraging data from the CovIdentify study (6765 participants) and the MyPHD study (8580 participants), including smartwatch data from 1265 individuals of whom 126 tested positive for COVID-19. Our rigorous model and parameter search uncovered the optimal time periods and aggregate metrics for monitoring continuous digital biomarkers to increase the positivity rate of COVID-19 diagnostic testing. We found that resting heart rate (RHR) features distinguished between COVID-19-positive and -negative cases earlier in the course of the infection than steps features, as early as 10 and 5 days prior to the diagnostic test, respectively. We also found that including steps features increased the area under the receiver operating characteristic curve (AUC-ROC) by 7-11% when compared with RHR features alone, while including RHR features improved the AUC of the ITA model's precision-recall curve (AUC-PR) by 38-50% when compared with steps features alone. The best AUC-ROC (0.73±0.14 and 0.77 on the cross-validated training set and independent test set, respectively) and AUC-PR (0.55±0.21 and 0.24) were achieved by using data from a single device type (Fitbit) with high-resolution (minute-level) data. Finally, we show that ITA generates up to a 6.5-fold increase in the positivity rate in the cross-validated training set and up to a 4.5-fold increase in the positivity rate in the independent test set, including both symptomatic and asymptomatic (up to 27%) individuals. Our findings suggest that, if deployed on a large scale and without needing self-reported symptoms, the ITA method could improve the allocation of diagnostic testing resources and reduce the burden of test shortages.
View details for DOI 10.1038/s41746-022-00672-z
View details for PubMedID 36050372
-
A Method for Intelligent Allocation of Diagnostic Testing by Leveraging Data from Commercial Wearable Devices: A Case Study on COVID-19.
Research square
2022
Abstract
Mass surveillance testing can help control outbreaks of infectious diseases such as COVID-19. However, diagnostic test shortages are prevalent globally and continue to occur in the US with the onset of new COVID-19 variants, demonstrating an unprecedented need for improving our current methods for mass surveillance testing. By targeting surveillance testing towards individuals who are most likely to be infected and, thus, increasing testing positivity rate (i.e., percent positive in the surveillance group), fewer tests are needed to capture the same number of positive cases. Here, we developed an Intelligent Testing Allocation (ITA) method by leveraging data from the CovIdentify study (6,765 participants) and the MyPHD study (8,580 participants), including smartwatch data from 1,265 individuals of whom 126 tested positive for COVID-19. Our rigorous model and parameter search uncovered the optimal time periods and aggregate metrics for monitoring continuous digital biomarkers to increase the positivity rate of COVID-19 diagnostic testing. We found that resting heart rate features distinguished between COVID-19 positive and negative cases earlier in the course of the infection than steps features, as early as ten and five days prior to the diagnostic test, respectively. We also found that including steps features increased the area under the receiver operating characteristic curve (AUC-ROC) by 7-11% when compared with RHR features alone, while including RHR features improved the AUC of the ITA model's precision-recall curve (AUC-PR) by 38-50% when compared with steps features alone. The best AUC-ROC (0.73 ± 0.14 and 0.77 on the cross-validated training set and independent test set, respectively) and AUC-PR (0.55 ± 0.21 and 0.24) were achieved by using data from a single device type (Fitbit) with high-resolution (minute-level) data. Finally, we show that ITA generates up to a 6.5-fold increase in the positivity rate in the cross-validated training set and up to a 3-fold increase in the positivity rate in the independent test set, including both symptomatic and asymptomatic (up to 27%) individuals. Our findings suggest that, if deployed on a large scale and without needing self-reported symptoms, the ITA method could improve allocation of diagnostic testing resources and reduce the burden of test shortages.
View details for DOI 10.21203/rs.3.rs-1490524/v1
View details for PubMedID 35378754
View details for PubMedCentralID PMC8978951
-
Five-year pediatric use of a digital wearable fitness device: lessons from a pilot case study.
JAMIA open
2021; 4 (3): ooab054
Abstract
Objectives: Wearable fitness devices are increasingly being used by the general population, with many new applications being proposed for healthy adults as well as for adults with chronic diseases. Fewer, if any, studies of these devices have been conducted in healthy adolescents and teenagers, especially over a long period of time. The goal of this work was to document the successes and challenges involved in 5 years of a wearable fitness device use in a pediatric case study.Materials and methods: Comparison of 5 years of step counts and minutes asleep from a teenaged girl and her father.Results: At 60 months, this may be the longest reported pediatric study involving a wearable fitness device, and the first simultaneously involving a parent and a child. We find step counts to be significantly higher for both the adult and teen on school/work days, along with less sleep. The teen walked significantly less towards the end of the 5-year study. Surprisingly, many of the adult's and teen's sleeping and step counts were correlated, possibly due to coordinated behaviors.Discussion: We end with several recommendations for pediatricians and device manufacturers, including the need for constant adjustments of stride length and calorie counts as teens are growing.Conclusion: With periodic adjustments for growth, this pilot study shows these devices can be used for more accurate and consistent measurements in adolescents and teenagers over longer periods of time, to potentially promote healthy behaviors.
View details for DOI 10.1093/jamiaopen/ooab054
View details for PubMedID 34350390
-
Wearable sensors enable personalized predictions of clinical laboratory measurements.
Nature medicine
2021
Abstract
Vital signs, including heart rate and body temperature, are useful in detecting or monitoring medical conditions, but are typically measured in the clinic and require follow-up laboratory testing for more definitive diagnoses. Here we examined whether vital signs as measured by consumer wearable devices (that is, continuously monitored heart rate, body temperature, electrodermal activity and movement) can predict clinical laboratory test results using machine learning models, including random forest and Lasso models. Our results demonstrate that vital sign data collected from wearables give a more consistent and precise depiction of resting heart rate than do measurements taken in the clinic. Vital sign data collected from wearables can also predict several clinical laboratory measurements with lower prediction error than predictions made using clinically obtained vital sign measurements. The length of time over which vital signs are monitored and the proximity of the monitoring period to the date of prediction play a critical role in the performance of the machine learning models. These results demonstrate the value of commercial wearable devices for continuous and longitudinal assessment of physiological measurements that today can be measured only with clinical laboratory tests.
View details for DOI 10.1038/s41591-021-01339-0
View details for PubMedID 34031607
-
Early Detection of SARS-CoV-2 and other Infections in Solid Organ Transplant Recipients and Household Members using Wearable Devices.
Transplant international : official journal of the European Society for Organ Transplantation
2021
Abstract
The increasing global prevalence of SARS-CoV-2 and the resulting COVID-19 disease pandemic pose significant concerns for clinical management of solid organ transplant recipients (SOTR). Wearable devices that can measure physiologic changes in biometrics including heart rate, heart rate variability, body temperature, respiratory, activity (such as steps taken per day) and sleep patterns and blood oxygen saturation, show utility for the early detection of infection before clinical presentation of symptoms. Recent algorithms developed using preliminary wearable datasets show that SARS-CoV-2 is detectable before clinical symptoms in >80% of adults. Early detection of SARS-CoV-2, influenza, and other pathogens in SOTR, and their household members, could facilitate early interventions such as self-isolation and early clinical management of relevant infection(s). Ongoing studies testing the utility of wearable devices such as smartwatches for early detection of SARS-CoV-2 and other infections in the general population are reviewed here, along with the practical challenges to implementing these processes at scale in pediatric and adult SOTR, and their household members. The resources and logistics, including transplant specific analyses pipelines to account for confounders such as polypharmacy and comorbidities, required in studies of pediatric and adult SOTR for the robust early detection of SARS-CoV-2 and other infections are also reviewed.
View details for DOI 10.1111/tri.13860
View details for PubMedID 33735480
-
Pre-symptomatic detection of COVID-19 from smartwatch data.
Nature biomedical engineering
2020
Abstract
Consumer wearable devices that continuously measure vital signs have been used to monitor the onset of infectious disease. Here, we show that data from consumer smartwatches can be used for the pre-symptomatic detection of coronavirus disease 2019 (COVID-19). We analysed physiological and activity data from 32 individuals infected with COVID-19, identified from a cohort of nearly 5,300 participants, and found that 26 of them (81%) had alterations in their heart rate, number of daily steps or time asleep. Of the 25 cases of COVID-19 with detected physiological alterations for which we had symptom information, 22 were detected before (or at) symptom onset, with four cases detected at least nine days earlier. Using retrospective smartwatch data, we show that 63% of the COVID-19 cases could have been detected before symptom onset in real time via a two-tiered warning system based on the occurrence of extreme elevations in resting heart rate relative to the individual baseline. Our findings suggest that activity tracking and health monitoring via consumer wearable devices may be used for the large-scale, real-time detection of respiratory infections, often pre-symptomatically.
View details for DOI 10.1038/s41551-020-00640-6
View details for PubMedID 33208926
-
The Human Tumor Atlas Network: Charting Tumor Transitions across Space and Time at Single-Cell Resolution.
Cell
2020; 181 (2): 236–49
Abstract
Crucial transitions in cancer-including tumor initiation, local expansion, metastasis, and therapeutic resistance-involve complex interactions between cells within the dynamic tumor ecosystem. Transformative single-cell genomics technologies and spatial multiplex in situ methods now provide an opportunity to interrogate this complexity at unprecedented resolution. The Human Tumor Atlas Network (HTAN), part of the National Cancer Institute (NCI) Cancer Moonshot Initiative, will establish a clinical, experimental, computational, and organizational framework to generate informative and accessible three-dimensional atlases of cancer transitions for a diverse set of tumor types. This effort complements both ongoing efforts to map healthy organs and previous large-scale cancer genomics approaches focused on bulk sequencing at a single point in time. Generating single-cell, multiparametric, longitudinal atlases and integrating them with clinical outcomes should help identify novel predictive biomarkers and features as well as therapeutically relevant cell types, cell states, and cellular interactions across transitions. The resulting tumor atlases should have a profound impact on our understanding of cancer biology and have the potential to improve cancer detection, prevention, and therapeutic discovery for better precision-medicine treatments of cancer patients and those at risk for cancer.
View details for DOI 10.1016/j.cell.2020.03.053
View details for PubMedID 32302568
-
Molecular Choreography of Acute Exercise.
Cell
2020; 181 (5): 1112–30.e16
Abstract
Acute physical activity leads to several changes in metabolic, cardiovascular, and immune pathways. Although studies have examined selected changes in these pathways, the system-wide molecular response to an acute bout of exercise has not been fully characterized. We performed longitudinal multi-omic profiling of plasma and peripheral blood mononuclear cells including metabolome, lipidome, immunome, proteome, and transcriptome from 36 well-characterized volunteers, before and after a controlled bout of symptom-limited exercise. Time-series analysis revealed thousands of molecular changes and an orchestrated choreography of biological processes involving energy metabolism, oxidative stress, inflammation, tissue repair, and growth factor response, as well as regulatory pathways. Most of these processes were dampened and some were reversed in insulin-resistant participants. Finally, we discovered biological pathways involved in cardiopulmonary exercise response and developed prediction models revealing potential resting blood-based biomarkers of peak oxygen consumption.
View details for DOI 10.1016/j.cell.2020.04.043
View details for PubMedID 32470399
-
VCFC: Structural and Semantic Compression and Indexing of Genetic Variant Data
edited by Park, T., Cho, Y. R., Hu, Yoo, Woo, H. G., Wang, J., Facelli, J., Nam, S., Kang, M.
IEEE COMPUTER SOC. 2020: 200-203
View details for DOI 10.1109/BIBM49941.2020.9313221
View details for Web of Science ID 000659487100035
-
The human body at cellular resolution: the NIH Human Biomolecular Atlas Program
NATURE
2019; 574 (7777): 187–92
Abstract
Transformative technologies are enabling the construction of three-dimensional maps of tissues with unprecedented spatial and molecular resolution. Over the next seven years, the NIH Common Fund Human Biomolecular Atlas Program (HuBMAP) intends to develop a widely accessible framework for comprehensively mapping the human body at single-cell resolution by supporting technology development, data acquisition, and detailed spatial mapping. HuBMAP will integrate its efforts with other funding agencies, programs, consortia, and the biomedical research community at large towards the shared vision of a comprehensive, accessible three-dimensional molecular and cellular atlas of the human body, in health and under various disease conditions.
View details for DOI 10.1038/s41586-019-1629-x
View details for Web of Science ID 000489784200035
View details for PubMedID 31597973
View details for PubMedCentralID PMC6800388
-
Longitudinal multi-omics of host-microbe dynamics in prediabetes.
Nature
2019; 569 (7758): 663–71
Abstract
Type 2 diabetes mellitus (T2D) is a growing health problem, but little is known about its early disease stages, its effects on biological processes or the transition to clinical T2D. To understand the earliest stages of T2Dbetter, we obtained samples from 106 healthy individuals and individuals with prediabetes over approximately four years and performed deep profiling of transcriptomes, metabolomes, cytokines, and proteomes, as well as changes in the microbiome. This rich longitudinal data set revealed many insights: first, healthy profiles are distinct among individuals while displaying diverse patterns of intra- and/or inter-personal variability. Second, extensive host and microbial changes occur during respiratory viral infections and immunization, and immunization triggers potentially protective responses that are distinct from responses to respiratory viral infections. Moreover, during respiratory viral infections, insulin-resistant participants respond differently than insulin-sensitive participants. Third, global co-association analyses among the thousands of profiled molecules reveal specific host-microbe interactions that differ between insulin-resistant and insulin-sensitive individuals. Last, we identified early personal molecular signatures in one individual that preceded the onset of T2D, including the inflammation markers interleukin-1 receptor agonist (IL-1RA) and high-sensitivity C-reactive protein (CRP) paired with xenobiotic-induced immune signalling. Our study reveals insights into pathways and responses that differ between glucose-dysregulated and healthy individuals during health and disease and provides an open-access data resource to enable further research into healthy, prediabetic and T2D states.
View details for DOI 10.1038/s41586-019-1236-x
View details for PubMedID 31142858
https://orcid.org/0000-0003-4533-9334