Bio


Dr. Ng is a postdoctoral fellow at the Stanford Center for Biomedical Informatics Research, mentored by Dr. Tina Hernandez Boussard. Her research aims to illuminate the evolving ethical and practical challenges with emerging technologies used for health purposes. Prior to joining Stanford, Dr. Ng facilitated mobile- and internet-based health research initiatives with the Health eHeart Study and the Eureka Digital Research Platform and developed research study prototypes that used blockchain technology for health data exchange. Her current work focuses on discerning key challenges that exist at each stage of the AI life cycle and generating informed guidance to drive the responsible and equitable use of AI for patient care.

Professional Education


  • DrPH, University of California, Berkeley, Public Health with DE in Science and Technology Studies
  • MPH, University of California, Los Angeles, Epidemiology
  • BS, University of California, Los Angeles, Neuroscience with minor in Asian American Studies

Stanford Advisors


All Publications


  • Sequence modeling and design from molecular to genome scale with Evo. Science (New York, N.Y.) Nguyen, E., Poli, M., Durrant, M. G., Kang, B., Katrekar, D., Li, D. B., Bartie, L. J., Thomas, A. W., King, S. H., Brixi, G., Sullivan, J., Ng, M. Y., Lewis, A., Lou, A., Ermon, S., Baccus, S. A., Hernandez-Boussard, T., Re, C., Hsu, P. D., Hie, B. L. 2024; 386 (6723): eado9336

    Abstract

    The genome is a sequence that encodes the DNA, RNA, and proteins that orchestrate an organism's function. We present Evo, a long-context genomic foundation model with a frontier architecture trained on millions of prokaryotic and phage genomes, and report scaling laws on DNA to complement observations in language and vision. Evo generalizes across DNA, RNA, and proteins, enabling zero-shot function prediction competitive with domain-specific language models and the generation of functional CRISPR-Cas and transposon systems, representing the first examples of protein-RNA and protein-DNA codesign with a language model. Evo also learns how small mutations affect whole-organism fitness and generates megabase-scale sequences with plausible genomic architecture. These prediction and generation capabilities span molecular to genomic scales of complexity, advancing our understanding and control of biology.

    View details for DOI 10.1126/science.ado9336

    View details for PubMedID 39541441

  • Scaling equitable artificial intelligence in healthcare with machine learning operations. BMJ health & care informatics Ng, M. Y., Youssef, A., Pillai, M., Shah, V., Hernandez-Boussard, T. 2024; 31 (1)

    View details for DOI 10.1136/bmjhci-2024-101101

    View details for PubMedID 39496359

  • AI and biosecurity: The need for governance. Science (New York, N.Y.) Bloomfield, D., Pannu, J., Zhu, A. W., Ng, M. Y., Lewis, A., Bendavid, E., Asch, S. M., Hernandez-Boussard, T., Cicero, A., Inglesby, T. 2024; 385 (6711): 831-833

    Abstract

    Governments should evaluate advanced models and if needed impose safety measures.

    View details for DOI 10.1126/science.adq1977

    View details for PubMedID 39172825

  • Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning: A Qualitative Study. JAMA network open Ng, M. Y., Youssef, A., Miner, A. S., Sarellano, D., Long, J., Larson, D. B., Hernandez-Boussard, T., Langlotz, C. P. 2023; 6 (12): e2345892

    Abstract

    The lack of data quality frameworks to guide the development of artificial intelligence (AI)-ready data sets limits their usefulness for machine learning (ML) research in health care and hinders the diagnostic excellence of developed clinical AI applications for patient care.To discern what constitutes high-quality and useful data sets for health and biomedical ML research purposes according to subject matter experts.This qualitative study interviewed data set experts, particularly those who are creators and ML researchers. Semistructured interviews were conducted in English and remotely through a secure video conferencing platform between August 23, 2022, and January 5, 2023. A total of 93 experts were invited to participate. Twenty experts were enrolled and interviewed. Using purposive sampling, experts were affiliated with a diverse representation of 16 health data sets/databases across organizational sectors. Content analysis was used to evaluate survey information and thematic analysis was used to analyze interview data.Data set experts' perceptions on what makes data sets AI ready.Participants included 20 data set experts (11 [55%] men; mean [SD] age, 42 [11] years), of whom all were health data set creators, and 18 of the 20 were also ML researchers. Themes (3 main and 11 subthemes) were identified and integrated into an AI-readiness framework to show their association within the health data ecosystem. Participants partially determined the AI readiness of data sets using priority appraisal elements of accuracy, completeness, consistency, and fitness. Ethical acquisition and societal impact emerged as appraisal considerations in that participant samples have not been described to date in prior data quality frameworks. Factors that drive creation of high-quality health data sets and mitigate risks associated with data reuse in ML research were also relevant to AI readiness. The state of data availability, data quality standards, documentation, team science, and incentivization were associated with elements of AI readiness and the overall perception of data set usefulness.In this qualitative study of data set experts, participants contributed to the development of a grounded framework for AI data set quality. Data set AI readiness required the concerted appraisal of many elements and the balancing of transparency and ethical reflection against pragmatic constraints. The movement toward more reliable, relevant, and ethical AI and ML applications for patient care will inevitably require strategic updates to data set creation practices.

    View details for DOI 10.1001/jamanetworkopen.2023.45892

    View details for PubMedID 38039004

  • The AI life cycle: a holistic approach to creating ethical AI for health decisions. Nature medicine Ng, M. Y., Kapur, S., Blizinsky, K. D., Hernandez-Boussard, T. 2022

    View details for DOI 10.1038/s41591-022-01993-y

    View details for PubMedID 36163298

  • Development of Secure Infrastructure for Advancing Generative AI Research in Healthcare at an Academic Medical Center. Research square Ng, M. Y., Helzer, J., Pfeffer, M. A., Seto, T., Hernandez-Boussard, T. 2024

    Abstract

    The increasing interest in leveraging generative AI models in healthcare necessitates secure infrastructure at academic medical centers. Without an all-encompassing secure system, researchers may create their own insecure microprocesses, risking the exposure of protected health information (PHI) to the public internet or its inadvertent incorporation into AI model training. To address these challenges, our institution implemented a secure pathway to the Azure OpenAI Service using our own private OpenAI instance which we fully control to facilitate high-throughput, secure LLM queries. This pathway ensures data privacy while allowing researchers to harness the capabilities of LLMs for diverse healthcare applications. Our approach supports compliant, efficient, and innovative AI research in healthcare. This paper discusses the implementation, advantages, and use cases of this secure infrastructure, underscoring the critical need for centralized, secure AI solutions in academic medical environments.

    View details for DOI 10.21203/rs.3.rs-5095287/v1

    View details for PubMedID 39399679

  • Email-Based Recruitment Into the Health eHeart Study: Cohort Analysis of Invited Eligible Patients. Journal of medical Internet research Ng, M. Y., Olgin, J. E., Marcus, G. M., Lyles, C. R., Pletcher, M. J. 2023; 25: e51238

    Abstract

    Web- or app-based digital health studies allow for more efficient collection of health data for research. However, remote recruitment into digital health studies can enroll nonrepresentative study samples, hindering the robustness and generalizability of findings. Through the comprehensive evaluation of an email-based campaign on recruitment into the Health eHeart Study, we aim to uncover key sociodemographic and clinical factors that contribute to enrollment.This study sought to understand the factors related to participation, specifically regarding enrollment, in the Health eHeart Study as a result of a large-scale remote email recruitment campaign.We conducted a cohort analysis on all invited University of California, San Francisco (UCSF) patients to identify sociodemographic and clinical predictors of enrollment into the Health eHeart Study. The primary outcome was enrollment, defined by account registration and consent into the Health eHeart Study. The email recruitment campaign was carried out from August 2015 to February 2016, with electronic health record data extracted between September 2019 and December 2019.The email recruitment campaign delivered at least 1 email invitation to 93.5% (193,606/206,983) of all invited patients and yielded a 3.6% (7012/193,606) registration rate among contacted patients and an 84.1% (5899/7012) consent rate among registered patients. Adjusted multivariate logistic regression models analyzed independent sociodemographic and clinical predictors of (1) registration among contacted participants and (2) consent among registered participants. Odds of registration were higher among patients who are older, women, non-Hispanic White, active patients with commercial insurance or Medicare, with a higher comorbidity burden, with congestive heart failure, and randomized to receive up to 2 recruitment emails. The odds of registration were lower among those with medical conditions such as dementia, chronic pulmonary disease, moderate or severe liver disease, paraplegia or hemiplegia, renal disease, or cancer. Odds of subsequent consent after initial registration were different, with an inverse trend of being lower among patients who are older and women. The odds of consent were also lower among those with peripheral vascular disease. However, the odds of consent remained higher among patients who were non-Hispanic White and those with commercial insurance.This study provides important insights into the potential returns on participant enrollment when digital health study teams invest resources in using email for recruitment. The findings show that participant enrollment was driven more strongly by sociodemographic factors than clinical factors. Overall, email is an extremely efficient means of recruiting participants from a large list into the Health eHeart Study. Despite some improvements in representation, the formulation of truly diverse studies will require additional resources and strategies to overcome persistent participation barriers.

    View details for DOI 10.2196/51238

    View details for PubMedID 38133910

  • Organizational Factors in Clinical Data Sharing for Artificial Intelligence in Health Care. JAMA network open Youssef, A., Ng, M. Y., Long, J., Hernandez-Boussard, T., Shah, N., Miner, A., Larson, D., Langlotz, C. P. 2023; 6 (12): e2348422

    Abstract

    Limited sharing of data sets that accurately represent disease and patient diversity limits the generalizability of artificial intelligence (AI) algorithms in health care.To explore the factors associated with organizational motivation to share health data for AI development.This qualitative study investigated organizational readiness for sharing health data across the academic, governmental, nonprofit, and private sectors. Using a multiple case studies approach, 27 semistructured interviews were conducted with leaders in data-sharing roles from August 29, 2022, to January 9, 2023. The interviews were conducted in the English language using a video conferencing platform. Using a purposive and nonprobabilistic sampling strategy, 78 individuals across 52 unique organizations were identified. Of these, 35 participants were enrolled. Participant recruitment concluded after 27 interviews, as theoretical saturation was reached and no additional themes emerged.Concepts defining organizational readiness for data sharing and the association between data-sharing factors and organizational behavior were mapped through iterative qualitative analysis to establish a framework defining organizational readiness for sharing clinical data for AI development.Interviews included 27 leaders from 18 organizations (academia: 10, government: 7, nonprofit: 8, and private: 2). Organizational readiness for data sharing centered around 2 main constructs: motivation and capabilities. Motivation related to the alignment of an organization's values with data-sharing priorities and was associated with its engagement in data-sharing efforts. However, organizational motivation could be modulated by extrinsic incentives for financial or reputational gains. Organizational capabilities comprised infrastructure, people, expertise, and access to data. Cross-sector collaboration was a key strategy to mitigate barriers to access health data.This qualitative study identified sector-specific factors that may affect the data-sharing behaviors of health organizations. External incentives may bolster cross-sector collaborations by helping overcome barriers to accessing health data for AI development. The findings suggest that tailored incentives may boost organizational motivation and facilitate sustainable flow of health data for AI development.

    View details for DOI 10.1001/jamanetworkopen.2023.48422

    View details for PubMedID 38113040

  • Usability, inclusivity, and content evaluation of COVID-19 contact tracing apps in the United States JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Blacklow, S. O., Lisker, S., Ng, M. Y., Sarkar, U., Lyles, C. 2021; 28 (9): 1982-1989

    Abstract

    We evaluated the usability of mobile COVID-19 contact tracing apps, especially for individuals with barriers to communication and limited digital literacy skills. We searched the Apple App Store, Google Play, peer-reviewed literature, and lay press to find contact tracing apps in the United States. We evaluated apps with a framework focused on user characteristics and user interface. Of the final 26 apps, 77% were on both iPhone and Android. 69% exceeded 9th grade readability, and 65% were available only in English. Only 12% had inclusive illustrations (different genders, skin tones, physical abilities). 92% alerted users of an exposure, 42% linked to a testing site, and 62% linked to a public health website within 3 clicks. Most apps alert users of COVID-19 exposure but require high English reading levels and are not fully inclusive of the U.S. population, which may limit their reach as public health tools.

    View details for DOI 10.1093/jamia/ocab093

    View details for Web of Science ID 000692577000022

    View details for PubMedID 34022053

    View details for PubMedCentralID PMC8194594

  • Smartphone-Based Geofencing to Ascertain Hospitalizations CIRCULATION-CARDIOVASCULAR QUALITY AND OUTCOMES Nguyen, K. T., Olgin, J. E., Pletcher, M. J., Ng, M., Kaye, L., Moturu, S., Gladstone, R. A., Malladi, C., Fann, A. H., Maguire, C., Bettencourt, L., Christensen, M. A., Marcus, G. M. 2017; 10 (3)

    Abstract

    Ascertainment of hospitalizations is critical to assess quality of care and the effectiveness and adverse effects of various therapies. Smartphones, mobile geolocators that are ubiquitous, have not been leveraged to ascertain hospitalizations. Therefore, we evaluated the use of smartphone-based geofencing to track hospitalizations.Participants aged ≥18 years installed a mobile application programmed to geofence all hospitals using global positioning systems and cell phone tower triangulation and to trigger a smartphone-based questionnaire when located in a hospital for ≥4 hours. An in-person study included consecutive consenting patients scheduled for electrophysiology and cardiac catheterization procedures. A remote arm invited Health eHeart Study participants who consented and engaged with the study via the internet only. The accuracy of application-detected hospitalizations was confirmed by medical record review as the reference standard. Of 22 eligible in-person patients, 17 hospitalizations were detected (sensitivity 77%; 95% confidence interval, 55%-92%). The length of stay according to the application was positively correlated with the length of stay ascertained via the electronic medical record (r=0.53; P=0.03). In the remote arm, the application was downloaded by 3443 participants residing in all 50 US states; 243 hospital visits at 119 different hospitals were detected through the application. The positive predictive value for an application-reported hospitalization was 65% (95% confidence interval, 57%-72%).Mobile application-based ascertainment of hospitalizations can be achieved with modest accuracy. This first proof of concept may ultimately be applicable to geofencing other types of prespecified locations to facilitate healthcare research and patient care.

    View details for DOI 10.1161/CIRCOUTCOMES.116.003326

    View details for Web of Science ID 000397591500005

    View details for PubMedID 28325751

    View details for PubMedCentralID PMC5363280

  • Continuous daily assessment of multiple sclerosis disability using remote step count monitoring JOURNAL OF NEUROLOGY Block, V. J., Lizee, A., Crabtree-Hartman, E., Bevan, C. J., Graves, J. S., Bove, R., Green, A. J., Nourbakhsh, B., Tremblay, M., Gourraud, P., Ng, M. Y., Pletcher, M. J., Olgin, J. E., Marcus, G. M., Allen, D. D., Cree, B. C., Gelfand, J. M. 2017; 264 (2): 316-326

    Abstract

    Disability measures in multiple sclerosis (MS) rely heavily on ambulatory function, and current metrics fail to capture potentially important variability in walking behavior. We sought to determine whether remote step count monitoring using a consumer-friendly accelerometer (Fitbit Flex) can enhance MS disability assessment. 99 adults with relapsing or progressive MS able to walk ≥2-min were prospectively recruited. At 4 weeks, study retention was 97% and median Fitbit use was 97% of days. Substudy validation resulted in high interclass correlations between Fitbit, ActiGraph and manual step count tally during a 2-minute walk test, and between Fitbit and ActiGraph (ICC = 0.76) during 7-day home monitoring. Over 4 weeks of continuous monitoring, daily steps were lower in progressive versus relapsing MS (mean difference 2546 steps, p < 0.01). Lower average daily step count was associated with greater disability on the Expanded Disability Status Scale (EDSS) (p < 0.001). Within each EDSS category, substantial variability in step count was apparent (i.e., EDSS = 6.0 range 1097-7152). Step count demonstrated moderate-strong correlations with other walking measures. Lower average daily step count is associated with greater MS disability and captures important variability in real-world walking activity otherwise masked by standard disability scales, including the EDSS. These results support remote step count monitoring as an exploratory outcome in MS trials.

    View details for DOI 10.1007/s00415-016-8334-6

    View details for Web of Science ID 000393902500012

    View details for PubMedID 27896433

    View details for PubMedCentralID PMC5292081

  • IRF8 acts in lineage-committed rather than oligopotent progenitors to control neutrophil vs monocyte production BLOOD Yanez, A., Ng, M. Y., Hassanzadeh-Kiabi, N., Goodridge, H. S. 2015; 125 (9): 1452-1459

    Abstract

    Interferon regulatory factor 8 (IRF8) is a key regulator of myelopoiesis in mice and humans. IRF8-deficient mice exhibit increased neutrophil numbers but defective monocyte and dendritic cell (DC) production. It has therefore been hypothesized that IRF8 regulates granulocyte vs monocyte/DC lineage commitment by oligopotent progenitors. Alternatively, IRF8 could control the differentiation of lineage-committed progenitors. In this study, we defined the role of IRF8 in lineage commitment and neutrophil vs monocyte differentiation using a novel sorting strategy that for the first time allows us to separate oligopotent granulocyte-monocyte progenitors (GMPs) and their lineage-committed progeny: granulocyte progenitors (GPs) and monocyte progenitors (MPs). We show that IRF8 is highly expressed by both GPs and MPs, but not GMPs, and is not required for GP or MP production by GMPs. In fact, IRF8-deficient mice have more GPs and MPs. This is not due to IRF8-mediated suppression of GP and MP production by GMPs, but rather to selective effects in GPs and MPs. We identify roles for IRF8 in regulating progenitor survival and differentiation and preventing leukemic cell accumulation. Thus, IRF8 does not regulate granulocytic vs monocytic fate in GMPs, but instead acts downstream of lineage commitment to selectively control neutrophil and monocyte production.

    View details for DOI 10.1182/blood-2014-09-600833

    View details for Web of Science ID 000350820900018

    View details for PubMedID 25597637

  • Detection of a TLR2 agonist by hematopoietic stem and progenitor cells impacts the function of the macrophages they produce EUROPEAN JOURNAL OF IMMUNOLOGY Yanez, A., Hassanzadeh-Kiabi, N., Ng, M. Y., Megias, J., Subramanian, A., Liu, G. Y., Underhill, D. M., Luisa Gil, M., Goodridge, H. S. 2013; 43 (8): 2114-2125
  • Preferential Biological Processes in the Human Limbus by Differential Gene Profiling PLOS ONE Nakatsu, M. N., Vartanyan, L., Vu, D. M., Ng, M. Y., Li, X., Deng, S. X. 2013; 8 (4): e61833

    Abstract

    Corneal epithelial stem cells or limbal stem cells (LSCs) are responsible for the maintenance of the corneal epithelium in humans. The exact location of LSCs is still under debate, but the increasing need for identifying the biological processes in the limbus, where LSCs are located, is of great importance in the regulation of LSCs. In our current study we identified 146 preferentially expressed genes in the human limbus in direct comparison to that in the cornea and conjunctiva. The expression of newly identified limbal transcripts endomucin, fibromodulin, paired-like homeodomain 2 (PITX2) and axin-2 were validated using qRT-PCR. Further protein analysis on the newly identified limbal transcripts showed protein localization of PITX2 in the basal and suprabasal layer of the limbal epithelium and very low expression in the cornea and conjunctiva. Two other limbal transcripts, frizzled-7 and tenascin-C, were expressed in the basal epithelial layer of the limbus. Gene ontology and network analysis of the overexpressed limbal genes revealed cell-cell adhesion, Wnt and TGF-β/BMP signaling components among other developmental processes in the limbus. These results could aid in a better understanding of the regulatory elements in the LSC microenvironment.

    View details for DOI 10.1371/journal.pone.0061833

    View details for Web of Science ID 000317911500070

    View details for PubMedID 23630617

    View details for PubMedCentralID PMC3632514

  • Wnt/beta-Catenin Signaling Regulates Proliferation of Human Cornea Epithelial Stem/Progenitor Cells INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE Nakatsu, M. N., Ding, Z., Ng, M. Y., Truong, T. T., Yu, F., Deng, S. X. 2011; 52 (7): 4734-4741

    Abstract

    To investigate the expression and role of the Wnt signaling pathway in human limbal stem cells (LSCs).Total RNA was isolated from the human limbus and central cornea. Limbal or cornea-specific transcripts were identified through quantitative real-time PCR. Protein expression of Wnt molecules was confirmed by immunohistochemistry on human ocular tissue. Activation of Wnt signaling using lithium chloride was achieved in vitro and its effects on LSC differentiation and proliferation were evaluated.Expression of Wnt2, Wnt6, Wnt11, Wnt16b, and four Wnt inhibitors were specific to the limbal region, whereas Wnt3, Wnt7a, Wnt7b, and Wnt10a were upregulated in the central cornea. Nuclear localization of β-catenin was observed in a very small subset of basal epithelial cells only at the limbus. Activation of Wnt/β-catenin signaling increased the proliferation and colony-forming efficiency of primary human LSCs. The stem cell phenotype was maintained, as shown by higher expression levels of putative corneal epithelial stem cell markers, ATP-binding cassette family G2 and ΔNp63α, and low expression levels of mature cornea epithelial cell marker, cytokeratin 12.These findings demonstrate for the first time that Wnt signaling is present in the ocular surface epithelium and plays an important role in the regulation of LSC proliferation. Modulation of Wnt signaling could be of clinical application to increase the efficiency of ex vivo expansion of corneal epithelial stem/progenitor cells for transplantation.

    View details for DOI 10.1167/iovs.10-6486

    View details for Web of Science ID 000293332500107

    View details for PubMedID 21357396

    View details for PubMedCentralID PMC3175950