All Publications


  • Labeling messages as AI-generated does not reduce their persuasive effects. PNAS nexus Gallegos, I. O., Shani, C., Shi, W., Bianchi, F., Gainsburg, I., Jurafsky, D., Willer, R. 2026; 5 (2): pgag008

    Abstract

    As generative AI enables the creation and dissemination of information at massive scale and speed, it is increasingly important to understand how people perceive AI-generated content. One prominent policy proposal requires explicitly labeling AI-generated content to increase transparency and encourage critical thinking about the information, but prior research has not yet tested the effects of such labels. To address this gap, we conducted a survey experiment ( N = 1,601 ) on a diverse sample of Americans, presenting participants with an AI-generated message about several public policies (e.g. allowing colleges to pay student-athletes), randomly assigning whether participants were told the message was generated by (i) an expert AI model, (ii) a human policy expert, or (iii) no label. We found that messages were generally persuasive, influencing participants' views of the policies by 9.74 percentage points on average. However, while 92.0% of participants assigned to the AI and human label conditions believed the authorship labels, labels had no significant effects on participants' attitude change toward the policies, judgments of message accuracy, nor intentions to share the message with others. These patterns were robust across a variety of participant characteristics, including prior knowledge of the policy, prior experience with AI, political party, education level, and age. Given current levels of trust in AI content, these results imply that, while authorship labels would likely enhance transparency, they are unlikely to substantially affect the persuasiveness of the labeled content, highlighting the need for alternative strategies to address challenges posed by AI-generated information.

    View details for DOI 10.1093/pnasnexus/pgag008

    View details for PubMedID 41675465

    View details for PubMedCentralID PMC12887897

  • Enabling disaggregation of Asian American subgroups: a dataset of Wikidata names for disparity estimation. Scientific data Lin, Q., Ouyang, D., Guage, C., Gallegos, I. O., Goldin, J., Ho, D. E. 2025; 12 (1): 580

    Abstract

    Decades of research and advocacy have underscored the imperative of surfacing - as the first step towards mitigating - racial disparities, including among subgroups historically bundled into aggregated categories. Recent U.S. federal regulations have required increasingly disaggregated race reporting, but major implementation barriers mean that, in practice, reported race data continues to remain inadequate. While imputation methods have enabled disparity assessments in many research and policy settings lacking reported race, the leading name algorithms cannot recover disaggregated categories, given the same lack of disaggregated data from administrative sources to inform algorithm design. Leveraging a Wikidata sample of over 300,000 individuals from six Asian countries, we extract frequencies of 25,876 first names and 18,703 surnames which can be used as proxies for U.S. name-race distributions among six major Asian subgroups: Asian Indian, Chinese, Filipino, Japanese, Korean, and Vietnamese. We show that these data, when combined with public geography-race distributions to predict subgroup membership, outperform existing deterministic name lists in key prediction settings, and enable critical Asian disparity assessments.

    View details for DOI 10.1038/s41597-025-04753-y

    View details for PubMedID 40188111

  • Self-Debiasing Large Language Models: Zero-Shot Recognition and Reduction of Stereotypes Gallegos, I. O., Aponte, R., Rossi, R. A., Barrow, J., Tanjim, M., Yu, T., Deilamsalehy, H., Zhang, R., Kim, S., Dernoncourt, F., Lipka, N., Owens, D., Gu, J. edited by Chiruzzo, L., Ritter, A., Wang, L. ASSOC COMPUTATIONAL LINGUISTICS-ACL. 2025: 873-888
  • Bias and Fairness in Large Language Models: A Survey COMPUTATIONAL LINGUISTICS Gallegos, I. O., Rossi, R. A., Barrow, J., Tanjim, M., Kim, S., Dernoncourt, F., Yu, T., Zhang, R., Ahmed, N. K. 2024; 50 (3): 1097-1179
  • Abdominal CT metrics in 17,646 patients reveal associations between myopenia, myosteatosis, and medical phenotypes: a phenome-wide association study EBIOMEDICINE Chaves, J., Lenchik, L., Gallegos, I. O., Blankemeier, L., Liang, T., Rubin, D. L., Willis, M. H., Chaudhari, A. S., Boutin, R. D. 2024; 103
  • Abdominal CT metrics in 17,646 patients reveal associations between myopenia, myosteatosis, and medical phenotypes: aphenome-wide association study. EBioMedicine Zambrano Chaves, J. M., Lenchik, L., Gallegos, I. O., Blankemeier, L., Rubin, D. L., Willis, M. H., Chaudhari, A. S., Boutin, R. D. 2024; 103: 105116

    Abstract

    BACKGROUND: Deep learning facilitates large-scale automated imaging evaluation of body composition. However, associations of body composition biomarkers with medical phenotypes have been underexplored. Phenome-wide association study (PheWAS) techniques search for medical phenotypes associated with biomarkers. A PheWAS integrating large-scale analysis of imaging biomarkers and electronic health record (EHR) data could discover previously unreported associations and validate expected associations. Here we use PheWAS methodology to determine the association of abdominal CT-based skeletal muscle metrics with medical phenotypes in a large North American cohort.METHODS: An automated deep learning pipeline was used to measure skeletal muscle index (SMI; biomarker of myopenia) and skeletal muscle density (SMD; biomarker of myosteatosis) from abdominal CT scans of adults between 2012 and 2018. A PheWAS was performed with logistic regression using patient sex and age as covariates to assess for associations between CT-derived muscle metrics and 611 common EHR-derived medical phenotypes. PheWAS P values were considered significant at a Bonferroni corrected threshold (alpha=0.05/1222).FINDINGS: 17,646 adults (mean age, 56 years±19 [SD]; 57.5% women) were included. CT-derived SMI was significantly associated with 268 medical phenotypes; SMD with 340 medical phenotypes. Previously unreported associations with the highest magnitude of significance included higher SMI with decreased cardiac dysrhythmias (OR [95% CI], 0.59 [0.55-0.64]; P<0.0001), decreased epilepsy (OR, 0.59 [0.50-0.70]; P<0.0001), and increased elevated prostate-specific antigen (OR, 1.84 [1.47-2.31]; P<0.0001), and higher SMD with decreased decubitus ulcers (OR, 0.36 [0.31-0.42]; P<0.0001), sleep disorders (OR, 0.39 [0.32-0.47]; P<0.0001), and osteomyelitis (OR, 0.43 [0.36-0.52]; P<0.0001).INTERPRETATION: PheWAS methodology reveals previously unreported associations between CT-derived biomarkers of myopenia and myosteatosis and EHR medical phenotypes. The high-throughput PheWAS technique applied on a population scale can generate research hypotheses related to myopenia and myosteatosis and can be adapted to research possible associations of other imaging biomarkers with hundreds of EHR medical phenotypes.FUNDING: National Institutes of Health, Stanford AIMI-HAI pilot grant, Stanford Precision Health and Integrated Diagnostics, Stanford Cardiovascular Institute, Stanford Center for Digital Health, and Stanford Knight-Hennessy Scholars.

    View details for DOI 10.1016/j.ebiom.2024.105116

    View details for PubMedID 38636199

  • How Redundant are Redundant Encodings? Blindness in the Wild and Racial Disparity when Race is Unobserved Cheng, L., Gallegos, I. O., Ouyang, D., Goldin, J., Ho, D. E., ASSOC COMPUTING MACHINERY ASSOC COMPUTING MACHINERY. 2023: 667-686
  • Opportunistic Incidence Prediction of Multiple Chronic Diseases from Abdominal CT Imaging Using Multi-task Learning Blankemeier, L., Gallegos, I., Chaves, J., Maron, D., Sandhu, A., Rodriguez, F., Rubin, D., Patel, B., Willis, M., Boutin, R., Chaudhari, A. S. edited by Wang, L., Dou, Q., Fletcher, P. T., Speidel, S., Li, S. SPRINGER INTERNATIONAL PUBLISHING AG. 2022: 309-318