Professional Education


  • Doctor of Philosophy, Stanford University, CS-PHD (2025)
  • B.S.E., Princeton University, Computer Science (2018)
  • Minor, Princeton University, Statistics and Machine Learning (2018)

Stanford Advisors


All Publications


  • Incorporating area-level social drivers of health in predictive algorithms using electronic health record data. Journal of the American Medical Informatics Association : JAMIA Foryciarz, A., Gladish, N., Rehkopf, D. H., Rose, S. 2025

    Abstract

    The inclusion of social drivers of health (SDOH) into predictive algorithms of health outcomes has potential for improving algorithm interpretation, performance, generalizability, and transportability. However, there are limitations in the availability, understanding, and quality of SDOH variables, as well as a lack of guidance on how to incorporate them into algorithms when appropriate to do so. As such, few published algorithms include SDOH, and there is substantial methodological variability among those that do. We argue that practitioners should consider the use of social indices and factors-a class of area-level measurements-given their accessibility, transparency, and quality.We illustrate the process of using such indices in predictive algorithms, which includes the selection of appropriate indices for the outcome, measurement time, and geographic level, in a demonstrative example with the Kidney Failure Risk Equation.Identifying settings where incorporating SDOH may be beneficial and incorporating them rigorously can help validate algorithms and assess generalizability.

    View details for DOI 10.1093/jamia/ocaf009

    View details for PubMedID 39832294

  • Clinical utility gains from incorporating comorbidity and geographic location information into risk estimation equations for atherosclerotic cardiovascular disease. Journal of the American Medical Informatics Association : JAMIA Xu, Y., Foryciarz, A., Steinberg, E., Shah, N. H. 2023

    Abstract

    There are over 363 customized risk models of the American College of Cardiology and the American Heart Association (ACC/AHA) pooled cohort equations (PCE) in the literature, but their gains in clinical utility are rarely evaluated. We build new risk models for patients with specific comorbidities and geographic locations and evaluate whether performance improvements translate to gains in clinical utility.We retrain a baseline PCE using the ACC/AHA PCE variables and revise it to incorporate subject-level information of geographic location and 2 comorbidity conditions. We apply fixed effects, random effects, and extreme gradient boosting (XGB) models to handle the correlation and heterogeneity induced by locations. Models are trained using 2 464 522 claims records from Optum©'s Clinformatics® Data Mart and validated in the hold-out set (N = 1 056 224). We evaluate models' performance overall and across subgroups defined by the presence or absence of chronic kidney disease (CKD) or rheumatoid arthritis (RA) and geographic locations. We evaluate models' expected utility using net benefit and models' statistical properties using several discrimination and calibration metrics.The revised fixed effects and XGB models yielded improved discrimination, compared to baseline PCE, overall and in all comorbidity subgroups. XGB improved calibration for the subgroups with CKD or RA. However, the gains in net benefit are negligible, especially under low exchange rates.Common approaches to revising risk calculators incorporating extra information or applying flexible models may enhance statistical performance; however, such improvement does not necessarily translate to higher clinical utility. Thus, we recommend future works to quantify the consequences of using risk calculators to guide clinical decisions.

    View details for DOI 10.1093/jamia/ocad017

    View details for PubMedID 36795076

  • Evaluating algorithmic fairness in the presence of clinical guidelines: the case of atherosclerotic cardiovascular disease risk estimation. BMJ health & care informatics Foryciarz, A., Pfohl, S. R., Patel, B., Shah, N. 2022; 29 (1)

    Abstract

    OBJECTIVES: The American College of Cardiology and the American Heart Association guidelines on primary prevention of atherosclerotic cardiovascular disease (ASCVD) recommend using 10-year ASCVD risk estimation models to initiate statin treatment. For guideline-concordant decision-making, risk estimates need to be calibrated. However, existing models are often miscalibrated for race, ethnicity and sex based subgroups. This study evaluates two algorithmic fairness approaches to adjust the risk estimators (group recalibration and equalised odds) for their compatibility with the assumptions underpinning the guidelines' decision rules.MethodsUsing an updated pooled cohorts data set, we derive unconstrained, group-recalibrated and equalised odds-constrained versions of the 10-year ASCVD risk estimators, and compare their calibration at guideline-concordant decision thresholds.RESULTS: We find that, compared with the unconstrained model, group-recalibration improves calibration at one of the relevant thresholds for each group, but exacerbates differences in false positive and false negative rates between groups. An equalised odds constraint, meant to equalise error rates across groups, does so by miscalibrating the model overall and at relevant decision thresholds.DISCUSSION: Hence, because of induced miscalibration, decisions guided by risk estimators learned with an equalised odds fairness constraint are not concordant with existing guidelines. Conversely, recalibrating the model separately for each group can increase guideline compatibility, while increasing intergroup differences in error rates. As such, comparisons of error rates across groups can be misleading when guidelines recommend treating at fixed decision thresholds.CONCLUSION: The illustrated tradeoffs between satisfying a fairness criterion and retaining guideline compatibility underscore the need to evaluate models in the context of downstream interventions.

    View details for DOI 10.1136/bmjhci-2021-100460

    View details for PubMedID 35396247

  • A comparison of approaches to improve worst-case predictive model performance over patient subpopulations. Scientific reports Pfohl, S. R., Zhang, H., Xu, Y., Foryciarz, A., Ghassemi, M., Shah, N. H. 2022; 12 (1): 3254

    Abstract

    Predictive models for clinical outcomes that are accurate on average in a patient population may underperform drastically for some subpopulations, potentially introducing or reinforcing inequities in care access and quality. Model training approaches that aim to maximize worst-case model performance across subpopulations, such as distributionally robust optimization (DRO), attempt to address this problem without introducing additional harms. We conduct a large-scale empirical study of DRO and several variations of standard learning procedures to identify approaches for model development and selection that consistently improve disaggregated and worst-case performance over subpopulations compared to standard approaches for learning predictive models from electronic health records data. In the course of our evaluation, we introduce an extension to DRO approaches that allows for specification of the metric used to assess worst-case performance. We conduct the analysis for models that predict in-hospital mortality, prolonged length of stay, and 30-day readmission for inpatient admissions, and predict in-hospital mortality using intensive care data. We find that, with relatively few exceptions, no approach performs better, for each patient subpopulation examined, than standard learning procedures using the entire training dataset. These results imply that when it is of interest to improve model performance for patient subpopulations beyond what can be achieved with standard practices, it may be necessary to do so via data collection techniques that increase the effective sample size or reduce the level of noise in the prediction problem.

    View details for DOI 10.1038/s41598-022-07167-7

    View details for PubMedID 35228563

  • Net benefit, calibration, threshold selection, and training objectives for algorithmic fairness in healthcare Pfohl, S. R., Xu, Y., Foryciarz, A., Ignatiadis, N., Genkins, J., Shah, N. H., ACM ASSOC COMPUTING MACHINERY. 2022: 1039-1052
  • An empirical characterization of fair machine learning for clinical risk prediction. Journal of biomedical informatics Pfohl, S. R., Foryciarz, A., Shah, N. H. 2020: 103621

    Abstract

    The use of machine learning to guide clinical decision making has the potential to worsen existing health disparities. Several recent works frame the problem as that of algorithmic fairness, a framework that has attracted considerable attention and criticism. However, the appropriateness of this framework is unclear due to both ethical as well as technical considerations, the latter of which include trade-offs between measures of fairness and model performance that are not well-understood for predictive models of clinical outcomes. To inform the ongoing debate, we conduct an empirical study to characterize the impact of penalizing group fairness violations on an array of measures of model performance and group fairness. We repeat the analysis across multiple observational healthcare databases, clinical outcomes, and sensitive attributes. We find that procedures that penalize differences between the distributions of predictions across groups induce nearly-universal degradation of multiple performance metrics within groups. On examining the secondary impact of these procedures, we observe heterogeneity of the effect of these procedures on measures of fairness in calibration and ranking across experimental conditions. Beyond the reported trade-offs, we emphasize that analyses of algorithmic fairness in healthcare lack the contextual grounding and causal awareness necessary to reason about the mechanisms that lead to health disparities, as well as about the potential of algorithmic fairness methods to counteract those mechanisms. In light of these limitations, we encourage researchers building predictive models for clinical use to step outside the algorithmic fairness frame and engage critically with the broader sociotechnical context surrounding the use of machine learning in healthcare.

    View details for DOI 10.1016/j.jbi.2020.103621

    View details for PubMedID 33220494

  • Where do Algorithmic Accountability and Explainability Frameworks Take Us in the Real World? Szymielewicz, K., Bacciarelli, A., Hidvegi, F., Foryciarz, A., Penicaud, S., Spielkamp, M., Assoc Comp Machinery ASSOC COMPUTING MACHINERY. 2020: 689
  • Zero-Shot Transfer Learning with Synthesized Data for Multi-Domain Dialogue State Tracking Campagna, G., Foryciarz, A., Moradshahi, M., Lam, M. S., Assoc Computat Linguist ASSOC COMPUTATIONAL LINGUISTICS-ACL. 2020: 122–32