Bio


Xiao Wu is a Data Science Fellow at Stanford Data Science, where he works with Professor Trevor Hastie in the Department of Statistics. His research interests lie in developing statistical and causal inference methods to address methodological needs in climate and health research. The key goal of his research is to provide scientific evidence on the health impacts of environmental factors in an age of rapidly changing climate.

Before coming to Stanford, he completed his Ph.D. training in the Department of Biostatistics at Harvard University, where he was advised by Dr. Francesca Dominici and Dr. Danielle Braun. His dissertation focuses on developing robust and interpretable causal inference methods to handle error-prone, continuous, and time-series exposures. He is also working on collaborative projects to design Bayesian clinical trials, meta-analyses, and real-world evidence studies.

Professional Education


  • Ph.D., Harvard University, Biostatistics (2021)
  • M.S., Harvard T.H. Chan School of Public Health, Biostatistics (2017)
  • B.S., Peking University, Mathematics (2015)
  • LL.B., Peking University, Laws (2015)

Stanford Advisors


All Publications


  • Air pollution and the pandemic: Long-term PM2.5 exposure and disease severity in COVID-19 patients. Respirology (Carlton, Vic.) Mendy, A., Wu, X., Keller, J. L., Fassler, C. S., Apewokin, S., Mersha, T. B., Xie, C., Pinney, S. M. 2021

    Abstract

    BACKGROUND AND OBJECTIVE: Ecological studies have suggested an association between exposure to particulate matter ≤2.5mum (PM2.5 ) and coronavirus disease 2019 (COVID-19) severity. However, these findings are yet to be validated in individual-level studies. We aimed to determine the association of long-term PM2.5 exposure with hospitalization among individual patients infected with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).METHODS: We estimated the 10-year (2009-2018) PM2.5 exposure at the residential zip code of COVID-19 patients diagnosed at the University of Cincinnati healthcare system between 13 March 2020 and 30 September 2020. Logistic regression was used to determine the odds ratio (OR) and 95% CI for COVID-19 hospitalizations associated with PM2.5 , adjusting for socioeconomic characteristics and comorbidities.RESULTS: Among the 14,783 COVID-19 patients included in our study, 13.6% were hospitalized; the geometric mean (SD) PM2.5 was 10.48 (1.12) mug/m3 . In adjusted analysis, 1mug/m3 increase in 10-year annual average PM2.5 was associated with 18% higher hospitalization (OR: 1.18, 95% CI: 1.11-1.26). Likewise, 1mug/m3 increase in PM2.5 estimated for the year 2018 was associated with 14% higher hospitalization (OR: 1.14, 95% CI: 1.08-1.21).CONCLUSION: Long-term PM2.5 exposure is associated with increased hospitalization in COVID-19. Therefore, more stringent COVID-19 prevention measures may be needed in areas with higher PM2.5 exposure to reduce the disease morbidity and healthcare burden.

    View details for DOI 10.1111/resp.14140

    View details for PubMedID 34459069

  • Heat warnings, mortality, and hospital admissions among older adults in the United States. Environment international Weinberger, K. R., Wu, X., Sun, S., Spangler, K. R., Nori-Sarma, A., Schwartz, J., Requia, W., Sabath, B. M., Braun, D., Zanobetti, A., Dominici, F., Wellenius, G. A. 2021; 157: 106834

    Abstract

    BACKGROUND: Heat warnings are issued in advance of forecast extreme heat events, yet little evidence is available regarding their effectiveness in reducing heat-related illness and death. We estimated the association of heat warnings and advisories (collectively, "alerts") issued by the United States National Weather Service with all-cause mortality and cause-specific hospitalizations among Medicare beneficiaries aged 65 years and older in 2,817 counties, 2006-2016.METHODS: In each county, we compared days with heat alerts to days without heat alerts, matched on daily maximum heat index and month. We used conditional Poisson regression models stratified on county, adjusting for year, day of week, federal holidays, and lagged daily maximum heat index.RESULTS: We identified a matched non-heat alert day for 92,029 heat alert days in 2,817 counties, or 54.6% of all heat alert days during the study period. Contrary to expectations, heat alerts were not associated with lower risk of mortality (RR: 1.005 [95% CI: 0.997, 1.013]). However, heat alerts were associated with higher risk of hospitalization for fluid and electrolyte disorders (RR: 1.040 [95% CI: 1.015, 1.065]) and heat stroke (RR: 1.094 [95% CI: 1.038, 1.152]). Results were similar in sensitivity analyses additionally adjusting for same-day heat index, ozone, and PM2.5.CONCLUSIONS: Our results suggest that heat alerts are not associated with lower risk of mortality but may be associated with higher rates of hospitalization for fluid and electrolyte disorders and heat stroke, potentially suggesting that heat alerts lead more individuals to seek or access care.

    View details for DOI 10.1016/j.envint.2021.106834

    View details for PubMedID 34461376

  • County-level exposures to greenness and associations with COVID-19 incidence and mortality in the United States ENVIRONMENTAL RESEARCH Klompmaker, J. O., Hart, J. E., Holland, I., Sabath, M., Wu, X., Laden, F., Dominici, F., James, P. 2021; 199: 111331

    Abstract

    COVID-19 is an infectious disease that has killed more than 555,000 people in the US. During a time of social distancing measures and increasing social isolation, green spaces may be a crucial factor to maintain a physically and socially active lifestyle while not increasing risk of infection.We evaluated whether greenness was related to COVID-19 incidence and mortality in the US.We downloaded data on COVID-19 cases and deaths for each US county up through June 7, 2020, from Johns Hopkins University, Center for Systems Science and Engineering Coronavirus Resource Center. We used April-May 2020 Normalized Difference Vegetation Index (NDVI) data, to represent the greenness exposure during the initial COVID-19 outbreak in the US. We fitted negative binomial mixed models to evaluate associations of NDVI with COVID-19 incidence and mortality, adjusting for potential confounders such as county-level demographics, epidemic stage, and other environmental factors. We evaluated whether the associations were modified by population density, proportion of Black residents, median home value, and issuance of stay-at-home orders.An increase of 0.1 in NDVI was associated with a 6% (95% Confidence Interval: 3%, 10%) decrease in COVID-19 incidence rate after adjustment for potential confounders. Associations with COVID-19 incidence were stronger in counties with high population density and in counties with stay-at-home orders. Greenness was not associated with COVID-19 mortality in all counties; however, it was protective in counties with higher population density.Exposures to NDVI were associated with reduced county-level incidence of COVID-19 in the US as well as reduced county-level COVID-19 mortality rates in densely populated counties.

    View details for DOI 10.1016/j.envres.2021.111331

    View details for Web of Science ID 000663722200007

    View details for PubMedID 34004166

    View details for PubMedCentralID PMC8123933

  • Long-term exposure to fine particulate matter and hospitalization in COVID-19 patients RESPIRATORY MEDICINE Mendy, A., Wu, X., Keller, J. L., Fassler, C. S., Apewokin, S., Mersha, T. B., Xie, C., Pinney, S. M. 2021; 178: 106313

    Abstract

    Ecological evidence suggests that exposure to air pollution affects coronavirus disease 2019 (COVID-19) outcomes. However, no individual-level study has confirmed the association to date.We identified COVID-19 patients diagnosed at the University of Cincinnati hospitals and clinics and estimated particulate matter ≤2.5 μm (PM2.5) exposure over a 10-year period (2008-2017) at their residential zip codes. We used logistic regression to evaluate the association between PM2.5 exposure and hospitalizations for COVID-19, adjusting for socioeconomic characteristics and comorbidities.Among the 1128 patients included in our study, the mean (standard deviation) PM2.5 was 11.34 (0.70) μg/m3 for the 10-year average exposure and 13.83 (1.03) μg/m3 for the 10-year maximal exposures. The association between long-term PM2.5 exposure and hospitalization for COVID-19 was contingent upon having pre-existing asthma or chronic obstructive pulmonary (COPD) (Pinteraction = 0.030 for average PM2.5 and Pinteraction = 0.001 for maximal PM2.5). In COVID-19 patients with asthma or COPD, the odds of hospitalization were 62% higher with 1 μg/m3 increment in 10-year average PM2.5 (odds ratio [OR]: 1.62, 95% confidence interval [CI]: 1.00-2.64) and 65% higher with 1 μg/m3 increase in 10-year maximal PM2.5 levels (OR: 1.65, 95% CI: 1.16-2.35). However, among COVID-19 patients without asthma or COPD, PM2.5 exposure was not associated with higher hospitalizations (OR: 0.84, 95% CI: 0.65-1.09 for average PM2.5 and OR: 0.78, 95% CI: 0.65-0.95 for maximal PM2.5).Long-term exposure to PM2.5 is associated with higher odds of hospitalization in COVID-19 patients with pre-existing asthma or COPD.

    View details for DOI 10.1016/j.rmed.2021.106313

    View details for Web of Science ID 000623791800005

    View details for PubMedID 33550152

    View details for PubMedCentralID PMC7835077

  • Air pollution and cardiovascular disease hospitalization - Are associations modified by greenness, temperature and humidity? Environment international Klompmaker, J. O., Hart, J. E., James, P., Sabath, M. B., Wu, X., Zanobetti, A., Dominici, F., Laden, F. 2021; 156: 106715

    Abstract

    Studies have observed associations between long-term air pollution and cardiovascular disease hospitalization. Little is known, however, about effect modification of these associations by greenness, temperature and humidity.We constructed an open cohort consisting of all fee-for-service Medicare beneficiaries, aged ≥ 65, living in the contiguous US from 2000 through 2016 (~63 million individuals). We assigned annual average PM2.5, NO2 and ozone zip code concentrations. Cox-equivalent Poisson models were used to estimate associations with first cardiovascular disease (CVD), coronary heart disease (CHD) and cerebrovascular disease (CBV) hospitalization.PM2.5 and NO2 were both positively associated with CVD, CHD and CBV hospitalization, after adjustment for potential confounders. Associations were substantially stronger at the lower end of the exposure distributions. For CVD hospitalization, the hazard ratio (HR) of PM2.5 was 1.041 (1.038, 1.045) per IQR increase (4.0 µg/m3) in the full study population and 1.327 (1.305, 1.350) per IQR increase for a subgroup with annual exposures always below 10 µg/m3 PM2.5. Ozone was only positively associated with CVD, CHD and CBV hospitalization for the low-exposure subgroup (<40 ppb). Associations of PM2.5 were stronger in areas with higher greenness, lower ozone and Ox, lower summer and winter temperature and lower summer and winter specific humidity.PM2.5 and NO2 were positively associated with CVD, CHD and CBV hospitalization. Associations were more pronounced at low exposure levels. Associations of PM2.5 were stronger with higher greenness, lower ozone and Ox, lower temperature and lower specific humidity.

    View details for DOI 10.1016/j.envint.2021.106715

    View details for PubMedID 34218186

  • Long-term effects of PM2.5 on neurological disorders in the American Medicare population: a longitudinal cohort study LANCET PLANETARY HEALTH Shi, L., Wu, X., Yazdi, M., Braun, D., Abu Awad, Y., Wei, Y., Liu, P., Di, Q., Wang, Y., Schwartz, J., Dominici, F., Kioumourtzoglou, M., Zanobetti, A. 2020; 4 (12): E557-E565

    Abstract

    Accumulating evidence links fine particulate matter (PM2·5) to premature mortality, cardiovascular disease, and respiratory disease. However, less is known about the influence of PM2·5 on neurological disorders. We aimed to investigate the effect of long-term PM2·5 exposure on development of Parkinson's disease or Alzheimer's disease and related dementias.We did a longitudinal cohort study in which we constructed a population-based nationwide open cohort including all fee-for-service Medicare beneficiaries (aged ≥65 years) in the contiguous United States (2000-16) with no exclusions. We assigned PM2·5 postal code (ie, ZIP code) concentrations based on mean annual predictions from a high-resolution model. To accommodate our very large dataset, we applied Cox-equivalent Poisson models with parallel computing to estimate hazard ratios (HRs) for first hospital admission for Parkinson's disease or Alzheimer's disease and related dementias, adjusting for potential confounders in the health models.Between Jan 1, 2000, and Dec 31, 2016, of 63 038 019 individuals who were aged 65 years or older during the study period, we identified 1·0 million cases of Parkinson's disease and 3·4 million cases of Alzheimer's disease and related dementias based on primary and secondary diagnosis billing codes. For each 5 μg/m3 increase in annual PM2·5 concentrations, the HR was 1·13 (95% CI 1·12-1·14) for first hospital admission for Parkinson's disease and 1·13 (1·12-1·14) for first hospital admission for Alzheimer's disease and related dementias. For both outcomes, there was strong evidence of linearity at PM2·5 concentrations less than 16 μg/m3 (95th percentile of the PM2·5 distribution), followed by a plateaued association with increasingly larger confidence bands.We provide evidence that exposure to annual mean PM2·5 in the USA is significantly associated with an increased hazard of first hospital admission with Parkinson's disease and Alzheimer's disease and related dementias. For the ageing American population, improving air quality to reduce PM2·5 concentrations to less than current national standards could yield substantial health benefits by reducing the burden of neurological disorders.The Health Effects Institute, The National Institute of Environmental Health Sciences, The National Institute on Aging, and the HERCULES Center.

    View details for DOI 10.1016/S2542-5196(20)30227-8

    View details for Web of Science ID 000596950500009

    View details for PubMedID 33091388

    View details for PubMedCentralID PMC7720425

  • Air pollution and COVID-19 mortality in the United States: Strengths and limitations of an ecological regression analysis SCIENCE ADVANCES Wu, X., Nethery, R. C., Sabath, M. B., Braun, D., Dominici, F. 2020; 6 (45)

    Abstract

    Assessing whether long-term exposure to air pollution increases the severity of COVID-19 health outcomes, including death, is an important public health objective. Limitations in COVID-19 data availability and quality remain obstacles to conducting conclusive studies on this topic. At present, publicly available COVID-19 outcome data for representative populations are available only as area-level counts. Therefore, studies of long-term exposure to air pollution and COVID-19 outcomes using these data must use an ecological regression analysis, which precludes controlling for individual-level COVID-19 risk factors. We describe these challenges in the context of one of the first preliminary investigations of this question in the United States, where we found that higher historical PM2.5 exposures are positively associated with higher county-level COVID-19 mortality rates after accounting for many area-level confounders. Motivated by this study, we lay the groundwork for future research on this important topic, describe the challenges, and outline promising directions and opportunities.

    View details for DOI 10.1126/sciadv.abd4049

    View details for Web of Science ID 000587544300044

    View details for PubMedID 33148655

    View details for PubMedCentralID PMC7673673

  • Causal Effects of Air Pollution on Mortality Rate in Massachusetts AMERICAN JOURNAL OF EPIDEMIOLOGY Wei, Y., Wang, Y., Wu, X., Di, Q., Shi, L., Koutrakis, P., Zanobetti, A., Dominici, F., Schwartz, J. D. 2020; 189 (11): 1316-1323

    Abstract

    Air pollution epidemiology studies have primarily investigated long- and short-term exposures separately, have used multiplicative models, and have been associational studies. Implementing a generalized propensity score adjustment approach with 3.8 billion person-days of follow-up, we simultaneously assessed causal associations of long-term (1-year moving average) and short-term (2-day moving average) exposure to particulate matter with an aerodynamic diameter less than or equal to 2.5 μm (PM2.5), ozone, and nitrogen dioxide with all-cause mortality on an additive scale among Medicare beneficiaries in Massachusetts (2000-2012). We found that long- and short-term PM2.5, ozone, and nitrogen dioxide exposures were all associated with increased mortality risk. Specifically, per 10 million person-days, each 1-μg/m3 increase in long- and short-term PM2.5 exposure was associated with 35.4 (95% confidence interval (CI): 33.4, 37.6) and 3.04 (95% CI: 2.17, 3.94) excess deaths, respectively; each 1-part per billion (ppb) increase in long- and short-term ozone exposure was associated with 2.35 (95% CI: 1.08, 3.61) and 2.41 (95% CI: 1.81, 2.91) excess deaths, respectively; and each 1-ppb increase in long- and short-term nitrogen dioxide exposure was associated with 3.24 (95% CI: 2.75, 3.77) and 5.60 (95% CI: 5.24, 5.98) excess deaths, respectively. Mortality associated with long-term PM2.5 and ozone exposure increased substantially at low levels. The findings suggested that air pollution was causally associated with mortality, even at levels below national standards.

    View details for DOI 10.1093/aje/kwaa098

    View details for Web of Science ID 000592576800015

    View details for PubMedID 32558888

    View details for PubMedCentralID PMC7604530

  • Evaluating the impact of long-term exposure to fine particulate matter on mortality among the elderly SCIENCE ADVANCES Wu, X., Braun, D., Schwartz, J., Kioumourtzoglou, M. A., Dominici, F. 2020; 6 (29): eaba5692

    Abstract

    Many studies link long-term fine particle (PM2.5) exposure to mortality, even at levels below current U.S. air quality standards (12 micrograms per cubic meter). These findings have been disputed with claims that the use of traditional statistical approaches does not guarantee causality. Leveraging 16 years of data-68.5 million Medicare enrollees-we provide strong evidence of the causal link between long-term PM2.5 exposure and mortality under a set of causal inference assumptions. Using five distinct approaches, we found that a decrease in PM2.5 (by 10 micrograms per cubic meter) leads to a statistically significant 6 to 7% decrease in mortality risk. Based on these models, lowering the air quality standard to 10 micrograms per cubic meter would save 143,257 lives (95% confidence interval, 115,581 to 170,645) in one decade. Our study provides the most comprehensive evidence to date of the link between long-term PM2.5 exposure and mortality, even at levels below current standards.

    View details for DOI 10.1126/sciadv.aba5692

    View details for Web of Science ID 000552227800017

    View details for PubMedID 32832626

    View details for PubMedCentralID PMC7439614

  • Propensity score analysis for time-dependent exposure ANNALS OF TRANSLATIONAL MEDICINE Zhang, Z., Li, X., Wu, X., Qiu, H., Shi, H., AME Big-Data Clinical Trial Collab 2020; 8 (5): 246

    Abstract

    Propensity score analysis (PSA) is widely used in medical literature to account for confounders. Conventionally, the propensity score (PS) is calculated by a binary logistic regression model using time-fixed covariates. In the presence of time-varying treatment or exposure, the conventional method may cause bias because subjects with early and late exposure are treated as the same. In effect, subjects who are treated latter can be different from those who are treated early. Thus, the conventional PSA must be modified to address this bias. In this paper, we illustrate how to perform analysis in the presence of time-dependent exposure. We conduct a simulation study with a known treatment effect. In the simulation study, we find the PSA method that directly adjust PS estimated by either a binary logistic regression model or a Cox regression model using time-fixed covariates still introduce significant bias. On the other hand, the time-dependent PS matching can help to achieve a result approaching the true effect. After time-dependent PS matching, the matched cohort can be analyzed with conventional Cox regression model or conditional logistic regression (CLR) model with time strata. The performance is comparable to the correctly specified Cox regression model with time-varying covariates (i.e., adjusting the exposure in a multivariable model as a time-varying covariate). We further develop a function called TDPSM() for time-dependent PS matching and it is applied to a real world dataset.

    View details for DOI 10.21037/atm.2020.01.33

    View details for Web of Science ID 000520850100098

    View details for PubMedID 32309393

    View details for PubMedCentralID PMC7154493

  • Optimizing interim analysis timing for Bayesian adaptive commensurate designs STATISTICS IN MEDICINE Wu, X., Xu, Y., Carlin, B. P. 2020; 39 (4): 424-437

    Abstract

    In developing products for rare diseases, statistical challenges arise due to the limited number of patients available for participation in drug trials and other clinical research. Bayesian adaptive clinical trial designs offer the possibility of increased statistical efficiency, reduced development cost and ethical hazard prevention via their incorporation of evidence from external sources (historical data, expert opinions, and real-world evidence), and flexibility in the specification of interim looks. In this paper, we propose a novel Bayesian adaptive commensurate design that borrows adaptively from historical information and also uses a particular payoff function to optimize the timing of the study's interim analysis. The trial payoff is a function of how many samples can be saved via early stopping and the probability of making correct early decisions for either futility or efficacy. We calibrate our Bayesian algorithm to have acceptable long-run frequentist properties (Type I error and power) via simulation at the design stage. We illustrate our approach using a pediatric trial design setting testing the effect of a new drug for a rare genetic disease. The optimIA R package available at https://github.com/wxwx1993/Bayesian_IA_Timing provides an easy-to-use implementation of our approach.

    View details for DOI 10.1002/sim.8414

    View details for Web of Science ID 000500370700001

    View details for PubMedID 31799737

  • CAUSAL INFERENCE IN THE CONTEXT OF AN ERROR PRONE EXPOSURE: AIR POLLUTION AND MORTALITY ANNALS OF APPLIED STATISTICS Wu, X., Braun, D., Kioumourtzoglou, M., Choirat, C., Di, Q., Dominici, F. 2019; 13 (1): 520-547

    Abstract

    We propose a new approach for estimating causal effects when the exposure is measured with error and confounding adjustment is performed via a generalized propensity score (GPS). Using validation data, we propose a regression calibration (RC)-based adjustment for a continuous error-prone exposure combined with GPS to adjust for confounding (RC-GPS). The outcome analysis is conducted after transforming the corrected continuous exposure into a categorical exposure. We consider confounding adjustment in the context of GPS subclassification, inverse probability treatment weighting (IPTW) and matching. In simulations with varying degrees of exposure error and confounding bias, RC-GPS eliminates bias from exposure error and confounding compared to standard approaches that rely on the error-prone exposure. We applied RC-GPS to a rich data platform to estimate the causal effect of long-term exposure to fine particles (PM2.5) on mortality in New England for the period from 2000 to 2012. The main study consists of 2202 zip codes covered by 217,660 1 km × 1 km grid cells with yearly mortality rates, yearly PM2.5 averages estimated from a spatio-temporal model (error-prone exposure) and several potential confounders. The internal validation study includes a subset of 83 1 km × 1 km grid cells within 75 zip codes from the main study with error-free yearly PM2.5 exposures obtained from monitor stations. Under assumptions of noninterference and weak unconfoundedness, using matching we found that exposure to moderate levels of PM2.5 (8 < PM2.5 ≤ 10 μg/m3) causes a 2.8% (95% CI: 0.6%, 3.6%) increase in all-cause mortality compared to low exposure (PM2.5 ≤ 8 μg/m3).

    View details for DOI 10.1214/18-AOAS1206

    View details for Web of Science ID 000464000700021

    View details for PubMedID 31649797

    View details for PubMedCentralID PMC6812524

  • Cross-sectional design with a short-term follow up for prognostic imaging biomarkers Computational Statistics & Data Analysis Won, J., Wu, X., Li, S. H., Lu, Y. 2017