I received my PhD from Dalian University of Technology (China), where I was working in the DUTIR team (information retrieval, natural language processing, data mining). My research area have focused on literature based discovery - mining new knowledge from biomedical literature. In Boussard Lab, my research is to establish different novel strategies to analyze Electronic Health Records for improving clinical decisions. Specifically, I am working on Cerebrospinal Fluid (CSF) Leak project, the goal of the project is to improve the diagnosis of CSF leaks though artificial intelligence methodologies.

Stanford Advisors

  • Lei Xing, Postdoctoral Faculty Sponsor

All Publications

  • Learning from Past Respiratory Failure Patients to Triage COVID-19 Patient Ventilator Needs: A Multi-Institutional Study. Journal of biomedical informatics Carmichael, H., Coquet, J., Sun, R., Sang, S., Groat, D., Asch, S. M., Bledsoe, J., Peltan, I. D., Jacobs, J. R., Hernandez-Boussard, T. 2021: 103802


    BACKGROUND: Unlike well-established diseases that base clinical care on randomized trials, past experiences, and training, prognosis in COVID19 relies on a weaker foundation. Knowledge from other respiratory failure diseases may inform clinical decisions in this novel disease. The objective was to predict 48-hour invasive mechanical ventilation (IMV) within 48 hours in patients hospitalized with COVID-19 using COVID-like diseases (CLD).METHODS: This retrospective multicenter study trained machine learning (ML) models on patients hospitalized with CLD to predict IMV within 48 hours in COVID-19 patients. CLD patients were identified using diagnosis codes for bacterial pneumonia, viral pneumonia, influenza, unspecified pneumonia and acute respiratory distress syndrome (ARDS), 2008-2019. A total of 16 cohorts were constructed, including any combinations of the four diseases plus an exploratory ARDS cohort, to determine the most appropriate cohort to use. Candidate predictors included demographic and clinical parameters that were previously associated with poor COVID-19 outcomes. Model development included the implementation of logistic regression and three ensemble tree-based algorithms: decision tree, AdaBoost, and XGBoost. Models were validated in hospitalized COVID-19 patients at two healthcare systems, March 2020-July 2020. ML models were trained on CLD patients at Stanford Hospital Alliance (SHA). Models were validated on hospitalized COVID-19 patients at both SHA and Intermountain Healthcare.RESULTS: CLD training data were obtained from SHA (n=14,030), and validation data included 444 adult COVID-19 hospitalized patients from SHA (n=185) and Intermountain (n=259). XGBoost was the top-performing ML model, and among the 16 CLD training cohorts, the best model achieved an area under curve (AUC) of 0.883 in the validation set. In COVID-19 patients, the prediction models exhibited moderate discrimination performance, with the best models achieving an AUC of 0.77 at SHA and 0.65 at Intermountain. The model trained on all pneumonia and influenza cohorts had the best overall performance (SHA: positive predictive value (PPV) 0.29, negative predictive value (NPV) 0.97, positive likelihood ratio (PLR) 10.7; Intermountain: PPV, 0.23, NPV 0.97, PLR 10.3). We identified important factors associated with IMV that are not traditionally considered for respiratory diseases.CONCLUSIONS: The performance of prediction models derived from CLD for 48-hour IMV in patients hospitalized with COVID-19 demonstrate high specificity and can be used as a triage tool at point of care. Novel predictors of IMV identified in COVID-19 are often overlooked in clinical practice. Lessons learned from our approach may assist other research institutes seeking to build artificial intelligence technologies for novel or rare diseases with limited data for training and validation.

    View details for DOI 10.1016/j.jbi.2021.103802

    View details for PubMedID 33965640

  • Learning from past respiratory infections to predict COVID-19 Outcomes: A retrospective study. Journal of medical Internet research Sang, S. n., Sun, R. n., Coquet, J. n., Carmichael, H. n., Seto, T. n., Hernandez-Boussard, T. n. 2021


    In the clinical care of well-established diseases, randomized trials, literature and research are supplemented by clinical judgment to understand disease prognosis and inform treatment choices. In the void created by a lack of clinical experience with COVID-19, Artificial Intelligence (AI) may be an important tool to bolster clinical judgment and decision making. However, lack of clinical data restricts the design and development of such AI tools, particularly in preparation of an impending crisis or pandemic.This study aimed to develop and test the feasibility of a 'patients-like-me' framework to predict COVID-19 patient deterioration using a retrospective cohort of similar respiratory diseases.Our framework used COVID-like cohorts to design and train AI models that were then validated on the COVID-19 population. The COVID-like cohorts included patients diagnosed with bacterial pneumonia, viral pneumonia, unspecified pneumonia, influenza, and acute respiratory distress syndrome (ARDS) from an academic medical center, 2008-2019. Fifteen training cohorts were created using different combinations of the COVID-like cohorts with the ARDS cohort for exploratory purpose. Two machine learning (ML) models were developed, one to predict invasive mechanical ventilation (IMV) within 48 hours for each hospitalized day, and one to predict all-cause mortality at the time of admission. Model performance was assessed using the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). We established model interpretability by calculating SHapley Additive exPlanations (SHAP) scores to identify important features.Compared to the COVID-like cohorts (n=16,509), the COVID-19 hospitalized patients (n=159) were significantly younger, with a higher proportion of Hispanic ethnicity, lower proportion of smoking history and fewer comorbidities (P <0.001). COVID-19 patients had a lower IMV rate (15.1 vs 23.2, P=0.016) and shorter time to IMV (2.9 vs 4.1, P <0.001) compared to the COVID-like patients. In the COVID-like training data, the top models achieved excellent performance (AUV > 0.90). Validating in the COVID-19 cohort, the best performing model of predicting IMV was the XGBoost model (AUC: 0.826) trained on the viral pneumonia cohort. Similarly, the XGBoost model trained on all four COVID-like cohorts without ARDS achieved the best performance (AUC: 0.928) in predicting mortality. Important predictors included demographic information (age), vital signs (oxygen saturation), and laboratory values (white blood count, cardiac troponin, albumin, etc.). Our models suffered from class imbalance, that resulted in high negative predictive values and low positive predictive values.We provided a feasible framework for modeling patient deterioration using existing data and AI technology to address data limitations during the onset of a novel, rapidly changing pandemic.

    View details for DOI 10.2196/23026

    View details for PubMedID 33534724