Feng Xie is currently a postdoctoral scholar at Stanford University School of Medicine, and he recently graduated with a joint Ph.D. degree from Duke University and the National University of Singapore. He previously obtained his bachelor’s degree from Tsinghua University, Beijing, China, in 2017. During his Ph.D. study, he utilized interpretable machine learning tools in acute and emergency care settings and published six first-author research papers in high-impact journals. Specifically, he developed a novel informatics framework called AutoScore, which automatically generates interpretable clinical scores from electronic health records. This open-source software package has been used by local and international researchers, downloaded about 400 times per month from the CRAN platform, and the first paper published in 2020 has garnered around 40 citations. His research interests include machine learning, clinical informatics and decision-making, predictive models, electronic health records, and risk stratification in acute care settings.

Professional Education

  • Doctor of Philosophy, National University Of Singapore (2022)
  • Bachelor of Science, Tsinghua University (2017)
  • PhD, National University of Singapore / Duke University (2022)
  • Bachelor of Science, Tsinghua University (2017)

Stanford Advisors

Research Interests

  • Data Sciences
  • Research Methods

Lab Affiliations

All Publications

  • Development and validation of an interpretable clinical score for early identification of acute kidney injury at the emergency department SCIENTIFIC REPORTS Ang, Y., Li, S., Ong, M., Xie, F., Teo, S., Choong, L., Koniman, R., Chakraborty, B., Ho, A., Liu, N. 2022; 12 (1): 7111


    Acute kidney injury (AKI) in hospitalised patients is a common syndrome associated with poorer patient outcomes. Clinical risk scores can be used for the early identification of patients at risk of AKI. We conducted a retrospective study using electronic health records of Singapore General Hospital emergency department patients who were admitted from 2008 to 2016. The primary outcome was inpatient AKI of any stage within 7 days of admission based on the Kidney Disease Improving Global Outcome (KDIGO) 2012 guidelines. A machine learning-based framework AutoScore was used to generate clinical scores from the study sample which was randomly divided into training, validation and testing cohorts. Model performance was evaluated using area under the curve (AUC). Among the 119,468 admissions, 10,693 (9.0%) developed AKI. 8491 were stage 1 (79.4%), 906 stage 2 (8.5%) and 1296 stage 3 (12.1%). The AKI Risk Score (AKI-RiSc) was a summation of the integer scores of 6 variables: serum creatinine, serum bicarbonate, pulse, systolic blood pressure, diastolic blood pressure, and age. AUC of AKI-RiSc was 0.730 (95% CI 0.714-0.747), outperforming an existing AKI Prediction Score model which achieved AUC of 0.665 (95% CI 0.646-0.679) on the testing cohort. At a cut-off of 4 points, AKI-RiSc had a sensitivity of 82.6% and specificity of 46.7%. AKI-RiSc is a simple clinical score that can be easily implemented on the ground for early identification of AKI and potentially be applied in international settings.

    View details for DOI 10.1038/s41598-022-11129-4

    View details for Web of Science ID 000789854100016

    View details for PubMedID 35501411

    View details for PubMedCentralID PMC9061747

  • AutoScore-Imbalance: An interpretable machine learning tool for development of clinical scores with rare events data JOURNAL OF BIOMEDICAL INFORMATICS Yuan, H., Xie, F., Ong, M., Ning, Y., Chee, M., Saffari, S., Abdullah, H., Goldstein, B., Chakraborty, B., Liu, N. 2022; 129: 104072


    Medical decision-making impacts both individual and public health. Clinical scores are commonly used among various decision-making models to determine the degree of disease deterioration at the bedside. AutoScore was proposed as a useful clinical score generator based on machine learning and a generalized linear model. However, its current framework still leaves room for improvement when addressing unbalanced data of rare events.Using machine intelligence approaches, we developed AutoScore-Imbalance, which comprises three components: training dataset optimization, sample weight optimization, and adjusted AutoScore. Baseline techniques for performance comparison included the original AutoScore, full logistic regression, stepwise logistic regression, least absolute shrinkage and selection operator (LASSO), full random forest, and random forest with a reduced number of variables. These models were evaluated based on their area under the curve (AUC) in the receiver operating characteristic analysis and balanced accuracy (i.e., mean value of sensitivity and specificity). By utilizing a publicly accessible dataset from Beth Israel Deaconess Medical Center, we assessed the proposed model and baseline approaches to predict inpatient mortality.AutoScore-Imbalance outperformed baselines in terms of AUC and balanced accuracy. The nine-variable AutoScore-Imbalance sub-model achieved the highest AUC of 0.786 (0.732-0.839), while the eleven-variable original AutoScore obtained an AUC of 0.723 (0.663-0.783), and the logistic regression with 21 variables obtained an AUC of 0.743 (0.685-0.801). The AutoScore-Imbalance sub-model (using a down-sampling algorithm) yielded an AUC of 0.771 (0.718-0.823) with only five variables, demonstrating a good balance between performance and variable sparsity. Furthermore, AutoScore-Imbalance obtained the highest balanced accuracy of 0.757 (0.702-0.805), compared to 0.698 (0.643-0.753) by the original AutoScore and the maximum of 0.720 (0.664-0.769) by other baseline models.We have developed an interpretable tool to handle clinical data imbalance, presented its structure, and demonstrated its superiority over baselines. The AutoScore-Imbalance tool can be applied to highly unbalanced datasets to gain further insight into rare medical events and facilitate real-world clinical decision-making.

    View details for DOI 10.1016/j.jbi.2022.104072

    View details for Web of Science ID 000794840600004

    View details for PubMedID 35421602

  • Development and validation of an interpretable machine learning scoring tool for estimating time to emergency readmissions ECLINICALMEDICINE Xie, F., Liu, N., Yan, L., Ning, Y., Lim, K., Gong, C., Kwan, Y., Ho, A., Low, L., Chakraborty, B., Ong, M. 2022; 45: 101315


    Emergency readmission poses an additional burden on both patients and healthcare systems. Risk stratification is the first step of transitional care interventions targeted at reducing readmission. To accurately predict the short- and intermediate-term risks of readmission and provide information for further temporal risk stratification, we developed and validated an interpretable machine learning risk scoring system.In this retrospective study, all emergency admission episodes from January 1st 2009 to December 31st 2016 at a tertiary hospital in Singapore were assessed. The primary outcome was time to emergency readmission within 90 days post discharge. The Score for Emergency ReAdmission Prediction (SERAP) tool was derived via an interpretable machine learning-based system for time-to-event outcomes. SERAP is six-variable survival score, and takes the number of emergency admissions last year, age, history of malignancy, history of renal diseases, serum creatinine level, and serum albumin level during index admission into consideration.A total of 293,589 ED admission episodes were finally included in the whole cohort. Among them, 203,748 episodes were included in the training cohort, 50,937 episodes in the validation cohort, and 38,904 in the testing cohort. Readmission within 90 days was documented in 80,213 (27.3%) episodes, with a median time to emergency readmission of 22 days (Interquartile range: 8-47). For different time points, the readmission rates observed in the whole cohort were 6.7% at 7 days, 10.6% at 14 days, 13.6% at 21 days, 16.4% at 30 days, and 23.0% at 60 days. In the testing cohort, the SERAP achieved an integrated area under the curve of 0.737 (95% confidence interval: 0.730-0.743). For a specific 30-day readmission prediction, SERAP outperformed the LACE index (Length of stay, Acuity of admission, Charlson comorbidity index, and Emergency department visits in past six months) and the HOSPITAL score (Hemoglobin at discharge, discharge from an Oncology service, Sodium level at discharge, Procedure during the index admission, Index Type of admission, number of Admissions during the last 12 months, and Length of stay). Besides 30-day readmission, SERAP can predict readmission rates at any time point during the 90-day period.Better performance in risk prediction was achieved by the SERAP than other existing scores, and accurate information about time to emergency readmission was generated for further temporal risk stratification and clinical decision-making. In the future, external validation studies are needed to evaluate the SERAP at different settings and assess their real-world performance.This study was supported by the Singapore National Medical Research Council under the PULSES Center Grant, and Duke-NUS Medical School.

    View details for DOI 10.1016/j.eclinm.2022.101315

    View details for Web of Science ID 000823395500019

    View details for PubMedID 35284804

    View details for PubMedCentralID PMC8904223

  • Leveraging Large-Scale Electronic Health Records and Interpretable Machine Learning for Clinical Decision Making at the Emergency Department: Protocol for System Development and Validation JMIR RESEARCH PROTOCOLS Liu, N., Xie, F., Siddiqui, F., Ho, A., Chakraborty, B., Nadarajan, G., Tan, K., Ong, M. 2022; 11 (3): e34201


    There is a growing demand globally for emergency department (ED) services. An increase in ED visits has resulted in overcrowding and longer waiting times. The triage process plays a crucial role in assessing and stratifying patients' risks and ensuring that the critically ill promptly receive appropriate priority and emergency treatment. A substantial amount of research has been conducted on the use of machine learning tools to construct triage and risk prediction models; however, the black box nature of these models has limited their clinical application and interpretation.In this study, we plan to develop an innovative, dynamic, and interpretable System for Emergency Risk Triage (SERT) for risk stratification in the ED by leveraging large-scale electronic health records (EHRs) and machine learning.To achieve this objective, we will conduct a retrospective, single-center study based on a large, longitudinal data set obtained from the EHRs of the largest tertiary hospital in Singapore. Study outcomes include adverse events experienced by patients, such as the need for an intensive care unit and inpatient death. With preidentified candidate variables drawn from expert opinions and relevant literature, we will apply an interpretable machine learning-based AutoScore to develop 3 SERT scores. These 3 scores can be used at different times in the ED, that is, on arrival, during ED stay, and at admission. Furthermore, we will compare our novel SERT scores with established clinical scores and previously described black box machine learning models as baselines. Receiver operating characteristic analysis will be conducted on the testing cohorts for performance evaluation.The study is currently being conducted. The extracted data indicate approximately 1.8 million ED visits by over 810,000 unique patients. Modelling results are expected to be published in 2022.The SERT scoring system proposed in this study will be unique and innovative because of its dynamic nature and modelling transparency. If successfully validated, our proposed solution will establish a standard for data processing and modelling by taking advantage of large-scale EHRs and interpretable machine learning tools.DERR1-10.2196/34201.

    View details for DOI 10.2196/34201

    View details for Web of Science ID 000779979500009

    View details for PubMedID 35333179

  • Deep learning for temporal data representation in electronic health records: A systematic review of challenges and methodologies JOURNAL OF BIOMEDICAL INFORMATICS Xie, F., Yuan, H., Ning, Y., Ong, M., Feng, M., Hsu, W., Chakraborty, B., Liu, N. 2022; 126: 103980


    Temporal electronic health records (EHRs) contain a wealth of information for secondary uses, such as clinical events prediction and chronic disease management. However, challenges exist for temporal data representation. We therefore sought to identify these challenges and evaluate novel methodologies for addressing them through a systematic examination of deep learning solutions.We searched five databases (PubMed, Embase, the Institute of Electrical and Electronics Engineers [IEEE] Xplore Digital Library, the Association for Computing Machinery [ACM] Digital Library, and Web of Science) complemented with hand-searching in several prestigious computer science conference proceedings. We sought articles that reported deep learning methodologies on temporal data representation in structured EHR data from January 1, 2010, to August 30, 2020. We summarized and analyzed the selected articles from three perspectives: nature of time series, methodology, and model implementation.We included 98 articles related to temporal data representation using deep learning. Four major challenges were identified, including data irregularity, heterogeneity, sparsity, and model opacity. We then studied how deep learning techniques were applied to address these challenges. Finally, we discuss some open challenges arising from deep learning.Temporal EHR data present several major challenges for clinical prediction modeling and data utilization. To some extent, current deep learning solutions can address these challenges. Future studies may consider designing comprehensive and integrated solutions. Moreover, researchers should incorporate clinical domain knowledge into study designs and enhance model interpretability to facilitate clinical implementation.

    View details for DOI 10.1016/j.jbi.2021.103980

    View details for Web of Science ID 000767887400004

    View details for PubMedID 34974189

  • AutoScore-Survival: Developing interpretable machine learning-based time-to-event scores with right-censored survival data JOURNAL OF BIOMEDICAL INFORMATICS Xie, F., Ning, Y., Yuan, H., Goldstein, B., Ong, M., Liu, N., Chakraborty, B. 2022; 125: 103959


    Scoring systems are highly interpretable and widely used to evaluate time-to-event outcomes in healthcare research. However, existing time-to-event scores are predominantly created ad-hoc using a few manually selected variables based on clinician's knowledge, suggesting an unmet need for a robust and efficient generic score-generating method.AutoScore was previously developed as an interpretable machine learning score generator, integrating both machine learning and point-based scores in the strong discriminability and accessibility. We have further extended it to the time-to-event outcomes and developed AutoScore-Survival, for generating time-to-event scores with right-censored survival data. Random survival forest provided an efficient solution for selecting variables, and Cox regression was used for score weighting. We implemented our proposed method as an R package. We illustrated our method in a study of 90-day survival prediction for patients in intensive care units and compared its performance with other survival models, the random survival forest, and two traditional clinical scores.The AutoScore-Survival-derived scoring system was more parsimonious than survival models built using traditional variable selection methods (e.g., penalized likelihood approach and stepwise variable selection), and its performance was comparable to survival models using the same set of variables. Although AutoScore-Survival achieved a comparable integrated area under the curve of 0.782 (95% CI: 0.767-0.794), the integer-valued time-to-event scores generated are favorable in clinical applications because they are easier to compute and interpret.Our proposed AutoScore-Survival provides a robust and easy-to-use machine learning-based clinical score generator to studies of time-to-event outcomes. It gives a systematic guideline to facilitate the future development of time-to-event scores for clinical applications.

    View details for DOI 10.1016/j.jbi.2021.103959

    View details for Web of Science ID 000735573800005

    View details for PubMedID 34826628

  • Development and Assessment of an Interpretable Machine Learning Triage Tool for Estimating Mortality After Emergency Admissions JAMA NETWORK OPEN Xie, F., Ong, M., Liew, J., Tan, K., Ho, A., Nadarajan, G., Low, L., Kwan, Y., Goldstein, B., Matchar, D., Chakraborty, B., Liu, N. 2021; 4 (8): e2118467


    Triage in the emergency department (ED) is a complex clinical judgment based on the tacit understanding of the patient's likelihood of survival, availability of medical resources, and local practices. Although a scoring tool could be valuable in risk stratification, currently available scores have demonstrated limitations.To develop an interpretable machine learning tool based on a parsimonious list of variables available at ED triage; provide a simple, early, and accurate estimate of patients' risk of death; and evaluate the tool's predictive accuracy compared with several established clinical scores.This single-site, retrospective cohort study assessed all ED patients between January 1, 2009, and December 31, 2016, who were subsequently admitted to a tertiary hospital in Singapore. The Score for Emergency Risk Prediction (SERP) tool was derived using a machine learning framework. To estimate mortality outcomes after emergency admissions, SERP was compared with several triage systems, including Patient Acuity Category Scale, Modified Early Warning Score, National Early Warning Score, Cardiac Arrest Risk Triage, Rapid Acute Physiology Score, and Rapid Emergency Medicine Score. The initial analyses were completed in October 2020, and additional analyses were conducted in May 2021.Three SERP scores, namely SERP-2d, SERP-7d, and SERP-30d, were developed using the primary outcomes of interest of 2-, 7-, and 30-day mortality, respectively. Secondary outcomes included 3-day mortality and inpatient mortality. The SERP's predictive power was measured using the area under the curve in the receiver operating characteristic analysis.The study included 224 666 ED episodes in the model training cohort (mean [SD] patient age, 63.60 [16.90] years; 113 426 [50.5%] female), 56 167 episodes in the validation cohort (mean [SD] patient age, 63.58 [16.87] years; 28 427 [50.6%] female), and 42 676 episodes in the testing cohort (mean [SD] patient age, 64.85 [16.80] years; 21 556 [50.5%] female). The mortality rates in the training cohort were 0.8% at 2 days, 2.2% at 7 days, and 5.9% at 30 days. In the testing cohort, the areas under the curve of SERP-30d were 0.821 (95% CI, 0.796-0.847) for 2-day mortality, 0.826 (95% CI, 0.811-0.841) for 7-day mortality, and 0.823 (95% CI, 0.814-0.832) for 30-day mortality and outperformed several benchmark scores.In this retrospective cohort study, SERP had better prediction performance than existing triage scores while maintaining easy implementation and ease of ascertainment in the ED. It has the potential to be widely applied and validated in different circumstances and health care settings.

    View details for DOI 10.1001/jamanetworkopen.2021.18467

    View details for Web of Science ID 000689731500001

    View details for PubMedID 34448870

    View details for PubMedCentralID PMC8397930

  • AutoScore: A Machine Learning-Based Automatic Clinical Score Generator and Its Application to Mortality Prediction Using Electronic Health Records JMIR MEDICAL INFORMATICS Xie, F., Chakraborty, B., Ong, M., Goldstein, B., Liu, N. 2020; 8 (10): e21798


    Risk scores can be useful in clinical risk stratification and accurate allocations of medical resources, helping health providers improve patient care. Point-based scores are more understandable and explainable than other complex models and are now widely used in clinical decision making. However, the development of the risk scoring model is nontrivial and has not yet been systematically presented, with few studies investigating methods of clinical score generation using electronic health records.This study aims to propose AutoScore, a machine learning-based automatic clinical score generator consisting of 6 modules for developing interpretable point-based scores. Future users can employ the AutoScore framework to create clinical scores effortlessly in various clinical applications.We proposed the AutoScore framework comprising 6 modules that included variable ranking, variable transformation, score derivation, model selection, score fine-tuning, and model evaluation. To demonstrate the performance of AutoScore, we used data from the Beth Israel Deaconess Medical Center to build a scoring model for mortality prediction and then compared the data with other baseline models using the receiver operating characteristic analysis. A software package in R 3.5.3 (R Foundation) was also developed to demonstrate the implementation of AutoScore.Implemented on the data set with 44,918 individual admission episodes of intensive care, the AutoScore-created scoring models performed comparably well as other standard methods (ie, logistic regression, stepwise regression, least absolute shrinkage and selection operator, and random forest) in terms of predictive accuracy and model calibration but required fewer predictors and presented high interpretability and accessibility. The nine-variable, AutoScore-created, point-based scoring model achieved an area under the curve (AUC) of 0.780 (95% CI 0.764-0.798), whereas the model of logistic regression with 24 variables had an AUC of 0.778 (95% CI 0.760-0.795). Moreover, the AutoScore framework also drives the clinical research continuum and automation with its integration of all necessary modules.We developed an easy-to-use, machine learning-based automatic clinical score generator, AutoScore; systematically presented its structure; and demonstrated its superiority (predictive performance and interpretability) over other conventional methods using a benchmark database. AutoScore will emerge as a potential scoring tool in various medical applications.

    View details for DOI 10.2196/21798

    View details for Web of Science ID 000587474400023

    View details for PubMedID 33084589

    View details for PubMedCentralID PMC7641783

  • Heart rate n-variability (HRnV) and its application to risk stratification of chest pain patients in the emergency department. BMC cardiovascular disorders Liu, N. n., Guo, D. n., Koh, Z. X., Ho, A. F., Xie, F. n., Tagami, T. n., Sakamoto, J. T., Pek, P. P., Chakraborty, B. n., Lim, S. H., Tan, J. W., Ong, M. E. 2020; 20 (1): 168


    Chest pain is one of the most common complaints among patients presenting to the emergency department (ED). Causes of chest pain can be benign or life threatening, making accurate risk stratification a critical issue in the ED. In addition to the use of established clinical scores, prior studies have attempted to create predictive models with heart rate variability (HRV). In this study, we proposed heart rate n-variability (HRnV), an alternative representation of beat-to-beat variation in electrocardiogram (ECG), and investigated its association with major adverse cardiac events (MACE) in ED patients with chest pain.We conducted a retrospective analysis of data collected from the ED of a tertiary hospital in Singapore between September 2010 and July 2015. Patients > 20 years old who presented to the ED with chief complaint of chest pain were conveniently recruited. Five to six-minute single-lead ECGs, demographics, medical history, troponin, and other required variables were collected. We developed the HRnV-Calc software to calculate HRnV parameters. The primary outcome was 30-day MACE, which included all-cause death, acute myocardial infarction, and revascularization. Univariable and multivariable logistic regression analyses were conducted to investigate the association between individual risk factors and the outcome. Receiver operating characteristic (ROC) analysis was performed to compare the HRnV model (based on leave-one-out cross-validation) against other clinical scores in predicting 30-day MACE.A total of 795 patients were included in the analysis, of which 247 (31%) had MACE within 30 days. The MACE group was older, with a higher proportion being male patients. Twenty-one conventional HRV and 115 HRnV parameters were calculated. In univariable analysis, eleven HRV and 48 HRnV parameters were significantly associated with 30-day MACE. The multivariable stepwise logistic regression identified 16 predictors that were strongly associated with MACE outcome; these predictors consisted of one HRV, seven HRnV parameters, troponin, ST segment changes, and several other factors. The HRnV model outperformed several clinical scores in the ROC analysis.The novel HRnV representation demonstrated its value of augmenting HRV and traditional risk factors in designing a robust risk stratification tool for patients with chest pain in the ED.

    View details for DOI 10.1186/s12872-020-01455-8

    View details for PubMedID 32276602

  • Novel model for predicting inpatient mortality after emergency admission to hospital in Singapore: retrospective observational study BMJ OPEN Xie, F., Liu, N., Wu, S., Ang, Y., Low, L., Ho, A., Lam, S., Matchar, D., Ong, M., Chakraborty, B. 2019; 9 (9): e031382


    To identify risk factors for inpatient mortality after patients' emergency admission and to create a novel model predicting inpatient mortality risk.This was a retrospective observational study using data extracted from electronic health records (EHRs). The data were randomly split into a derivation set and a validation set. The stepwise model selection was employed. We compared our model with one of the current clinical scores, Cardiac Arrest Risk Triage (CART) score.A single tertiary hospital in Singapore.All adult hospitalised patients, admitted via emergency department (ED) from 1 January 2008 to 31 October 2017 (n=433 187 by admission episodes).The primary outcome of interest was inpatient mortality following this admission episode. The area under the curve (AUC) of the receiver operating characteristic curve of the predictive model with sensitivity and specificity for optimised cut-offs.15 758 (3.64%) of the episodes were observed inpatient mortality. 19 variables were observed as significant predictors and were included in our final regression model. Our predictive model outperformed the CART score in terms of predictive power. The AUC of CART score and our final model was 0.705 (95% CI 0.697 to 0.714) and 0.817 (95% CI 0.810 to 0.824), respectively.We developed and validated a model for inpatient mortality using EHR data collected in the ED. The performance of our model was more accurate than the CART score. Implementation of our model in the hospital can potentially predict imminent adverse events and institute appropriate clinical management.

    View details for DOI 10.1136/bmjopen-2019-031382

    View details for Web of Science ID 000497787600368

    View details for PubMedID 31558458

    View details for PubMedCentralID PMC6773418