Chenxi Sun
Postdoctoral Scholar, Neurology and Neurological Sciences
Current Research and Scholarly Interests
Artificial intelligence for time-series data; clinical EEG foundation models; machine learning for neurophysiology
All Publications
-
A Review of Deep Learning Methods for Irregularly Sampled Medical Time Series Data
Health Data Science
2026
View details for DOI 10.34133/hds.0456
-
An Electrocardiogram Foundation Model Built on over 10 Million Recordings.
NEJM AI
2025; 2 (7)
Abstract
Artificial intelligence (AI) has demonstrated significant potential in electrocardiogram (ECG) analysis and cardiovascular disease assessment. Recently, foundation models have played a remarkable role in advancing medical AI, bringing benefits such as efficient disease diagnosis and cross-domain knowledge transfer. The development of an ECG foundation model holds the promise of elevating AI-ECG research to new heights. However, building such a model poses several challenges, including insufficient database sample sizes and inadequate generalization across multiple domains. In addition, there is a notable performance gap between single-lead and multilead ECG analysis.We propose a general-purpose ECG foundation model (ECGFounder), which leverages real-world ECG annotations from cardiologists to broaden the diagnostic capabilities of ECG analysis. ECGFounder was built on 10,771,552 ECGs from 1,818,247 unique subjects with 150 label categories from the Harvard-Emory ECG Database, enabling comprehensive cardiovascular disease diagnosis. The model is designed to be both an effective out-of-the-box solution and easily fine-tunable for downstream tasks, maximizing usability. Importantly, we extended its application to reduced-lead ECGs, particularly single-lead ECGs. ECGFounder is therefore applicable to various downstream tasks in mobile and remote monitoring scenarios.Experimental results demonstrate that ECGFounder achieves expert-level performance on internal validation sets, with area under the receiver operating characteristic curve (AUROC) exceeding 0.95 for 80 diagnoses. It also shows strong classification performance and generalization across various diagnoses on external validation sets. When fine-tuned, ECGFounder outperforms baseline models in demographic analysis, clinical event detection, and cross-modality cardiac rhythm diagnosis, surpassing baseline methods by 3 to 5 points in the AUROC.The ECG foundation model offers an effective solution, allowing it to generalize across a wide range of tasks. By enhancing existing cardiovascular diagnostics and facilitating integration with cloud-based systems, which analyze ECG data uploaded from wearable devices, it significantly contributes to the advancement of the cardiovascular AI community and enables management of cardiac conditions. (Funded by the National Science Foundation and others.).
View details for DOI 10.1056/aioa2401033
View details for PubMedID 40771651
View details for PubMedCentralID PMC12327759
-
Harvard Electroencephalography Database: A comprehensive clinical electroencephalographic resource from four Boston hospitals.
Epilepsia
2025
Abstract
This article presents the Harvard Electroencephalography Database (HEEDB), a large-scale, deidentified, and standardized electroencephalographic (EEG) resource supporting artificial intelligence-driven and reproducible research in epilepsy and broader clinical neuroscience.HEEDB aggregates more than 280 000 EEG recordings from more than 108 000 patients across four Harvard-affiliated hospitals. Data are harmonized using the Brain Imaging Data Structure and hosted on the Brain Data Science Platform. EEG data are linked with clinical notes, International Classification of Diseases, 10th Revision codes, medications, and EEG reports. Deidentification follows Health Insurance Portability and Accountability Act Safe Harbor standards.The database includes routine, epilepsy monitoring unit, and intensive care unit EEGs across all age groups, with 73% linked to deidentified clinical reports and 96% of those matched to recordings. Findings are extracted using expert curation, regular expressions, and medical natural language processing models. Auxiliary data include diagnoses, medications, and hospital course, supporting multimodal analysis.HEEDB fills a critical gap in EEG data availability for epilepsy research. By enabling large-scale, privacy-compliant, and clinically relevant analysis, it accelerates the development of diagnostic tools, improves training datasets for machine learning, and promotes data-sharing in alignment with FAIR (Findable, Accessible, Interoperable, Reusable) and National Institutes of Health data policies.
View details for DOI 10.1111/epi.18487
View details for PubMedID 40464151
-
Expert-Level Detection of Epilepsy Markers in EEG on Short and Long Timescales
The New England Journal of Medicine AI
2025
View details for DOI 10.1056/AIoa2401221
-
A Ranking-Based Cross-Entropy Loss for Early Classification of Time Series.
IEEE transactions on neural networks and learning systems
2024; 35 (8): 11194-11203
Abstract
Early classification tasks aim to classify time series before observing full data. It is critical in time-sensitive applications such as early sepsis diagnosis in the intensive care unit (ICU). Early diagnosis can provide more opportunities for doctors to rescue lives. However, there are two conflicting goals in the early classification task-accuracy and earliness. Most existing methods try to find a balance between them by weighing one goal against the other. But we argue that a powerful early classifier should always make highly accurate predictions at any moment. The main obstacle is that the key features suitable for classification are not obvious in the early stage, resulting in the excessive overlap of time series distributions in different time stages. The indistinguishable distributions make it difficult for classifiers to recognize. To solve this problem, this article proposes a novel ranking-based cross-entropy (RCE) loss to jointly learn the feature of classes and the order of earliness from time series data. In this way, RCE can help classifier to generate probability distributions of time series in different stages with more distinguishable boundary. Thus, the classification accuracy at each time step is finally improved. Besides, for the applicability of the method, we also accelerate the training process by focusing the learning process on high-ranking samples. Experiments on three real-world datasets show that our method can perform classification more accurately than all baselines at all moments.
View details for DOI 10.1109/TNNLS.2023.3250203
View details for PubMedID 37028352
- Curriculum Design Helps Spiking Neural Networks to Classify Time Series arXiv 2024
- Review of Data-centric Time Series Analysis from Sample, Feature, and Period arXiv 2024
- TEST: Text prototype aligned embedding to activate LLM's ability for time series The Twelfth International Conference on Learning Representations (ICLR 2024) 2024: 28
-
Time pattern reconstruction for classification of irregularly sampled time series
PATTERN RECOGNITION
2024; 147
View details for DOI 10.1016/j.patcog.2023.110075
View details for Web of Science ID 001105411400001
-
A multi-model architecture based on deep learning for aircraft load prediction
COMMUNICATIONS ENGINEERING
2023; 2 (1)
View details for DOI 10.1038/s44172-023-00100-4
View details for Web of Science ID 001478269300001
-
Estimating causal effects of physical disability and number of comorbid chronic diseases on risk of depressive symptoms in an elderly Chinese population: a machine learning analysis of cross-sectional baseline data from the China longitudinal ageing social survey.
BMJ open
2023; 13 (7): e069298
Abstract
This study aimed to explore the causal effects of physical disability and number of comorbid chronic diseases on depressive symptoms in an elderly Chinese population.Cross-sectional, baseline data were obtained from the China Longitudinal Ageing Social Survey, a stratified, multistage, probabilistic sampling survey conducted in 2014 that covers 28 of 31 provincial areas in China. The causal effects of physical disability and number of comorbid chronic diseases on depressive symptoms were analysed using the conditional average treatment effect method of machine learning. The causal effects model's adjustment was made for age, gender, residence, marital status, educational level, ethnicity, wealth quantile and other factors.Assessment of the causal effects of physical disability and number of comorbid chronic diseases on depressive symptoms.7496 subjects who were 60 years of age or older and who answered the questions on depressive symptoms and other independent variables of interest in a survey conducted in 2014 were included in this study.Physical disability and number of comorbid chronic diseases had causal effects on depressive symptoms. Among the subjects who had one or more functional limitations, the probability of depressive symptoms increased by 22% (95% CI 19% to 24%). For the subjects who had one chronic disease and those who had two or more chronic diseases, the possibility of depressive symptoms increased by 13% (95% CI 10% to 15%) and 20% (95% CI 18% to 22%), respectively.This study provides evidence that the presence of one or more functional limitations affects the occurrence of depressive symptoms among elderly people. The findings of our study are of value in developing programmes that are designed to identify elderly individuals who have physical disabilities or comorbid chronic diseases to provide early intervention.
View details for DOI 10.1136/bmjopen-2022-069298
View details for PubMedID 37407052
View details for PubMedCentralID PMC10335586
-
SPL-LDP: a label distribution propagation method for semi-supervised partial label learning
APPLIED INTELLIGENCE
2023; 53 (18): 20785-20796
View details for DOI 10.1007/s10489-023-04548-x
View details for Web of Science ID 000971606000001
-
Adaptive model training strategy for continuous classification of time series.
Applied intelligence (Dordrecht, Netherlands)
2023: 1-19
Abstract
The classification of time series is essential in many real-world applications like healthcare. The class of a time series is usually labeled at the final time, but more and more time-sensitive applications require classifying time series continuously. For example, the outcome of a critical patient is only determined at the end, but he should be diagnosed at all times for timely treatment. For this demand, we propose a new concept, Continuous Classification of Time Series (CCTS). Different from the existing single-shot classification, the key of CCTS is to model multiple distributions simultaneously due to the dynamic evolution of time series. But the deep learning model will encounter intertwined problems of catastrophic forgetting and over-fitting when learning multi-distribution. In this work, we found that the well-designed distribution division and replay strategies in the model training process can help to solve the problems. We propose a novel Adaptive model training strategy for CCTS (ACCTS). Its adaptability represents two aspects: (1) Adaptive multi-distribution extraction policy. Instead of the fixed rules and the prior knowledge, ACCTS extracts data distributions adaptive to the time series evolution and the model change; (2) Adaptive importance-based replay policy. Instead of reviewing all old distributions, ACCTS only replays important samples adaptive to their contribution to the model. Experiments on four real-world datasets show that our method outperforms all baselines.
View details for DOI 10.1007/s10489-022-04433-z
View details for PubMedID 36819946
View details for PubMedCentralID PMC9922045
-
Continuous diagnosis and prognosis by controlling the update process of deep neural networks.
Patterns (New York, N.Y.)
2023; 4 (2): 100687
Abstract
Continuous diagnosis and prognosis are essential for critical patients. They can provide more opportunities for timely treatment and rational allocation. Although deep-learning techniques have demonstrated superiority in many medical tasks, they frequently forget, overfit, and produce results too late when performing continuous diagnosis and prognosis. In this work, we summarize the four requirements; propose a concept, continuous classification of time series (CCTS); and design a training method for deep learning, restricted update strategy (RU). The RU outperforms all baselines and achieves average accuracies of 90%, 97%, and 85% on continuous sepsis prognosis, COVID-19 mortality prediction, and eight disease classifications, respectively. The RU can also endow deep learning with interpretability, exploring disease mechanisms through staging and biomarker discovery. We find four sepsis stages, three COVID-19 stages, and their respective biomarkers. Further, our approach is data and model agnostic. It can be applied to other diseases and even in other fields.
View details for DOI 10.1016/j.patter.2023.100687
View details for PubMedID 36873902
View details for PubMedCentralID PMC9982300
- Curricular and Cyclical Loss for Time Series Learning Strategy arXiv 2023
-
A systematic review of deep learning methods for modeling electrocardiograms during sleep.
Physiological measurement
2022; 43 (8)
Abstract
Sleep is one of the most important human physiological activities, and plays an essential role in human health. Polysomnography (PSG) is the gold standard for measuring sleep quality and disorders, but it is time-consuming, labor-intensive, and prone to errors. Current research has confirmed the correlations between sleep and the respiratory/circulatory system. Electrocardiography (ECG) is convenient to perform, and ECG data are rich in breathing information. Therefore, sleep research based on ECG data has become popular. Currently, deep learning (DL) methods have achieved promising results on predictive health care tasks using ECG signals. Therefore, in this review, we systematically identify recent research studies and analyze them from the perspectives of data, model, and task. We discuss the shortcomings, summarize the findings, and highlight the potential opportunities. For sleep-related tasks, many ECG-based DL methods produce more accurate results than traditional approaches by combining multiple signal features and model structures. Methods that are more interpretable, scalable, and transferable will become ubiquitous in the daily practice of medicine and ambient-assisted-living applications. This paper is the first systematic review of ECG-based DL methods for sleep tasks.
View details for DOI 10.1088/1361-6579/ac826e
View details for PubMedID 35853448
-
DLSA: Semi-supervised partial label learning via dependence-maximized label set assignment
INFORMATION SCIENCES
2022; 609: 1169-1180
View details for DOI 10.1016/j.ins.2022.07.114
View details for Web of Science ID 000848146300013
-
Hypergraph Contrastive Learning for Electronic Health Records
edited by Banerjee, A., Zhou, Z. H., Papalexakis, E. E., Riondato, M.
SIAM. 2022: 127-135
View details for Web of Science ID 001281343300007
-
Deep Ordinal Neural Network for Length of Stay Estimation in the Intensive Care Units
ASSOC COMPUTING MACHINERY. 2022: 3843-3847
View details for DOI 10.1145/3511808.3557578
View details for Web of Science ID 001074639603084
-
Confidence-Guided Learning Process for Continuous Classification of Time Series
ASSOC COMPUTING MACHINERY. 2022: 4525-4529
View details for DOI 10.1145/3511808.3557565
View details for Web of Science ID 001074639604111
-
GRP-FED: Addressing Client Imbalance in Federated Learning via Global-Regularized Personalization
Proceedings of the 2022 SIAM International Conference on Data Mining (SDM 2022)
2022
View details for DOI 10.1137/1.9781611977172.51
-
Hypergraph Contrastive Learning for Electronic Health Records
Proceedings of the 2022 SIAM International Conference on Data Mining (SDM 2022)
2022
View details for DOI 10.1137/1.9781611977172.15
-
Hypergraph Structure Learning for Hypergraph Neural Networks
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI 2022)
2022
View details for DOI 10.24963/ijcai.2022/267
-
A Systematic Review of Echo State Networks From Design to Application
IEEE Transactions on Artificial Intelligence
2022
View details for DOI 10.1109/TAI.2022.3225780
-
Hypergraph Structure Learning for Hypergraph Neural Networks
edited by DeRaedt, L.
IJCAI-INT JOINT CONF ARTIF INTELL. 2022: 1923-1929
View details for Web of Science ID 001202342302008
-
GRP-FED: Addressing Client Imbalance in Federated Learning via Global-Regularized Personalization
edited by Banerjee, A., Zhou, Z. H., Papalexakis, E. E., Riondato, M.
SIAM. 2022: 451-458
View details for Web of Science ID 001281343300047
-
Classifying vaguely labeled data based on evidential fusion
INFORMATION SCIENCES
2022; 583: 159-173
View details for DOI 10.1016/j.ins.2021.11.005
View details for Web of Science ID 000727727800007
-
Interpretable time-aware and co-occurrence-aware network for medical prediction.
BMC medical informatics and decision making
2021; 21 (1): 305
Abstract
Disease prediction based on electronic health records (EHRs) is essential for personalized healthcare. But it's hard due to the special data structure and the interpretability requirement of methods. The structure of EHR is hierarchical: each patient has a sequence of admissions, and each admission has some co-occurrence diagnoses. However, the existing methods only partially model these characteristics and lack the interpretation for non-specialists.This work proposes a time-aware and co-occurrence-aware deep learning network (TCoN), which is not only suitable for EHR data structure but also interpretable: the co-occurrence-aware self-attention (CS-attention) mechanism and time-aware gated recurrent unit (T-GRU) can model multilevel relations; the interpretation path and the diagnosis graph can make the result interpretable.The method is tested on a real-world dataset for mortality prediction, readmission prediction, disease prediction, and next diagnoses prediction. Experimental results show that TCoN is better than baselines with 2.01% higher accuracy. Meanwhile, the method can give the interpretation of causal relationships and the diagnosis graph of each patient.This work proposes a novel model-TCoN. It is an interpretable and effective deep learning method, that can model the hierarchical medical structure and predict medical events. The experiments show that it outperforms all state-of-the-art methods. Future work can apply the graph embedding technology based on more knowledge data such as doctor notes.
View details for DOI 10.1186/s12911-021-01662-z
View details for PubMedID 34727940
View details for PubMedCentralID PMC8561378
-
Personalized vital signs control based on continuous action-space reinforcement learning with supervised experience
BIOMEDICAL SIGNAL PROCESSING AND CONTROL
2021; 69
View details for DOI 10.1016/j.bspc.2021.102847
View details for Web of Science ID 000685910600005
-
Predicting COVID-19 disease progression and patient outcomes based on temporal deep learning.
BMC medical informatics and decision making
2021; 21 (1): 45
Abstract
The coronavirus disease 2019 (COVID-19) pandemic has caused health concerns worldwide since December 2019. From the beginning of infection, patients will progress through different symptom stages, such as fever, dyspnea or even death. Identifying disease progression and predicting patient outcome at an early stage helps target treatment and resource allocation. However, there is no clear COVID-19 stage definition, and few studies have addressed characterizing COVID-19 progression, making the need for this study evident.We proposed a temporal deep learning method, based on a time-aware long short-term memory (T-LSTM) neural network and used an online open dataset, including blood samples of 485 patients from Wuhan, China, to train the model. Our method can grasp the dynamic relations in irregularly sampled time series, which is ignored by existing works. Specifically, our method predicted the outcome of COVID-19 patients by considering both the biomarkers and the irregular time intervals. Then, we used the patient representations, extracted from T-LSTM units, to subtype the patient stages and describe the disease progression of COVID-19.Using our method, the accuracy of the outcome of prediction results was more than 90% at 12 days and 98, 95 and 93% at 3, 6, and 9 days, respectively. Most importantly, we found 4 stages of COVID-19 progression with different patient statuses and mortality risks. We ranked 40 biomarkers related to disease and gave the reference values of them for each stage. Top 5 is Lymph, LDH, hs-CRP, Indirect Bilirubin, Creatinine. Besides, we have found 3 complications - myocardial injury, liver function injury and renal function injury. Predicting which of the 4 stages the patient is currently in can help doctors better assess and cure the patient.To combat the COVID-19 epidemic, this paper aims to help clinicians better assess and treat infected patients, provide relevant researchers with potential disease progression patterns, and enable more effective use of medical resources. Our method predicted patient outcomes with high accuracy and identified a four-stage disease progression. We hope that the obtained results and patterns will aid in fighting the disease.
View details for DOI 10.1186/s12911-020-01359-9
View details for PubMedID 33557818
View details for PubMedCentralID PMC7869774
-
Practical Lessons on 12-Lead ECG Classification: Meta-Analysis of Methods From PhysioNet/Computing in Cardiology Challenge 2020.
Frontiers in physiology
2021; 12: 811661
Abstract
Cardiovascular diseases (CVDs) are one of the most fatal disease groups worldwide. Electrocardiogram (ECG) is a widely used tool for automatically detecting cardiac abnormalities, thereby helping to control and manage CVDs. To encourage more multidisciplinary researches, PhysioNet/Computing in Cardiology Challenge 2020 (Challenge 2020) provided a public platform involving multi-center databases and automatic evaluations for ECG classification tasks. As a result, 41 teams successfully submitted their solutions and were qualified for rankings. Although Challenge 2020 was a success, there has been no in-depth methodological meta-analysis of these solutions, making it difficult for researchers to benefit from the solutions and results. In this study, we aim to systematically review the 41 solutions in terms of data processing, feature engineering, model architecture, and training strategy. For each perspective, we visualize and statistically analyze the effectiveness of the common techniques, and discuss the methodological advantages and disadvantages. Finally, we summarize five practical lessons based on the aforementioned analysis: (1) Data augmentation should be employed and adapted to specific scenarios; (2) Combining different features can improve performance; (3) A hybrid design of different types of deep neural networks (DNNs) is better than using a single type; (4) The use of end-to-end architectures should depend on the task being solved; (5) Multiple models are better than one. We expect that our meta-analysis will help accelerate the research related to ECG classification based on machine-learning models.
View details for DOI 10.3389/fphys.2021.811661
View details for PubMedID 35095568
View details for PubMedCentralID PMC8795785
-
TE-ESN: Time Encoding Echo State Network for Prediction Based on Irregularly Sampled Time Series Data
edited by Zhou, Z. H.
IJCAI-INT JOINT CONF ARTIF INTELL. 2021: 3010-3016
View details for Web of Science ID 001202335503012
-
TE-ESN: Time Encoding Echo State Network for Prediction Based on Irregularly Sampled Time Series Data
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI 2021)
2021
View details for DOI 10.24963/ijcai.2021/414
https://orcid.org/0000-0002-1762-0877