Hejie Cui
Postdoctoral Scholar, Biomedical Informatics
Bio
Dr. Hejie Cui is a postdoctoral researcher at the Stanford Center for Biomedical Informatics Research at Stanford University. Her research focuses on the intersection of machine learning, data mining, and biomedical informatics. At Stanford, Dr. Cui works on large language model (LLM) evaluation and post-training for healthcare. Dr. Cui has authored and co-authored several publications in top computer science and interdisciplinary venues, including NeurIPS, KDD, AAAI, CIKM, TMI, and MICCAI. Her work contributes to advancing the application of artificial intelligence in healthcare and improving the understanding of complex biomedical data. Dr. Cui was selected as a Rising Star in EECS in 2023. She has also received numerous awards, including the Fellowship of 2021 CRA-WP Grad Cohort for Women, Student Travel Grant Award for MICCAI'22, NSF Travel Grant for CIKM'22, and NeurIPS AI4Science Travel Award for NeurIPS'22. Dr. Cui holds a Ph.D. in Computer Science from Emory University (2024) and a B.Eng. in Computer Science and Engineering from Tongji University (2019). During her graduate studies, she gained industry experience through internships at Microsoft Research and Amazon Science.
Honors & Awards
-
Rising Star in EECS, EECS Rising Stars Committee (11/2023)
-
Laney-EDGE Graduate School Diverse Scholars in the Sciences, Laney Graduate School, Emory University (08/2023)
-
Laney Graduate Student Council Research Grant, Emory University (11/2022)
-
Award for CRA-WP Grad Cohort for Women, Computing Research Association (04/2021)
-
Mitacs Globalink Research Award (GRA), Canada Mitacs (05/2018)
Professional Education
-
PhD, Emory University, Computer Science (2024)
-
BEng, Tongji University, Computer Science and Engineering (2019)
All Publications
-
BrainGB: A Benchmark for Brain Network Analysis With Graph Neural Networks
IEEE TRANSACTIONS ON MEDICAL IMAGING
2023; 42 (2): 493-506
Abstract
Mapping the connectome of the human brain using structural or functional connectivity has become one of the most pervasive paradigms for neuroimaging analysis. Recently, Graph Neural Networks (GNNs) motivated from geometric deep learning have attracted broad interest due to their established power for modeling complex networked data. Despite their superior performance in many fields, there has not yet been a systematic study of how to design effective GNNs for brain network analysis. To bridge this gap, we present BrainGB, a benchmark for brain network analysis with GNNs. BrainGB standardizes the process by (1) summarizing brain network construction pipelines for both functional and structural neuroimaging modalities and (2) modularizing the implementation of GNN designs. We conduct extensive experiments on datasets across cohorts and modalities and recommend a set of general recipes for effective GNN designs on brain networks. To support open and reproducible research on GNN-based brain network analysis, we host the BrainGB website at https://braingb.us with models, tutorials, examples, as well as an out-of-box Python package. We hope that this work will provide useful empirical evidence and offer insights for future research in this novel and promising direction.
View details for DOI 10.1109/TMI.2022.3218745
View details for Web of Science ID 000934156000015
View details for PubMedID 36318557
-
Neighborhood-Regularized Self-Training for Learning with Few Labels
ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE. 2023: 10611-10619
Abstract
Training deep neural networks (DNNs) with limited supervision has been a popular research topic as it can significantly alleviate the annotation burden. Self-training has been successfully applied in semi-supervised learning tasks, but one drawback of self-training is that it is vulnerable to the label noise from incorrect pseudo labels. Inspired by the fact that samples with similar labels tend to share similar representations, we develop a neighborhood-based sample selection approach to tackle the issue of noisy pseudo labels. We further stabilize self-training via aggregating the predictions from different rounds during sample selection. Experiments on eight tasks show that our proposed method outperforms the strongest self-training baseline with 1.83% and 2.51% performance gain for text and graph datasets on average. Our further analysis demonstrates that our proposed data selection strategy reduces the noise of pseudo labels by 36.8% and saves 57.3% of the time when compared with the best baseline. Our code and appendices will be uploaded to https://github.com/ritaranx/NeST.
View details for Web of Science ID 001243747800035
View details for PubMedID 38333625
View details for PubMedCentralID PMC10851329
-
R-Mixup: Riemannian Mixup for Biological Networks
ASSOC COMPUTING MACHINERY. 2023: 1073-1085
Abstract
Biological networks are commonly used in biomedical and healthcare domains to effectively model the structure of complex biological systems with interactions linking biological entities. However, due to their characteristics of high dimensionality and low sample size, directly applying deep learning models on biological networks usually faces severe overfitting. In this work, we propose R-Mixup, a Mixup-based data augmentation technique that suits the symmetric positive definite (SPD) property of adjacency matrices from biological networks with optimized training efficiency. The interpolation process in R-Mixup leverages the log-Euclidean distance metrics from the Riemannian manifold, effectively addressing the swelling effect and arbitrarily incorrect label issues of vanilla Mixup. We demonstrate the effectiveness of R-Mixup with five real-world biological network datasets on both regression and classification tasks. Besides, we derive a commonly ignored necessary condition for identifying the SPD matrices of biological networks and empirically study its influence on the model performance. The code implementation can be found in Appendix E.
View details for DOI 10.1145/3580305.3599483
View details for Web of Science ID 001118896301013
View details for PubMedID 38343707
View details for PubMedCentralID PMC10853987
-
Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023
View details for Web of Science ID 001227224005020
-
Interpretable Graph Neural Networks for Connectome-Based Brain Disorder Analysis
SPRINGER INTERNATIONAL PUBLISHING AG. 2022: 375-385
View details for DOI 10.1007/978-3-031-16452-1_36
View details for Web of Science ID 000867418200036
-
How Can Graph Neural Networks Help Document Retrieval: A Case Study on CORD19 with Concept Map Generation
SPRINGER INTERNATIONAL PUBLISHING AG. 2022: 75-83
View details for DOI 10.1007/978-3-030-99739-7_9
View details for Web of Science ID 000787788000009
-
On Positional and Structural Node Features for Graph Neural Networks on Non-attributed Graphs
ASSOC COMPUTING MACHINERY. 2022: 3898-3902
View details for DOI 10.1145/3511808.3557661
View details for Web of Science ID 001074639603094
-
Pulmonary Vessel Segmentation Based on Orthogonal Fused U-Net plus plus of Chest CT Images
SPRINGER INTERNATIONAL PUBLISHING AG. 2019: 293-300
View details for DOI 10.1007/978-3-030-32226-7_33
View details for Web of Science ID 000548737100033
-
BrainSTEAM: A Practical Pipeline for Connectome-based fMRI Analysis towards Subject Classification
WORLD SCIENTIFIC PUBL CO PTE LTD. 2024: 53-64
Abstract
Functional brain networks represent dynamic and complex interactions among anatomical regions of interest (ROIs), providing crucial clinical insights for neural pattern discovery and disorder diagnosis. In recent years, graph neural networks (GNNs) have proven immense success and effectiveness in analyzing structured network data. However, due to the high complexity of data acquisition, resulting in limited training resources of neuroimaging data, GNNs, like all deep learning models, suffer from overfitting. Moreover, their capability to capture useful neural patterns for downstream prediction is also adversely affected. To address such challenge, this study proposes BrainSTEAM, an integrated framework featuring a spatio-temporal module that consists of an EdgeConv GNN model, an autoencoder network, and a Mixup strategy. In particular, the spatio-temporal module aims to dynamically segment the time series signals of the ROI features for each subject into chunked sequences. We leverage each sequence to construct correlation networks, thereby increasing the training data. Additionally, we employ the EdgeConv GNN to capture ROI connectivity structures, an autoencoder for data denoising, and mixup for enhancing model training through linear data augmentation. We evaluate our framework on two real-world neuroimaging datasets, ABIDE for Autism prediction and HCP for gender prediction. Extensive experiments demonstrate the superiority and robustness of BrainSTEAM when compared to a variety of existing models, showcasing the strong potential of our proposed mechanisms in generalizing to other studies for connectome-based fMRI analysis.
View details for Web of Science ID 001258333100005
View details for PubMedID 38160269
-
Federated Learning for Cross-Institution Brain Network Analysis
SPIE-INT SOC OPTICAL ENGINEERING. 2024
View details for DOI 10.1117/12.3005883
View details for Web of Science ID 001208134600016
-
FedBrain: Federated Training of Graph Neural Networks for Connectome-based Brain Imaging Analysis
WORLD SCIENTIFIC PUBL CO PTE LTD. 2024: 214-225
Abstract
Recent advancements in neuroimaging techniques have sparked a growing interest in understanding the complex interactions between anatomical regions of interest (ROIs), forming into brain networks that play a crucial role in various clinical tasks, such as neural pattern discovery and disorder diagnosis. In recent years, graph neural networks (GNNs) have emerged as powerful tools for analyzing network data. However, due to the complexity of data acquisition and regulatory restrictions, brain network studies remain limited in scale and are often confined to local institutions. These limitations greatly challenge GNN models to capture useful neural circuitry patterns and deliver robust downstream performance. As a distributed machine learning paradigm, federated learning (FL) provides a promising solution in addressing resource limitation and privacy concerns, by enabling collaborative learning across local institutions (i.e., clients) without data sharing. While the data heterogeneity issues have been extensively studied in recent FL literature, cross-institutional brain network analysis presents unique data heterogeneity challenges, that is, the inconsistent ROI parcellation systems and varying predictive neural circuitry patterns across local neuroimaging studies. To this end, we propose FedBrain, a GNN-based personalized FL framework that takes into account the unique properties of brain network data. Specifically, we present a federated atlas mapping mechanism to overcome the feature and structure heterogeneity of brain networks arising from different ROI atlas systems, and a clustering approach guided by clinical prior knowledge to address varying predictive neural circuitry patterns regarding different patient groups, neuroimaging modalities and clinical outcomes. Compared to existing FL strategies, our approach demonstrates superior and more consistent performance, showcasing its strong potential and generalizability in cross-institutional connectome-based brain imaging analysis. The implementation is available here.
View details for Web of Science ID 001258333100016
View details for PubMedID 38160281
-
Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting.
Advances in neural information processing systems
2023; 36: 23499-23519
Abstract
Images contain rich relational knowledge that can help machines understand the world. Existing methods on visual knowledge extraction often rely on the pre-defined format (e.g., sub-verb-obj tuples) or vocabulary (e.g., relation types), restricting the expressiveness of the extracted knowledge. In this work, we take a first exploration to a new paradigm of open visual knowledge extraction. To achieve this, we present OpenVik which consists of an open relational region detector to detect regions potentially containing relational knowledge and a visual knowledge generator that generates format-free knowledge by prompting the large multimodality model with the detected region of interest. We also explore two data enhancement techniques for diversifying the generated format-free visual knowledge. Extensive knowledge quality evaluations highlight the correctness and uniqueness of the extracted open visual knowledge by OpenVik. Moreover, integrating our extracted knowledge across various visual reasoning applications shows consistent improvements, indicating the real-world applicability of OpenVik.
View details for PubMedID 39130613
-
TRANSFORMER-BASED HIERARCHICAL CLUSTERING FOR BRAIN NETWORK ANALYSIS
IEEE. 2023
View details for DOI 10.1109/ISBI53787.2023.10230606
View details for Web of Science ID 001062050500283
-
PTGB: Pre-Train Graph Neural Networks for Brain Network Analysis
JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2023: 526-544
View details for Web of Science ID 001221739300030
-
DYNAMIC BRAIN TRANSFORMER WITH MULTI-LEVEL ATTENTION FOR FUNCTIONAL BRAIN NETWORK ANALYSIS
IEEE. 2023
View details for DOI 10.1109/BHI58575.2023.10313480
View details for Web of Science ID 001107519300049
-
DEEP DAG LEARNING OF EFFECTIVE BRAIN CONNECTIVITY FOR FMRI ANALYSIS
IEEE. 2023
Abstract
Functional magnetic resonance imaging (fMRI) has become one of the most common imaging modalities for brain function analysis. Recently, graph neural networks (GNN) have been adopted for fMRI analysis with superior performance. Unfortunately, traditional functional brain networks are mainly constructed based on similarities among region of interests (ROIs), which are noisy and can lead to inferior results for GNN models. To better adapt GNNs for fMRI analysis, we propose DABNet, a Deep DAG learning framework based on Brain Networks for fMRI analysis. DABNet adopts a brain network generator module, which harnesses the DAG learning approach to transform the raw time-series into effective brain connectivities. Experiments on two fMRI datasets demonstrate the efficacy of DABNet. The generated brain networks also highlight the prediction-related brain regions and thus provide interpretations for predictions.
View details for DOI 10.1109/ISBI53787.2023.10230429
View details for Web of Science ID 001062050500107
View details for PubMedID 38868456
View details for PubMedCentralID PMC11168307
-
Joint Embedding of Structural and Functional Brain Networks with Graph Neural Networks for Mental Illness Diagnosis.
Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
2022; 2022: 272-276
Abstract
Multimodal brain networks characterize complex connectivities among different brain regions from both structural and functional aspects and provide a new means for mental disease analysis. Recently, Graph Neural Networks (GNNs) have become a de facto model for analyzing graph-structured data. However, how to employ GNNs to extract effective representations from brain networks in multiple modalities remains rarely explored. Moreover, as brain networks provide no initial node features, how to design informative node attributes and leverage edge weights for GNNs to learn is left unsolved. To this end, we develop a novel multiview GNN for multimodal brain networks. In particular, we treat each modality as a view for brain networks and employ contrastive learning for multimodal fusion. Then, we propose a GNN model which takes advantage of the message passing scheme by propagating messages based on degree statistics and brain region connectivities. Extensive experiments on two real-world disease datasets (HIV and Bipolar) demonstrate the effectiveness of our proposed method over state-of-the-art baselines.
View details for DOI 10.1109/EMBC48229.2022.9871118
View details for PubMedID 36085703
-
BRAIN NETWORK TRANSFORMER
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2022
View details for Web of Science ID 001213927501046
-
Data-Efficient Brain Connectome Analysis via Multi-Task Meta-Learning
ASSOC COMPUTING MACHINERY. 2022: 4743-4751
View details for DOI 10.1145/3534678.3542680
View details for Web of Science ID 001119000304076
-
FBNetGen: Task-aware GNN-based fMRI Analysis via Functional Brain Network Generation
JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2022: 618-637
Abstract
Functional magnetic resonance imaging (fMRI) is one of the most common imaging modalities to investigate brain functions. Recent studies in neuroscience stress the great potential of functional brain networks constructed from fMRI data for clinical predictions. Traditional functional brain networks, however, are noisy and unaware of downstream prediction tasks, while also incompatible with the deep graph neural network (GNN) models. In order to fully unleash the power of GNNs in network-based fMRI analysis, we develop FBNETGEN, a task-aware and interpretable fMRI analysis framework via deep brain network generation. In particular, we formulate (1) prominent region of interest (ROI) features extraction, (2) brain networks generation, and (3) clinical predictions with GNNs, in an end-to-end trainable model under the guidance of particular prediction tasks. Along with the process, the key novel component is the graph generator which learns to transform raw time-series features into task-oriented brain networks. Our learnable graphs also provide unique interpretations by highlighting prediction-related brain regions. Comprehensive experiments on two datasets, i.e., the recently released and currently largest publicly available fMRI dataset Adolescent Brain Cognitive Development (ABCD), and the widely-used fMRI dataset PNC, prove the superior effectiveness and interpretability of FBNETGEN. The implementation is available at https://github.com/Wayfear/FBNETGEN.
View details for Web of Science ID 001227587200039
View details for PubMedID 37377881
View details for PubMedCentralID PMC10296778
-
Zero-Shot Scene Graph Relation Prediction Through Commonsense Knowledge Integration
SPRINGER INTERNATIONAL PUBLISHING AG. 2021: 466-482
View details for DOI 10.1007/978-3-030-86520-7_29
View details for Web of Science ID 000713032300029