Bio
I am a Research Scientist at CNSlab, advised by Professor Kilian. I received my PhD from University of Oulu , Finland, where I was advised by Academy Professor Guoying Zhao. During my PhD study, I was lucky enough to have the opportunity to visit Harvard Medical School and ETH Zurich. Prior to that, I received a B.E. degree from UESTC, China, and Master's degree from Xiamen University, China. My research interests include Machine Learning, Geometric neural networks, and Medical image analysis with special emphasis on Neuroscience.
Honors & Awards
-
Finnish AI Dissertation Award 2022, Finnish AI Society (2023)
-
ISMRM Magna Cum Laude Merit Award, ISMRM (2022)
-
Excellent for my doctoral thesis defense, University of Oulu (2022)
-
Best conference paper award (Finland Section), IEEE (2020)
-
The 2nd Place on light-weight Action Recognition, ECCV (2020)
All Publications
-
Metadata-conditioned generative models to synthesize anatomically-plausible 3D brain MRIs.
Medical image analysis
2024; 98: 103325
Abstract
Recent advances in generative models have paved the way for enhanced generation of natural and medical images, including synthetic brain MRIs. However, the mainstay of current AI research focuses on optimizing synthetic MRIs with respect to visual quality (such as signal-to-noise ratio) while lacking insights into their relevance to neuroscience. To generate high-quality T1-weighted MRIs relevant for neuroscience discovery, we present a two-stage Diffusion Probabilistic Model (called BrainSynth) to synthesize high-resolution MRIs conditionally-dependent on metadata (such as age and sex). We then propose a novel procedure to assess the quality of BrainSynth according to how well its synthetic MRIs capture macrostructural properties of brain regions and how accurately they encode the effects of age and sex. Results indicate that more than half of the brain regions in our synthetic MRIs are anatomically plausible, i.e., the effect size between real and synthetic MRIs is small relative to biological factors such as age and sex. Moreover, the anatomical plausibility varies across cortical regions according to their geometric complexity. As is, the MRIs generated by BrainSynth significantly improve the training of a predictive model to identify accelerated aging effects in an independent study. These results indicate that our model accurately capture the brain's anatomical information and thus could enrich the data of underrepresented samples in a study. The code of BrainSynth will be released as part of the MONAI project at https://github.com/Project-MONAI/GenerativeModels.
View details for DOI 10.1016/j.media.2024.103325
View details for PubMedID 39208560
-
MedSyn: Text-guided Anatomy-aware Synthesis of High-Fidelity 3D CT Images.
IEEE transactions on medical imaging
2024; PP
Abstract
This paper introduces an innovative methodology for producing high-quality 3D lung CT images guided by textual information. While diffusion-based generative models are increasingly used in medical imaging, current state-of-the-art approaches are limited to low-resolution outputs and underutilize radiology reports' abundant information. The radiology reports can enhance the generation process by providing additional guidance and offering fine-grained control over the synthesis of images. Nevertheless, expanding text-guided generation to high-resolution 3D images poses significant memory and anatomical detail-preserving challenges. Addressing the memory issue, we introduce a hierarchical scheme that uses a modified UNet architecture. We start by synthesizing low-resolution images conditioned on the text, serving as a foundation for subsequent generators for complete volumetric data. To ensure the anatomical plausibility of the generated samples, we provide further guidance by generating vascular, airway, and lobular segmentation masks in conjunction with the CT images. The model demonstrates the capability to use textual input and segmentation tasks to generate synthesized images. Algorithmic comparative assessments and blind evaluations conducted by 10 board-certified radiologists indicate that our approach exhibits superior performance compared to the most advanced models based on GAN and diffusion techniques, especially in accurately retaining crucial anatomical features such as fissure lines and airways. This innovation introduces novel possibilities. This study focuses on two main objectives: (1) the development of a method for creating images based on textual prompts and anatomical components, and (2) the capability to generate new images conditioning on anatomical elements. The advancements in image generation can be applied to enhance numerous downstream tasks.
View details for DOI 10.1109/TMI.2024.3415032
View details for PubMedID 38900619
-
Generating Realistic Brain MRIs via a Conditional Diffusion Probabilistic Model.
Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention
2023; 14227: 14-24
Abstract
As acquiring MRIs is expensive, neuroscience studies struggle to attain a sufficient number of them for properly training deep learning models. This challenge could be reduced by MRI synthesis, for which Generative Adversarial Networks (GANs) are popular. GANs, however, are commonly unstable and struggle with creating diverse and high-quality data. A more stable alternative is Diffusion Probabilistic Models (DPMs) with a fine-grained training strategy. To overcome their need for extensive computational resources, we propose a conditional DPM (cDPM) with a memory-efficient process that generates realistic-looking brain MRIs. To this end, we train a 2D cDPM to generate an MRI subvolume conditioned on another subset of slices from the same MRI. By generating slices using arbitrary combinations between condition and target slices, the model only requires limited computational resources to learn interdependencies between slices even if they are spatially far apart. After having learned these dependencies via an attention network, a new anatomy-consistent 3D brain MRI is generated by repeatedly applying the cDPM. Our experiments demonstrate that our method can generate high-quality 3D MRIs that share a similar distribution to real MRIs while still diversifying the training set. The code is available at https://github.com/xiaoiker/mask3DMRI_diffusion and also will be released as part of MONAI, at https://github.com/Project-MONAI/GenerativeModels.
View details for DOI 10.1007/978-3-031-43993-3_2
View details for PubMedID 38169668
View details for PubMedCentralID PMC10758344
-
TOPLight: Lightweight Neural Networks with Task-Oriented Pretraining for Visible-Infrared Recognition
IEEE COMPUTER SOC. 2023: 3541-3550
View details for DOI 10.1109/CVPR52729.2023.00345
View details for Web of Science ID 001058542603080
-
Hyperbolic Deep Neural Networks: A Survey.
IEEE transactions on pattern analysis and machine intelligence
2022; 44 (12): 10023-10044
Abstract
Recently, hyperbolic deep neural networks (HDNNs) have been gaining momentum as the deep representations in the hyperbolic space provide high fidelity embeddings with few dimensions, especially for data possessing hierarchical structure. Such a hyperbolic neural architecture is quickly extended to different scientific fields, including natural language processing, single-cell RNA-sequence analysis, graph embedding, financial analysis, and computer vision. The promising results demonstrate its superior capability, significant compactness of the model, and a substantially better physical interpretability than its counterpart in the euclidean space. To stimulate future research, this paper presents a comprehensive review of the literature around the neural components in the construction of HDNN, as well as the generalization of the leading deep approaches to the hyperbolic space. It also presents current applications of various tasks, together with insightful observations and identifying open questions and promising future directions.
View details for DOI 10.1109/TPAMI.2021.3136921
View details for PubMedID 34932472
-
Learning Optimal K-space Acquisition and Reconstruction using Physics-Informed Neural Networks
IEEE COMPUTER SOC. 2022: 20762-20771
View details for DOI 10.1109/CVPR52688.2022.02013
View details for Web of Science ID 000870783006057
-
Tripool: Graph triplet pooling for 3D skeleton-based action recognition
PATTERN RECOGNITION
2021; 115
View details for DOI 10.1016/j.patcog.2021.107921
View details for Web of Science ID 000639744500007
-
Revealing the Invisible with Model and Data Shrinking for Composite-database Micro-expression Recognition.
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
2020; PP
Abstract
Composite-database micro-expression recognition is attracting increasing attention as it is more practical for real-world applications. Though the composite database provides more sample diversity for learning good representation models, the important subtle dynamics are prone to disappearing in the domain shift such that the models greatly degrade their performance, especially for deep models. In this paper, we analyze the influence of learning complexity, including input complexity and model complexity, and discover that the lower-resolution input data and shallower-architecture model are helpful to ease the degradation of deep models in composite-database task. Based on this, we propose a recurrent convolutional network (RCN) to explore the shallower-architecture and lower-resolution input data, shrinking model and input complexities simultaneously. Furthermore, we develop three parameter-free modules (i.e., wide expansion, shortcut connection and attention unit) to integrate with RCN without increasing any learnable parameters. These three modules can enhance the representation ability in various perspectives while preserving not-very-deep architecture for lower-resolution data. Besides, three modules can further be combined by an automatic strategy (a neural architecture search strategy) and the searched architecture becomes more robust. Extensive experiments on the MEGC2019 dataset (composited of existing SMIC, CASME II and SAMM datasets) have verified the influence of learning complexity and shown that RCNs with three modules and the searched combination outperform the state-of-the-art approaches.
View details for DOI 10.1109/TIP.2020.3018222
View details for PubMedID 32845838
-
Learning Graph Convolutional Network for Skeleton-Based Human Action Recognition by Neural Searching
ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE. 2020: 2669-2676
View details for Web of Science ID 000667722802090
-
Mix Dimension in Poincare Geometry for 3D Skeleton-based Action Recognition
ASSOC COMPUTING MACHINERY. 2020: 1432-1440
View details for DOI 10.1145/3394171.3413910
View details for Web of Science ID 000810735001055
-
Remote Heart Rate Measurement from Highly Compressed Facial Videos: an End-to-end Deep Learning Solution with Video Enhancement
IEEE COMPUTER SOC. 2019: 151-160
View details for DOI 10.1109/ICCV.2019.00024
View details for Web of Science ID 000531438100016
-
HRCUNet: Hierarchical Region Contrastive Learning for Segmentation of Breast Tumors in DCE-MRI
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE
2024
View details for DOI 10.1002/cpe.8319
View details for Web of Science ID 001354204600001
-
Large Language Models in Healthcare and Medical Domain: A Review
INFORMATICS-BASEL
2024; 11 (3)
View details for DOI 10.3390/informatics11030057
View details for Web of Science ID 001323615500001
-
Geometric Graph Representation With Learnable Graph Structure and Adaptive AU Constraint for Micro-Expression Recognition
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING
2024; 15 (3): 1343-1357
View details for DOI 10.1109/TAFFC.2023.3340016
View details for Web of Science ID 001308401200070
-
Rethinking Few-Shot Class-Incremental Learning With Open-Set Hypothesis in Hyperbolic Geometry
IEEE TRANSACTIONS ON MULTIMEDIA
2024; 26: 5897-5910
View details for DOI 10.1109/TMM.2023.3340550
View details for Web of Science ID 001197874100001
-
Data Leakage and Evaluation Issues in Micro-Expression Analysis
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING
2024; 15 (1): 186-197
View details for DOI 10.1109/TAFFC.2023.3265063
View details for Web of Science ID 001178971100005
-
LSOR: Longitudinally-Consistent Self-Organized Representation Learning.
Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention
2023; 14220: 279-289
Abstract
Interpretability is a key issue when applying deep learning models to longitudinal brain MRIs. One way to address this issue is by visualizing the high-dimensional latent spaces generated by deep learning via self-organizing maps (SOM). SOM separates the latent space into clusters and then maps the cluster centers to a discrete (typically 2D) grid preserving the high-dimensional relationship between clusters. However, learning SOM in a high-dimensional latent space tends to be unstable, especially in a self-supervision setting. Furthermore, the learned SOM grid does not necessarily capture clinically interesting information, such as brain age. To resolve these issues, we propose the first self-supervised SOM approach that derives a high-dimensional, interpretable representation stratified by brain age solely based on longitudinal brain MRIs (i.e., without demographic or cognitive information). Called Longitudinally-consistent Self-Organized Representation learning (LSOR), the method is stable during training as it relies on soft clustering (vs. the hard cluster assignments used by existing SOM). Furthermore, our approach generates a latent space stratified according to brain age by aligning trajectories inferred from longitudinal MRIs to the reference vector associated with the corresponding SOM cluster. When applied to longitudinal MRIs of the Alzheimer's Disease Neuroimaging Initiative (ADNI, N=632), LSOR generates an interpretable latent space and achieves comparable or higher accuracy than the state-of-the-art representations with respect to the downstream tasks of classification (static vs. progressive mild cognitive impairment) and regression (determining ADAS-Cog score of all subjects). The code is available at https://github.com/ouyangjiahong/longitudinal-som-single-modality.
View details for DOI 10.1007/978-3-031-43907-0_27
View details for PubMedID 37961067
-
Efficient Hyperbolic Perceptron for Image Classification
ELECTRONICS
2023; 12 (19)
View details for DOI 10.3390/electronics12194027
View details for Web of Science ID 001119440600001
-
Hyperbolic Uncertainty Aware Semantic Segmentation
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS
2023
View details for DOI 10.1109/TITS.2023.3312290
View details for Web of Science ID 001071996700001
-
Imputing Brain Measurements Across Data Sets via Graph Neural Networks.
PRedictive Intelligence in MEdicine. PRIME (Workshop)
2023; 14277: 172-183
Abstract
Publicly available data sets of structural MRIs might not contain specific measurements of brain Regions of Interests (ROIs) that are important for training machine learning models. For example, the curvature scores computed by Freesurfer are not released by the Adolescent Brain Cognitive Development (ABCD) Study. One can address this issue by simply reapplying Freesurfer to the data set. However, this approach is generally computationally and labor intensive (e.g., requiring quality control). An alternative is to impute the missing measurements via a deep learning approach. However, the state-of-the-art is designed to estimate randomly missing values rather than entire measurements. We therefore propose to re-frame the imputation problem as a prediction task on another (public) data set that contains the missing measurements and shares some ROI measurements with the data sets of interest. A deep learning model is then trained to predict the missing measurements from the shared ones and afterwards is applied to the other data sets. Our proposed algorithm models the dependencies between ROI measurements via a graph neural network (GNN) and accounts for demographic differences in brain measurements (e.g. sex) by feeding the graph encoding into a parallel architecture. The architecture simultaneously optimizes a graph decoder to impute values and a classifier in predicting demographic factors. We test the approach, called Demographic Aware Graph-based Imputation (DAGI), on imputing those missing Freesurfer measurements of ABCD (N=3760; minimum age 12 years) by training the predictor on those publicly released by the National Consortium on Alcohol and Neurodevelopment in Adolescence (NCANDA, N=540). 5-fold cross-validation on NCANDA reveals that the imputed scores are more accurate than those generated by linear regressors and deep learning models. Adding them also to a classifier trained in identifying sex results in higher accuracy than only using those Freesurfer scores provided by ABCD.
View details for DOI 10.1007/978-3-031-46005-0_15
View details for PubMedID 37946742
-
Modality Unifying Network for Visible-Infrared Person Re-Identification
IEEE COMPUTER SOC. 2023: 11151-11161
View details for DOI 10.1109/ICCV51070.2023.01027
View details for Web of Science ID 001169499003057