All Publications


  • Metadata-conditioned generative models to synthesize anatomically-plausible 3D brain MRIs. Medical image analysis Peng, W., Bosschieter, T., Ouyang, J., Paul, R., Sullivan, E. V., Pfefferbaum, A., Adeli, E., Zhao, Q., Pohl, K. M. 2024; 98: 103325

    Abstract

    Recent advances in generative models have paved the way for enhanced generation of natural and medical images, including synthetic brain MRIs. However, the mainstay of current AI research focuses on optimizing synthetic MRIs with respect to visual quality (such as signal-to-noise ratio) while lacking insights into their relevance to neuroscience. To generate high-quality T1-weighted MRIs relevant for neuroscience discovery, we present a two-stage Diffusion Probabilistic Model (called BrainSynth) to synthesize high-resolution MRIs conditionally-dependent on metadata (such as age and sex). We then propose a novel procedure to assess the quality of BrainSynth according to how well its synthetic MRIs capture macrostructural properties of brain regions and how accurately they encode the effects of age and sex. Results indicate that more than half of the brain regions in our synthetic MRIs are anatomically plausible, i.e., the effect size between real and synthetic MRIs is small relative to biological factors such as age and sex. Moreover, the anatomical plausibility varies across cortical regions according to their geometric complexity. As is, the MRIs generated by BrainSynth significantly improve the training of a predictive model to identify accelerated aging effects in an independent study. These results indicate that our model accurately capture the brain's anatomical information and thus could enrich the data of underrepresented samples in a study. The code of BrainSynth will be released as part of the MONAI project at https://github.com/Project-MONAI/GenerativeModels.

    View details for DOI 10.1016/j.media.2024.103325

    View details for PubMedID 39208560

  • MedSyn: Text-guided Anatomy-aware Synthesis of High-Fidelity 3D CT Images. IEEE transactions on medical imaging Xu, Y., Sun, L., Peng, W., Jia, S., Morrison, K., Perer, A., Zandifar, A., Visweswaran, S., Eslami, M., Batmanghelich, K. 2024; PP

    Abstract

    This paper introduces an innovative methodology for producing high-quality 3D lung CT images guided by textual information. While diffusion-based generative models are increasingly used in medical imaging, current state-of-the-art approaches are limited to low-resolution outputs and underutilize radiology reports' abundant information. The radiology reports can enhance the generation process by providing additional guidance and offering fine-grained control over the synthesis of images. Nevertheless, expanding text-guided generation to high-resolution 3D images poses significant memory and anatomical detail-preserving challenges. Addressing the memory issue, we introduce a hierarchical scheme that uses a modified UNet architecture. We start by synthesizing low-resolution images conditioned on the text, serving as a foundation for subsequent generators for complete volumetric data. To ensure the anatomical plausibility of the generated samples, we provide further guidance by generating vascular, airway, and lobular segmentation masks in conjunction with the CT images. The model demonstrates the capability to use textual input and segmentation tasks to generate synthesized images. Algorithmic comparative assessments and blind evaluations conducted by 10 board-certified radiologists indicate that our approach exhibits superior performance compared to the most advanced models based on GAN and diffusion techniques, especially in accurately retaining crucial anatomical features such as fissure lines and airways. This innovation introduces novel possibilities. This study focuses on two main objectives: (1) the development of a method for creating images based on textual prompts and anatomical components, and (2) the capability to generate new images conditioning on anatomical elements. The advancements in image generation can be applied to enhance numerous downstream tasks.

    View details for DOI 10.1109/TMI.2024.3415032

    View details for PubMedID 38900619

  • Generating Realistic Brain MRIs via a Conditional Diffusion Probabilistic Model. Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention Peng, W., Adeli, E., Bosschieter, T., Hyun Park, S., Zhao, Q., Pohl, K. M. 2023; 14227: 14-24

    Abstract

    As acquiring MRIs is expensive, neuroscience studies struggle to attain a sufficient number of them for properly training deep learning models. This challenge could be reduced by MRI synthesis, for which Generative Adversarial Networks (GANs) are popular. GANs, however, are commonly unstable and struggle with creating diverse and high-quality data. A more stable alternative is Diffusion Probabilistic Models (DPMs) with a fine-grained training strategy. To overcome their need for extensive computational resources, we propose a conditional DPM (cDPM) with a memory-efficient process that generates realistic-looking brain MRIs. To this end, we train a 2D cDPM to generate an MRI subvolume conditioned on another subset of slices from the same MRI. By generating slices using arbitrary combinations between condition and target slices, the model only requires limited computational resources to learn interdependencies between slices even if they are spatially far apart. After having learned these dependencies via an attention network, a new anatomy-consistent 3D brain MRI is generated by repeatedly applying the cDPM. Our experiments demonstrate that our method can generate high-quality 3D MRIs that share a similar distribution to real MRIs while still diversifying the training set. The code is available at https://github.com/xiaoiker/mask3DMRI_diffusion and also will be released as part of MONAI, at https://github.com/Project-MONAI/GenerativeModels.

    View details for DOI 10.1007/978-3-031-43993-3_2

    View details for PubMedID 38169668

    View details for PubMedCentralID PMC10758344

  • Hyperbolic Deep Neural Networks: A Survey. IEEE transactions on pattern analysis and machine intelligence Peng, W., Varanka, T., Mostafa, A., Shi, H., Zhao, G. 2022; 44 (12): 10023-10044

    Abstract

    Recently, hyperbolic deep neural networks (HDNNs) have been gaining momentum as the deep representations in the hyperbolic space provide high fidelity embeddings with few dimensions, especially for data possessing hierarchical structure. Such a hyperbolic neural architecture is quickly extended to different scientific fields, including natural language processing, single-cell RNA-sequence analysis, graph embedding, financial analysis, and computer vision. The promising results demonstrate its superior capability, significant compactness of the model, and a substantially better physical interpretability than its counterpart in the euclidean space. To stimulate future research, this paper presents a comprehensive review of the literature around the neural components in the construction of HDNN, as well as the generalization of the leading deep approaches to the hyperbolic space. It also presents current applications of various tasks, together with insightful observations and identifying open questions and promising future directions.

    View details for DOI 10.1109/TPAMI.2021.3136921

    View details for PubMedID 34932472

  • Tripool: Graph triplet pooling for 3D skeleton-based action recognition PATTERN RECOGNITION Peng, W., Hong, X., Zhao, G. 2021; 115
  • Revealing the Invisible with Model and Data Shrinking for Composite-database Micro-expression Recognition. IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Xia, Z., Peng, W., Khor, H. Q., Feng, X., Zhao, G. 2020; PP

    Abstract

    Composite-database micro-expression recognition is attracting increasing attention as it is more practical for real-world applications. Though the composite database provides more sample diversity for learning good representation models, the important subtle dynamics are prone to disappearing in the domain shift such that the models greatly degrade their performance, especially for deep models. In this paper, we analyze the influence of learning complexity, including input complexity and model complexity, and discover that the lower-resolution input data and shallower-architecture model are helpful to ease the degradation of deep models in composite-database task. Based on this, we propose a recurrent convolutional network (RCN) to explore the shallower-architecture and lower-resolution input data, shrinking model and input complexities simultaneously. Furthermore, we develop three parameter-free modules (i.e., wide expansion, shortcut connection and attention unit) to integrate with RCN without increasing any learnable parameters. These three modules can enhance the representation ability in various perspectives while preserving not-very-deep architecture for lower-resolution data. Besides, three modules can further be combined by an automatic strategy (a neural architecture search strategy) and the searched architecture becomes more robust. Extensive experiments on the MEGC2019 dataset (composited of existing SMIC, CASME II and SAMM datasets) have verified the influence of learning complexity and shown that RCNs with three modules and the searched combination outperform the state-of-the-art approaches.

    View details for DOI 10.1109/TIP.2020.3018222

    View details for PubMedID 32845838

  • HRCUNet: Hierarchical Region Contrastive Learning for Segmentation of Breast Tumors in DCE-MRI CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE He, J., Luo, Z., Peng, W., Su, S., Zhao, X., Zhang, G., Li, S. 2024

    View details for DOI 10.1002/cpe.8319

    View details for Web of Science ID 001354204600001

  • Large Language Models in Healthcare and Medical Domain: A Review INFORMATICS-BASEL Nazi, Z., Peng, W. 2024; 11 (3)
  • Geometric Graph Representation With Learnable Graph Structure and Adaptive AU Constraint for Micro-Expression Recognition IEEE TRANSACTIONS ON AFFECTIVE COMPUTING Wei, J., Peng, W., Lu, G., Li, Y., Yan, J., Zhao, G. 2024; 15 (3): 1343-1357
  • Rethinking Few-Shot Class-Incremental Learning With Open-Set Hypothesis in Hyperbolic Geometry IEEE TRANSACTIONS ON MULTIMEDIA Cui, Y., Yu, Z., Peng, W., Tian, Q., Liu, L. 2024; 26: 5897-5910
  • Data Leakage and Evaluation Issues in Micro-Expression Analysis IEEE TRANSACTIONS ON AFFECTIVE COMPUTING Varanka, T., Li, Y., Peng, W., Zhao, G. 2024; 15 (1): 186-197
  • LSOR: Longitudinally-Consistent Self-Organized Representation Learning. Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention Ouyang, J., Zhao, Q., Adeli, E., Peng, W., Zaharchuk, G., Pohl, K. M. 2023; 14220: 279-289

    Abstract

    Interpretability is a key issue when applying deep learning models to longitudinal brain MRIs. One way to address this issue is by visualizing the high-dimensional latent spaces generated by deep learning via self-organizing maps (SOM). SOM separates the latent space into clusters and then maps the cluster centers to a discrete (typically 2D) grid preserving the high-dimensional relationship between clusters. However, learning SOM in a high-dimensional latent space tends to be unstable, especially in a self-supervision setting. Furthermore, the learned SOM grid does not necessarily capture clinically interesting information, such as brain age. To resolve these issues, we propose the first self-supervised SOM approach that derives a high-dimensional, interpretable representation stratified by brain age solely based on longitudinal brain MRIs (i.e., without demographic or cognitive information). Called Longitudinally-consistent Self-Organized Representation learning (LSOR), the method is stable during training as it relies on soft clustering (vs. the hard cluster assignments used by existing SOM). Furthermore, our approach generates a latent space stratified according to brain age by aligning trajectories inferred from longitudinal MRIs to the reference vector associated with the corresponding SOM cluster. When applied to longitudinal MRIs of the Alzheimer's Disease Neuroimaging Initiative (ADNI, N=632), LSOR generates an interpretable latent space and achieves comparable or higher accuracy than the state-of-the-art representations with respect to the downstream tasks of classification (static vs. progressive mild cognitive impairment) and regression (determining ADAS-Cog score of all subjects). The code is available at https://github.com/ouyangjiahong/longitudinal-som-single-modality.

    View details for DOI 10.1007/978-3-031-43907-0_27

    View details for PubMedID 37961067

  • Efficient Hyperbolic Perceptron for Image Classification ELECTRONICS Ahsan, A., Tang, S., Peng, W. 2023; 12 (19)
  • Hyperbolic Uncertainty Aware Semantic Segmentation IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS Chen, B., Peng, W., Cao, X., Roning, J. 2023
  • Imputing Brain Measurements Across Data Sets via Graph Neural Networks. PRedictive Intelligence in MEdicine. PRIME (Workshop) Wang, Y., Peng, W., Tapert, S. F., Zhao, Q., Pohl, K. M. 2023; 14277: 172-183

    Abstract

    Publicly available data sets of structural MRIs might not contain specific measurements of brain Regions of Interests (ROIs) that are important for training machine learning models. For example, the curvature scores computed by Freesurfer are not released by the Adolescent Brain Cognitive Development (ABCD) Study. One can address this issue by simply reapplying Freesurfer to the data set. However, this approach is generally computationally and labor intensive (e.g., requiring quality control). An alternative is to impute the missing measurements via a deep learning approach. However, the state-of-the-art is designed to estimate randomly missing values rather than entire measurements. We therefore propose to re-frame the imputation problem as a prediction task on another (public) data set that contains the missing measurements and shares some ROI measurements with the data sets of interest. A deep learning model is then trained to predict the missing measurements from the shared ones and afterwards is applied to the other data sets. Our proposed algorithm models the dependencies between ROI measurements via a graph neural network (GNN) and accounts for demographic differences in brain measurements (e.g. sex) by feeding the graph encoding into a parallel architecture. The architecture simultaneously optimizes a graph decoder to impute values and a classifier in predicting demographic factors. We test the approach, called Demographic Aware Graph-based Imputation (DAGI), on imputing those missing Freesurfer measurements of ABCD (N=3760; minimum age 12 years) by training the predictor on those publicly released by the National Consortium on Alcohol and Neurodevelopment in Adolescence (NCANDA, N=540). 5-fold cross-validation on NCANDA reveals that the imputed scores are more accurate than those generated by linear regressors and deep learning models. Adding them also to a classifier trained in identifying sex results in higher accuracy than only using those Freesurfer scores provided by ABCD.

    View details for DOI 10.1007/978-3-031-46005-0_15

    View details for PubMedID 37946742

  • Modality Unifying Network for Visible-Infrared Person Re-Identification Yu, H., Cheng, X., Peng, W., Liu, W., Zhao, G., IEEE IEEE COMPUTER SOC. 2023: 11151-11161