I am a postdoctoral research fellow at Stanford University, working with Prof. Lei Xing. Before that, I obtained my Ph.D. degree in the Department of Computer Science and Engineering, The Chinese University of Hong Kong, supervised by Prof. Pheng-Ann Heng and Prof. Chi-Wing Fu in July 2019. Previously, I received the B. Eng degree from the Department of Computer Science and Technology at Zhejiang University in 2015, under the supervision of Prof. Deng Cai.

My research lies at the intersection of medical image analysis and artificial intelligence. I am dedicated to designing data-efficient learning methods for biomedical image analysis. I also have expertise in deep learning for 3D vision.

Professional Education

  • BEng, Zhejiang University, Computer Science (2015)
  • PhD, The Chinese University of Hong Kong, Computer Science and Engineering (2019)

Stanford Advisors

All Publications

  • Modularized Data-Driven Reconstruction Framework for Non-ideal Focal Spot Effect Elimination in Computed Tomography. Medical physics Zhang, Z., Yu, L., Zhao, W., Xing, L. 2021


    PURPOSE: High-performance computed tomography (CT) plays a vital role in clinical decision making. However, the performance of CT imaging is adversely affected by the non-ideal focal spot size of the X-ray source or degraded by an enlarged focal spot size due to aging. In this work, we aim to develop a deep learning-based strategy to mitigate the problem so that high spatial resolution CT images can be obtained even in the case of a non-ideal X-ray source.METHODS: To reconstruct high-quality CT images from blurred sinograms via joint image and sinogram learning, a cross-domain hybrid model is formulated via deep learning into a modularized data-driven reconstruction (MDR) framework. The proposed MDR framework comprises several blocks, and all the blocks share the same network architecture and network parameters. In essence, each block utilizes two sub-models to generate an estimated blur kernel and a high-quality CT image simultaneously. In this way, our framework generates not only a final high-quality CT image but also a series of intermediate images with gradually improved anatomical details, enhancing the visual perception for clinicians through the dynamic process. We used simulated training datasets to train our model in an end-to-end manner and tested our model on both simulated and realistic experimental datasets.RESULTS: On the simulated testing datasets, our approach increases the information fidelity criterion (IFC) by up to 34.2%, the universal quality index (UQI) by up to 20.3%, the signal-to-noise (SNR) by up to 6.7%, and reduces the root mean square error (RMSE) by up to 10.5% as compared with FBP. Compared with the iterative deconvolution method (NSM), MDR increases IFC by up to 24.7%, UQI by up to 16.7%, SNR by up to 6.0%, and reduces RMSE by up to 9.4%. In the modulation transfer function (MTF) experiment, our method improves the MTF50% by 34.5% and MTF10% by 18.7% as compared with FBP, Similarly remarkably, our method improves MTF50% by 14.3% and MTF10% by 0.9% as compared with NSM. Also, our method shows better imaging results in the edge of bony structures and other tiny structures in the experiments using phantom consisting of ham and a bottle of peanuts.CONCLUSIONS: A modularized data-driven CT reconstruction framework is established to mitigate the blurring effect caused by a non-ideal X-ray source with relatively large focal spot. The proposed method enables us to obtain high-resolution images with less ideal X-ray source.

    View details for DOI 10.1002/mp.14785

    View details for PubMedID 33595900

  • Deep Sinogram Completion With Image Prior for Metal Artifact Reduction in CT Images IEEE TRANSACTIONS ON MEDICAL IMAGING Yu, L., Zhang, Z., Li, X., Xing, L. 2021; 40 (1): 228–38


    Computed tomography (CT) has been widely used for medical diagnosis, assessment, and therapy planning and guidance. In reality, CT images may be affected adversely in the presence of metallic objects, which could lead to severe metal artifacts and influence clinical diagnosis or dose calculation in radiation therapy. In this article, we propose a generalizable framework for metal artifact reduction (MAR) by simultaneously leveraging the advantages of image domain and sinogram domain-based MAR techniques. We formulate our framework as a sinogram completion problem and train a neural network (SinoNet) to restore the metal-affected projections. To improve the continuity of the completed projections at the boundary of metal trace and thus alleviate new artifacts in the reconstructed CT images, we train another neural network (PriorNet) to generate a good prior image to guide sinogram learning, and further design a novel residual sinogram learning strategy to effectively utilize the prior image information for better sinogram completion. The two networks are jointly trained in an end-to-end fashion with a differentiable forward projection (FP) operation so that the prior image generation and deep sinogram completion procedures can benefit from each other. Finally, the artifact-reduced CT images are reconstructed using the filtered backward projection (FBP) from the completed sinogram. Extensive experiments on simulated and real artifacts data demonstrate that our method produces superior artifact-reduced results while preserving the anatomical structures and outperforms other MAR methods.

    View details for DOI 10.1109/TMI.2020.3025064

    View details for Web of Science ID 000604883800020

    View details for PubMedID 32956044

  • Self-Supervised Feature Learning via Exploiting Multi-Modal Data for Retinal Disease Diagnosis IEEE TRANSACTIONS ON MEDICAL IMAGING Li, X., Jia, M., Islam, M., Yu, L., Xing, L. 2020; 39 (12): 4023–33


    The automatic diagnosis of various retinal diseases from fundus images is important to support clinical decision-making. However, developing such automatic solutions is challenging due to the requirement of a large amount of human-annotated data. Recently, unsupervised/self-supervised feature learning techniques receive a lot of attention, as they do not need massive annotations. Most of the current self-supervised methods are analyzed with single imaging modality and there is no method currently utilize multi-modal images for better results. Considering that the diagnostics of various vitreoretinal diseases can greatly benefit from another imaging modality, e.g., FFA, this paper presents a novel self-supervised feature learning method by effectively exploiting multi-modal data for retinal disease diagnosis. To achieve this, we first synthesize the corresponding FFA modality and then formulate a patient feature-based softmax embedding objective. Our objective learns both modality-invariant features and patient-similarity features. Through this mechanism, the neural network captures the semantically shared information across different modalities and the apparent visual similarity between patients. We evaluate our method on two public benchmark datasets for retinal disease diagnosis. The experimental results demonstrate that our method clearly outperforms other self-supervised feature learning methods and is comparable to the supervised baseline. Our code is available at GitHub.

    View details for DOI 10.1109/TMI.2020.3008871

    View details for Web of Science ID 000595547500024

    View details for PubMedID 32746140

  • DoFE: Domain-Oriented Feature Embedding for Generalizable Fundus Image Segmentation on Unseen Datasets IEEE TRANSACTIONS ON MEDICAL IMAGING Wang, S., Yu, L., Li, K., Yang, X., Fu, C., Heng, P. 2020; 39 (12): 4237–48


    Deep convolutional neural networks have significantly boosted the performance of fundus image segmentation when test datasets have the same distribution as the training datasets. However, in clinical practice, medical images often exhibit variations in appearance for various reasons, e.g., different scanner vendors and image quality. These distribution discrepancies could lead the deep networks to over-fit on the training datasets and lack generalization ability on the unseen test datasets. To alleviate this issue, we present a novel Domain-oriented Feature Embedding (DoFE) framework to improve the generalization ability of CNNs on unseen target domains by exploring the knowledge from multiple source domains. Our DoFE framework dynamically enriches the image features with additional domain prior knowledge learned from multi-source domains to make the semantic features more discriminative. Specifically, we introduce a Domain Knowledge Pool to learn and memorize the prior information extracted from multi-source domains. Then the original image features are augmented with domain-oriented aggregated features, which are induced from the knowledge pool based on the similarity between the input image and multi-source domain images. We further design a novel domain code prediction branch to infer this similarity and employ an attention-guided mechanism to dynamically combine the aggregated features with the semantic features. We comprehensively evaluate our DoFE framework on two fundus image segmentation tasks, including the optic cup and disc segmentation and vessel segmentation. Our DoFE framework generates satisfying segmentation results on unseen datasets and surpasses other domain generalization and network regularization methods.

    View details for DOI 10.1109/TMI.2020.3015224

    View details for Web of Science ID 000595547500042

    View details for PubMedID 32776876

  • Deep Mining External Imperfect Data for Chest X-Ray Disease Screening. IEEE transactions on medical imaging Luo, L., Yu, L., Chen, H., Liu, Q., Wang, X., Xu, J., Heng, P. 2020; 39 (11): 3583–94


    Deep learning approaches have demonstrated remarkable progress in automatic Chest X-ray analysis. The data-driven feature of deep models requires training data to cover a large distribution. Therefore, it is substantial to integrate knowledge from multiple datasets, especially for medical images. However, learning a disease classification model with extra Chest X-ray (CXR) data is yet challenging. Recent researches have demonstrated that performance bottleneck exists in joint training on different CXR datasets, and few made efforts to address the obstacle. In this paper, we argue that incorporating an external CXR dataset leads to imperfect training data, which raises the challenges. Specifically, the imperfect data is in two folds: domain discrepancy, as the image appearances vary across datasets; and label discrepancy, as different datasets are partially labeled. To this end, we formulate the multi-label thoracic disease classification problem as weighted independent binary tasks according to the categories. For common categories shared across domains, we adopt task-specific adversarial training to alleviate the feature differences. For categories existing in a single dataset, we present uncertainty-aware temporal ensembling of model predictions to mine the information from the missing labels further. In this way, our framework simultaneously models and tackles the domain and label discrepancies, enabling superior knowledge mining ability. We conduct extensive experiments on three datasets with more than 360,000 Chest X-ray images. Our method outperforms other competing models and sets state-of-the-art performance on the official NIH test set with 0.8349 AUC, demonstrating its effectiveness of utilizing the external dataset to improve the internal classification.

    View details for DOI 10.1109/TMI.2020.3000949

    View details for PubMedID 32746106

  • Semi-Supervised Medical Image Classification With Relation-Driven Self-Ensembling Model IEEE TRANSACTIONS ON MEDICAL IMAGING Liu, Q., Yu, L., Luo, L., Dou, Q., Heng, P. 2020; 39 (11): 3429–40


    Training deep neural networks usually requires a large amount of labeled data to obtain good performance. However, in medical image analysis, obtaining high-quality labels for the data is laborious and expensive, as accurately annotating medical images demands expertise knowledge of the clinicians. In this paper, we present a novel relation-driven semi-supervised framework for medical image classification. It is a consistency-based method which exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations, and leverages a self-ensembling model to produce high-quality consistency targets for the unlabeled data. Considering that human diagnosis often refers to previous analogous cases to make reliable decisions, we introduce a novel sample relation consistency (SRC) paradigm to effectively exploit unlabeled data by modeling the relationship information among different samples. Superior to existing consistency-based methods which simply enforce consistency of individual predictions, our framework explicitly enforces the consistency of semantic relation among different samples under perturbations, encouraging the model to explore extra semantic information from unlabeled data. We have conducted extensive experiments to evaluate our method on two public benchmark medical image classification datasets, i.e., skin lesion diagnosis with ISIC 2018 challenge and thorax disease classification with ChestX-ray14. Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.

    View details for DOI 10.1109/TMI.2020.2995518

    View details for Web of Science ID 000586352000016

    View details for PubMedID 32746096

  • Transformation-Consistent Self-Ensembling Model for Semisupervised Medical Image Segmentation. IEEE transactions on neural networks and learning systems Li, X., Yu, L., Chen, H., Fu, C. W., Xing, L., Heng, P. A. 2020; PP


    A common shortfall of supervised deep learning for medical imaging is the lack of labeled data, which is often expensive and time consuming to collect. This article presents a new semisupervised method for medical image segmentation, where the network is optimized by a weighted combination of a common supervised loss only for the labeled inputs and a regularization loss for both the labeled and unlabeled data. To utilize the unlabeled data, our method encourages consistent predictions of the network-in-training for the same input under different perturbations. With the semisupervised segmentation tasks, we introduce a transformation-consistent strategy in the self-ensembling model to enhance the regularization effect for pixel-level predictions. To further improve the regularization effects, we extend the transformation in a more generalized form including scaling and optimize the consistency loss with a teacher model, which is an averaging of the student model weights. We extensively validated the proposed semisupervised method on three typical yet challenging medical image segmentation tasks: 1) skin lesion segmentation from dermoscopy images in the International Skin Imaging Collaboration (ISIC) 2017 data set; 2) optic disk (OD) segmentation from fundus images in the Retinal Fundus Glaucoma Challenge (REFUGE) data set; and 3) liver segmentation from volumetric CT scans in the Liver Tumor Segmentation Challenge (LiTS) data set. Compared with state-of-the-art, our method shows superior performance on the challenging 2-D/3-D medical images, demonstrating the effectiveness of our semisupervised method for medical image segmentation.

    View details for DOI 10.1109/TNNLS.2020.2995319

    View details for PubMedID 32479407

  • MS-Net: Multi-Site Network for Improving Prostate Segmentation with Heterogeneous MRI Data. IEEE transactions on medical imaging Liu, Q., Dou, Q., Yu, L., Heng, P. A. 2020


    Automated prostate segmentation in MRI is highly demanded for computer-assisted diagnosis. Recently, a variety of deep learning methods have achieved remarkable progress in this task, usually relying on large amounts of training data. Due to the nature of scarcity for medical images, it is important to effectively aggregate data from multiple sites for robust model training, to alleviate the insufficiency of single-site samples. However, the prostate MRIs from different sites present heterogeneity due to the differences in scanners and imaging protocols, raising challenges for effective ways of aggregating multi-site data for network training. In this paper, we propose a novel multisite network (MS-Net) for improving prostate segmentation by learning robust representations, leveraging multiple sources of data. To compensate for the inter-site heterogeneity of different MRI datasets, we develop Domain-Specific Batch Normalization layers in the network backbone, enabling the network to estimate statistics and perform feature normalization for each site separately. Considering the difficulty of capturing the shared knowledge from multiple datasets, a novel learning paradigm, i.e., Multi-site-guided Knowledge Transfer, is proposed to enhance the kernels to extract more generic representations from multi-site data. Extensive experiments on three heterogeneous prostate MRI datasets demonstrate that our MS-Net improves the performance across all datasets consistently, and outperforms state-of-the-art methods for multi-site learning.

    View details for DOI 10.1109/TMI.2020.2974574

    View details for PubMedID 32078543

  • Automatic intraprostatic lesion segmentation in multiparametric magnetic resonance images with proposed multiple branch Unet. Medical physics Chen, Y., Xing, L., Yu, L., Bagshaw, H. P., Buyyounouski, M. K., Han, B. 2020


    Contouring intraprostatic lesions is a prerequisite for dose-escalating these lesions in radiotherapy to improve the local cancer control. In this study, a deep learning-based approach was developed for automatic intraprostatic lesion segmentation in multiparametric magnetic resonance imaging (mpMRI) images contributing to the clinical practice.mpMRI images from 136 patient cases were collected from our institution, and all these cases contained suspicious lesions with Prostate Imaging Reporting and Data System (PI-RADS) score ≥ 4. The contours of the lesion and prostate were manually created on axial T2-weighted (T2W), apparent diffusion coefficient (ADC) and high b-value diffusion-weighted imaging (DWI) images to provide the ground truth data. Then a multiple branch UNet (MB-UNet) was proposed for the segmentation of indistinct target in multi-modality MRI images. An encoder module was designed with three branches for the three MRI modalities separately, to fully extract the high-level features provided by different MRI modalities; an input module was added by using three sub-branches for three consecutive image slices, to consider the contour consistency among different image slices; deep supervision strategy was also integrated into the network to speed up the convergency of the network and improve the performance. The probability maps of the background, normal prostate and lesion were output by the network to generate the segmentation of the lesion, and the performance was evaluated using the Dice similarity coefficient (DSC) as the main metric.A total of 162 lesions were contoured on 652 image slices, with 119 lesions in the peripheral zone, 38 in the transition zone, 4 in the central zone and 1 in the anterior fibromuscular stroma. All prostates were also contoured on 1,264 image slices. As for the segmentation of lesions in the testing set, MB-UNet achieved a per case DSC of 0.6333, specificity of 0.9993, sensitivity of 0.7056; and global DSC of 0.7205, specificity of 0.9993, sensitivity of 0.7409. All the three deep learning strategies adopted in this study contributed to the performance promotion of the MB-UNet. And missing the DWI modality would degrade the segmentation performance more markedly compared with the other two modalities.A deep learning-based approach with proposed MB-UNet was developed to automatically segment suspicious lesions in mpMRI images. This study makes it feasible to adopt boosting intraprostatic lesions in clinical practice to achieve better outcomes.

    View details for DOI 10.1002/mp.14517

    View details for PubMedID 33012016

  • RMDL: Recalibrated multi-instance deep learning for whole slide gastric image classification. Medical image analysis Wang, S., Zhu, Y., Yu, L., Chen, H., Lin, H., Wan, X., Fan, X., Heng, P. A. 2019; 58: 101549


    The whole slide histopathology images (WSIs) play a critical role in gastric cancer diagnosis. However, due to the large scale of WSIs and various sizes of the abnormal area, how to select informative regions and analyze them are quite challenging during the automatic diagnosis process. The multi-instance learning based on the most discriminative instances can be of great benefit for whole slide gastric image diagnosis. In this paper, we design a recalibrated multi-instance deep learning method (RMDL) to address this challenging problem. We first select the discriminative instances, and then utilize these instances to diagnose diseases based on the proposed RMDL approach. The designed RMDL network is capable of capturing instance-wise dependencies and recalibrating instance features according to the importance coefficient learned from the fused features. Furthermore, we build a large whole-slide gastric histopathology image dataset with detailed pixel-level annotations. Experimental results on the constructed gastric dataset demonstrate the significant improvement on the accuracy of our proposed framework compared with other state-of-the-art multi-instance learning methods. Moreover, our method is general and can be extended to other diagnosis tasks of different cancer types based on WSIs.

    View details for DOI 10.1016/

    View details for PubMedID 31499320

  • CANet: Cross-disease Attention Network for Joint Diabetic Retinopathy and Diabetic Macular Edema Grading. IEEE transactions on medical imaging Li, X., Hu, X., Yu, L., Zhu, L., Fu, C. W., Heng, P. A. 2019


    Diabetic retinopathy (DR) and diabetic macular edema (DME) are the leading causes of permanent blindness in the working-age population. Automatic grading of DR and DME helps ophthalmologists design tailored treatments to patients, thus is of vital importance in the clinical practice. However, prior works either grade DR or DME, and ignore the correlation between DR and its complication, i.e., DME. Moreover, the location information, e.g., macula and soft hard exhaust annotations, are widely used as a prior for grading. Such annotations are costly to obtain, hence it is desirable to develop automatic grading methods with only image-level supervision. In this paper, we present a novel cross-disease attention network (CANet) to jointly grade DR and DME by exploring the internal relationship between the diseases with only image-level supervision. Our key contributions include the disease-specific attention module to selectively learn useful features for individual diseases, and the disease-dependent attention module to further capture the internal relationship between the two diseases. We integrate these two attention modules in a deep network to produce disease-specific and diseasedependent features, and to maximize the overall performance jointly for grading DR and DME. We evaluate our network on two public benchmark datasets, i.e., ISBI 2018 IDRiD challenge dataset and Messidor dataset. Our method achieves the best result on the ISBI 2018 IDRiD challenge dataset and outperforms other methods on the Messidor dataset. Our code is publicly available at

    View details for DOI 10.1109/TMI.2019.2951844

    View details for PubMedID 31714219

  • Towards Automated Semantic Segmentation in Prenatal Volumetric Ultrasound IEEE TRANSACTIONS ON MEDICAL IMAGING Yang, X., Yu, L., Li, S., Wen, H., Luo, D., Bian, C., Qin, J., Ni, D., Heng, P. 2019; 38 (1): 180–93


    Volumetric ultrasound is rapidly emerging as a viable imaging modality for routine prenatal examinations. Biometrics obtained from the volumetric segmentation shed light on the reformation of precise maternal and fetal health monitoring. However, the poor image quality, low contrast, boundary ambiguity, and complex anatomy shapes conspire toward a great lack of efficient tools for the segmentation. It makes 3-D ultrasound difficult to interpret and hinders the widespread of 3-D ultrasound in obstetrics. In this paper, we are looking at the problem of semantic segmentation in prenatal ultrasound volumes. Our contribution is threefold: 1) we propose the first and fully automatic framework to simultaneously segment multiple anatomical structures with intensive clinical interest, including fetus, gestational sac, and placenta, which remains a rarely studied and arduous challenge; 2) we propose a composite architecture for dense labeling, in which a customized 3-D fully convolutional network explores spatial intensity concurrency for initial labeling, while a multi-directional recurrent neural network (RNN) encodes spatial sequentiality to combat boundary ambiguity for significant refinement; and 3) we introduce a hierarchical deep supervision mechanism to boost the information flow within RNN and fit the latent sequence hierarchy in fine scales, and further improve the segmentation results. Extensively verified on in-house large data sets, our method illustrates a superior segmentation performance, decent agreements with expert measurements and high reproducibilities against scanning variations, and thus is promising in advancing the prenatal ultrasound examinations.

    View details for DOI 10.1109/TMI.2018.2858779

    View details for Web of Science ID 000455110500018

    View details for PubMedID 30040635

  • Patch-based Output Space Adversarial Learning for Joint Optic Disc and Cup Segmentation. IEEE transactions on medical imaging Wang, S., Yu, L., Yang, X., Fu, C. W., Heng, P. A. 2019


    Glaucoma is a leading cause of irreversible blindness. Accurate segmentation of the optic disc (OD) and cup (OC) from fundus images is beneficial to glaucoma screening and diagnosis. Recently, convolutional neural networks demonstrate promising progress in joint OD and OC segmentation. However, affected by the domain shift among different datasets, deep networks are severely hindered in generalizing across different scanners and institutions. In this paper, we present a novel patchbased Output Space Adversarial Learning framework (pOSAL) to jointly and robustly segment the OD and OC from different fundus image datasets. We first devise a lightweight and efficient segmentation network as a backbone. Considering the specific morphology of OD and OC, a novel morphology-aware segmentation loss is proposed to guide the network to generate accurate and smooth segmentation. Our pOSAL framework then exploits unsupervised domain adaptation to address the domain shift challenge by encouraging the segmentation in the target domain to be similar to the source ones. Since the whole-segmentationbased adversarial loss is insufficient to drive the network to capture segmentation details, we further design the pOSAL in a patch-based fashion to enable fine-grained discrimination on local segmentation details. We extensively evaluate our pOSAL framework and demonstrate its effectiveness in improving the segmentation performance on three public retinal fundus image datasets, i.e., Drishti-GS, RIM-ONE-r3, and REFUGE. Furthermore, our pOSAL framework achieved the first place in the OD and OC segmentation tasks in MICCAI 2018 Retinal Fundus Glaucoma Challenge.

    View details for DOI 10.1109/TMI.2019.2899910

    View details for PubMedID 30794170

  • SV-RCNet: Workflow Recognition From Surgical Videos Using Recurrent Convolutional Network IEEE TRANSACTIONS ON MEDICAL IMAGING Jin, Y., Dou, Q., Chen, H., Yu, L., Qin, J., Fu, C., Heng, P. 2018; 37 (5): 1114–26


    We propose an analysis of surgical videos that is based on a novel recurrent convolutional network (SV-RCNet), specifically for automatic workflow recognition from surgical videos online, which is a key component for developing the context-aware computer-assisted intervention systems. Different from previous methods which harness visual and temporal information separately, the proposed SV-RCNet seamlessly integrates a convolutional neural network (CNN) and a recurrent neural network (RNN) to form a novel recurrent convolutional architecture in order to take full advantages of the complementary information of visual and temporal features learned from surgical videos. We effectively train the SV-RCNet in an end-to-end manner so that the visual representations and sequential dynamics can be jointly optimized in the learning process. In order to produce more discriminative spatio-temporal features, we exploit a deep residual network (ResNet) and a long short term memory (LSTM) network, to extract visual features and temporal dependencies, respectively, and integrate them into the SV-RCNet. Moreover, based on the phase transition-sensitive predictions from the SV-RCNet, we propose a simple yet effective inference scheme, namely the prior knowledge inference (PKI), by leveraging the natural characteristic of surgical video. Such a strategy further improves the consistency of results and largely boosts the recognition performance. Extensive experiments have been conducted with the MICCAI 2016 Modeling and Monitoring of Computer Assisted Interventions Workflow Challenge dataset and Cholec80 dataset to validate SV-RCNet. Our approach not only achieves superior performance on these two datasets but also outperforms the state-of-the-art methods by a significant margin.

    View details for DOI 10.1109/TMI.2017.2787657

    View details for Web of Science ID 000431544500004

    View details for PubMedID 29727275

  • VoxResNet: Deep voxelwise residual networks for brain segmentation from 3D MR images NEUROIMAGE Chen, H., Dou, Q., Yu, L., Qin, J., Heng, P. 2018; 170: 446–55


    Segmentation of key brain tissues from 3D medical images is of great significance for brain disease diagnosis, progression assessment and monitoring of neurologic conditions. While manual segmentation is time-consuming, laborious, and subjective, automated segmentation is quite challenging due to the complicated anatomical environment of brain and the large variations of brain tissues. We propose a novel voxelwise residual network (VoxResNet) with a set of effective training schemes to cope with this challenging problem. The main merit of residual learning is that it can alleviate the degradation problem when training a deep network so that the performance gains achieved by increasing the network depth can be fully leveraged. With this technique, our VoxResNet is built with 25 layers, and hence can generate more representative features to deal with the large variations of brain tissues than its rivals using hand-crafted features or shallower networks. In order to effectively train such a deep network with limited training data for brain segmentation, we seamlessly integrate multi-modality and multi-level contextual information into our network, so that the complementary information of different modalities can be harnessed and features of different scales can be exploited. Furthermore, an auto-context version of the VoxResNet is proposed by combining the low-level image appearance features, implicit shape information, and high-level context together for further improving the segmentation performance. Extensive experiments on the well-known benchmark (i.e., MRBrainS) of brain segmentation from 3D magnetic resonance (MR) images corroborated the efficacy of the proposed VoxResNet. Our method achieved the first place in the challenge out of 37 competitors including several state-of-the-art brain segmentation methods. Our method is inherently general and can be readily applied as a powerful tool to many brain-related studies, where accurate segmentation of brain structures is critical.

    View details for DOI 10.1016/j.neuroimage.2017.04.041

    View details for Web of Science ID 000429940900036

    View details for PubMedID 28445774

  • PU-Net: Point Cloud Upsampling Network Yu, L., Li, X., Fu, C., Cohen-Or, D., Heng, P., IEEE IEEE. 2018: 2790–99
  • 3D deeply supervised network for automated segmentation of volumetric medical images Dou, Q., Yu, L., Chen, H., Jin, Y., Yang, X., Qin, J., Heng, P. ELSEVIER. 2017: 40–54


    While deep convolutional neural networks (CNNs) have achieved remarkable success in 2D medical image segmentation, it is still a difficult task for CNNs to segment important organs or structures from 3D medical images owing to several mutually affected challenges, including the complicated anatomical environments in volumetric images, optimization difficulties of 3D networks and inadequacy of training samples. In this paper, we present a novel and efficient 3D fully convolutional network equipped with a 3D deep supervision mechanism to comprehensively address these challenges; we call it 3D DSN. Our proposed 3D DSN is capable of conducting volume-to-volume learning and inference, which can eliminate redundant computations and alleviate the risk of over-fitting on limited training data. More importantly, the 3D deep supervision mechanism can effectively cope with the optimization problem of gradients vanishing or exploding when training a 3D deep model, accelerating the convergence speed and simultaneously improving the discrimination capability. Such a mechanism is developed by deriving an objective function that directly guides the training of both lower and upper layers in the network, so that the adverse effects of unstable gradient changes can be counteracted during the training procedure. We also employ a fully connected conditional random field model as a post-processing step to refine the segmentation results. We have extensively validated the proposed 3D DSN on two typical yet challenging volumetric medical image segmentation tasks: (i) liver segmentation from 3D CT scans and (ii) whole heart and great vessels segmentation from 3D MR images, by participating two grand challenges held in conjunction with MICCAI. We have achieved competitive segmentation results to state-of-the-art approaches in both challenges with a much faster speed, corroborating the effectiveness of our proposed 3D DSN.

    View details for DOI 10.1016/

    View details for Web of Science ID 000408073800005

    View details for PubMedID 28526212

  • Multilevel Contextual 3-D CNNs for False Positive Reduction in Pulmonary Nodule Detection IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING Dou, Q., Chen, H., Yu, L., Qin, J., Heng, P. 2017; 64 (7): 1558–67


    False positive reduction is one of the most crucial components in an automated pulmonary nodule detection system, which plays an important role in lung cancer diagnosis and early treatment. The objective of this paper is to effectively address the challenges in this task and therefore to accurately discriminate the true nodules from a large number of candidates.We propose a novel method employing three-dimensional (3-D) convolutional neural networks (CNNs) for false positive reduction in automated pulmonary nodule detection from volumetric computed tomography (CT) scans. Compared with its 2-D counterparts, the 3-D CNNs can encode richer spatial information and extract more representative features via their hierarchical architecture trained with 3-D samples. More importantly, we further propose a simple yet effective strategy to encode multilevel contextual information to meet the challenges coming with the large variations and hard mimics of pulmonary nodules.The proposed framework has been extensively validated in the LUNA16 challenge held in conjunction with ISBI 2016, where we achieved the highest competition performance metric (CPM) score in the false positive reduction track.Experimental results demonstrated the importance and effectiveness of integrating multilevel contextual information into 3-D CNN framework for automated pulmonary nodule detection in volumetric CT data.While our method is tailored for pulmonary nodule detection, the proposed framework is general and can be easily extended to many other 3-D object detection tasks from volumetric medical images, where the targeting objects have large variations and are accompanied by a number of hard mimics.

    View details for DOI 10.1109/TBME.2016.2613502

    View details for Web of Science ID 000404025100014

    View details for PubMedID 28113302

  • Comparative Validation of Polyp Detection Methods in Video Colonoscopy: Results From the MICCAI 2015 Endoscopic Vision Challenge IEEE TRANSACTIONS ON MEDICAL IMAGING Bernal, J., Tajkbaksh, N., Sanchez, F., Matuszewski, B. J., Chen, H., Yu, L., Angermann, Q., Romain, O., Rustad, B., Balasingham, I., Pogorelov, K., Choi, S., Debard, Q., Maier-Hein, L., Speidel, S., Stoyanov, D., Brandao, P., Cordova, H., Sanchez-Montes, C., Gurudu, S. R., Fernandez-Esparrach, G., Dray, X., Liang, J., Histace, A. 2017; 36 (6): 1231–49


    Colonoscopy is the gold standard for colon cancer screening though some polyps are still missed, thus preventing early disease detection and treatment. Several computational systems have been proposed to assist polyp detection during colonoscopy but so far without consistent evaluation. The lack of publicly available annotated databases has made it difficult to compare methods and to assess if they achieve performance levels acceptable for clinical use. The Automatic Polyp Detection sub-challenge, conducted as part of the Endoscopic Vision Challenge ( at the international conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) in 2015, was an effort to address this need. In this paper, we report the results of this comparative evaluation of polyp detection methods, as well as describe additional experiments to further explore differences between methods. We define performance metrics and provide evaluation databases that allow comparison of multiple methodologies. Results show that convolutional neural networks are the state of the art. Nevertheless, it is also demonstrated that combining different methodologies can lead to an improved overall performance.

    View details for DOI 10.1109/TMI.2017.2664042

    View details for Web of Science ID 000402722500003

    View details for PubMedID 28182555

  • Automated Melanoma Recognition in Dermoscopy Images via Very Deep Residual Networks IEEE TRANSACTIONS ON MEDICAL IMAGING Yu, L., Chen, H., Dou, Q., Qin, J., Heng, P. 2017; 36 (4): 994–1004


    Automated melanoma recognition in dermoscopy images is a very challenging task due to the low contrast of skin lesions, the huge intraclass variation of melanomas, the high degree of visual similarity between melanoma and non-melanoma lesions, and the existence of many artifacts in the image. In order to meet these challenges, we propose a novel method for melanoma recognition by leveraging very deep convolutional neural networks (CNNs). Compared with existing methods employing either low-level hand-crafted features or CNNs with shallower architectures, our substantially deeper networks (more than 50 layers) can acquire richer and more discriminative features for more accurate recognition. To take full advantage of very deep networks, we propose a set of schemes to ensure effective training and learning under limited training data. First, we apply the residual learning to cope with the degradation and overfitting problems when a network goes deeper. This technique can ensure that our networks benefit from the performance gains achieved by increasing network depth. Then, we construct a fully convolutional residual network (FCRN) for accurate skin lesion segmentation, and further enhance its capability by incorporating a multi-scale contextual information integration scheme. Finally, we seamlessly integrate the proposed FCRN (for segmentation) and other very deep residual networks (for classification) to form a two-stage framework. This framework enables the classification network to extract more representative and specific features based on segmented results instead of the whole dermoscopy images, further alleviating the insufficiency of training data. The proposed framework is extensively evaluated on ISBI 2016 Skin Lesion Analysis Towards Melanoma Detection Challenge dataset. Experimental results demonstrate the significant performance gains of the proposed framework, ranking the first in classification and the second in segmentation among 25 teams and 28 teams, respectively. This study corroborates that very deep CNNs with effective training mechanisms can be employed to solve complicated medical image analysis tasks, even with limited training data.

    View details for DOI 10.1109/TMI.2016.2642839

    View details for Web of Science ID 000400868100012

    View details for PubMedID 28026754

  • DCAN: Deep contour-aware networks for object instance segmentation from histology images MEDICAL IMAGE ANALYSIS Chen, H., Qi, X., Yu, L., Dou, Q., Qin, J., Heng, P. 2017; 36: 135–46


    In histopathological image analysis, the morphology of histological structures, such as glands and nuclei, has been routinely adopted by pathologists to assess the malignancy degree of adenocarcinomas. Accurate detection and segmentation of these objects of interest from histology images is an essential prerequisite to obtain reliable morphological statistics for quantitative diagnosis. While manual annotation is error-prone, time-consuming and operator-dependant, automated detection and segmentation of objects of interest from histology images can be very challenging due to the large appearance variation, existence of strong mimics, and serious degeneration of histological structures. In order to meet these challenges, we propose a novel deep contour-aware network (DCAN) under a unified multi-task learning framework for more accurate detection and segmentation. In the proposed network, multi-level contextual features are explored based on an end-to-end fully convolutional network (FCN) to deal with the large appearance variation. We further propose to employ an auxiliary supervision mechanism to overcome the problem of vanishing gradients when training such a deep network. More importantly, our network can not only output accurate probability maps of histological objects, but also depict clear contours simultaneously for separating clustered object instances, which further boosts the segmentation performance. Our method ranked the first in two histological object segmentation challenges, including 2015 MICCAI Gland Segmentation Challenge and 2015 MICCAI Nuclei Segmentation Challenge. Extensive experiments on these two challenging datasets demonstrate the superior performance of our method, surpassing all the other methods by a significant margin.

    View details for DOI 10.1016/

    View details for Web of Science ID 000393247900012

    View details for PubMedID 27898306

  • Integrating Online and Offline Three-Dimensional Deep Learning for Automated Polyp Detection in Colonoscopy Videos IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS Yu, L., Chen, H., Dou, Q., Qin, J., Heng, P. 2017; 21 (1): 65–75


    Automated polyp detection in colonoscopy videos has been demonstrated to be a promising way for colorectal cancer prevention and diagnosis. Traditional manual screening is time consuming, operator dependent, and error prone; hence, automated detection approach is highly demanded in clinical practice. However, automated polyp detection is very challenging due to high intraclass variations in polyp size, color, shape, and texture, and low interclass variations between polyps and hard mimics. In this paper, we propose a novel offline and online three-dimensional (3-D) deep learning integration framework by leveraging the 3-D fully convolutional network (3D-FCN) to tackle this challenging problem. Compared with the previous methods employing hand-crafted features or 2-D convolutional neural network, the 3D-FCN is capable of learning more representative spatio-temporal features from colonoscopy videos, and hence has more powerful discrimination capability. More importantly, we propose a novel online learning scheme to deal with the problem of limited training data by harnessing the specific information of an input video in the learning process. We integrate offline and online learning to effectively reduce the number of false positives generated by the offline network and further improve the detection performance. Extensive experiments on the dataset of MICCAI 2015 Challenge on Polyp Detection demonstrated the better performance of our method when compared with other competitors.

    View details for DOI 10.1109/JBHI.2016.2637004

    View details for Web of Science ID 000395538500008

    View details for PubMedID 28114049

  • 3D U-net with Multi-level Deep Supervision: Fully Automatic Segmentation of Proximal Femur in 3D MR Images Zeng, G., Yang, X., Li, J., Yu, L., Heng, P., Zheng, G., Wang, Q., Shi, Y., Suk, H. I., Suzuki, K. SPRINGER INTERNATIONAL PUBLISHING AG. 2017: 274–82
  • Deep Cascaded Networks for Sparsely Distributed Object Detection from Medical Images DEEP LEARNING FOR MEDICAL IMAGE ANALYSIS Chen, H., Dou, Q., Yu, L., Qin, J., Zhao, L., Mok, V. T., Wang, D., Shi, L., Heng, P., Zhou, S. K., Greenspan, H., Shen, D. 2017: 133–54
  • AGNet: Attention-Guided Network for Surgical Tool Presence Detection Hu, X., Yu, L., Chen, H., Qin, J., Heng, P., Cardoso, M. J., Arbel, T. SPRINGER INTERNATIONAL PUBLISHING AG. 2017: 186–94
  • 3D FractalNet: Dense Volumetric Segmentation for Cardiovascular MRI Volumes Yu, L., Yang, X., Qin, J., Heng, P., Zuluaga, M. A., Bhatia, K., Kainz, B., Moghari, M. H., Pace, D. F. SPRINGER INTERNATIONAL PUBLISHING AG. 2017: 103–10
  • Fine-Grained Recurrent Neural Networks for Automatic Prostate Segmentation in Ultrasound Images Yang, X., Yu, L., Wu, L., Wang, Y., Ni, D., Qin, J., Heng, P., AAAI ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE. 2017: 1633–39
  • Volumetric ConvNets with Mixed Residual Connections for Automated Prostate Segmentation from 3D MR Images Yu, L., Yang, X., Chen, H., Qin, J., Heng, P., AAAI ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE. 2017: 66–72
  • Automatic Detection of Cerebral Microbleeds From MR Images via 3D Convolutional Neural Networks IEEE TRANSACTIONS ON MEDICAL IMAGING Dou, Q., Chen, H., Yu, L., Zhao, L., Qin, J., Wang, D., Mok, V. T., Shi, L., Heng, P. 2016; 35 (5): 1182–95


    Cerebral microbleeds (CMBs) are small haemorrhages nearby blood vessels. They have been recognized as important diagnostic biomarkers for many cerebrovascular diseases and cognitive dysfunctions. In current clinical routine, CMBs are manually labelled by radiologists but this procedure is laborious, time-consuming, and error prone. In this paper, we propose a novel automatic method to detect CMBs from magnetic resonance (MR) images by exploiting the 3D convolutional neural network (CNN). Compared with previous methods that employed either low-level hand-crafted descriptors or 2D CNNs, our method can take full advantage of spatial contextual information in MR volumes to extract more representative high-level features for CMBs, and hence achieve a much better detection accuracy. To further improve the detection performance while reducing the computational cost, we propose a cascaded framework under 3D CNNs for the task of CMB detection. We first exploit a 3D fully convolutional network (FCN) strategy to retrieve the candidates with high probabilities of being CMBs, and then apply a well-trained 3D CNN discrimination model to distinguish CMBs from hard mimics. Compared with traditional sliding window strategy, the proposed 3D FCN strategy can remove massive redundant computations and dramatically speed up the detection process. We constructed a large dataset with 320 volumetric MR scans and performed extensive experiments to validate the proposed method, which achieved a high sensitivity of 93.16% with an average number of 2.74 false positives per subject, outperforming previous methods using low-level descriptors or 2D CNNs by a significant margin. The proposed method, in principle, can be adapted to other biomarker detection tasks from volumetric medical data.

    View details for DOI 10.1109/TMI.2016.2528129

    View details for Web of Science ID 000375550500004

    View details for PubMedID 26886975

  • DCAN: Deep Contour-Aware Networks for Accurate Gland Segmentation Chen, H., Qi, X., Yu, L., Heng, P., IEEE IEEE. 2016: 2487–96
  • Automatic Cerebral Microbleeds Detection from MR Images via Independent Subspace Analysis Based Hierarchical Features Dou, Q., Chen, H., Yu, L., Shi, L., Wang, D., Mok, V. T., Heng, P., IEEE IEEE. 2015: 7933–36


    With the development of susceptibility weighted imaging (SWI) technology, cerebral microbleed (CMB) detection is increasingly essential in cerebrovascular diseases diagnosis and cognitive impairment assessment. Clinical CMB detection is based on manual rating which is subjective and time-consuming with limited reproducibility. In this paper, we propose a computer-aided system for automatic detection of CMBs from brain SWI images. Our approach detects the CMBs within three stages: (i) candidates screening based on intensity values (ii) compact 3D hierarchical features extraction via a stacked convolutional Independent Subspace Analysis (ISA) network (iii) false positive candidates removal with a support vector machine (SVM) classifier based on the learned representation features from ISA. Experimental results on 19 subjects (161 CMBs) achieve a high sensitivity of 89.44% with an average of 7.7 and 0.9 false positives per subject and per CMB, respectively, which validate the efficacy of our approach.

    View details for Web of Science ID 000371717208052

    View details for PubMedID 26738132