I am a postdoctoral research fellow at Stanford University, working with Prof. Lei Xing. Before that, I obtained my Ph.D. degree in the Department of Computer Science and Engineering, The Chinese University of Hong Kong, supervised by Prof. Pheng-Ann Heng and Prof. Chi-Wing Fu in July 2019. Previously, I received the B. Eng degree from the Department of Computer Science and Technology at Zhejiang University in 2015, under the supervision of Prof. Deng Cai.
My research lies at the intersection of medical image analysis and artificial intelligence. I am dedicated to designing data-efficient learning methods for biomedical image analysis. I also have expertise in deep learning for 3D vision.
BEng, Zhejiang University, Computer Science (2015)
PhD, The Chinese University of Hong Kong, Computer Science and Engineering (2019)
RMDL: Recalibrated multi-instance deep learning for whole slide gastric image classification.
Medical image analysis
2019; 58: 101549
The whole slide histopathology images (WSIs) play a critical role in gastric cancer diagnosis. However, due to the large scale of WSIs and various sizes of the abnormal area, how to select informative regions and analyze them are quite challenging during the automatic diagnosis process. The multi-instance learning based on the most discriminative instances can be of great benefit for whole slide gastric image diagnosis. In this paper, we design a recalibrated multi-instance deep learning method (RMDL) to address this challenging problem. We first select the discriminative instances, and then utilize these instances to diagnose diseases based on the proposed RMDL approach. The designed RMDL network is capable of capturing instance-wise dependencies and recalibrating instance features according to the importance coefficient learned from the fused features. Furthermore, we build a large whole-slide gastric histopathology image dataset with detailed pixel-level annotations. Experimental results on the constructed gastric dataset demonstrate the significant improvement on the accuracy of our proposed framework compared with other state-of-the-art multi-instance learning methods. Moreover, our method is general and can be extended to other diagnosis tasks of different cancer types based on WSIs.
View details for DOI 10.1016/j.media.2019.101549
View details for PubMedID 31499320
CANet: Cross-disease Attention Network for Joint Diabetic Retinopathy and Diabetic Macular Edema Grading.
IEEE transactions on medical imaging
Diabetic retinopathy (DR) and diabetic macular edema (DME) are the leading causes of permanent blindness in the working-age population. Automatic grading of DR and DME helps ophthalmologists design tailored treatments to patients, thus is of vital importance in the clinical practice. However, prior works either grade DR or DME, and ignore the correlation between DR and its complication, i.e., DME. Moreover, the location information, e.g., macula and soft hard exhaust annotations, are widely used as a prior for grading. Such annotations are costly to obtain, hence it is desirable to develop automatic grading methods with only image-level supervision. In this paper, we present a novel cross-disease attention network (CANet) to jointly grade DR and DME by exploring the internal relationship between the diseases with only image-level supervision. Our key contributions include the disease-specific attention module to selectively learn useful features for individual diseases, and the disease-dependent attention module to further capture the internal relationship between the two diseases. We integrate these two attention modules in a deep network to produce disease-specific and diseasedependent features, and to maximize the overall performance jointly for grading DR and DME. We evaluate our network on two public benchmark datasets, i.e., ISBI 2018 IDRiD challenge dataset and Messidor dataset. Our method achieves the best result on the ISBI 2018 IDRiD challenge dataset and outperforms other methods on the Messidor dataset. Our code is publicly available at https://github.com/xmengli999/CANet.
View details for DOI 10.1109/TMI.2019.2951844
View details for PubMedID 31714219
Towards Automated Semantic Segmentation in Prenatal Volumetric Ultrasound
IEEE TRANSACTIONS ON MEDICAL IMAGING
2019; 38 (1): 180–93
Volumetric ultrasound is rapidly emerging as a viable imaging modality for routine prenatal examinations. Biometrics obtained from the volumetric segmentation shed light on the reformation of precise maternal and fetal health monitoring. However, the poor image quality, low contrast, boundary ambiguity, and complex anatomy shapes conspire toward a great lack of efficient tools for the segmentation. It makes 3-D ultrasound difficult to interpret and hinders the widespread of 3-D ultrasound in obstetrics. In this paper, we are looking at the problem of semantic segmentation in prenatal ultrasound volumes. Our contribution is threefold: 1) we propose the first and fully automatic framework to simultaneously segment multiple anatomical structures with intensive clinical interest, including fetus, gestational sac, and placenta, which remains a rarely studied and arduous challenge; 2) we propose a composite architecture for dense labeling, in which a customized 3-D fully convolutional network explores spatial intensity concurrency for initial labeling, while a multi-directional recurrent neural network (RNN) encodes spatial sequentiality to combat boundary ambiguity for significant refinement; and 3) we introduce a hierarchical deep supervision mechanism to boost the information flow within RNN and fit the latent sequence hierarchy in fine scales, and further improve the segmentation results. Extensively verified on in-house large data sets, our method illustrates a superior segmentation performance, decent agreements with expert measurements and high reproducibilities against scanning variations, and thus is promising in advancing the prenatal ultrasound examinations.
View details for DOI 10.1109/TMI.2018.2858779
View details for Web of Science ID 000455110500018
View details for PubMedID 30040635
Patch-based Output Space Adversarial Learning for Joint Optic Disc and Cup Segmentation.
IEEE transactions on medical imaging
Glaucoma is a leading cause of irreversible blindness. Accurate segmentation of the optic disc (OD) and cup (OC) from fundus images is beneficial to glaucoma screening and diagnosis. Recently, convolutional neural networks demonstrate promising progress in joint OD and OC segmentation. However, affected by the domain shift among different datasets, deep networks are severely hindered in generalizing across different scanners and institutions. In this paper, we present a novel patchbased Output Space Adversarial Learning framework (pOSAL) to jointly and robustly segment the OD and OC from different fundus image datasets. We first devise a lightweight and efficient segmentation network as a backbone. Considering the specific morphology of OD and OC, a novel morphology-aware segmentation loss is proposed to guide the network to generate accurate and smooth segmentation. Our pOSAL framework then exploits unsupervised domain adaptation to address the domain shift challenge by encouraging the segmentation in the target domain to be similar to the source ones. Since the whole-segmentationbased adversarial loss is insufficient to drive the network to capture segmentation details, we further design the pOSAL in a patch-based fashion to enable fine-grained discrimination on local segmentation details. We extensively evaluate our pOSAL framework and demonstrate its effectiveness in improving the segmentation performance on three public retinal fundus image datasets, i.e., Drishti-GS, RIM-ONE-r3, and REFUGE. Furthermore, our pOSAL framework achieved the first place in the OD and OC segmentation tasks in MICCAI 2018 Retinal Fundus Glaucoma Challenge.
View details for DOI 10.1109/TMI.2019.2899910
View details for PubMedID 30794170
SV-RCNet: Workflow Recognition From Surgical Videos Using Recurrent Convolutional Network
IEEE TRANSACTIONS ON MEDICAL IMAGING
2018; 37 (5): 1114–26
We propose an analysis of surgical videos that is based on a novel recurrent convolutional network (SV-RCNet), specifically for automatic workflow recognition from surgical videos online, which is a key component for developing the context-aware computer-assisted intervention systems. Different from previous methods which harness visual and temporal information separately, the proposed SV-RCNet seamlessly integrates a convolutional neural network (CNN) and a recurrent neural network (RNN) to form a novel recurrent convolutional architecture in order to take full advantages of the complementary information of visual and temporal features learned from surgical videos. We effectively train the SV-RCNet in an end-to-end manner so that the visual representations and sequential dynamics can be jointly optimized in the learning process. In order to produce more discriminative spatio-temporal features, we exploit a deep residual network (ResNet) and a long short term memory (LSTM) network, to extract visual features and temporal dependencies, respectively, and integrate them into the SV-RCNet. Moreover, based on the phase transition-sensitive predictions from the SV-RCNet, we propose a simple yet effective inference scheme, namely the prior knowledge inference (PKI), by leveraging the natural characteristic of surgical video. Such a strategy further improves the consistency of results and largely boosts the recognition performance. Extensive experiments have been conducted with the MICCAI 2016 Modeling and Monitoring of Computer Assisted Interventions Workflow Challenge dataset and Cholec80 dataset to validate SV-RCNet. Our approach not only achieves superior performance on these two datasets but also outperforms the state-of-the-art methods by a significant margin.
View details for DOI 10.1109/TMI.2017.2787657
View details for Web of Science ID 000431544500004
View details for PubMedID 29727275
VoxResNet: Deep voxelwise residual networks for brain segmentation from 3D MR images
2018; 170: 446–55
Segmentation of key brain tissues from 3D medical images is of great significance for brain disease diagnosis, progression assessment and monitoring of neurologic conditions. While manual segmentation is time-consuming, laborious, and subjective, automated segmentation is quite challenging due to the complicated anatomical environment of brain and the large variations of brain tissues. We propose a novel voxelwise residual network (VoxResNet) with a set of effective training schemes to cope with this challenging problem. The main merit of residual learning is that it can alleviate the degradation problem when training a deep network so that the performance gains achieved by increasing the network depth can be fully leveraged. With this technique, our VoxResNet is built with 25 layers, and hence can generate more representative features to deal with the large variations of brain tissues than its rivals using hand-crafted features or shallower networks. In order to effectively train such a deep network with limited training data for brain segmentation, we seamlessly integrate multi-modality and multi-level contextual information into our network, so that the complementary information of different modalities can be harnessed and features of different scales can be exploited. Furthermore, an auto-context version of the VoxResNet is proposed by combining the low-level image appearance features, implicit shape information, and high-level context together for further improving the segmentation performance. Extensive experiments on the well-known benchmark (i.e., MRBrainS) of brain segmentation from 3D magnetic resonance (MR) images corroborated the efficacy of the proposed VoxResNet. Our method achieved the first place in the challenge out of 37 competitors including several state-of-the-art brain segmentation methods. Our method is inherently general and can be readily applied as a powerful tool to many brain-related studies, where accurate segmentation of brain structures is critical.
View details for DOI 10.1016/j.neuroimage.2017.04.041
View details for Web of Science ID 000429940900036
View details for PubMedID 28445774
- PU-Net: Point Cloud Upsampling Network IEEE. 2018: 2790–99
3D deeply supervised network for automated segmentation of volumetric medical images
ELSEVIER. 2017: 40–54
While deep convolutional neural networks (CNNs) have achieved remarkable success in 2D medical image segmentation, it is still a difficult task for CNNs to segment important organs or structures from 3D medical images owing to several mutually affected challenges, including the complicated anatomical environments in volumetric images, optimization difficulties of 3D networks and inadequacy of training samples. In this paper, we present a novel and efficient 3D fully convolutional network equipped with a 3D deep supervision mechanism to comprehensively address these challenges; we call it 3D DSN. Our proposed 3D DSN is capable of conducting volume-to-volume learning and inference, which can eliminate redundant computations and alleviate the risk of over-fitting on limited training data. More importantly, the 3D deep supervision mechanism can effectively cope with the optimization problem of gradients vanishing or exploding when training a 3D deep model, accelerating the convergence speed and simultaneously improving the discrimination capability. Such a mechanism is developed by deriving an objective function that directly guides the training of both lower and upper layers in the network, so that the adverse effects of unstable gradient changes can be counteracted during the training procedure. We also employ a fully connected conditional random field model as a post-processing step to refine the segmentation results. We have extensively validated the proposed 3D DSN on two typical yet challenging volumetric medical image segmentation tasks: (i) liver segmentation from 3D CT scans and (ii) whole heart and great vessels segmentation from 3D MR images, by participating two grand challenges held in conjunction with MICCAI. We have achieved competitive segmentation results to state-of-the-art approaches in both challenges with a much faster speed, corroborating the effectiveness of our proposed 3D DSN.
View details for DOI 10.1016/j.media.2017.05.001
View details for Web of Science ID 000408073800005
View details for PubMedID 28526212
Multilevel Contextual 3-D CNNs for False Positive Reduction in Pulmonary Nodule Detection
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING
2017; 64 (7): 1558–67
False positive reduction is one of the most crucial components in an automated pulmonary nodule detection system, which plays an important role in lung cancer diagnosis and early treatment. The objective of this paper is to effectively address the challenges in this task and therefore to accurately discriminate the true nodules from a large number of candidates.We propose a novel method employing three-dimensional (3-D) convolutional neural networks (CNNs) for false positive reduction in automated pulmonary nodule detection from volumetric computed tomography (CT) scans. Compared with its 2-D counterparts, the 3-D CNNs can encode richer spatial information and extract more representative features via their hierarchical architecture trained with 3-D samples. More importantly, we further propose a simple yet effective strategy to encode multilevel contextual information to meet the challenges coming with the large variations and hard mimics of pulmonary nodules.The proposed framework has been extensively validated in the LUNA16 challenge held in conjunction with ISBI 2016, where we achieved the highest competition performance metric (CPM) score in the false positive reduction track.Experimental results demonstrated the importance and effectiveness of integrating multilevel contextual information into 3-D CNN framework for automated pulmonary nodule detection in volumetric CT data.While our method is tailored for pulmonary nodule detection, the proposed framework is general and can be easily extended to many other 3-D object detection tasks from volumetric medical images, where the targeting objects have large variations and are accompanied by a number of hard mimics.
View details for DOI 10.1109/TBME.2016.2613502
View details for Web of Science ID 000404025100014
View details for PubMedID 28113302
Comparative Validation of Polyp Detection Methods in Video Colonoscopy: Results From the MICCAI 2015 Endoscopic Vision Challenge
IEEE TRANSACTIONS ON MEDICAL IMAGING
2017; 36 (6): 1231–49
Colonoscopy is the gold standard for colon cancer screening though some polyps are still missed, thus preventing early disease detection and treatment. Several computational systems have been proposed to assist polyp detection during colonoscopy but so far without consistent evaluation. The lack of publicly available annotated databases has made it difficult to compare methods and to assess if they achieve performance levels acceptable for clinical use. The Automatic Polyp Detection sub-challenge, conducted as part of the Endoscopic Vision Challenge (http://endovis.grand-challenge.org) at the international conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) in 2015, was an effort to address this need. In this paper, we report the results of this comparative evaluation of polyp detection methods, as well as describe additional experiments to further explore differences between methods. We define performance metrics and provide evaluation databases that allow comparison of multiple methodologies. Results show that convolutional neural networks are the state of the art. Nevertheless, it is also demonstrated that combining different methodologies can lead to an improved overall performance.
View details for DOI 10.1109/TMI.2017.2664042
View details for Web of Science ID 000402722500003
View details for PubMedID 28182555
Automated Melanoma Recognition in Dermoscopy Images via Very Deep Residual Networks
IEEE TRANSACTIONS ON MEDICAL IMAGING
2017; 36 (4): 994–1004
Automated melanoma recognition in dermoscopy images is a very challenging task due to the low contrast of skin lesions, the huge intraclass variation of melanomas, the high degree of visual similarity between melanoma and non-melanoma lesions, and the existence of many artifacts in the image. In order to meet these challenges, we propose a novel method for melanoma recognition by leveraging very deep convolutional neural networks (CNNs). Compared with existing methods employing either low-level hand-crafted features or CNNs with shallower architectures, our substantially deeper networks (more than 50 layers) can acquire richer and more discriminative features for more accurate recognition. To take full advantage of very deep networks, we propose a set of schemes to ensure effective training and learning under limited training data. First, we apply the residual learning to cope with the degradation and overfitting problems when a network goes deeper. This technique can ensure that our networks benefit from the performance gains achieved by increasing network depth. Then, we construct a fully convolutional residual network (FCRN) for accurate skin lesion segmentation, and further enhance its capability by incorporating a multi-scale contextual information integration scheme. Finally, we seamlessly integrate the proposed FCRN (for segmentation) and other very deep residual networks (for classification) to form a two-stage framework. This framework enables the classification network to extract more representative and specific features based on segmented results instead of the whole dermoscopy images, further alleviating the insufficiency of training data. The proposed framework is extensively evaluated on ISBI 2016 Skin Lesion Analysis Towards Melanoma Detection Challenge dataset. Experimental results demonstrate the significant performance gains of the proposed framework, ranking the first in classification and the second in segmentation among 25 teams and 28 teams, respectively. This study corroborates that very deep CNNs with effective training mechanisms can be employed to solve complicated medical image analysis tasks, even with limited training data.
View details for DOI 10.1109/TMI.2016.2642839
View details for Web of Science ID 000400868100012
View details for PubMedID 28026754
DCAN: Deep contour-aware networks for object instance segmentation from histology images
MEDICAL IMAGE ANALYSIS
2017; 36: 135–46
In histopathological image analysis, the morphology of histological structures, such as glands and nuclei, has been routinely adopted by pathologists to assess the malignancy degree of adenocarcinomas. Accurate detection and segmentation of these objects of interest from histology images is an essential prerequisite to obtain reliable morphological statistics for quantitative diagnosis. While manual annotation is error-prone, time-consuming and operator-dependant, automated detection and segmentation of objects of interest from histology images can be very challenging due to the large appearance variation, existence of strong mimics, and serious degeneration of histological structures. In order to meet these challenges, we propose a novel deep contour-aware network (DCAN) under a unified multi-task learning framework for more accurate detection and segmentation. In the proposed network, multi-level contextual features are explored based on an end-to-end fully convolutional network (FCN) to deal with the large appearance variation. We further propose to employ an auxiliary supervision mechanism to overcome the problem of vanishing gradients when training such a deep network. More importantly, our network can not only output accurate probability maps of histological objects, but also depict clear contours simultaneously for separating clustered object instances, which further boosts the segmentation performance. Our method ranked the first in two histological object segmentation challenges, including 2015 MICCAI Gland Segmentation Challenge and 2015 MICCAI Nuclei Segmentation Challenge. Extensive experiments on these two challenging datasets demonstrate the superior performance of our method, surpassing all the other methods by a significant margin.
View details for DOI 10.1016/j.media.2016.11.004
View details for Web of Science ID 000393247900012
View details for PubMedID 27898306
Integrating Online and Offline Three-Dimensional Deep Learning for Automated Polyp Detection in Colonoscopy Videos
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS
2017; 21 (1): 65–75
Automated polyp detection in colonoscopy videos has been demonstrated to be a promising way for colorectal cancer prevention and diagnosis. Traditional manual screening is time consuming, operator dependent, and error prone; hence, automated detection approach is highly demanded in clinical practice. However, automated polyp detection is very challenging due to high intraclass variations in polyp size, color, shape, and texture, and low interclass variations between polyps and hard mimics. In this paper, we propose a novel offline and online three-dimensional (3-D) deep learning integration framework by leveraging the 3-D fully convolutional network (3D-FCN) to tackle this challenging problem. Compared with the previous methods employing hand-crafted features or 2-D convolutional neural network, the 3D-FCN is capable of learning more representative spatio-temporal features from colonoscopy videos, and hence has more powerful discrimination capability. More importantly, we propose a novel online learning scheme to deal with the problem of limited training data by harnessing the specific information of an input video in the learning process. We integrate offline and online learning to effectively reduce the number of false positives generated by the offline network and further improve the detection performance. Extensive experiments on the dataset of MICCAI 2015 Challenge on Polyp Detection demonstrated the better performance of our method when compared with other competitors.
View details for DOI 10.1109/JBHI.2016.2637004
View details for Web of Science ID 000395538500008
View details for PubMedID 28114049
- 3D U-net with Multi-level Deep Supervision: Fully Automatic Segmentation of Proximal Femur in 3D MR Images SPRINGER INTERNATIONAL PUBLISHING AG. 2017: 274–82
- Deep Cascaded Networks for Sparsely Distributed Object Detection from Medical Images DEEP LEARNING FOR MEDICAL IMAGE ANALYSIS 2017: 133–54
- AGNet: Attention-Guided Network for Surgical Tool Presence Detection SPRINGER INTERNATIONAL PUBLISHING AG. 2017: 186–94
- 3D FractalNet: Dense Volumetric Segmentation for Cardiovascular MRI Volumes SPRINGER INTERNATIONAL PUBLISHING AG. 2017: 103–10
Fine-Grained Recurrent Neural Networks for Automatic Prostate Segmentation in Ultrasound Images
ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE. 2017: 1633–39
View details for Web of Science ID 000485630701094
Volumetric ConvNets with Mixed Residual Connections for Automated Prostate Segmentation from 3D MR Images
ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE. 2017: 66–72
View details for Web of Science ID 000485630700010
Automatic Detection of Cerebral Microbleeds From MR Images via 3D Convolutional Neural Networks
IEEE TRANSACTIONS ON MEDICAL IMAGING
2016; 35 (5): 1182–95
Cerebral microbleeds (CMBs) are small haemorrhages nearby blood vessels. They have been recognized as important diagnostic biomarkers for many cerebrovascular diseases and cognitive dysfunctions. In current clinical routine, CMBs are manually labelled by radiologists but this procedure is laborious, time-consuming, and error prone. In this paper, we propose a novel automatic method to detect CMBs from magnetic resonance (MR) images by exploiting the 3D convolutional neural network (CNN). Compared with previous methods that employed either low-level hand-crafted descriptors or 2D CNNs, our method can take full advantage of spatial contextual information in MR volumes to extract more representative high-level features for CMBs, and hence achieve a much better detection accuracy. To further improve the detection performance while reducing the computational cost, we propose a cascaded framework under 3D CNNs for the task of CMB detection. We first exploit a 3D fully convolutional network (FCN) strategy to retrieve the candidates with high probabilities of being CMBs, and then apply a well-trained 3D CNN discrimination model to distinguish CMBs from hard mimics. Compared with traditional sliding window strategy, the proposed 3D FCN strategy can remove massive redundant computations and dramatically speed up the detection process. We constructed a large dataset with 320 volumetric MR scans and performed extensive experiments to validate the proposed method, which achieved a high sensitivity of 93.16% with an average number of 2.74 false positives per subject, outperforming previous methods using low-level descriptors or 2D CNNs by a significant margin. The proposed method, in principle, can be adapted to other biomarker detection tasks from volumetric medical data.
View details for DOI 10.1109/TMI.2016.2528129
View details for Web of Science ID 000375550500004
View details for PubMedID 26886975
- DCAN: Deep Contour-Aware Networks for Accurate Gland Segmentation IEEE. 2016: 2487–96
Automatic Cerebral Microbleeds Detection from MR Images via Independent Subspace Analysis Based Hierarchical Features
IEEE. 2015: 7933–36
With the development of susceptibility weighted imaging (SWI) technology, cerebral microbleed (CMB) detection is increasingly essential in cerebrovascular diseases diagnosis and cognitive impairment assessment. Clinical CMB detection is based on manual rating which is subjective and time-consuming with limited reproducibility. In this paper, we propose a computer-aided system for automatic detection of CMBs from brain SWI images. Our approach detects the CMBs within three stages: (i) candidates screening based on intensity values (ii) compact 3D hierarchical features extraction via a stacked convolutional Independent Subspace Analysis (ISA) network (iii) false positive candidates removal with a support vector machine (SVM) classifier based on the learned representation features from ISA. Experimental results on 19 subjects (161 CMBs) achieve a high sensitivity of 89.44% with an average of 7.7 and 0.9 false positives per subject and per CMB, respectively, which validate the efficacy of our approach.
View details for Web of Science ID 000371717208052
View details for PubMedID 26738132
AUTOMATIC DETECTION OF CEREBRAL MICROBLEEDS VIA DEEP LEARNING BASED 3D FEATURE REPRESENTATION
IEEE. 2015: 764–67
View details for Web of Science ID 000380546000183