VoxResNet: Deep voxelwise residual networks for brain segmentation from 3D MR images
Introduction
Brain parcellation from volumetric medical images, especially 3D magnetic resonance (MR) images, is a prerequisite for quantifying the structural volumes. It is of great significance on diagnosis, progression assessment, and treatment of a wide range of neurodegenerative diseases such as dementia and Alzheimer's disease (Petrella et al., 2003, Giorgio and De Stefano, 2013). Particularly, the segmentation of brain tissues into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF) is essential for measuring and visualizing anatomical structures (Wright et al., 2014), analyzing the brain changes (Zhang et al., 2015), conducting large-scale studies with images acquired at all ages (Moeskops et al., 2016, Thambisetty et al., 2010), making surgical planning and performing image-guided interventions (Despotović et al., 2015).
However, manual segmentation of brain structures from 3D images is an extremely laborious and time-consuming task, which requires a sophisticated knowledge base of brain anatomy, and is difficult, if not impossible, to be performed at a large scale. Furthermore, manual segmentation suffers from low reproducibility, which is easily prone to errors due to inter- or intra-operator variabilities. In this regard, automated segmentation methods are highly desired in practice for providing consistent measurements and quantitative analyses. However, the automatic brain segmentation is quite challenging due to the low contrast of anatomical structures in some modalities, the large intra-class variations of these structures among different subjects (Moeskops et al., 2016) or caused by various lesions (Menze et al., 2015, Maier et al., 2017), the confounding appearance of different inter-class anatomical regions, etc. In Fig. 1, we illustrate the appearance of key structures in brain, including WM, GM, and CSF, from different image modalities. From these images, we can observe the challenges of brain segmentation mentioned above. In addition, the imaging acquisition protocol can affect the imaging quality significantly, which further poses challenges for automated segmentation methods (Despotović et al., 2015).
In the past decade, a lot of automated methods have been developed for brain segmentation. Broadly speaking, they can be categorized into three classes. (1) Machine learning based methods with hand-crafted features. This kind of method employs different classifiers with various hand-crafted features, such as support vector machines (SVM) with spatial and intensity features (van Opbroek et al., 2013, Moeskops and Benders, 2015, Moeskops et al., 2015), Gaussian mixture models (GMM) with intensity features (Ashburner and Friston, 2005, Rajchl et al., 2013, Prakash and Kumari,), random forests (RF) with 3D Haar like features (Wang et al., 2015) or appearance as well as spatial features (Pereira et al., 2016). However, the main limitation of these methods is that hand-crafted features usually suffer from limited representation capability for accurate recognition, considering the large variations of brain structures. (2) Deep learning based methods with automatically end-to-end learned features. These methods learn the feature representations in a data-driven way, such as 3D convolutional neural network (Cicek et al., 2016), parallelized long short-term memory (LSTM) (Stollenga et al., 2015), convolutional neural networks with multiple pathways (de Brebisson and Montana, 2015) or multiple patch and kernel sizes (Moeskops et al., 2016), and 2D fully convolutional networks (FCN) (Nie et al., 2016). These methods can achieve more accurate segmentation results without manually designing sophisticated input features explicitly. Nevertheless, more elegant architectures such as residual networks are required to further advance the performance. In addition, the complementary information of different modalities and multi-level contextual features should be sufficiently considered to enhance the discrimination capability of the generated features. (3) Multi-atlas registration based methods (Klein and Hirsch, 2005, Aljabar et al., 2009, Artaechevarria et al., 2009, Sarikaya et al., 2013, Habas et al., 2010, Shi et al., 2010). For example, multi-atlas label fusion (MALF) made use of multiple reference atlases and achieved good performance in brain segmentation tasks (Aljabar et al., 2009, Lötjönen et al., 2010, Wang et al., 2013, Heckemann et al., 2006). However, current MALF methods often employ single image modality for segmentation or treat each modality equally when employing multiple image modalities. Furthermore, these methods are usually computationally expensive, making them infeasible to be used in applications requiring fast processing speed. In addition, the errors originated from the registration process can decrease the accuracy of fusion results from multiple atlases.
Recent years, deep learning especially deep convolutional neural networks (CNNs) have emerged as one of the most prominent approaches for image recognition problems in both natural image processing (Krizhevsky et al., 2012, Simonyan and Zisserman, 2014, Long et al., 2015, Szegedy et al., 2015, Chen et al., 2015, Ji et al., 2013) and medical image analysis (Prasoon et al., 2013, Ronneberger et al., 2015, Chen et al., 2015a, Shin et al., 2016, Nogues et al., 2016, Li et al., 2014, Zheng et al., 2015, Chen et al., 2017). Although significant improvements have been achieved in many applications compared to previous methods employing hand-crafted features, most of these studies focused on the 2D images. However, in the field of medical image computing, volumetric data accounts for a large portion of medical image modalities, such as 3D computed tomography (CT), 3D MR images, 3D ultrasound, etc. Note that developing an effective 3D neural network is quite challenging due to not only the higher dimensionality but also the more complicated anatomical environment along with volumetric data than in 2D images.
To our best knowledge, nowadays there are two main types of CNNs developed for volumetric image processing. The first type employed modified variants of 2D CNNs by taking single slice (Lee et al., 2011), aggregated adjacent slices (Chen et al., 2015, Zhang et al., 2015) or orthogonal planes (i.e., axial, coronal and sagittal) (Prasoon et al., 2013, Roth et al., 2014) as input to make up three dimensional spatial information. Although preliminary good performance has been validated, these methods cannot sufficiently make use of the 3D contextual information, which greatly limited their capability to segment objects from volumetric data more accurately. The other type of methods employed real 3D CNNs to detect or segment objects from volumetric data and demonstrated compelling performance (Chen et al., 2016, Cicek et al., 2016, Dou et al., 2016, Dou et al., 2016, Dou et al.,, Kamnitsas et al., 2017, Merkow et al., 2015, Milletari et al., 2016, Yu et al., 2017). Nevertheless, these methods may suffer from limited representation capability using a relatively shallow network. On the other hand, when we want to train deep neural networks to capture more representative features, we may confront the degradation problem, where the performance of the network gets saturated and then degrades rapidly if simply increasing the depth of network without any effective training schemes (He et al., 2016a).
Recently, deep residual learning with substantially enlarged depth advanced the state-of-the-art performance on 2D image recognition tasks (He et al., 2016, He et al., 2016, Lequan et al., 2016). Instead of simply stacking layers, it alleviated the optimization degradation issue by approximating the objective function with residual functions, which are skip connections between layers of the network. Such a technique allows a network to pass derivatives backwards through the network sometimes skipping layers (i.e., not passing through all non-linearities). In this paper, we propose a novel voxelwise residual network (VoxResNet) to cope with the challenging problem of segmentation of key brain tissues from 3D MR images by introducing residual learning to volumetric data processing. As mentioned previously, the main merit of residual learning is that it can alleviate the degradation problem when training a deeper network so that the performance gains achieved by increasing network depth can be fully leveraged. With this technique, our VoxResNet is built with 25 layers, and hence can generate more powerful features to deal with the large variations of brain tissues than its competitors either using hand-crafted features or applying shallower networks. In order to effectively train such a deep network for brain segmentation, we seamlessly integrate multi-modality and multi-level contextual information into our network, so that the complementary information of different modalities can be harnessed and features of different scales can be exploited. Furthermore, an auto-context version of VoxResNet is proposed by combining the low-level image appearance features, implicit shape information, and high-level context together for further improving the segmentation performance. The auto-context is a well-known and effective algorithm for image segmentation by integrating low-level and context information through fusing a large number of low-level appearance features with context and implicit shape information (Tu, 2008, Tu and Bai, 2010). Extensive experiments on the well-known benchmark (i.e., MRBrainS) of brain segmentation from 3D MR images corroborated the efficacy of the proposed VoxResNet. Our method achieved the first place in the challenge out of 37 competitors including several state-of-the-art brain segmentation methods. Our main contributions can be summarised as follows:
1) We propose a novel deep voxelwise residual network, referred as VoxResNet, which borrows the spirit of deep residual learning from 2D image recognition tasks and extends it into a 3D variant to fully explore the volumetric spatial information for accurate segmentation of brain structures from 3D MR images.
2) To tackle the large variation of brain structures, we validate the efficacy and necessity of complementary information from multiple imaging modalities and multi-level contextual feature representations by integrating them within our unified deep learning framework.
3) An auto-context version of VoxResNet is proposed by seamlessly integrating the low-level image appearance features, implicit shape information, and high-level context together for further improving the volumetric segmentation performance. Extensive experiments on a well-known benchmark dataset corroborated the efficacy of our method, outperforming other state-of-the-art methods by a great margin.
The remainder of paper is organized as follows. In Section 2, we first describe the experimental datasets, then elaborate deep residual learning for effective feature representations and detail the proposed VoxResNet for volumetric brain segmentation. We report the experiments and results in Section 3. We further discuss and analyze our study in Section 4. Finally, conclusions are drawn in Section 5.
Section snippets
Data acquisition and pre-processing
We validated our method on the 2013 MICCAI MRBrainS challenge, which is a well-known benchmark for evaluating algorithms on brain segmentation. The target of MRBrainS challenge is to segment the brain into four-class structures, i.e., WM, GM, CSF, and background. The datasets were acquired at the UMC Utrecht of patients with diabetes and matched controls with varying degrees of atrophy and white matter lesions (Mendrik et al., 2015). Multi-sequence 3 T MRI brain scans, including T1, T1-IR, and
Evaluation metrics
The evaluation metrics of MRBrainS challenge consist of three types of measures: Dice coefficient (DC), the 95th-percentile of the Hausdorff distance (HD) and absolute volume difference (AVD), which are calculated for each tissue type (i.e., GM, WM, and CSF), respectively (Mendrik et al., 2015). The Dice coefficient measures the spatial overlap between the segmentation result and ground truth, with a larger value denoting a higher segmentation accuracy. It is defined as
Discussion
We proposed a deep voxelwise residual network for brain segmentation from 3D MR images. The deep residual learning technique was originally developed for recognition in 2D images. In this paper, we generalize it with 3D convolutions and develop a set of effective training schemes for handling the segmentation of 3D brain MR images. The proposed VoxResNet can fully explore the spatial contextual information and generate more distinctive features to achieve much better performance compared to the
Conclusions
In this paper, we developed a novel 3D residual network, named VoxResNet, and analyzed its capability in automatically segmenting brain structures from 3D MR images. Our method extended the 2D residual learning into a 3D variant for solving challenging segmentation tasks from volumetric data with a deeper network than our previous competitors. Both multi-modality and multi-level contextual information were elegantly integrated into our end-to-end network to improve the segmentation performance.
Acknowledgments
The work described in this paper was supported by Hong Kong Research Grants Council, General Research Fund (Project number-14203115).
References (81)
- et al.
Multi-atlas based segmentation of brain images: atlas selection and its effect on accuracy
NeuroImage
(2009) - et al.
Unified segmentation
NeuroImage
(2005) - et al.
Dcan: deep contour-aware networks for object instance segmentation from histology images
Med. Image Anal.
(2017) - et al.
Automatic anatomical brain MRI segmentation combining label propagation and decision fusion
NeuroImage
(2006) - et al.
Evaluation of automatic neonatal brain segmentation algorithms: the neobrains 12 challenge
Med. Image Anal.
(2015) - et al.
Efficient multi-scale 3d CNN with fully connected CRF for accurate brain lesion segmentation
Med. Image Anal.
(2017) - et al.
Mindboggle: a scatterbrained approach to automate brain labeling
NeuroImage
(2005) - et al.
Fast and robust multi-atlas segmentation of brain magnetic resonance images
NeuroImage
(2010) - et al.
ISLES 2015-a public evaluation benchmark for ischemic stroke lesion segmentation from multispectral MRI
Med. Image Anal.
(2017) - et al.
Automatic segmentation of mr brain images of preterm infants using supervised classification
NeuroImage
(2015)