Supervoxel classification forests for estimating pairwise image correspondences

doi:10.1016/j.patcog.2016.09.026

Pattern Recognition

Volume 63, March 2017, Pages 561-569

https://doi.org/10.1016/j.patcog.2016.09.026 Get rights and content

Highlights

•
A method for using random forests to estimate image correspondences is proposed.
•
The method does not rely on the availability of manual label annotations.
•
Labels for training are obtained via the use of supervoxels.
•
The efficient method is effective at providing an estimate of image correspondences.

Abstract

This article presents a general method for estimating pairwise image correspondences, which is a fundamental problem in image analysis. The method consists of over-segmenting a pair of images into supervoxels. A forest classifier is then trained on one of the images, the source, by using supervoxel indices as voxel-wise class labels. Applying the forest on the other image, the target, yields a supervoxel labelling, which is then regularised using majority voting within the boundaries of the target's supervoxels. This yields semi-dense correspondences in a fully automatic, unsupervised, efficient and robust manner. The advantage of our approach is that no prior information or manual annotations are required, making it suitable as a general initialisation component for various medical imaging tasks that require coarse correspondences, such as atlas/patch-based segmentation, registration, and atlas construction. We demonstrate the effectiveness of our approach in two different applications: a) initialisation of longitudinal registration on spine CT data of 96 patients, and b) atlas-based image segmentation using 150 abdominal CT images. Comparison to state-of-the-art methods demonstrate the potential of supervoxel classification forests for estimating image correspondences.

Introduction

Establishing correspondences between images is a fundamental and important problem in many medical image analysis tasks. To this end, dedicated image registration techniques have been developed and successfully employed in fully automated analysis pipelines [1]. Many of these techniques work best when applied on particular types of images, such as brain scans, where simple initialisation strategies work well. In general settings, however, the to-be-registered images might capture very different fields of view, as is often the case in pre- and post-operative abdominal scans. In such settings, estimating an initial alignment can be quite challenging if no prior information is available. It can be beneficial to utilise anatomy recognition and landmark detection methods, which provide spatial priors for registration [2]. However, this requires an annotated image dataset for training; obtaining a large number of manually annotated images can be tedious, costly, and time-consuming.

We propose a general method for estimating initial pairwise correspondences between images, which does not require any prior information or manual annotations. To do so, we employ random classification forests [3], but, in contrast to previous work, class labels for training are generated automatically. Our method consists of over-segmenting a pair of images into supervoxels. We then train a forest classifier on one of the images – the source image – by using its supervoxels indices as voxel-wise class labels. Applying the forest on the other image – the target image – yields a supervoxel label prediction for each of its voxels. Majority voting is then carried out within the supervoxels of the target image, where each voxel casts a vote as to what the final supervoxel label should be. The final labelling yields correspondences between the supervoxels of the two images. Supervoxels are an ideal representation for semi-densely distributed correspondences, relaxing the one-to-one matching assumption between images. Having a set of initial correspondences between two images, on a supervoxel level, can help solve the initialisation problem for many image analysis tasks such as atlas/patch-based segmentation [4], [5], registration, and atlas construction.

The main advantage of a supervoxel classification forest (SVF) is that it does not rely on any prior manual annotations, making it possible to train a forest on an unlabelled image. Using supervoxels that follow boundaries make it possible to perform matching between regions that have different shapes and avoid the constraints of rectangular-shaped patches that tend to contain elements from multiple anatomical regions.

Random forests [3], as a supervised machine learning technique, have found many successful applications in medical image analysis [6], [7], [8], [9]; this is mainly due to their accuracy, robustness, and scalability. They rely on the availability of labelled images, which is in contrast to the approach taken in this paper: the labels for training are generated automatically. While, traditionally, forests are trained on a dataset containing many images, the idea of encoding a single labelled image (or “atlas”) as a forest [9] has been proposed recently in the context of multi-atlas label propagation. This has inspired our idea of using the atlas-forest approach to encode a single source image into a collection of homogeneous regions, obtained automatically via supervoxelisation. Those supervoxel/region-based labels can then be used to predict matching regions in another target image. Supervoxels – and their 2D counterpart, superpixels – have found many applications in computer vision [10], [11]. They allow the grouping of voxels into locally consistent regions that have similar appearance characteristics, thereby reducing redundancy and computational complexity. Supervoxels are mainly used within segmentation pipelines. We are not aware of previous work that has used supervoxels as label entities in classification forests, in particular, with the aim of establishing image correspondences.

In [2], random classification forests are used to provide spatial priors to initialise image registration and it has been shown that those priors yield improved registration of spine CT images. Their method relies on the availability of annotated images. Our method can be used for the similar task of providing priors for registration, except that there is no need for annotated images for training.

Random forests have been used to train on unlabelled datasets before, mainly in the context of density estimation [8] and clustering [3], [12]. For density estimation, the forest, also called density forest, is trained on unlabelled data by assuming multi-variate Gaussian distributions over feature responses at the split nodes. For clustering, the forest is used to extract a similarity measure between points, where two points are considered similar if they both end up in the same leaf node of a tree. The predictions of all trees are then aggregated to get a similarity measure between points. To train the forest to cluster unlabelled data, two dummy labels are introduced: class label 1 assigned to the unlabelled observed data and a class label 2 is assigned to a synthetic dataset. The forest is then trained to distinguish between the observed unlabelled dataset and the synthetic dataset.

Section snippets

Problem formulation

The aim of our method is to estimate correspondences between a set of image regions, i.e. supervoxels. Let I_i be an image that is over-segmented into distinct regions that are represented by an indexed family of sets ${SV}^{i} = {{sv}_{k}^{i}}_{k \in C^{i}}$ . The image, therefore, consists of $| {SV}^{i} |$ supervoxels, with the index set $C^{i} = {1, \dots, | {SV}^{i} |}$ denoting the distinct indices/labels of the supervoxels. Each supervoxel ${sv}_{k}^{i} = {v_{l}^{i}}_{1}^{| {sv}_{k}^{i} |}$ , in turn, is a set of voxels vⁱ_l. With Nⁱ representing the total number of voxels in the

Experiments and results

We evaluate our proposed method on two different datasets. Dense ground-truth one-to-one correspondences between images is hard to obtain; there are datasets available that have sparse correspondences, such as spine CT images, for which the location of the vertebrae centroids are available in form of manual annotations. We use a publicly available spine CT dataset to quantitatively evaluate our method. In addition, we test our proposed method in a simple multi-atlas label propagation (MALP)

Discussion and conclusion

In this paper, we propose a method for estimating correspondences between images on a supervoxel level using random classification forests. The advantage of our approach is that it does not rely on the availability of prior organ annotations. Training a random forest using automatically generated supervoxels as class labels allows training on unlabelled images. Qualitative evaluations of the estimated correspondences, in a registration initialisation setting and in a simple multi-atlas

Fahdi Kanavati received his M.Sc. in Advanced Computing, with distinction, in 2013, from Imperial College London, United Kingdom. He is currently a PhD student in the biomedical image analysis group, BioMedIA, at Imperial College London. His research interests include medical image analysis, computer vision, and machine learning.

References (21)

B. Zitova et al.
Image registration methods: a survey
Image Vis. Comput.
(2003)
R.A. Heckemann et al.
Automatic anatomical brain MRI segmentation combining label propagation and decision fusion
NeuroImage
(2006)
P. Coupé et al.
Patch-based segmentation using expert priors: application to hippocampus and ventricle segmentation
NeuroImage
(2011)
T. Tong et al.
Discriminative dictionary learning for abdominal multi-organ segmentation
Med. Image Anal.
(2015)
B. Glocker, D. Zikic, D.R. Haynor, Robust Registration of Longitudinal Spine CT, in: Medical Image Computing and...
L. Breiman, Random forests, Machine learning, 2001, 5-32ISSN...
A. Criminisi, J. Shotton, D. Robertson, E. Konukoglu, Regression forests for efficient anatomy detection and...
A. Montillo, J. Shotton, J. Winn, J. E. Iglesias, D. Metaxas, A. Criminisi, Entangled decision forests and their...
A. Criminisi et al.
Decision forests for classification, regression, density estimation, manifold learning and semi-supervised learning
Learning
(2011)
D. Zikic, B. Glocker, A. Criminisi, Encoding atlases by randomized classification forests for efficient multi-atlas...

There are more references available in the full text version of this article.

Cited by (29)

Dense correspondence of deformable volumetric images via deep spectral embedding and descriptor learning
2022, Medical Image Analysis
Deformable image correspondence plays an essential role in a variety of medical image analysis tasks. Most existing deep learning-based registration and correspondence techniques exploit metric space alignments in the spatial domain and learn a nonlinear voxel-wise mapping function between volumetric images and displacement fields, agnostic to intrinsic structure correspondence. When confronted with high-frequency perturbations of patients’ poses and anatomical structural variations, they relied on prior rigid and affine transformations, as well as additional segmentation masks and landmark annotations for reliable registration. This paper presents a data-driven spectral mapping-based correspondence framework to handle the intrinsic correspondence of anatomical structures. At the core of our approach lies a deep convolutional framework that approximates spectral bases and optimizes volumetric descriptors. The multi-path graph convolutional network-based spectral embedding approximation module relieves the computationally expensive eigendecomposition-based embedding of volumetric images. The deep descriptor learning module surpasses the prior hand-crafted descriptors and the descriptor selection. We showcase the efficacy of the core modules, i.e., the spectral embedding approximation and descriptor learning, for volumetric image correspondence and the atlas-based registration on two volumetric image datasets. The proposed method achieves comparable correspondence accuracy with the state-of-the-art deep registration models, resilient to pose and shape perturbations.
From complex to neural networks
2021, Big Data in Psychiatry and Neurology
Quantitative neuroscience is trying to exploit the increasing number of large data sharing initiatives; therefore, Big Data analytics can play a pivotal role. So far, especially for neuroimaging, two different strategies have been largely explored: voxel-based and region of interest-based approaches. A common idea is that through quantitative features extracted by brain models it is possible to learn specific patterns, pathological or physiological, especially with the use of artificial intelligence techniques borrowed by Big Data analytics expertise. However, these approaches can suffer because of several limitations. This is why a third option has gained popularity: complex networks. In this chapter we discuss how brain models can be suitably designed with complex network theory and how this approach can suitably feed learning algorithms, especially deep learning ones. Accordingly, it is possible to design quantitative evaluation frameworks for several purposes as early diagnosis support systems or fully automated age prediction models.
Multi-scale superpatch matching using dual superpixel descriptors
2020, Pattern Recognition Letters
Citation Excerpt :
Therefore, a process applied at such over-segmentation scale can be close to the optimal pixel-wise result. Several works have used superpixels in non-local frameworks, e.g., [12,29], or in unsupervised learning-based superpixel matching approaches using random forests [6,16]. Nevertheless, the geometrical irregularity of such decompositions [11] (i.e., in terms of shape, adjacency or contour smoothness) can become an issue, since neighborhood information is crucial to compute accurate matches in terms of context.
Over-segmentation into superpixels is a very effective dimensionality reduction strategy, enabling fast dense image processing. The main issue of this approach is the inherent irregularity of the image decomposition compared to standard hierarchical multi-resolution schemes, especially when searching for similar neighboring patterns. Several works have attended to overcome this issue by taking into account the region irregularity into their comparison model. Nevertheless, they remain sub-optimal to provide robust and accurate superpixel neighborhood descriptors, since they only compute features within each region, poorly capturing contour information at superpixel borders. In this work, we address these limitations by introducing the dual superpatch, a novel superpixel neighborhood descriptor. This structure contains features computed in reduced superpixel regions, as well as at the interfaces of multiple superpixels to explicitly capture contour structure information. A fast multi-scale non-local matching framework is also introduced for the search of similar descriptors at different resolution levels in an image dataset. The proposed dual superpatch enables to more accurately capture similar structured patterns at different scales, and we demonstrate the robustness and performance of this new strategy on matching and supervised labeling applications.
Unsupervised learning-based long-term superpixel tracking
2019, Image and Vision Computing
Citation Excerpt :
In summary, two main contributions are proposed towards accurate long-term superpixel tracking. First, unsupervised learning-based superpixel matching is generalized and adapted from medical image processing [16,17] to computer vision in order to find associations along video sequences between consecutive and distant images decomposed into superpixels (Section 2). The approach is carried out using classifiers such as k-nearest neighbors (kNN) or RF [18], incorporates new forward-backward consistency constraints and fully exploits dedicated context-rich features we extended from greyscale [26,16,17] to multi-channel to incorporate neighborhood information on RGB frames.
Finding correspondences between structural entities decomposing images is of high interest for computer vision applications. In particular, we analyze how to accurately track superpixels - visual primitives generated by aggregating adjacent pixels sharing similar characteristics - over extended time periods relying on unsupervised learning and temporal integration. A two-step video processing pipeline dedicated to long-term superpixel tracking is proposed. First, unsupervised learning-based superpixel matching provides correspondences between consecutive and distant frames using new context-rich features extended from greyscale to multi-channel and forward-backward consistency constraints. Resulting elementary matches are then combined along multi-step paths running through the whole sequence with various inter-frame distances. This produces a large set of candidate long-term superpixel pairings upon which majority voting is performed. Video object tracking experiments demonstrate the accuracy of our elementary estimator against state-of-the-art methods and proves the ability of multi-step integration to provide accurate long-term superpixel matches compared to usual direct and sequential integration.
SQL: Superpixels via quaternary labeling
2019, Pattern Recognition
Citation Excerpt :
A variety of computer vision and pattern recognition problems have benefited from above advantages [4]: feature extraction [5], clustering [6], classification [7], segmentation [8–10], saliency detection [11], contour detection [12], stereo computation [13–15], objectness measure [16], proposal generation [17], object localization [18] and object tracking [19–21] to name a few. They also cover some domain specific applications such as remotely sensed image analysis [22,23] and medical image analysis [24,25]. Few approaches produce superpixels that conform to a regular lattice [26–28].
This paper formulates superpixel segmentation as a pixel labeling problem and proposes a quaternary labeling algorithm to generate superpixel lattice. It is achieved by seaming overlapped patches regularly placed on the image plane. Patch seaming is formulated as a pixel labeling problem, where each label indexes one patch. Once the optimal seaming is completed, all pixels covered by one retained patch constitute one superpixel. Further, four kinds of patches are distinguished and assembled into four layers correspondingly, and the patch indexes are mapped to the quaternary layer indexes. It significantly reduces the number of labels and greatly improves labelling efficiency. Furthermore, an objective function is developed to achieve optimal segmentation. Lattice structure is guaranteed by fixing patch centers to be superpixel centers, compact superpixels are assured by horizontal and vertical constraints enforced on the smooth terms, and coherent superpixels are achieved by iteratively refining the data terms. Extensive experiments on BSDS data set demonstrate that SQL algorithm significantly improves labeling efficiency, outperforms the other superpixel lattice methods, and is competitive with state-of-the-art methods without lattice guarantee. Superpixel lattice allows contextual relationships among superpixels to be easily modeled by either MRFs or CNN.
Random forests in medical image computing
2019, Handbook of Medical Image Computing and Computer Assisted Intervention
The Random Forests algorithm had a substantial impact on medical image computing over the last decade. This chapter presents basic algorithmic details, some variations proposed in the recent years and applications in medical image computing. Arguably, Random Forests' main impact was on the analysis tasks that required understanding spatial context within the images. We take a specific angle and view Random Forests as a machine learning tool that can integrate contextual information. We position the algorithm and its contributions within the larger field from this respect. Lastly, we briefly discuss how Random Forests and deep learning methods relate to each other and how they differ.

View all citing articles on Scopus

View full text

Supervoxel classification forests for estimating pairwise image correspondences

Highlights

Abstract

Introduction

Section snippets

Problem formulation

Experiments and results

Discussion and conclusion

Image Vis. Comput.

NeuroImage

NeuroImage

Med. Image Anal.

Decision forests for classification, regression, density estimation, manifold learning and semi-supervised learning

Learning