Elsevier

Pattern Recognition

Volume 63, March 2017, Pages 561-569
Pattern Recognition

Supervoxel classification forests for estimating pairwise image correspondences

https://doi.org/10.1016/j.patcog.2016.09.026Get rights and content

Highlights

  • A method for using random forests to estimate image correspondences is proposed.

  • The method does not rely on the availability of manual label annotations.

  • Labels for training are obtained via the use of supervoxels.

  • The efficient method is effective at providing an estimate of image correspondences.

Abstract

This article presents a general method for estimating pairwise image correspondences, which is a fundamental problem in image analysis. The method consists of over-segmenting a pair of images into supervoxels. A forest classifier is then trained on one of the images, the source, by using supervoxel indices as voxel-wise class labels. Applying the forest on the other image, the target, yields a supervoxel labelling, which is then regularised using majority voting within the boundaries of the target's supervoxels. This yields semi-dense correspondences in a fully automatic, unsupervised, efficient and robust manner. The advantage of our approach is that no prior information or manual annotations are required, making it suitable as a general initialisation component for various medical imaging tasks that require coarse correspondences, such as atlas/patch-based segmentation, registration, and atlas construction. We demonstrate the effectiveness of our approach in two different applications: a) initialisation of longitudinal registration on spine CT data of 96 patients, and b) atlas-based image segmentation using 150 abdominal CT images. Comparison to state-of-the-art methods demonstrate the potential of supervoxel classification forests for estimating image correspondences.

Introduction

Establishing correspondences between images is a fundamental and important problem in many medical image analysis tasks. To this end, dedicated image registration techniques have been developed and successfully employed in fully automated analysis pipelines [1]. Many of these techniques work best when applied on particular types of images, such as brain scans, where simple initialisation strategies work well. In general settings, however, the to-be-registered images might capture very different fields of view, as is often the case in pre- and post-operative abdominal scans. In such settings, estimating an initial alignment can be quite challenging if no prior information is available. It can be beneficial to utilise anatomy recognition and landmark detection methods, which provide spatial priors for registration [2]. However, this requires an annotated image dataset for training; obtaining a large number of manually annotated images can be tedious, costly, and time-consuming.

We propose a general method for estimating initial pairwise correspondences between images, which does not require any prior information or manual annotations. To do so, we employ random classification forests [3], but, in contrast to previous work, class labels for training are generated automatically. Our method consists of over-segmenting a pair of images into supervoxels. We then train a forest classifier on one of the images – the source image – by using its supervoxels indices as voxel-wise class labels. Applying the forest on the other image – the target image – yields a supervoxel label prediction for each of its voxels. Majority voting is then carried out within the supervoxels of the target image, where each voxel casts a vote as to what the final supervoxel label should be. The final labelling yields correspondences between the supervoxels of the two images. Supervoxels are an ideal representation for semi-densely distributed correspondences, relaxing the one-to-one matching assumption between images. Having a set of initial correspondences between two images, on a supervoxel level, can help solve the initialisation problem for many image analysis tasks such as atlas/patch-based segmentation [4], [5], registration, and atlas construction.

The main advantage of a supervoxel classification forest (SVF) is that it does not rely on any prior manual annotations, making it possible to train a forest on an unlabelled image. Using supervoxels that follow boundaries make it possible to perform matching between regions that have different shapes and avoid the constraints of rectangular-shaped patches that tend to contain elements from multiple anatomical regions.

Random forests [3], as a supervised machine learning technique, have found many successful applications in medical image analysis [6], [7], [8], [9]; this is mainly due to their accuracy, robustness, and scalability. They rely on the availability of labelled images, which is in contrast to the approach taken in this paper: the labels for training are generated automatically. While, traditionally, forests are trained on a dataset containing many images, the idea of encoding a single labelled image (or “atlas”) as a forest [9] has been proposed recently in the context of multi-atlas label propagation. This has inspired our idea of using the atlas-forest approach to encode a single source image into a collection of homogeneous regions, obtained automatically via supervoxelisation. Those supervoxel/region-based labels can then be used to predict matching regions in another target image. Supervoxels – and their 2D counterpart, superpixels – have found many applications in computer vision [10], [11]. They allow the grouping of voxels into locally consistent regions that have similar appearance characteristics, thereby reducing redundancy and computational complexity. Supervoxels are mainly used within segmentation pipelines. We are not aware of previous work that has used supervoxels as label entities in classification forests, in particular, with the aim of establishing image correspondences.

In [2], random classification forests are used to provide spatial priors to initialise image registration and it has been shown that those priors yield improved registration of spine CT images. Their method relies on the availability of annotated images. Our method can be used for the similar task of providing priors for registration, except that there is no need for annotated images for training.

Random forests have been used to train on unlabelled datasets before, mainly in the context of density estimation [8] and clustering [3], [12]. For density estimation, the forest, also called density forest, is trained on unlabelled data by assuming multi-variate Gaussian distributions over feature responses at the split nodes. For clustering, the forest is used to extract a similarity measure between points, where two points are considered similar if they both end up in the same leaf node of a tree. The predictions of all trees are then aggregated to get a similarity measure between points. To train the forest to cluster unlabelled data, two dummy labels are introduced: class label 1 assigned to the unlabelled observed data and a class label 2 is assigned to a synthetic dataset. The forest is then trained to distinguish between the observed unlabelled dataset and the synthetic dataset.

Section snippets

Problem formulation

The aim of our method is to estimate correspondences between a set of image regions, i.e. supervoxels. Let Ii be an image that is over-segmented into distinct regions that are represented by an indexed family of sets SVi={svki}kCi. The image, therefore, consists of |SVi| supervoxels, with the index set Ci={1,,|SVi|} denoting the distinct indices/labels of the supervoxels. Each supervoxel svki={vli}1|svki|, in turn, is a set of voxels vil. With Ni representing the total number of voxels in the

Experiments and results

We evaluate our proposed method on two different datasets. Dense ground-truth one-to-one correspondences between images is hard to obtain; there are datasets available that have sparse correspondences, such as spine CT images, for which the location of the vertebrae centroids are available in form of manual annotations. We use a publicly available spine CT dataset to quantitatively evaluate our method. In addition, we test our proposed method in a simple multi-atlas label propagation (MALP)

Discussion and conclusion

In this paper, we propose a method for estimating correspondences between images on a supervoxel level using random classification forests. The advantage of our approach is that it does not rely on the availability of prior organ annotations. Training a random forest using automatically generated supervoxels as class labels allows training on unlabelled images. Qualitative evaluations of the estimated correspondences, in a registration initialisation setting and in a simple multi-atlas

Fahdi Kanavati received his M.Sc. in Advanced Computing, with distinction, in 2013, from Imperial College London, United Kingdom. He is currently a PhD student in the biomedical image analysis group, BioMedIA, at Imperial College London. His research interests include medical image analysis, computer vision, and machine learning.

References (21)

There are more references available in the full text version of this article.

Cited by (29)

  • From complex to neural networks

    2021, Big Data in Psychiatry and Neurology
  • Multi-scale superpatch matching using dual superpixel descriptors

    2020, Pattern Recognition Letters
    Citation Excerpt :

    Therefore, a process applied at such over-segmentation scale can be close to the optimal pixel-wise result. Several works have used superpixels in non-local frameworks, e.g., [12,29], or in unsupervised learning-based superpixel matching approaches using random forests [6,16]. Nevertheless, the geometrical irregularity of such decompositions [11] (i.e., in terms of shape, adjacency or contour smoothness) can become an issue, since neighborhood information is crucial to compute accurate matches in terms of context.

  • Unsupervised learning-based long-term superpixel tracking

    2019, Image and Vision Computing
    Citation Excerpt :

    In summary, two main contributions are proposed towards accurate long-term superpixel tracking. First, unsupervised learning-based superpixel matching is generalized and adapted from medical image processing [16,17] to computer vision in order to find associations along video sequences between consecutive and distant images decomposed into superpixels (Section 2). The approach is carried out using classifiers such as k-nearest neighbors (kNN) or RF [18], incorporates new forward-backward consistency constraints and fully exploits dedicated context-rich features we extended from greyscale [26,16,17] to multi-channel to incorporate neighborhood information on RGB frames.

  • SQL: Superpixels via quaternary labeling

    2019, Pattern Recognition
    Citation Excerpt :

    A variety of computer vision and pattern recognition problems have benefited from above advantages [4]: feature extraction [5], clustering [6], classification [7], segmentation [8–10], saliency detection [11], contour detection [12], stereo computation [13–15], objectness measure [16], proposal generation [17], object localization [18] and object tracking [19–21] to name a few. They also cover some domain specific applications such as remotely sensed image analysis [22,23] and medical image analysis [24,25]. Few approaches produce superpixels that conform to a regular lattice [26–28].

  • Random forests in medical image computing

    2019, Handbook of Medical Image Computing and Computer Assisted Intervention
View all citing articles on Scopus

Fahdi Kanavati received his M.Sc. in Advanced Computing, with distinction, in 2013, from Imperial College London, United Kingdom. He is currently a PhD student in the biomedical image analysis group, BioMedIA, at Imperial College London. His research interests include medical image analysis, computer vision, and machine learning.

View full text