Scale Invariant Feature Transform for n -Dimensional Images ( n -SIFT)

This document describes the implementation of several features previously developed[2], extending the 2D scale invariant feature transform (SIFT)[4, 5] for images of arbitrary dimensionality, such as 3D medical image volumes and time series, using ITK 1 . Speciﬁcally, we provide a scale invariant implementation of a weighted histogram of gradient feature, a rotationally invariant version of the histogram feature


Implementation
1 Scale Invariant Features The following section provides an overview of the method.For further details, the reader is directed to our previous work [2] 3 .
The three features share the same basic framework -first, extrema of the difference of Gaussian scale space are identified.At these points, features are computed using the gradients around the points.Once features are computed for two images, we can compare feature vectors and match points that are sufficiently similar.

Salient Point Detection
To sample the difference of Gaussian space in a scale-invariant manner, Gaussian filter the image multiple times at sigma=σ, kσ, k2 σ, . ... For each pair of "adjacent" Gaussian filtered images (e.g.k j σ and k j+1 σ), we compute the difference of Gaussian by taking the difference of the images.This generates a series of difference of Gaussian images.
Salient points are extracted by locating local extrema in these images.A voxel is considered an extrema if it is the local maximum or minimum of all voxels orthogonally and diagonally adjacent in one of the difference of Gaussian images, as well as in the corresponding voxels in the preceding and successive images of the difference of Gaussian series.
Scale invariance is achieved by taking the Gaussian filtered image with sigma = σ and downsizing all dimensions by one half.Extrema in the difference of Gaussian space is then examined for this downsized image.
To continue sampling scale space, we can repeat the process, testing the quarter-sized filtered version of the original image, and so on.

Feature Generation
At each of the extrema located, we examine the gradients near the point to compute a distinctive feature vector.The global histogram feature generates a single multi-dimensional histogram to summarise the local gradients.The bins of the histogram cover the hyperspherical coordinates of the gradients.For each gradient of the image, the value placed into the histogram is weighted by the magnitude of the gradient as well as a Gaussian centred on the feature point.The histogram bins form a feature vector, which is normalised.
The reoriented global histogram feature incorporates more robustness to rotation.After the histogram is computed, we rotate the bins such that the highest magnitude bin is always in the first position of the vector.
The n-SIFT feature uses 4 n multidimensional histograms to summarise hypercubic regions around the feature voxel.We divide the 16 n hypercubic region around the feature voxel into 4 n voxel subregions.Each region is summarised by a multidimensional histogram of the hyperspherical coordinates, weighted by magnitude and a Gaussian centred on the feature point.These vectors for these histograms are then concatenated to form the final n-SIFT feature.

Feature Matching
To match points from two images, we use the 2 distance between the feature vectors.To minimise the number of spurious matches, we only accept a match if the ratio of the distance for the best match and the second best match is below a threshold, thus ensuring the second best match is substantially worse than the best match.We also only accept the match if it is a recipricol best match -a feature x matches a feature y if and only if x is the best match for y and y is the best match to x.

Implementation Details
The images are loaded using itk::ImageFileReader and itk::ImageSeriesReader for 3D and 4D images respectively.Gaussian filtering is accomplished using itk::DiscreteGaussianImageFilter, and the difference between images is subsequently computed using itk::SubtractImageFilter.The keypoint-feature pairs are represented as a itk::PointSet, with the features represented using an Array < float >.
To generate synthetic test images, we manipulate the original image using itk::ScalableAffineTransform to generate scaled and rotated versions of the original image.
The code for this implementation conforms as much as possible to the coding style outlined in the Coding Style Guid by the Insight Software Consortium (June 2004) 4 .The code was documented using doxygen 5 .

Using the Compiled Program
The included source compiles separate programs for 3D and 4D data, and for each of the feature types.3Dhistkeys,3Drhistkeys and 3Dsiftkeys generate features for the histogram feature, the reoriented histogram feature and the n-SIFT feature respectively, for 3-dimensional images.All versions of the program 4 http://www.insightsoftwareconsortium.org/documents/policies/Style.pdf 5 http://www.cs.sfu.ca/˜hamarneh/software/doxygen/nsift/output/html/

Synthetic Performance Tests
When performing a synthetic image test, the user specifies the original test image, as well as scaling (--scale) and rotation (--rotate).Rotation is by default centred around the origin, but a flag can instead force rotation around the centre of the image(--rotate-mid).The program will then compute features for the original and the synthetically modified image, and return a list of corresponding voxel locations.As well, since the ground truth is known in this case, it comoputes the number of matches that are within 1.5, 3.0 and 7.5 of the corresponding location.For example, the following command will test rotation by 10 degrees about the centre of the image.
3Dsiftkeys --synthetic --rotate-mid --rotate 10 brainweb165a10f17.mha The output reports the number of extrema found when testing each set of difference of Gaussian images.For example: Searching for Extrema in DoG Image 0-1 ( 90 108 90 ) Scale 0.5 Acc.Num Max: 29 Acc.Num Min: 25 Acc.Num Reject: 0 reports, for the initial pyramid level (0), and the first set of three difference of Gaussian images, the presence of 29 maxima, 25 minima, and no extrema were rejected due to the voxel intensity being too low.( 90 108 90 ) reports the dimensions -90 × 108 × 90 voxels -of the image being tested.
At the end of the run, summary statistics are given to help evaluate the performance of the feature for keypoint matching.First, the total number of keypoints found in each image is given Here, we report 219 matches generated.Of these, 190 of the 219 match points that are within 1.5 voxels of the "true" match.Again, we also report the results at the error levels of 3.0 voxels and 7.5 voxels.
We can then examine the performance of the various feature types under angular rotation.We can continue to test the image file brainweb165a10f17.mha, and summarise the results after differing amounts of rotation and for different feature types.Table 1 reports the number of points correctly matched for the three feature types after 6, 10, 20 and 40 degree rotation, and Table 2 reports the fraction of points correctly matched, at 1.5 voxel error.

Software Requirements
The following open source packages were used: • CMake8 2.4 • gcc9 3.4.4 No further dependencies were used.The software was developed and compiled under the Cygwin10 development environment for 32-bit Windows XP.It should be sufficient, assuming the presence of CMake and ITK, to invoke cmake .; make to build the example executables.
To link the code into your own program, simply include itkScaleInvariantFeatureImageFilter.h.You will need to load the images for analysis.Then, using an appropriate instance of itk::ScaleInvariantFeatureImageFilter for the image type and dimensionality, you can call the method getSiftFeatures to generate the set of points with the scale invariant features, and also call the MatchKeypointsFeatures to match two sets of points with features.
By default, the class will generate histogram features.To generate reoriented histogram features, define REORIENT when compiling.To generate n-SIFT features, define SIFT FEATURE when compiling.See CMakeLists.txt for an example of how to rebuild the same program for each feature type using the preprocessor declarations.

Conclusion
We have provided a general implementation of the algorithms previously described [2] using ITK, and the example program shows how it can be leveraged to analyse the 3D and 4D (3D time series) images that can be loaded by ITK.The software can be applied to non medical n-dimensional images in general.Future work includes adapting the rotational invariance via reorientation to the n-SIFT feature, unifying the code to deal more elegantly with dimensionality and feature types, optimising the code for speed by taking advantage of more specialised ITK filters, and providing a data structure for reporting matches.

Table 1 :
Number of points correctly matched for the global histogram feature, the reoriented global histogram feature and the n-SIFT feature, with error at most 1.5 voxels

Table 2 :
Fraction of points correctly matched for the global histogram feature, the reoriented global histogram feature and the n-SIFT feature, with error at most 1.5 voxels