Joint alignment of multispectral images via semidefinite programming

: In this paper, we introduce a novel feature-point-matching based framework for achieving an optimized joint-alignment of sequential images from multispectral imaging (MSI). It solves a low-rank and semideﬁnite matrix that stores all pairwise-image feature-mappings by minimizing the total amount of point-to-point matching cost via a convex optimization of a semideﬁnite programming formulation. This unique strategy takes a complete consideration of the information aggregated by all point-matching costs and enables the entire set of pairwise-image feature-mappings to be solved simultaneously and near-optimally. Our framework is capable of running in an automatic or interactive fashion, oﬀering an eﬀective tool for eliminating spatial misalignments introduced into sequential MSI images during the imaging process. Our experimental results obtained from a database of 28 sequences of MSI images of human eye demonstrate the superior performances of our approach to the state-of-the-art techniques. Our framework is potentially invaluable in a large variety of practical applications of MSI images.


Introduction
Multispectral imaging (MSI) refers to imaging systems that penetrate certain features of physical object within the field of view by using a number of spectral bands [1][2][3]. It has been used in various applications such as but not limited to airborne mapping, astronomical imaging, dentistry, dermatology, histopathology and ophthalmology [4]. Taking MSI in ocular imaging for example, the inner limiting membrane all the way to the choroid can be dissected and visualized by taking a sequence of spectral images via MSI systems like the Retinal Health Assessment (RHA, Annidis Health System Corp.) [5][6][7]. MSI provides a collection of noninvasive en face views of the retina and choroid, that collect and combine spectral information to highlight anatomic and metabolic signatures for diagnosing and managing diseases. Interpretation of MSI images based on human visual inspection remains the reference standard in MSI applications. However, computer based algorithms are attracting more and more research interests due to the ability to provide objective and automatic image assessments.
One grand challenge in analyzing MSI images comes from the image-to-image spatial misalignment which is caused by object movement during the imaging process. For example, we need an analysis of multiple MSI slices from different spectral bands to assess the retinal pigment epithelium (RPE) disruption because it is characterized by a slight structural variation of pigment pattern in red and infrared spectral slices in contrast to normal structures in shorter wavelengths [4]. However, it is not a trivial in practice due to the misalignment between these spectral slices mainly because of the fact that the acquisition time of eye MSI images is commonly longer than the natural time scale of eye's saccadic movement [8]. Similar to the research in other fields [9][10][11][12][13][14][15][16], this spatial misalignment may introduce troubles into not only automatic but also manual interpretation and quantification of the anatomic/metabolic variations between spectral slices.
There exist a variety of techniques for retinal image registration [17][18][19] in the field of eye and generic image registration [10] in a broader field. A detailed review of all these techniques is beyond the scope of this paper and we only provide a summary on current group image registration methods as they are closely related to the proposed technique in this paper. There are various group image registration approaches, including the well-known congealing framework and its extensions [20][21][22] which focus on aligning a number of binary images of handwritten digits, 2D face images or 3D medical images via a sequential optimization process to gradually reduce the entropy of image intensity's distribution. Other approaches try to improve alignment performances by taking advantage of shape matching [23], sparse learning [24,25], image's Markov property [14,26], pairwise registration based on normalized cross-correlation [27], robust matching based on gradient descent optimization [28], image contours [29] and joint alignment by enforcing temporal smoothness on motion [30].
Theoretically speaking, most of the existing image registration techniques can be used for estimating and eliminating the spatial misaligment between MSI images of eye with certain adaptions. However, they are limited according to our experiments because of several basic reasons. First, many of them rely on independent processes of matching pairwise images, in which the joint information of MSI images can't be taken into account. Second, the point matches may become unreliable when the predefined matching cost is corrupted by the anatomic/metabolic variations of retina across spectral slices. Third, not all feature-mappings of pairwise images are solvable considering the fact that different spectral slices (penetrating different light-absorbing species/chromophores) may be associated with different anatomical structures. Finally, it is not a trivial to accomplish a semi-automated alignment which is of great importance in practice especially for difficult images.
In this paper, we propose a novel framework for achieving a joint discrete alignment of sequential MSI images. It generates near-optimal feature matches between images by jointly aligning all image pairs via a convex optimization process. Given a collection of sequential MSI images, we extract a set of salient feature points from each of them. Then, the proposed framework concerns a recovery of a globally consistent feature matches by taking advantage of the aggregated information from the point-to-point matching cost of all image pairs. Specifically, we encode the feature mapping across each pair of images via a binary matrix and leverage a stacked matrix to store all these binary matrices. Inspired by a recent finding that enforcing global consistency (composite feature maps along cycles are identity maps) on the resulted maps is equivalent to imposing a low-rank constraint upon this stacked matrix [31], we formulate the joint alignment problem as a semidefinte program (SDP) and solve it via a convex optimization to obtain a semidefinte and low-rank stacked matrix.
The proposed approach is effective in dealing with the above challenges in MSI image registration due to several of its advantages over the existing image alignment techniques. First, pairwise image matching and joint refinement of these matches are solved all at once, resulting in a near-optimal solution. Second, expert-specified feature-point correspondences can be seamlessly incorporated into the framework without any change of the optimization technique, which allows human expert to use his knowledge to guide matching in difficult cases. Third, our approach is robust to not only the corruptions in point matches but also the incompleteness of pairwise maps. Experiments of this paper are carried out by using a database consisting of 28 MSI image sequences of human eye and the results not only demonstrate our approach's superior performances to state-of-the-art but also confirm its practical applicability.

Problem formulation
Given a collection of MSI images {I 1 , I 2 , · · · , I N }, we extract m i feature points from image I i and then obtain N feature sets denoted by {S 1 , S 2 , · · · , S N }. For any two feature point sets S i and S j , we define an attributed graph G i j = S i ∪ S j , E i j , C i j where any k ∈ S i and l ∈ S j correspond to an edge e = kl ∈ E i j = S i × S j and are attributed to a cost c ∈ C i j ⊂ R m i ×m j . For S i and S j , we define a partial map φ i j ⊂ S i × S j to encode point-to-point maps between these two sets of feature points, with which each element of S i is paired with at most one element of S j and vice versa. The ultimate goal of this paper is to propose a tractable algorithm for computing optimal pairwise partial maps Φ = φ i j |1 ≤ i, j ≤ N by taking advantage of the aggregated information contained in all attributed graphs G = G i j |1 ≤ i, j ≤ N .

Matrix representation
We encode each partial map φ i j as a binary matrix X i j ∈ {0, 1} m i ×m j in which the elements associated with φ i j take value 1 and the left are 0. When φ i j is a one-to-one map, X i j satisfies the following constraints where 1 represents a vertical vector for which all elements are ones. We leverage an M × M block matrix X ∈ {0, 1} M×M to store the entire collection of m i denoting the total number of features points extracted from the entire collection of MSI images: where the diagonal blocks are identity matrices representing self-maps. Obviously, X i j = X T ji when φ i j and φ ji are one-to-one maps and X in Eq. (2) is hence a symmetric matrix. In addition, when Φ is cycle-consistent (composite maps along cycles, e.g. φ ki • φ jk • φ i j for a 3-circle, are identity maps), X is low-rank and positive semidefinite (denoted by X 0), as detailed in [31]. Similarly, we build a M × M block matrix C by stacking all cost matrices C i j |1 ≤ i, j ≤ N as shown below

Semidefinte programming objective function
Joint MSI image matching can be treated as searching combinatorially maps φ i j |1 ≤ i, j ≤ N by minimizing the total matching cost computed by summing the edge cost of all matched points, which amounts to a minimization of the following objective function with the following constraints X 0 and together with the condition in Eq. (1). In Eq. (4), Tr[] denotes trace of matrix. In difficult cases, expert-specified feature correspondences between MSI images are required to be incorporated into the algorithm for producing accurate results. In this way, human's knowledge can be exploited to guide the matching process. Suppose the manually-provided feature correspondences are associated with a set of elements in X, denoted by their rows and columns: where X(u, v) means the element of X as specified by (u, v).

Optimization via convex relaxation
Eq. (4) with the constraints in Eqs. (1) and (5)-(8) formulates a standard form of semidefinite programming (SDP). A direct optimization of Eq. (4) is NP-hard because the variables contained in X are binary. To deal with this, we relax these variables to take real values in [0 1]. Then, Eq. (4) becomes a convex optimization problem which can offer a global optimum within polynomial time. We employ the alternating direction of multiplier method (ADMM) algorithm [32] which solves a convex optimization problem by breaking it into smaller easier-to-handle pieces.
The solutions of ADMM are continuous values within [0 1] and we round them by using the simple greedy algorithm in [31]. The basic idea is to resolve a binary version of X by minimizing its differences with the continuous version via a linear programming.

Matching cost
Feature points used for matching MSI images are specified as the SIFT [33] key locations due to its prestige of being stable. SIFT detects maxima and minima of the image convoluted with a Gaussian at different image scale followed by a postprocessing of discarding low contrast candidate points and edge response points. A robust cross correlation is used to measure the matching cost between two points [28,30], which is defined as below when the kth point of I i is matched to the lth point of I j : where w ∈ [0, 1] adjusts the contributions of the two terms, I i (k) and I i (k) represent a square patch (e.g. 15 × 15) surrounding point k in I i and its gradient space I i , respectively. In Eq. (9), I j (l) and I j (l) have a similar meaning with I i (k) and I i (k), respectively, but are defined on point l in I j . γ(·, ·) in Eq. (9) is the normalized cross correlation [34]. The robust function ρ() is defined as the Geman-McClure function where a is a parameter which can be determined automatically from the given image and is used for adjusting the stringency in rejecting outliers [35].

Data
We evaluate our algorithm with a validation database comprising 28 MSI image sequences acquired by using an Annidis RHA TM instrument (Annidis Health Systems Corp., Ottawa, Canada). These images are of oculus dexter (OD) and oculus sinister (OS) from 9 patients diagnosed with hypertensive retinopathy and 8 healthy subjects. They are provided in the format of dicom with a bit depth of 16 and in size of 2048 × 2048. Each sequence bears at least 8 images captured with wavelengths of amber, green, infrared, red and yellow, as shown in an exampling sequence in Fig. 1. An ophthalmologist manually pick 15 points for each sequence and marked them in all the MSI images of this sequence by using MRIcron [36]. These manual marks are treated as the ground truth for validating our algorithm.

Evaluation
We evaluate several unique characteristics of our framework, including the power of joint alignment in contrast to the pairwise fashion, its noise resistance, the benefits from the introduced interactive mechanism and its superiority of performances to the state-of-the-art techniques, respectively. In our experiments, we set empirically the only parameter w (in Eq. (9)) of our framework as w = 0.8.

Power of joint alignment
To validate the power of joint alignment in our approach, we exploit the performance curves used in [31] for assessing the matching errors. Specifically, we calculate for a manually-marked point in a MSI image the distance between its matched point in the other image and the ground-truth correspondence. We treat a match as correct if this distance divided by an approximated value (1035 pixels in our experiments) of the radius of retina is below a threshold. Then, we plot a curve (analogous to the precision-recall curve) showing the ratio (a value in [0 1]) of the manually-marked points with correct matches over the threshold values. For points bearing no corresponded point in the other image, interpolation is conducted to estimate the corresponding location. A curve closer to the upper-left corner indicates that the algorithm performs better. In our experiment, we mix the manually specified marks with the detected SIFT key locations and use them by computing their matching cost as defined in Sec. 2.5.
For each image sequence, we choose randomly a portion (with a percentage 0%, 25%, 50%, 75% and 100%, respectively) of the images and process them with our joint alignment scheme. For the matches of each of the left images to other images, we exploit pairwise alignment which can be carried out by setting N = 2 in our framework as explained in Sec. 2. For the percentages of 25%, 50% and 75%, the random portion choosing process is repeated for 10 times and the resulted ratios are averaged. Final results for each test are computed by averaging over correct-correspondence ratios of all image pairs. As shown in Fig. 2, we can see that our algorithm produces improved accuracies when more images are aligned in a joint way. It means that more images should be included in joint alignment when using the proposed approach.

Noise resistance and benefits from manual matches
To validate the noise resistance property of our approach and the benefit of providing manual matches, we test our approach only with the feature points marked manually by the ophthalmologist. We add Gaussian noise with zero mean and various variances (0, 0001, 0.05, 0.1, 0.15 and 0.2, respectively) in the matching costs computed for these points. Also, we randomly choose different percentages (0%, 25%, 50%, 75% and 95%, respectively) of the pairs of manually-matched points and fix them (via Eq. (8)) when resolving the joint alignment by our approach. The randomly choosing process is repeated for 10 times and the ratios of correct matches are averaged as the final results. The threshold of matching distance is set to 0.03 in our experiments.
We have two findings from the results in Fig. 3. First, the algorithm performs better when  more point correspondences are added into our algorithm as a constraint known a priori. For example, the average improvement over all noise levels of the case using 75% of manual correspondences compared with no manual correspondences is 26.2% for all images. We also find that this improvement is even larger (with a value of 37.4%) for the images from patients. They demonstrate the benefit of manual correspondences in improving matching performances especially for difficult images (e.g. images from patients). Second, our algorithm deteriorates with corruptions added in the matching cost. However, we find that more than half of the features points are still matched correctly even when heavy noise (with a Gaussian variance of 0.2) is added and no manually-set correspondences are provided.

Comparisons with state-of-the-art techniques
We run our approach in both a pairwise and joint alignment fashion and compare the performance with two state-of-the-art techniques in sense of matching sequential MSI images in a fullyautomatic mode, including the quadratic programming based group registration in [30] and the pairwise robust matching in [28]. In more detail, the automatically-detected SIFT key locations are used for matching the images and the manually-specified marks are employed for validating the matching accuracy. We estimate the corresponding location in one image of a manual key location in the other image by an interpolation process which is carried out using the matched SIFT key locations. Correct-match ratios are then computed over all manual marks (with a matching distance threshold of 0.03 as explained above).
From the values of correct-match ratio on the healthy subjects and patients in Table 1 and from the displays of matching results in Fig. 4, we have at least two findings. First, joint alignment outperforms pairwise matching and at the same time the joint alignment function of the proposed SDP based scheme is superior to the one of the quadratic programming (QP) [30]. As a major reason, the proposed approach exploits a binary matrix representation (as explained in Sec. 2.2) to resolve all point correspondences jointly and near-optimally in contrast to the temporal smoothness constraint applied to the resulted matches by QP [30]. Second, more accurate matches are obtained on healthy subjects than patients for all algorithms probably because of the affects of image deterioration caused by retinal degeneration.  Table 1. Ratios of correct matches (with a distance threshold 0.03) of our approach in a pairwise way ("SDP-Pairwise"), our approach of joint alignment ("SDP-Joint"), the quadratic programming based group registration in [30] ("QP-Group") and the pairwise robust matching in [28] ("Robust"), performed on the healthy subjects and patients, respectively.
We use a Dell PC with 2.39 GHz Intel Core 2 CPU and the optimization process can be finished within 120 seconds when matching simultaneously 10 MSI images for each of which about 420 landmark points are detected. In our experiments, the stoppage criterion of ADMM is set as the condition that the Frobenius norm of the difference between X's estimations in neighboring iterations exceeds 10 −4 , which costs about 350-500 iterations for a typical MSI sequence.

Discussion and future work
Misalignment of sequential images from multispectral imaging (MSI) needs to be eliminated in order for manual/automatic assessment of them in practical application of MSI. To accomplish this, image alignment (registration/matching) plays an important role by providing spatial Fig. 4. Comparisons of matched feature points on the images in Fig. 1 by our algorithm in a pairwise-matching fashion (upper-left), our algorithm in a joint alignment way (upper-right), the technique in [30] (lower-left) and the one in [28] (lower-right). From top to down of each panel, a local area of the "Amber" (MSI=590), "Green" (MSI=550) and "Red" (MSI=620) images is shown, respectively. correspondences between images. A naive approach is to compute pairwise matching across all image pairs in isolation, i.e. computing feature point correspondences between a pair of images independently from the operations on other image pairs. Many off-the-shelf algorithms can accomplish this with minor adaptions to MSI images. However, a joint alignment is capable of generating better pairwise matches by resolving all pairwise matches simultaneously, as shown by our previous work which is carried out by enforcing spatial smoothness on transformations fields between temporally neighboring image pairs [30].
In this paper, we have presented a novel approach for joint alignment of sequential MSI images by formulating the problem as a convex optimization framework. We encode pairwise image matching map via a binary matrix and store the binary matrices of all image pairs in a stacked matrix. As shown by recent work in [31], enforcing consistency on the resulted pairwise matching maps is equivalent to imposing low-rankness upon this stacked matrix. By leveraging this important finding, we express joint alignment of sequential images as a semidefinite programming (SDP) and solve a low-rank semidefinite matrix via convex optimization together with two additional processes of relaxation and rounding. This new approach recovers globally consistent feature matches by taking advantage of the aggregated information from the point-to-point matching cost. The power of our approach in contrast to state-of-the-art (e.g. our previous work in [30]) can be reflected by its four characteristics. First, pairwise matching and joint refinement of pairwise matches are solved all at once by minimizing the total feature point matching cost and at the same time exploiting the mutual relations between pairwise maps. Second, expert-specified feature point correspondences can be easily incorporated in our framework as additional equal constraints of the SDP. This allows an interactive aligning function while keeping the complete optimization process unchanged. Third, our approach performs better in resisting both the corruptions in pairwise matches and the incompleteness of pairwise maps. Finally, our main part of our algorithm is parameter free (the only parameter in our framework is w in Eq. (9)) and computationally feasible.
We have validated the proposed approach with a database of 28 sequences of MSI images of human eyes with and without retinal degenerations. Experimental results show our approach's advantages over pairwise alignments and the state-of-the-art techniques in sense of accuracy and robustness to both map's corruptions and incompleteness.
Our approach can run in both a fully-automatic way and an interactive fashion, which is invaluable in ophthalmologist's clinical practice. Fully-automatic algorithm is extremely useful in ophthalmologist's clinical practice because it can reduce labor cost and generate objective results. However, its performance may deteriorate significantly in practice especially for various difficult cases where the automatically-detected feature points become unstable. In this situation, manual intervention becomes necessary and our approach provides an easy solution for that.
In addition to MSI image alignment, our joint alignment framework can find applications in other domains, e.g. aligning MSI images used in other fields such as dentistry, astronomy and dermatology (as explained in Sec. 1), or conducting object recognition, segmentation [37], image retrieval and structure from motion [38].
We have several works to address in future. First, in this paper, we assume that each feature point of an image with fewer points detected can be matched to one point in the other image. However, it doesn't always hold in practice especially when certain contents in one image are missing in the other. We would deal with this in our future work. Second, we expect X in our objective function to be of low rank, however, we don't include an explicit low rank constraint in the optimization process. Although it is proved that the enforcement of the positive-semidefinite property on X (by Eq. (6)) is equivalent to a constraint of low rank [31], we think it is very interesting to investigate the performance of an explicit constraint. Third, we will also validate our approach by using other format of images, e.g. RGB vs near-infrared images, flash vs no-flash images and depth vs color images. Finally, we would like to try other matching cost measurements [39][40][41] or optimization