Groupwise registration of sequential images from multispectral imaging (MSI) of the retina and choroid

: Multispectral Imaging (MSI) produces a sequence of discrete spectral slices that penetrate di ﬀ erent light-absorbing species or chromophores and is a noninvasive technology useful for the early detection of various retinal, optic nerve and choroidal diseases. However, eye movement during the image acquisition process may introduce spatial misalignment between MSI images. This potentially causes trouble in the manual / automatic interpretation of MSI, but still remains an unresolved problem to this date. To deal with this MSI misalignment problem, we present a method on the groupwise registration of sequential images from MSI of the retina and choroid. The advantage of our algorithm is at least threefold: 1) simultaneous estimation of landmark correspondences and a parametric motion model via quadratic programming, 2) enforcement of temporal smoothness on the estimated motion, and 3) inclusion of a robust matching cost function. As validated in our experiments with a database of 22 MSI sequences, our algorithm outperforms two state-of-the-art registration techniques proposed originally in other domains. Our algorithm is potentially invaluable in ophthalmologists’ clinical practice regarding various eye diseases.


Introduction
Multispectral Imaging (MSI) captures image data at specific frequencies across the electromagnetic spectrum and is a non-contact, optical technology used in various applications, including but not limited to remote sensing, aerial survey, astronomy, dentistry, dermatology and histopathology [1][2][3]. An emerging and advanced application of MSI is its use to dissect and visualize the inner limiting membrane all the way to the choroid of the ocular by producing a series of spectral images, e.g., the Retinal Health Assessment (RHA, Annidis Health System Corp.) [4][5][6]. MSI allows the clinician to differentiate and follow a wide variety of complex conditions and disorders of the eye [4][5][6][7]. Compared with conventional retinal fundus photography, which is limited to parameters of the visible spectrum, MSI is more sensitive in detecting features that are reflected in longer range of the spectrum, e.g., deep retinal layers and the choroid. With the ability to complement the clinician's knowledge of retinal anatomy and alterations in physiology, MSI provides a novel way to examine the retina [4] and may help to diagnose retinal conditions earlier than conventional fundoscopy [5,8].
MSI of the ocular employs an extensive range of discrete monochromatic LED-sourced wavelengths (ranging from 550nm to 950nm), with each one capable of creating a discrete spectral slice by penetrating/fluorescing different light-absorbing species or chromophores throughout the retina and choroid, as shown in Fig. 1. Each spectral slice represents a cumulative en face spectral view of the fundus and the clinicians' diagnosis is commonly carried out by examining multiple slices. For example, early changes of retinal pigment epithelium (RPE) of RPE disruption are observed as normal structures in short wavelengths and a slight structural variation in pigment pattern in the red (620nm, 660nm, 690nm and 740nm) and infrared (760nm, 780nm, 810nm and 850nm) spectral slices [9]. Manual assessment of the variation of retinal/choroidal structures and functions remains the reference standard for MSI based diagnosis. However, automatic and quantitative analysis of MSI images is becoming more and more demanded because it can help to deal with the problems of the manual works, e.g., being tedious, labor-intensive, subjective and error-prone.
One main drawback of most MSI systems lies in the possible significant movement between frames in the spectral slice stack (as shown in Fig. 2), which is caused by the fact that the acquisition time of a complete set of spectral slices is often longer than the natural timescale of eye's saccadic movement [10]. Although there exist a lot of efforts to increase the image acquisition speed, current MSI technologies (e.g., the RHA) still require seconds for taking a complete set of spectral slices. The movements between spectral slices might introduce troubles into manual/automatic interpretation and evaluation of MSI. Theoretically speaking, the movements can be handled by registering the images, however, this is not a trivial task in practice due to three major hardships. First of all, the anatomical and pathological retinal features may appear significantly different between different spectral slices. This results from the fact that different wavelengths are used for imaging different anatomic components of the retina and choroid. For example, shorter wavelengths are used to reflect anterior retinal features while longer ones are employed for deeper retinal features and the choroid. Besides this, different tissues may have different light absorbtion and reflection properties. For example, blood vessels appear as having more well-defined features in spectral slices corresponding to shorter wavelengths than longer wavelengths. The second hardship in registering MSI spectral slices is attributed to the illumination inhomogeneity caused by the curvature of the retina. As a result, the incident light intensity at a given spatial location may change significantly between images because of eye movement, introducing difficulties in interpreting the image data [11]. The third hardship lies in the requirement of registering a sequence of MSI images instead of the common registration tasks for which a pair of images are concerned. Moreover, MSI image sequence is different from a generic image group investigated by most groupwise image registration algorithms in the fact that eye motion changes smoothly during MSI imaging process. There has been very little work on groupwise registration of MSI sequential images. Moreover, an examination of groupwise registration techniques designed for other images (e.g. MRI brain images [12] or aerial images [13] as to be summarized in Sec. 2) shows that they can't be readily applied to resolving the problem of groupwise MSI registration. State-of-the-art techniques of groupwise image registration are mostly proposed for aligning intra-modality images. For these algorithms, image acquisition properties are consistent across different images, being different from MSI images that are acquired using different LED-source wavelengths in order to penetrate different species/chromophores as mentioned above. There exist techniques for jointly registering inter-modality images, however, most of them assume that motions obtained from different pairs of images are irrelevant. This simple assumption prevents these techniques from dealing with the smooth change of eye motion captured during MSI imaging process.
The goal of this work is to provide a tool for achieving a groupwise registration of sequential MSI images of the retina and choroid. We present a novel landmark matching approach which is validated to be capable of dealing with the challenges in registering MSI images (as shown in Fig. 2). Our method is superior to state-of-the-art groupwise registration algorithms due to several major novelties of significance. First, our approach measures similarity between landmark points by using a robust cross correlation function. Second, it establishes dense correspondences between MSI images by simultaneously building connections between landmark points across different images and estimating a predefined motion model. Third, a constraint of temporal smoothness is enforced when estimating the motions across the image sequence, which complies better with the circumstances of retinal/choroidal MSI imaging in practice. Finally, our optimization is accomplished by resolving all unknowns using a combinatorial optimization technique -quadratic programming [14].
The remainder of this article is structured as follows. In Sec. 2, we present a review of the works related to groupwise image registration. In Sec. 3, we describe the details of our approach. Experimental results are provided in Sec. 4 and we conclude in Sec. 5.  Fig. 1, respectively. Bottom row, left to right: patches in the images created by overlaying MSI-550 and MSI-850 before and after applying our algorithm, respectively. Misalignment is eliminated in the right image in contrast to the left one.

Related Work
Image registration has been extensively involved in a wide variety of applications in fields of medical image analysis and computer vision. The corpus of relevant previous work is pretty rich and varied, which can be basically categorized as pixel/voxel-wise based techniques, landmark/feature based methods or their combinations [15][16][17]. They often bear a high degree of domain specificity and few papers can be found specifically for MSI applications. Most existing image registration techniques register images in a pair-wise manner, either aligning an image to another, or an image to a map. Groupwise registration operates on image sets instead, i.e. aligning a group of images simultaneously. Detailed review to all pairwise image registration techniques is beyond the scope of this paper and in this paper we only provide a brief review to methods of groupwise image registration.
Groupwise registration is an important tool in fields of both medical image analysis and computer vision. It can be used for establishing anatomical correspondences among subjects in a population in order for characterizing anatomical shape differences within/across populations [18,19]. It can also achieve building connections between similar points/regions across different images in order for representation, 3D modeling, synthesis, morphing, browsing of for example human faces [20,21].
Most of the existing groupwise registration techniques focus on aligning a collection of intramodality images. The well-known congealing framework and its extensions [12,22,23] aim to align a large number of binary images from a database of handwritten digits, 2D face images or 3D medical images by summing up the entropy of pixel/voxel stacks over the image. Congealing warps each image via a parametric transformation and takes advantage of sequential optimization to gradually lower the entropy of the intensity distribution of the image. It identifies one image as a template and align all other images to it with a pair-wise approach. However, it leads to an undesired introduction of bias with respect to the a priori chosen template. The point set based approach proposed in [24] attempts to create correspondences between groups of point sets by leveraging mean shape and its transformations to all training shapes. However, it requires the shape of objects of interests to be known in advance. The methods in [25,26] exploit sparse learning and focus on face alignment. The techniques in [19,27] assume a Markov property of the image and accumulate pairwise image matching in order for achieving a groupwise registration.
Groupwise registration of inter-modality images has recently attracted intensive research interests and various methods were proposed. For example, the method proposed in [13] focuses on aerial images and formulates group registration as a function of pairwise registration, in which image similarity is characterized as normalized cross-correlation of high-pass filtered images. The technique in [28] is designed for aligning multi-modal/multi-spectral natural images by using a robust matching cost function and a gradient descent optimization technique.
Our approach (to be detailed in Sec. 3) is distinguished from existing groupwise registration algorithms due to several of its characteristics, including a simultaneous estimation of landmark correspondences and motion model, an inclusion of temporal smoothness on the estimated motion, a robust matching cost function and an optimization with quadratic programming. These enable our approach to produce extremely accurate estimation of motion between MSI images.

Our approach
We believe that a successful technique for achieving an accurate group registration of sequential MSI images depends on several key factors: robustness to appearance variations between MSI images, superior optimization for obtaining more accurate estimations and ability to enforce temporal smoothness on the estimated motions. To target these, we developed a landmark matching based groupwise image registration algorithm. It exploits a robust matching cost function and a motion model designed specifically for eye motion. Moreover, it simultaneously optimize landmark correspondences and motion model parameters by leveraging a quadratic programming technique. Besides these, it achieves groupwise image registration by combining a series of pairwise image registrations, for which motion differences between temporally neighboring pairs are penalized.
In the rest of this section, we start with a description of our overall formulation, followed with an elaboration on the landmark detection, robust matching cost, motion model and optimization strategy, respectively.

Problem formulation
Given a collection of MSI images {I 1 , I 2 , · · · , I N }, we aim to find spatial correspondences jointly over the entire image set. We carry out the task by establishing connections between landmark points across pair of images {(i, j), 1 ≤ i ≤ N, 1 ≤ j ≤ N } while enforcing a parametric model (denoted by its parameters Θ i , j ) which represents the motion between I i and I j . Our groupwise registration algorithm is then formulated as an optimization problem which is cast as minimizing the total matching cost while obliging a parametric motion model In Eq. (1), C i , j and E i , j denote the cost matrix and the correspondence matrix for matching landmark points of I i to the ones of I j , respectively. For C i , j and E i , j , rows correspond to the landmark points of I i while columns are associated with the ones of I j . E i , j is binary, and for which, each row contains exactly one element, meaning that I i 's point specified by the row number of E i , j matches I j 's point specified by the column number of E i , j . These can be stated alternatively as the following two constraints: where N i and N j denote the number of landmark points of I i and I j , respectively, and 1 is a vertical vector of ones. In Eq. (1), tr () means the trace of a matrix and tr (C T i , j E i , j ) amounts to calculate the total cost when matching I i to I j .
In Eq. (2), the spatial transformation from I i to I j is expressed as a motion model. X j is a N j × 2 matrix formulated by the coordinates of all N j landmark points of I j , with each row composed by the coordinates [x, y] of the corresponding landmark point. χ i is a matrix created using coordinates of the landmark points of I i , for which each row corresponds to a landmark point and consists of components built with exponentiation and multiplication of this point's coordinate x and y. Θ i , j is a matrix containing parameters of the motion model, containing two columns for x and y coordinates, respectively. The specific expressions of all components in χ i , the number of columns of χ i and size of Θ i , j are determined by the motion model (to be described in Sec. 3.3) we will choose.
Unknowns in our groupwise registration algorithm as defined in Eqs.
In order to resolve these unknowns, we need to specify the landmark detection technique, the matching cost, an appropriate motion model for MSI of the retina and choroid and an optimization technique for computing the unknown parameters, which will be detailed in the follows.

Landmark detection and matching cost
The landmark points used for matching MSI images in our groupwise registration algorithm are specified as the key locations of SIFT [29] due to their prestigious benefit of being stable.
These key locations are defined as maxima and minima of the result of difference of Gaussian function applied to different image scales, for which low contrast candidate points and edge response points are discarded in a postprocessing.
Determining the matching cost between landmark points in different images is vital in establishing correspondences but not a trivial due to the inconsistency of intensity, gradient or even structure across MSI images. Measures being widely used in state-of-the-art multi-modal image registration techniques include mutual information [30], cross correlation of image gradient [31], cross correlation of Laplacian energy map [32] and normalized cross correlation of image color and image gradient [28]. Inspired by the work in [28], we employ a robust cross correlation. Taking the kth point of I i being matched to the lth point of I j as an example, our matching cost is defined as where I i (k) and I i (k) means the patch (15 × 15 window in our experiments) centered at landmark point k in I i and its gradient space I i , respectively, and I j (l) and I j (l) have similar definitions. γ(·, ·) denotes the normalized cross correlation [33]. ρ() represents a robust function and we define it as the Geman-McClure function where a is a parameter in charge of adjusting the stringency of rejecting outliers and can be determined by the given image as explained in [34]. In Eq. (1), w ∈ [0, 1] adjusts the contributions of the two terms defined respectively on image intensity and image gradient and its value can be empirically specified (e.g. w = 0.5 as used in all our experiments).

Motion model
We choose the 12-parameter inter-image transformation model in [35] as our parametric motion model for representing the spatial relation between MSI images. It was derived by modeling the retina as a rigid quadratic surface and has been validated to outperform other popular motion models (e.g. rigid and affine) for retinal image analysis. With this model, the kth row of χ i in Eq.
(2) corresponds to the kth landmark point of I i (with coordinate [x ik y ik ]) and is written as a horizontal vector where χ i ( j, :) means the jth row of χ i . With the above definition of χ i , Θ i , j in Eq. (2) is a 6 × 2 matrix for which the two columns correspond to motion parameters regarding X j 's coordinates x and y, respectively.

Optimization
In order to resolve the unknowns {E i , j } and {Θ i , j } in our groupwise registration scheme, We need to optimize Eq. (1) based on the constraints in Eqs.
(2)-(4). However, this optimization problem is very difficult because of at least three challenges. The first one lies in the requirement of binary value on {E i , j }, which makes the optimization problem to be computationally intractable. The second one results from how to enforce motion's temporal smoothness. The final one is attributed to the simultaneous optimization of {E i , j } and {Θ i , j }. We will below provide our solutions to these challenges.

Penalizing to-centroid spatial deviation
As shown in our previous work [36,37], the hardship caused by the binary matrix {E i , j } can be circumvented with a minimization of the so called "to-centroid spatial deviation" if we relax the requirement in a fashion that the matrix's elements can take a continuous value within [0 1]. This relaxation can be mathematically expressed as the combination of Eq. (4) with the following equation This relaxation allows more than one landmark points in I j matched to each of the point in I i and is equivalent to the widely used "soft-assignment" strategy [38]. However, it may cause ambiguities in the matching process because of the non-sparse correspondence matrix it may produce. To deal with this, we combine the strategy of minimizing the to-centroid spatial deviation which measures the spatial concentration of the matched landmark points. We firs defineX for which each row ofX i , j represents the coordinates of the centroid of the matched points in I j to the corresponding point in I i . The to-centroid spatial deviation matrix is then written as where the size of D i , j is N j × N i guaranteed by the length of 1. In Eq. (10),X i , j (:, 1) and X i , j (:, 2) represent the first and second column ofX i , j , respectively. X j (:, 1) and X j (:, 2) are defined similarly.
Penalizing the to-centroid spatial deviation when aligning I j to I i amounts to minimizing the summation over all deviations, which is expressed as Regarding our groupwise registration task, it is written as Eq. (1) in our optimization problem becomes where λ balances the contributions of the two terms.
For the proposed optimization scheme as shown in Eqs. (13), (8), (4) and (2), image alignment is carried out within each pair of MSI images, and in this process, different pairs are processed independently.

Enforcing temporal smoothness on motion
As described in Sec. 1, MSI generates a set of spectral slices sequentially within a limited time period of image acquisition. Therefore, eye motion changes smoothly during the process of image acquisition. In other words, we should assume that eye motions are similar between temporally neighboring slices. This is in contrast to the unpredictable relations between motions in generic groupwise image registration problems, e.g. brain image registration [12] for which images are maybe acquired at different patient visits.
To enforce this temporal consistency of motion, we penalize Θs of the temporally neighboring pairs and obtain the following minimization function

Formulating optimization as quadratic programming
As we discovered, the minimization in Eq. (14) constrained by the functions in Eqs. (8), (4) and (2) can be resolved via a quadratic programming (QP) [14]. We first vectorize the matrices E, C and D and denote the resulted vertical vectors as e, c, and d, respectively. We also write θ = Θ(:, 1) Θ(:, 2) . We then reformulate Eq. (4) as where P e i , j is a binary matrix in size of Similarly, we can take advantage of X j (:, 2) to form Q 2 i , j and we have We also re-write Eq. (9) by replacing X j in Eq. (2) withX i , j as Combining Eqs. (17) and (18), we have Our optimization problem is finally expressed as the following QP For Eq. (20), and −e i , j ≤ 0.
In this quadratic programming formulation, e i , j and θ i , j are unknowns which are solved simultaneously.
2. Updating the to-centroid deviation matrix using Eq. (10) with current estimation of the correspondence matrix.
5. Go to step 2 until the stopping criterion is satisfied.
Our algorithm is unique in group-wisely registering sequential MSI images because of at least six of its properties. First, correspondences of landmark points and the pre-defined transformation model are estimated simultaneously in step 1 and step 4. Second, the transformation model can help to guide the estimation of correspondences and resist noise/outliers better. Third, the motions estimated by our algorithm are guaranteed to change smoothly. Fourth, gradual increase of λ in step 4 approaches a binary correspondence matrix with a less ambiguity in the assignment. Fifth, step 1 can offer a very nice initialization without any manual intervention. Finally, step 1 and step 4 can both result in an optimal solution based on the QP technique.
In our groupwise registration scheme, the entire set of images are assessed via a series of pairwise registrations which are connected by the constraints on motion's temporal smoothness. With this strategy, information from pair-wise registration is propagated globally across the entire set of images being registered, resulting in an integration of available information in a meaningful manner which leads to a globally good solution by virtue of local constraints [21].

Data
Our data set for validating the proposed algorithm comprises 22 sequences of MSI images acquired by using an Annidis RHA TM instrument (Annidis Health Systems Corp., Ottawa, Canada). These images were of oculus dexter (OD) and oculus sinister (OS) from 8 healthy subjects and 3 patients which were diagnosed with hypertensive retinopathy. These images are grayscale in a bit depth of 16, provided in format of dicom. All images are in size of 2048 × 2048. Each sequence has at least 8 images captured with wavelengths of amber, green, infrared, red and yellow. An ophthalmologist manually picked 15 points for each sequence and marked them in all the MSI images of the sequence by using MRIcron [39]. These manual marks are treated as the ground truth for validating our algorithm and their locations were compared with the ones transformed by using the estimated motion model in our experiments.

Implementation details
We stopped our algorithm after the steps 2-5 of our algorithm are repeated for 3 times, and from our experiments, we found that an accurate estimation can be resulted for most cases. We performed a set of experiments to empirically choose an optimal value for the parameter τ in Eq. (20). As a postprocessing, each floating-landmark is finally assigned to the referencelandmark with the largest correspondence likelihood value. We used the the open-source Computational Geometry Algorithms Library (CGAL) (downloaded from http://www.cgal.org/) for solving the QPs. We used a Dell PC with 2.39 GHz Intel Core 2 CPU and found that the optimization process can be finished within 19 seconds to group-wisely match 10 MSI images, for each of which about 200 landmark points are detected.

Evaluation
In our experiments, for each of the manually marked points and each pair of MSI images in a sequence, we computed points' distance (in pixels) to their location transformed by the estimated motion model. The final error measurement was treated as the the averaged distance value over all manually marked points and all image pairs.
We first ran our algorithm on our data set by taking advantage of various cost functions including sum of squared difference (SSD), mutual information (MI) and our proposed robust cross correlation (RCC). At the same time, we changed τ's value in Eq. (20). From the results shown in Fig. 3, we can see that our cost function outperforms SSD and MI obviously. Moreover, our algorithm performs best when τ = 0.2, which will be used in our following experiments. Comparing the best resolutions at τ = 0.2 with the ones at τ = 0, we can also see that the enforcement of temporal smoothness on the estimated motions enables our algorithm to perform better than without this constraint.
We then evaluated our algorithm by comparing its performance with state-of-the-art image registration algorithms. We chose the EM like technique of Chui & Rangarajan [40], which performs as repeating the estimations of correspondences and motion model in contrast to our simultaneous estimation strategy. In order for comparison, we only employed the EM strategy for optimization and kept other components consistent with our algorithm, e.g. the landmark points, the matching cost and the motion model. We also chose the groupwise registration approach of Wells et al. [12] by incorporating the mutual information measurement [41] as the matching cost function. This technique resolves a pixel-wise registration by estimating an affine motion model followed by a free-form B-spline deformation model based on a gradient descent optimization strategy. For the B-spline, we set the distance between control points as 30 pixels. We compared the errors obtained from the healthy subjects, the patients and the complete data set, respectively.  From the results in Table 1, we can see that our algorithm generated accuracies superior significantly to the algorithms of Chui & Rangarajan and Wells et al. From this, we have validated several of our claims mentioned before. First, our strategy of simultaneous estimation on correspondence and motion model via quadratic programming is better than the EM technique which iteratively estimates these two unknowns. Second, our algorithm which is designed specifically for registration of MSI images outperforms generic image registration approaches (e.g the wellknown method of Well et al.) possibly because of the integration of our motion model, cost function, temporal smoothness constraint on motion and global optimization technique.
From Fig. 4 and Fig. 5, similar findings as mentioned above can be concluded as well. Moreover, we can see from

Conclusion and future work
In this paper, we provide a solution to an unresolved but important problem of aligning a sequence of images from multispectral imaging (MSI) of the retina and choroid in an automatic and joint fashion. We have four significant technical contributions . First, we leverage the quadratic programming (QP) to simultaneously estimate landmark correspondences and a predefined motion model, which is validated to help produce better accuracies due to the optimal optimization of QP and the mutual benefits of correspondences and motion model during the optimization process. Second, we enforce a temporal smoothness on the estimated motion,  [40], Wells et al. [12] and ours, respectively. resulting in a better consistency with the practical MSI acquisition. Third, our groupwise registration algorithm is able to achieve an optimization globally over the entire image set, which is accomplished by propagating information obtained from local pairwise image registration. Finally, the confounding variations caused by anatomical/pathological differences or illumination inhomogeneity in MSI images are handled by matching key points only and defining a robust matching cost function. Our algorithm is potentially invaluable in clinical practices of ophthalmologists and being able to extend to other multi-model/multi-spectral images.
Our future work would include not only further validations with more MSI images but also various clinical studies for examining various eye diseases by using our algoritm. In addition, we are interested in research on detection of landmark points specifically for retina in MSI images in order for better robustness and accuracy.   [40], Wells et al. [12] and ours, respectively.