A Multiobjective Approach to Homography Estimation

In several machine vision problems, a relevant issue is the estimation of homographies between two different perspectives that hold an extensive set of abnormal data. A method to find such estimation is the random sampling consensus (RANSAC); in this, the goal is to maximize the number of matching points given a permissible error (Pe), according to a candidate model. However, those objectives are in conflict: a low Pe value increases the accuracy of the model but degrades its generalization ability that refers to the number of matching points that tolerate noisy data, whereas a high Pe value improves the noise tolerance of the model but adversely drives the process to false detections. This work considers the estimation process as a multiobjective optimization problem that seeks to maximize the number of matching points whereas Pe is simultaneously minimized. In order to solve the multiobjective formulation, two different evolutionary algorithms have been explored: the Nondominated Sorting Genetic Algorithm II (NSGA-II) and the Nondominated Sorting Differential Evolution (NSDE). Results considering acknowledged quality measures among original and transformed images over a well-known image benchmark show superior performance of the proposal than Random Sample Consensus algorithm.


Introduction
A homography is a transformation that maps points of interest by considering movements as translation, rotation, skewing, scaling, and projection among image planes, all of them contained into a single, invertible matrix. In general terms, those displacements could be considered to be belonging to three cases: (1) an object moving in front of a static camera, (2) a static scene captured by a moving camera, and (3) multiple cameras from different viewpoints. In either case, those approximations simplify the utilization of image sequences to construct panoramic views [1][2][3], to increment resolution in low quality imagery [4][5][6], to remove camera movements when studying the motion of an object into a video [7], and to control the position of robots [8][9][10], among other uses [11][12][13].
Taking a set of experimental data as a base, in a modeling problem there exist two data types: those that can be adjusted to a model with a certain probability (also known as inliers) and those that are not related to the model (e.g., outliers). There are several algorithms specialized in solving this classification problem; one of such techniques is the Random Sample Consensus (RANSAC) [14].
In the algorithm, minimum subsets of experimental data are randomly taken, and a model is proposed and evaluated according to a permissible error (Pe), in order to determine how well the model adjusts to the data [15]. This process is repeated until a number of iterations are completed, and the model with the maximum number of inliers is taken.
Even considering that RANSAC is a robust and simple algorithm, it has some drawbacks [16][17][18], two of which are the high dependency between the number of matching points (model quality) and the permissible error. In this work, it is considered that those disadvantages belong to a multiobjective optimization problem. On the one hand, due to the random nature of RANSAC, achieving improvements 2 Computational Intelligence and Neuroscience in the quantity of inliers implies more iterations in order to discard unadjusted data to the proposed model. On the other hand, the number of matching points conflictingly depends on the permissible error (Pe). A low Pe value increases the accuracy of the model but degrades its generalization ability to tolerate noisy data (number of matching points). By contrast, a high Pe value improves the noise tolerance of the model but adversely drives the process to false detections. The main error source in the model estimation procedure arises from defining the Pe value with no consideration of the relationship between the dataset and the model.
In order to make the RANSAC algorithm more efficient, some improvements have been suggested; for instance, in the algorithm called MLESAC [19] it is considered that the inliers into the images will follow a Gaussian distribution whereas the outliers are considered as uniformly positioned; according to that, the voting process is achieved through maximizing the likelihood and the original RANSAC. The SIMFIT method [20] proposed the forecasting of the permissible error, through an iterative reestimation of that value, until the model is adjusted to the experimental data. Some other variants to the original RANSAC are the projectionpursuit method, the Two-Step Scale Estimator, and the CC-RANSAC [15,21,22], all of them focused on maximizing the number of inliers by making more searches into the data and therefore making the complete process more expensive, computationally speaking. In such sense, an algorithm that tries to reduce the computational cost is the one proposed in [17], where the maximization of the inliers is achieved by using a metaheuristic technique, called Harmony Search.
Nevertheless the mentioned improvements, the search strategy used in the mentioned articles (with exception of [17]), are more close to random walking, and therefore those approaches are computationally expensive. Moreover, in all the cases only one objective function is considered, usually related to the number of matching points, while the permissible error is left behind. In accordance with that, and in order to overcome the typical RANSAC problems, we propose to visualize the RANSAC operation as a multiobjective problem solved by an evolutionary algorithm. Under such point of view, at each iteration, new candidate solutions are built by using evolutionary operators taking into account the quality of the previously generated models, rather than purely random, reducing significantly the number of iterations. Likewise, new objective functions can be added to incorporate other elements that allow an accurate evaluation of the quality of a candidate model.
When an optimization problem involves more than one objective function, the procedure of finding one or more optimum solutions is known as multiobjective optimization (MO) [23]. Under MO, different solutions produce conflicting scenarios among the objectives [24]. Contrary to single objective optimization, in MO it is usually difficult to find one optimal solution. Instead, algorithms for optimizing multiobjective problems try to find a family of points known as the Pareto optimal set [25]. These points verify that there is no different feasible solution which strictly improves one component of the objective function vector without worsening at least one of the remaining ones. Evolutionary algorithms (EAs) are considered the most adequate methods for solving complex MO problems, due mainly they are many times capable of maintaining a good diversity [26], can extend to multiple populations [27], as well as can work with a variety of problems such as discrete ones [28]. Several variants of nondominated sorting as well as new methods have been proposed in recent years in order to solve problems related to feature selection [29], community detection [30], among other issues [24,31]; however, the Nondominated Sorting Genetic Algorithm II (NSGA-II) [32] and the Nondominated Sorting Differential Evolution (NSDE) [31] are some of the most representative.
In this paper, the estimation process is considered as a multiobjective problem where the number of matching points and the permissible error (Pe) are simultaneously optimized. In order to solve the multiobjective formulation, two different evolutionary algorithms have been explored: the Nondominated Sorting Genetic Algorithm II (NSGA-II) and the Nondominated Sorting Differential Evolution (NSDE).
Results considering acknowledged quality measures among original and transformed images over a well-known image benchmark show superior performance of the proposal than Random Sample Consensus algorithm on the problem being assessed, giving good results even with high outliers levels.
The remainder of the paper is organized as follows: Section 2 explains the problem of image homography considering multiple views. Section 3 introduces the fundamentals of the RANSAC method. Section 4 briefly explains the evolutionary approaches that are used in this paper in order to solve the multiobjective problem while Section 5 presents the proposed method. Section 6 exhibits the experimental set and its performance. Finally, Section 7 establishes some final conclusions.

Homography between Images
For the case where two images are taken of the same scene from different perspectives, a problem consists in finding a transformation that permits the matching among the pixels belonging to both images. This denominated the image matching problem. The search of a geometric transformation is achieved by utilizing corresponding points from image pairs [33,34], which enable forming feature vectors, also called image descriptors. Even when considering that such descriptors are not completely reliable, so they can produce erroneous results for the image matching, in this paper they are used to find the geometric relations between images by using the homography, which is explained in the next paragraphs.
Consider a set of points such that x = ( , , 1) and x = ( , , 1) are the positions with respect to a given image pair. By means of a plane, a homography H establishes a geometric relation between two images taken under different perspectives, as can be seen in Figure 1; this allows for a projection of the points from the plane to a pair of images, Computational Intelligence and Neuroscience 3 (1) for = 1 through (2) Randomly select a subsample from S i ⊂ G ⊆ U, and assemble a sample S (3) According to S , compose the candidate hypothesis ℎ (4) Calculate the degree of agreement (U, ℎ ) Pseudocode 1 Figure 1: Homography from a plane between two views. through x = Hx or x = H −1 x . Conducive to find the homography between an image pair, a set with four point matches is only required, to construct a linear system which must be solved [35]. Concerning evaluation of the quality of the candidate homography, it is necessary to calculate the distance among the point positions of the first image with respect to the second image; that distance is labeled as the Mismatch Error and is defined by as long as = (x , H −1 x ) and = (x , Hx ) are the respective errors from each image.
Consider the example shown in Figure 2, where five correspondences U = {(x 1 , x 1 ), . . . , (x 5 , x 5 )} are depicted; for the case of the points (x 3 , x 3 ), the error ( , ) is considerable, and therefore the quality of the candidate homography will be ranked with a low value.

Random Sampling Consensus (RANSAC) Algorithm
To find correspondences from images through a geometric transformation (homography) and therefore to increase the number of correct matches (inliers), the use of a robust approach, such as RANSAC, is necessary. Contrary to the inliers, outliers are conflicting points related to the candidate homography.
The idea behind the algorithm consists in discovering the best hypothesis ℎ from a set of hypotheses generated by the source data, usually corrupted with noise. The construction of candidate hypothesis ℎ is achieved by means of a sample S , with a minimum size , to model estimation. As in this paper = 4, then several S could be taken from the complete source data U, and, therefore, an exhaustive search would be computationally expensive.
In Pseudocode 1 the basic pseudocode of the RANSAC algorithms family is shown.
A subsequent step consists in finding the best candidate hypothesis ℎ from all the constructed and evaluated hypotheses, according to The degree of agreement is directly related to the number of inliers, and it is calculated by ( 2 (ℎ )) , = 1, . . . , , where Pe is a permissible error, is the number of elements contained in the source data U, and 2 (ℎ ) = 2 is the quadratic error produced by the th data considering the hypothesis ℎ ; in other words, it represents the error produced by the th correspondence.
In original RANSAC algorithm, the best hypothesis is the one with the maximum number of inliers. The point which produces an error 2 (ℎ ) lesser than a permissible error Pe is considered as an inlier of candidate hypothesis ℎ ; otherwise, it is considered as an outlier. The RANSAC technique has to search the entire source data U at least once in the worst case; by considering such situation, the algorithm is similar to random walking. Several strategies could improve that kind of search, like evolutionary algorithms (EAs) [36]. These techniques are capable of exploitation and exploration of the search space judiciously, by considering that new candidate solutions will contain information regarding the best spots from search space, visited through each generation.
This work proposes working the estimation process as a multiobjective problem, by simultaneously optimizing both the number of matching points and the permissible error (Pe). In order to solve the multiobjective formulation, two different evolutionary algorithms have been explored: the Nondominated Sorting Genetic Algorithm II (NSGA-II) and the Nondominated Sorting Differential Evolution (NSDE). With the formulation, the proposed method adopts a different sampling strategy than RANSAC to generate putative solutions. Under the new mechanism, at every iteration new candidate solutions are generated based on the quality of previously found solutions, avoiding random walks in the searching process, as in the case of RANSAC.

Multiobjective Evolutionary Algorithms
A MO problem can be stated as minimizing or maximizing the function [37] f where a solution x is a vector of decision variables x = ( 1 , 2 , . . . , ). The last set of constraints is called variable bounds, restricting each decision variable to take a value within a lower ( ) and upper ( ) bound and whose limits constitute a decision space . There are inequalities and equality constraints, both associated with the problem. In order to cover both minimization and maximization of objective functions, the operator ⊲ is used between two solutions u and k. Therefore, u ⊲ k denotes that solution u is better than solution k whereas u ⊴ k implies that solution u is better than or equal to solution k.
Different from single objective optimization, in the case of multiobjective optimization, it is usually difficult to find one optimal solution. Instead, algorithms for optimizing multiobjective problems attempt to find a group of points known as the Pareto optimal set [38]. These points verify that there is no other feasible solution which strictly improves one component of the objective function vector without worsening at least one of the remaining ones. A more formal definition of Pareto optimality or Pareto efficiency is the following.
Definition 1. If, given a solution u, there exists another solution k such that ∀ = 1, . . . , (u) ⊴ (k) and ∃ ∈ {1, . . . , } such that (u) ⊲ (k), then one will say that solution u dominates solution k (denoted by u ≺ k), and, obviously, solution k will never be sensibly selected as the solution to the problem. If (u) ⊴ (k), ∀ , one will say that solution u weakly dominates solution k and will be denoted by u ⪯ k.

Definition 2.
A solution u ∈ is considered to be part of the Pareto optimal set if and only if ∄ k ∈ such that k ≺ u.
Evolutionary algorithms (EAs) are considered the most adequate methods for solving complex MO problems and some have been proposed to face such problems, where the Nondominated Sorting Genetic Algorithm II (NSGA-II) and the Nondominated Sorting Differential Evolution (NSDE) are some of the most representative.

Nondominated Sorting Genetic Algorithm II (NSGA-II).
NSGA-II, introduced by Deb et al. [32], is one of the most applicable and employed algorithms based on GA to solve multiobjective optimization problems. NSGA-II starts randomly generating an initial ( = 0) parent population of size . During several consecutive generations ( = 1, . . . , maxIterations), the objective values of are evaluated. Then, the population is ranked based on the nondomination sorting procedure to create Pareto optimal fronts . Each individual of the population under evaluation obtains a rank equal to its nondomination level (1 is the best level, 2 is the next-best level, and so on), where the first Computational Intelligence and Neuroscience 5 front contains individuals with the best rank, the second front corresponds to the individuals with the second rank, and so on. In the next step, the crowding distance between members of each front is calculated by a linear distance criterion. As a binary tournament selection operator based on a crowdedcomparison operator is used, it is necessary to calculate both the rank and the crowding distance of each member in the population. Using this selection operator, two members are selected among the population. Then, the member with the larger crowding distance is selected if they share an equal rank. Otherwise, the member with the lower rank is chosen. Next, a new population of offspring with a size of is created using the random selection, the simulated binary crossover [19], and the polynomial mutation [20] operators to create a population consisting of the current and the new population of the size of 2 .

Simulated Binary
Crossover. This operator simulates the behavior of the single-point crossover on binary strings. Given as parents x (1, ) and x (2, ) , they generate the th component ( = 1, 2, . . . , ) of the offspring individuals as follows: where is a random number in [0, 1]. The parameter determines the separation between the offspring individuals in comparison to their parents.

Polynomial Mutation.
This operator employs a polynomial distribution in the following way: where ( ) and ( ) are the low and upper bounds, respectively, for the decision variable, whereas represents the distribution index.

Nondominated Sorting Differential Evolution (NSDE).
The NSDE [31] algorithm is an extension of the original differential evolution (DE) [39] method for solving multiobjective problems. NSDE works in a similar way to DE except in the selection operation which is modified in order to be coherent with the nondominated criterion.
The algorithm begins by initializing a population ofdimensional individuals and considers parameter values that are randomly distributed between the prespecified lower initial parameter bound ( ) and the upper initial parameter bound ( ) . In order to generate a trial individual (solution), the DE algorithm first mutates the current individual x , from the population by adding the scaled difference of two vectors from the current population: . . , } , (8) with k , being the mutant individual. Indexes 1 and 2 are randomly selected with the condition that they are different and have no relation to the individual index whatsoever (i.e., 1 ̸ = 2 ̸ = ). The mutation scale factor is a positive real number, typically less than one. In order to increase the diversity of the parameter element, the crossover operation is applied between the mutant individual k , and the original individuals x , . The result is the trial individual u , which is computed by considering an element to element operation as follows: where rand ∈ {1, 2, . . . , }. The subscripts and are the parameter and individual indexes, respectively. The crossover parameter (0.0 ≤ CR ≤ 1.0) controls the fraction of parameters where the mutant individual is contributing to the final trial individual. In addition, the trial individual always inherits the mutant individual parameter according to the randomly chosen index rand , assuring that the trial individual differs by at least one parameter from x , . Finally, a nondominated selection is used to build the Pareto optimal front. Thus, if the trial individual x , dominates the target individual x , , the trial individual x , is copied into the population for the next generation; otherwise, the target individual x , is copied: x , , otherwise. (10)

Individual Representation.
In the estimation process, each candidate homography H is calculated by using four different point correspondences. The candidate homography H is thus evaluated over the entire dataset U, dividing all elements from the dataset to inliers and outliers, according to a permissible error (Pe). In order to construct a candidate solution or individual s , four indexes, , , , and , are selected from the set {1, 2, . . . , } of correspondences. Therefore, the homography H across the two views is computed by solving the linear system produced from the set of four point matches (x , x ), (x , x ), (x , x ), and (x , x ). Additionally, the permissible error Pe that is associated with the individual s is incorporated as a decision variable. Thus, in the proposed algorithm, an individual or candidate solution s is coded as a vector of five decision variables (s = { 1 , 2 , 3 , 4 , 5 }) that is defined by In our approach, the candidate solution s presents the same functionality, that is, hypothesis ℎ in the original RANSAC algorithm.

Multiobjective Problem Formulation.
In the proposed approach, the estimation process is considered as a multiobjective problem where the number of matching points and the permissible error are simultaneously optimized. Under such circumstances, the multiobjective problem can be defined as follows: ( 2 (s )) , Subject to 1 ≤ ≤ , ( 2 (s )) = 0, 2 (s ) > Pe; 1, 2 (s ) ≤ Pe and where Max represents the maximal commensurable error produced by a candidate homography. Although Max could be any high value, a sufficiently small value significantly reduces the search of Pareto fronts. In this work, Max has been set to 25.

Computational Procedures.
In order to solve the multiobjective formulation, two different evolutionary algorithms have been explored: the Nondominated Sorting Genetic Algorithm II (NSGA-II) and the Nondominated Sorting Differential Evolution (NSDE). In this section, the computational procedure of both methods is described when they face the multiobjective problem described in (12). (2) Produce an offspring population from by using simulated binary crossover and polynomial mutation. (4) Perform a nondominated sorting to and identify different fronts: , = 1, 2, . . ., and so forth. (6) Perform +1 = +1 ∪ and increment ( = +1).
(9) For each objective function = 1, 2, . . . , , sort the set in worse order of . Therefore, I = sort( , >) contains the sorted elements of the objective function .
(10) For = 1, 2, . . . , , assign a large distance to the boundary elements of I ( I 1 = I = ∞). For all other elements = 2, 3, . . . , − 1, assign a distance calculated as follows: where I represent the element from the sorted set I . max and min symbolize the maximum and minimum value of .
(11) Select the ( − | +1 |) elements from whose distances are the longest and include them in +1 . (12) If the maximum number of iterations has been reached, the process is thus completed; otherwise, go back to step (2).
(13) The final population +1 contains the Pareto optimal set.

Experimental Results
This part of the paper deals with several experiments performed over a collection of real images. The results exhibit the performance of NSGA-II and NSDE solving the estimation problem as a multiobjective optimization task in comparison to RANSAC. In the experiments, two performance indexes are considered: the mean squared error (MSE) and the  Peak Signal to Noise Ratio (PSNR). Such indexes allow appropriately assessing the accuracy of the estimation.
The problem of homography estimation consists in finding a geometric transformation that maps points of a first view (x ) to a second view (x ), taken from different point of view. This projective transformation H relates corresponding points of the plane projected into the first and second views by x = Hx or x = H −1 x . In order to calculate the MSE and the PSNR, two different images are defined: the estimated image (EI) and the actual image (AI). The EI is produced by mapping the pixels from the first view in terms of the estimated homography H (EI = Hx ). On the other hand, the actual image (AI) corresponds to the second view image.
The mean squared error (MSE) evaluates the squared differences among the pixels of EI and AI. Considering that 1 × 2 represents the image dimensions, the MSE can be computed as follows: The Peak Signal to Noise Ratio (PSNR) is commonly used to measure the quality of reconstruction of an image that undergoes some process. The signal in this case is the original data (AI), and the noise is the error introduced by the transformation H (EI). When comparing images, PSNR is an approximation to human perception of approximation quality. A higher PSNR value generally indicates that the estimation is of higher quality. PSNR is mainly defined via the mean squared error (MSE). Given 1 × 2 image, the PSNR is defined as where MAX is the maximum possible pixel value contained in the image. The images used in the experiments are collected from [40] which contains several two-view images of different objects, considering a dimension of 640 × 480 pixels. Likewise, the set of images used in the experimental test is presented in Figure 3.
For the test, both algorithms, NSGA-II and NSDE, have been configured considering 200 individuals under 200 iterations. In order to conduct a fair comparison between RANSAC and multiobjective approaches, RANSAC has been operated during 40,000 iterations. Such number of calculations (200 × 200) corresponds to the maximum number of evaluations invested by NSGA-II and NSDE during their execution. Figure 4 shows the Pareto optimal set obtained by NSGA-II and NSDE during the estimation process, considering Figures 5(a) and 5(b) as the first and second views, respectively. In Figure 4, the best RANSAC estimations have been also included as a reference, only to validate the performance of the multiobjective approaches.
In order to illustrate the obtained results, Figure 5 shows the estimations produced by all methods in terms of their resulting estimated images. The single estimation generated by RANSAC is exhibited in Figures 5(c) and 5(d). In case of the multiobjective approaches, three solutions from the Pareto optimal set have been selected: the boundary solution for 1 , the boundary solution for 2 , and the median solution. Such solutions are presented in Figures 5(e)-5(f) and 5(g)-5(h), for NSGA-II and NSDE, respectively.  results only report the boundary solutions obtained for 1 and 2 . The best results at each experiment are highlighted in Table 1.
In order to evaluate the robustness of both algorithms, a set of outliers was added by selecting correspondence random points within the space limits. In the test, the fraction of outliers varies from 85% to 95%. Figure 6 shows the estimations produced by all methods in terms of their resulting estimated images, considering image pairs in Figures 3(k) and 3(l). Table 2 presents the performance results for RANSAC, NSGA-II, and NSDE in terms of the mean squared error (MSE) over the three pairs of images. The results exhibit the averaged outcomes obtained throughout 30 different executions.
From Table 2, it can be easily seen that as the number of outliers increases, the performance of each algorithm also decreases. However, the NSDE algorithm obtains the best performance in almost every case despite outliers rising above 95%.  The approach has been experimentally tested considering a set of benchmark experiments. The efficiency of the method has been evaluated in terms of the mean squared error (MSE) and the Peak Signal to Noise Ratio (PSNR) measurements. Experimental results that consider real images provide evidence on the remarkable performance of the proposed approach in comparison to the classical RANSAC.
The Wilcoxon signed rank test [41] is a nonparametric test used both to compare quantitatively some experimental data and also to determine whether there exists a meaningful difference among them. By applying the test to the data contained in Table 2, it was found that the algorithms NSGA-II and NSDE are substantially different with a 5% significance, so it can be considered that NSDE gives better results than NSGA-II when both algorithms are applied to the homography problem. In the same order of ideas, the same test was used to compare NSGA-II and RANSAC algorithms, causing the first algorithm to be better than the second.

Conclusions
In this work the use of two multiobjective evolutionary algorithms in conjunction with point correspondences is proposed to estimate homographies between image pairs. Under this approach, the estimation process is considered as a multiobjective problem with the number of matching points ( 1 ) and the permissible error ( 2 ) being simultaneously optimized. Under such circumstances, the approach has the capacity to find the best balance between both objectives.
A close inspection of the standard deviations from Table 1 reveals that NSGA-II maintains a big dispersion in its solutions. This aspect is mainly emphasized in the MSE index. Such an inconsistency is a consequence of the NSGA-II incapacity to produce similar solutions during its executions. On the contrary, NSDE produces better solutions than NSGA-II in terms of accuracy (MSE) and consistency. On the other hand, as a higher PSNR value indicates that the  estimation is of higher quality, results produced by the NSDE algorithm exhibit the best performance. In order to solve the multiobjective formulation, two different evolutionary algorithms have been explored: the Nondominated Sorting Genetic Algorithm II (NSGA-II) and the Nondominated Sorting Differential Evolution (NSDE).
After several tests, it was found that NSDE gives better results in solving the image matching problem presented, according to a known statistical test over a set of experimental results.