Next Article in Journal
Formal Software Architecture Rule Learning: A Comparative Investigation between Large Language Models and Inductive Techniques
Previous Article in Journal
Distributed Feature Selection for Power System Dynamic Security Region Based on Grid-Partition and Fuzzy-Rough Sets
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Error Analysis and Condition Estimation of the Pyramidal Form of the Lucas-Kanade Method in Optical Flow

Department Computer Science, The University of Sheffield, Regent Court, 211 Portobello, Sheffield S1 4DP, UK
Electronics 2024, 13(5), 812; https://doi.org/10.3390/electronics13050812
Submission received: 12 October 2023 / Revised: 4 February 2024 / Accepted: 15 February 2024 / Published: 20 February 2024
(This article belongs to the Section Computer Science & Engineering)

Abstract

:
Optical flow is the apparent motion of the brightness patterns in an image. The pyramidal form of the Lucas-Kanade ( LK ) method is frequently used for its computation but experiments have shown that the method has deficiencies. Problems arise because of numerical issues in the least squares ( LS ) problem min A x b 2 2 , A R m × 2 and m 2 , which must be solved many times. Numerical properties of the solution x 0 = A b = ( A T A ) 1 A T b of the LS problem are considered and it is shown that the property m 2 has implications for the error and stability of x 0 . In particular, it can be assumed that b has components that lie in the column space (range) R ( A ) of A, and the space that is orthogonal to R ( A ) , from which it follows that the upper bound of the condition number of x 0 is inversely proportional to cos θ , where θ is the angle between b and its component that lies in R ( A ) . It is shown that the maximum values of this condition number, other condition numbers and the errors in the solutions of the LS problems increase as the pyramid is descended from the top level (coarsest image) to the base (finest image), such that the optical flow computed at the base of the pyramid may be computationally unreliable. The extension of these results to the problem of total least squares is addressed by considering the stability of the optical flow vectors when there are errors in A and b. Examples of the computation of the optical flow demonstrate the theoretical results, and the implications of these results for extended forms of the LK method are discussed.

1. Introduction

Optical flow is the apparent motion of the brightness patterns in an image. This is different from the motion field, which is the projection onto two dimensions of motion in a three dimensional environment. Both these motions are usually termed optical flow even though they are not necessarily the same ([§2] [1]). There are several methods for the computation of optical flow and it is assumed in all the methods that it is locally smooth. The methods can be classified as global, for example, the Horn-Schunck method [2], or local, for example, the Lucas-Kanade ( LK ) method [3], and a review of these methods, and other methods, is in [4,5,6,7,8]. Local methods are more robust to noise but they do not yield a dense optical flow field, and global methods are more sensitive to noise but they yield a dense optical flow field. The computation of the optical flow has been the focus of much research, particularly in images that are corrupted by significant noise, or images in which two or more objects move, or images in which there are temporal changes in illumination [9].
Comparisons of the different methods for the computation of optical flow are in [1,10]. The conclusions from these comparisons are empirical and based on quantitative measures of the errors in the computed results, but numerical and linear algebraic considerations of the methods are not included in these comparisons. These issues are addressed in this paper by error analysis and condition estimation of the pyramidal form of the LK method. This method requires that many least squares ( LS ) problems be solved at each level of the pyramid and it is shown that, even in the absence of occlusions, motion discontinuities and other problematic phenomena, the maximum values of the errors and condition numbers of the solutions of the LS problems (the optical flow) increase as the pyramid is descended from the top level to the bottom level.
Kearney, Thompson and Boley [11] consider condition estimation and error analysis for the computation of the optical flow from two points with pixel coordinates p i = ( x i , y i ) and p j = ( x j , y j ) at time t, where p i is near p j such that a first order Taylor approximation is appropriate. This approximation yields
G ω = b ,
where
G = I x ( i ) I y ( i ) I x ( j ) I y ( j ) , ω = u v , b = I t ( i ) I t ( j ) ,
I x ( i ) = I ( i ) x , I y ( i ) = I ( i ) y , I t ( i ) = I ( i ) t ,
ω is the optical flow and I ( i ) = I ( i ) ( x , y , t ) is the intensity of the ith pixel. The work in this paper follows the analysis in [11] because issues of computational linear algebra, specifically condition estimation and error analysis, are addressed. There are, however, differences because an arbitrary number of points, rather than two points, is considered in this paper since the LK method yields an overdetermined set of equations, rather than the problem (1), for which the coefficient matrix is square and of order two. It is therefore necessary to compute x 0 , the optical flow,
x 0 = arg min x A x b 2 , A R m × n ,
where · = · 2 , m > n and rank A = n , for each window at each level of the pyramid. The optical flow problem requires that n = 2 but it is assumed in the theoretical analysis that n m is arbitrary.
The first and second columns of A contain, respectively, the intensity gradients with respect to x and y, and b contains the negative of the temporal intensity gradients, and errors are therefore present in A and b, such that it is necessary to solve the total least squares ( TLS ) problem ([§6.3] [12]),
x t l s , δ A t l s , δ b t l s = arg min x , δ A , δ b δ A δ b F s u b j e c t t o A + δ A x = b + δ b ,
where · F denotes the Frobenius norm. The method of TLS has been used for the computation of optical flow [13], but an estimate of its numerical stability, which is a measure of its sensitivity to noise, has not been established. This measure is called the condition number and this paper addresses this issue by considering the condition number of the computed optical flow. This condition number must be derived for the solution x t l s of the TLS problem (3), but it is shown that it is first necessary to consider the condition number of the solution x 0 of the LS problem (2).
The results of this paper are summarised:
  • It is shown that the maximum value of the error in the solution x 0 of (2) increases as the pyramid is descended from the top level to the lowest level.
  • The condition number κ ( A ) = A A of A, where A = A T A 1 A T , is a function of A only but the solution x 0 of the LS problem is a function of A and b. A refined condition number, called the effective condition number η ( A , b ) that is a function of A and b, is therefore developed.
  • An upper bound of η ( A , b ) is κ ( A ) cos θ , where θ is the angle between b and its component that lies in the column space (range) R ( A ) of A. It follows that even if A is well conditioned, the solution x 0 may be ill conditioned if θ is large.
  • The dimensions of the coefficient matrix A satisfy m n and it can therefore be assumed that a significant component of b lies in the space that is orthogonal to R ( A ) . It is shown that the error in x 0 is therefore large and that there is a simple relationship between this error and cos θ . It is concluded that the angle θ arises in the error and condition numbers of x 0 , and it is therefore an important measure to consider when assessing the computational reliability of the optical flow.
  • It is shown that the maximum values of the condition numbers and errors in the solutions of the LS problems (one LS problem for each window at each level of the pyramid) increase as the pyramid is descended from the coarsest image to the finest image. This shows that the advantage of the pyramidal implementation of the LK method - the ability to cope with large displacements between successive images - must be balanced against the numerical measures that quantify the robustness and reliability of the solutions of the many LS problems that are solved in the LK method.
  • The condition number κ ( A ) cos θ of x 0 is limited because it quantifies the effect on x 0 of errors in b, but errors in A are not considered. A condition number ρ ( A , b ) of x 0 that includes errors in A and b is developed because it is more appropriate for the LK method. This is a non-linear condition number, in contrast to the linear condition number κ ( A ) cos θ , and it is shown that the dominant term in the expression for ρ ( A , b ) is κ ( A ) η ( A , b ) .
The method of least median of squares ( LMedS ) is used in [14] to solve simultaneously the problems of temporal variation of illumination and motion discontinuities induced by the relative motion of two or more objects. The LMedS method requires the solution of the LS problem (2), and the pyramidal form of the LK method is also used in feature tracking [15]. More generally, the theory and results in this paper are applicable to all problems in which the LK method is used. There are several forms of the LK method and they differ in the expressions used for the calculation of the derivatives of the intensity. For example, one form of the LK method requires that A and b in (2) are premultiplied by a diagonal matrix, such that a larger weight is assigned to pixels in the centre of each window, and a smaller weight is assigned to pixels at its borders ([§4.3.2] [16]).
The LK method is described in Section 2 and the error in the solution x 0 of the LS problem is considered in Section 3. It is shown that there is a simple relationship between cos θ and the error, and an expression for cos θ is developed in Section 4. Linear and non-linear condition numbers that consider errors in b, and errors in A and b, respectively, of the optical flow are considered in Section 5. Examples that show the errors and condition numbers of the optical flow at each level of the pyramid are in Section 6. The paper is summarised in Section 7 and it is shown that the numerical problems with the solution of the LS problem that are addressed in this paper provide the mathematical motivation for the generalised form of the LK method that includes warping functions [17].

2. The Lucas-Kanade Method

The LK method is derived from the assumption of constant intensity I ( x , y , t ) as a pixel at position ( x , y ) at time t moves to position ( x + δ x , y + δ y ) at time t + δ t , such that
I ( x , y , t ) = I ( x + δ x , y + δ y , t + δ t ) ,
to first order. It therefore follows that
I x δ x + I y δ y + I t δ t = 0 ,
and thus if
u = x t a n d v = y t ,
then
I x u + I y v + I t = 0 ,
where
I x = I x , I y = I y a n d I t = I t .
Equation (5) is one equation in two unknowns (the components u and v of the optical flow), and it therefore possesses an infinite number of solutions. This problem of a non-unique solution is addressed by considering a square window in each image at times t and t + δ t , and if each window contains p points, then the application of (5) to each of these points yields,
I x ( i ) u + I y ( i ) v + I t ( i ) = 0 , i = 1 , , p ,
which can be combined into one equation,
A x = b , A R p × 2 , x R 2 a n d b R p .
Equation (6) is solved in the LS sense (2), which yields the solution,
x 0 = A b , A = A T A 1 A T ,
where x 0 = [ u v ] T is the optical flow. It follows from (6) that the motion in each window is a translation only, where u and v represent its horizontal and vertical components, respectively.
The solution x 0 assumes there are errors in b only, and that A does not contain errors. The qualifications that follow from this assumption must be addressed because A and b contain, respectively, the spatial and temporal derivatives of the intensity I ( x , y , t ) . The inclusion of errors in A and b requires that the TLS problem be considered ([§6.3] [12]), and the effect of perturbations in A and b on the optical flow x 0 is considered by the development of a non-linear condition number of x 0 in Section 5.2.
The computation of the translation x 0 between two images of a dynamic scene occurs in several applications, including remote sensing and medical imaging, and it has been shown that noise and the first order approximation (4) introduce bias in the estimate of x 0 [11,18,19,20]. Robinson and Milanfar [20] use prior knowledge of the image and the translation to design gradient-estimation filters but the bias due to noise is not considered, and thus the LK method yields poor results for images whose signal-to-noise ratio is less than about 25 dB. Pham et al. [19] reduce the bias by iteratively computing the translation and upsampling the matrices that store the components u and v of the optical flow, which is the pyramid procedure proposed by Lucas and Kanade [3], and refined by Baker and Matthews [17]. Pham et al. [19] report that a pyramid with three levels yields an optical flow field with a very small bias.
The first order approximation (4) requires that
δ x , δ y , δ t 1 ,
and if this condition is not satisfied, a pyramid is formed in which the images at the ith level are formed by passing the images at the ( i 1 ) th level through a low pass filter and then downsampling the filtered images. The coefficients of the low pass filter are [21,22],
1 16 1 4 6 4 1 1 16 1 4 6 4 1 ,
which is approximately equal to a Gaussian filter with standard deviation σ = 1 .
The pyramidal implementation is shown in Figure 1 for two images F and G that are taken at times t and t + δ t respectively. The optical flow is computed at the top of the pyramid, level 4 (the coarsest image) in Figure 1, which is the initial condition for the computation of the optical flow at level 3 of the pyramid. It follows from (8) that (5) yields good estimates of u and v if the translation between the images at times t and t + δ t is small. Only the increments between successive levels of the pyramid are therefore calculated, after warping the two images based on the flow field estimate from the coarse level. This operation yields two matrices, one for each component of the optical flow, and each matrix is upsampled to yield matrices of twice the number of rows and columns, and then passed through a modified form of the filter (9),
1 8 1 4 6 4 1 1 8 1 4 6 4 1 .
This process is continued to the base of the pyramid, level 1, and the optical flow at this level is the desired optical flow.
The difference in the scale factors for the filters for downsampling the images and upsampling the optical flow field arises because all the pixels in the images that are downsampled are included in the computation, but rows and columns of zeros are added to the matrices that store the components u and v before they are upsampled, such that a matrix of order M × N is increased to a matrix of order 2 M × 2 N . The computation of all the pixel values in these larger matrices cannot include the added rows and columns of zeros and thus only one quarter of the pixels in the matrices of order 2 M × 2 N are included in the computation.
The matrices of the components of the optical flow must be upsampled by the filter (10) and then multiplied by two in order to preserve the values of its components between successive levels of the pyramid. In particular, two adjacent pixels have unit separation before upsampling, but their separation increases to two after upsampling. The preservation of the values of the components of the optical flow in the upsampled images therefore requires that they be multiplied by two.
The discussion above shows that the pyramidal form of the LK method requires that the LS problem be solved many times at each level of the pyramid, and thus the error in its solution must be considered. This issue is addressed in Section 3 and it is shown in Section 4 that this error is closely related to the value of the angle θ between b and its component that lies in the column space R ( A ) of A. The computed optical flow must be numerically stable and it is therefore necessary to consider its numerical condition. This issue is addressed in Section 5, and linear and non-linear condition numbers are considered.

3. The Error in the Solution of the LS Problem

This section considers the error e in the solution of the LS problem (7),
e = b A x 0 b = ( I A A ) b b = ( I Σ Σ ) c c ,
where the singular value decomposition of A is U Σ V T , U and V are orthogonal matrices of orders m and n respectively,
Σ = Σ 1 0 R m × n , Σ 1 = d i a g σ 1 σ 2 σ n ,
the singular values σ i satisfy σ i σ i + 1 , i = 1 , , n 1 , and
c = U T b .
The matrix U is partitioned into two matrices U 1 and U 2 ,
U = U 1 U 2 , U 1 R m × n , U 2 R m × ( m n ) ,
and it follows from the orthogonal property of U that
U 1 T U 1 = I n , U 2 T U 2 = I m n , U 1 T U 2 = 0 n , m n , U 2 T U 1 = 0 m n , n ,
and
U 1 U 1 T + U 2 U 2 T = I m .
It follows from (12) that c can be written as
c = c ¯ 1 c ¯ 2 = U 1 T U 2 T b , c ¯ 1 R n , c ¯ 2 R m n ,
where c ¯ 1 and c ¯ 2 are the components of b that lie in, respectively, the spaces spanned by the columns of U 1 and U 2 . It follows from (11) that
e 2 = i = n + 1 m c i 2 i = 1 m c i 2 = c ¯ 2 2 c 2 = c ¯ 2 2 c ¯ 1 2 + c ¯ 2 2 = 1 cos 2 θ ,
where
cos θ = c ¯ 1 c ,
and thus the error in x 0 is a measure of the proportion of b that lies in the space that is orthogonal to R ( A ) , and a large value of θ leads to a large error in x 0 . The significance of θ is considered in Section 4 and it is shown that it allows a geometric interpretation of the conditions that define the minimum and maximum values of e.

4. The Angle between b and R ( A )

It is shown in this section that θ , where cos θ is defined in (16), is the angle between b and its component that lies in the column space R ( A ) of A. This interpretation follows from the decomposition of b into a component r 1 R ( A ) and a component r 2 R ( A ) ,
b = r 1 + r 2 .
Since r 1 R ( A ) and r 2 R ( A ) , there exist vectors t 1 and t 2 such that
r 1 = U 1 t 1 and r 2 = U 2 t 2 ,
because the columns of U 1 and U 2 form orthonormal bases for R ( A ) and R ( A ) , respectively, and thus
b = U 1 t 1 + U 2 t 2 .
It follows from (13) that premultiplication of this equation by U 1 T and then U 2 T yields
b = U 1 U 1 T b + U 2 U 2 T b ,
where the first and second terms on the right are, respectively, the orthogonal projections of b onto R ( A ) and R ( A ) . It follows from (13) that the second term is orthogonal to the first term, which confirms that r 1 is orthogonal to r 2 . Also, it follows from (14) that
U 1 U 1 T b 2 + U 2 U 2 T b 2 = b 2 ,
where
U 1 U 1 T b = U 1 T b = c ¯ 1 ,
and thus cos θ is equal to the ratio of the magnitude of the orthogonal projection of b onto R ( A ) to the magnitude of b. Equation (15) shows that there is a simple relationship between the error in x 0 , and the angle between b and its component that lies in R ( A ) . The limiting cases are:
  • b R ( A ) : It follows that cos θ = 1 and U 2 = 0 , and thus e = 0 .
  • b R ( A ) : It follows that cos θ = 0 and U 1 = 0 , and thus e = 1 .
The ratio cos θ provides a geometric interpretation of the error in the solution of the LS problem, and it is shown in Section 5 that cos θ also arises in the expressions for the condition numbers of x 0 .

5. The Numerical Condition of the Optical Flow

This section considers the numerical condition of the optical flow x 0 , the solution of the LS problem. The condition number of A, κ ( A ) = σ 1 σ 2 , where σ 1 > σ 2 , is frequently used as a measure of the stability of x 0 , but it requires qualification:
  • The condition number κ ( A ) is a function of A and it is independent of b, but x 0 = A b is a function of A and b. This independence of b follows because κ ( A ) is the upper bound of the ratio of the relative error in x 0 = A b to the relative error in b with respect to all vectors b R ( A ) and perturbations δ b R m . The vector b is specified in a given problem and thus a better measure of the stability of x 0 is an upper bound of the ratio Δ of the relative error in x 0 to the relative error in b with respect to all perturbations δ b , for the given vector b. The condition number κ ( A ) may therefore overestimate the true value of Δ by many orders of magnitude if b R ( A ) .
  • The property b R ( A ) is not, in general, satisfied in the LK method and thus a measure of the stability of x 0 with respect to perturbations in b that is appropriate when b R ( A ) must be developed.
Although these points show that κ ( A ) as a measure of the stability of x 0 has disadvantages, it has been used to assess the computational reliability of x 0 ([§II.A] [14]), ([§4.3.2] [16]), ([§2] [10]). These disadvantages of κ ( A ) are overcome by the effective condition number of x 0 , which is a function of A and b, and therefore a more accurate measure of the stability of x 0 . This condition number is considered in Section 5.1.
The effective condition number is a linear condition number because it is assumed that the entries of A (the spatial derivatives of the intensity) are not subject to error, and only errors in b (the temporal derivatives of the intensity) are subject to error. A more realistic condition number of the optical flow requires that errors in the spatial and temporal derivatives of the intensity be considered. This inclusion leads to a non-linear extension of the effective condition number, and it is considered in Section 5.2.

5.1. The Effective Condition Number

This section considers the effective condition number of the solution x 0 of the LS problem, which must be solved many times in the LK method. It is assumed for generality that the coefficient matrix A is of order m × n where m n , rather than m × 2 .
The relative errors of x 0 and b are, respectively,
Δ x 0 = δ x 0 x 0 a n d Δ b = δ b b ,
and the effective condition number of x 0 is defined as the maximum value of the ratio of Δ x 0 to Δ b for the given vector b, with respect to all perturbations δ b in b,
η ( A , b ) = max δ b R m Δ x 0 Δ b .
Theorem 1.
The effective condition number η ( A , b ) of x 0 is
η ( A , b ) = A b x 0 = A b A b ,
and it follows from (12) that
η ( A , b ) = c σ n Σ c .
Proof. 
It follows from (7) and (17) that
δ x 0 = A δ b A δ b = A b Δ b .
The division of this inequality by x 0 = A b yields (18), from which (19) follows. □
The condition that must be satisfied for κ ( A ) to be a measure of the stability of x 0 follows from (20) because b A x 0 = 0 if and only if b R ( A ) . This condition is necessarily satisfied by all vectors b if A is square, but by some vectors b, not all vectors b, if m > n . If b R ( A ) , then b = A x 0 and thus b A x 0 , and (20) yields
δ x 0 A b Δ b A A x 0 Δ b .
This leads to the definition of the condition number κ ( A ) of A R m × n , m n , as the upper bound of the ratio of the relative error of x 0 to the relative error of b if and only if b R ( A ) ,
κ ( A ) = max δ b R m , b R ( A ) Δ x 0 Δ b = A A = σ 1 σ n .
Furthermore, it follows from (12) and (19) that
max b R ( A ) η ( A , b ) = max c R ( A ) c σ n Σ c = σ 1 σ n ,
and thus κ ( A ) is the maximum value of η ( A , b ) with respect to all vectors b R ( A ) .
The superiority of the effective condition number η ( A , b ) with respect to the condition number κ ( A ) as a measure of the stability of x 0 follows because η ( A , b ) is a function of A and b, but κ ( A ) is a function of A only. The example of regression in ([§3] [23]) shows the difference between these condition numbers. The dependence of η ( A , b ) on A and b is clearly advantageous, but η ( A , b ) must be used with care because it follows from (18) that the denominator contains the term x 0 . It therefore follows that η ( A , b ) is ill conditioned if x 0 is ill conditioned, and it is well conditioned if x 0 is well conditioned. It is shown in ([§4] [24]) that η ( A , b ) , and therefore x 0 , are ill conditioned if the discrete Picard condition is satisfied ([§4.5] [25]). This condition states that if the constants c i decay to zero faster than the singular values σ i decay to zero, that is,
c i σ i 0 as i n ,
then x 0 is ill conditioned. This condition can be derived from x 0 ,
x 0 = A b = i = 1 n c i σ i v i ,
where v i is the ith column of V. If b is perturbed to b + δ b , then x 0 is perturbed to
x 0 + δ x 0 = i = 1 n c i + δ c i σ i v i ,
and there exists a constant s such that the perturbations δ c i ϵ , i = 1 , , n , satisfy
c i > ϵ , i = 1 , , s , c i < ϵ , i = s + 1 , , n .
These inequalities follow because the perturbations δ c i are approximately constant but the coefficients c i decay to zero because the discrete Picard condition (21) is satisfied, and thus from (23),
x 0 + δ x 0 = i = 1 n c i + δ c i σ i v i i = 1 s c i σ i v i + i = s + 1 n δ c i σ i v i .
Since the discrete Picard condition is satisfied, x 0 can be approximated with a small error by only considering the first s singular values in (22),
x 0 = A b i = 1 s c i σ i v i ,
and thus
δ x 0 i = s + 1 n δ c i σ i v i .
It follows that
δ x 0 ϵ i = s + 1 n 1 σ i 2 1 2 ϵ σ n ,
and thus the norm of the perturbation δ x 0 is approximately proportional to the magnitude of the noise and inversely proportional to the smallest singular value of A, from which it follows that x 0 is ill conditioned if the discrete Picard condition (21) is satisfied. It is seen that x 0 is ill conditioned if it is dominated by the large singular values of A, in which case δ x 0 is dominated by the small singular values of A.
The satisfaction of the discrete Picard condition is necessary for the application of Tikhonov regularisation because it guarantees that the regularisation error is small and the regularised form of x 0 is numerically stable ([§5] [23]). Tikhonov regularisation is used for the removal of blur from images because experiments have shown that many images satisfy the discrete Picard condition [26]. This condition forms the prior information that guarantees that Tikhonov regularisation is effective for image deblurring, and more generally, the ill conditioned property of η ( A , b ) limits its practical use if prior information is not available. The condition number η ( A , b ) is, however, an important condition measure because, as shown above, it defines the conditions between A and b for which x 0 is well conditioned, and the conditions for which x 0 is ill conditioned.
It has been shown that η ( A , b ) is ill conditioned if x 0 is ill conditioned, but its upper bound is numerically stable, and an expression for this bound is developed in Theorem 2. This bound includes the term cos θ that is defined in (16), and thus the error in the solution x 0 of the LS problem is related to the numerical condition of x 0 .
Theorem 2.
If A R m × n , m n , then the effective condition number (19) of the solution x 0 of the LS problem is
η ( A , b ) = 1 σ n i = 1 m c i 2 i = 1 n c i σ i 2 1 2 = σ 1 σ n i = 1 m c i 2 i = 1 n σ 1 σ i 2 c i 2 1 2 ,
where c = c i i = 1 m is defined in (12), and thus η ( A , b ) satisfies
η ( A , b ) κ ( A ) i f m = n , κ ( A ) cos θ i f m > n .
Proof. 
If m = n , (19) yields
η ( A , b ) = σ 1 σ n i = 1 n c i 2 i = 1 n σ 1 σ i 2 c i 2 1 2 σ 1 σ n = κ ( A ) ,
and the result (24) for m = n follows.
Consider now the situation m > n ,
η ( A , b ) = σ 1 σ n i = 1 m c i 2 i = 1 n σ 1 σ i 2 c i 2 1 2 σ 1 σ n i = 1 m c i 2 i = 1 n c i 2 1 2 = κ ( A ) c c ¯ 1 ,
and the result (24) for m > n follows from (16). □
Equation (24) shows that the condition number of x 0 increases rapidly as cos θ 0 . For example, if a window of order 7 × 7 is used, then A R 49 × 2 and the error in x 0 is equal to zero if and only if b lies in the two dimensional subspace R c o l A 2 spanned by the columns of A, in the space R 49 , and the error in x 0 is due to the component of b that lies in the space that is orthogonal to R c o l A 2 . The condition number of x 0 is large if 1 cos θ 1 , even if σ 1 σ 2 = O ( 1 ) .
Example 1.
Consider the matrix A and vector b,
A = 1 0 0 1 1 1 a n d b = 1 1 1 .
The singular values of A are σ 1 = 1 and σ 2 = 3 , and thus κ ( A ) = 3 . It may therefore be thought that the solution x 0 of the LS problem is well conditioned but this is incorrect because b is orthogonal to R ( A ) ,
b T A = 0 0 0 .
It follows that cos θ = 0 and x 0 = 0 0 T .
The left singular matrix U of A = U Σ V T , and the matrices U 1 and U 2 are
U = 1 6 1 2 1 3 1 6 1 2 1 3 2 6 0 1 3 , U 1 = 1 6 1 2 1 6 1 2 2 6 0 a n d U 2 = 1 3 1 3 1 3 ,
and thus
U 1 T b = 0 0 a n d U 1 U 1 T b = 0 0 0 ,
which confirm that b is orthogonal to R ( A ) .

5.2. A Non-Linear Condition Number

This section extends the linear effective condition number considered in Section 5.1 by including perturbations in the coefficient matrix A, such that the perturbed solution of the LS problem is
x 0 + δ x 0 = A + δ A ( b + δ b ) .
An expression for the condition number of x 0 when A and b are perturbed is developed in Theorem 3.
Theorem 3.
The condition number of x 0 when A and b are perturbed by, respectively, δ A and δ b such that
δ b ϵ b a n d δ A ϵ A ,
is
ρ ( A , b ) = max δ b ϵ b , δ A ϵ A 1 ϵ δ x 0 x 0 = A b + A A T A 1 I A A b x 0 + A A = η ( A , b ) + κ ( A ) + κ ( A ) I A A b σ n x 0 .
Proof. 
The pseudo-inverse of A + δ A is
( A + δ A ) = A + δ A T A + δ A 1 A + δ A T ,
where, to first order,
A + δ A T A + δ A 1 = I + A δ A + A T A 1 δ A T A 1 A T A 1 = I A δ A A T A 1 δ A T A A T A 1 ,
and thus
( A + δ A ) = A + δ A T A + δ A 1 ( A + δ A ) T = I A δ A A T A 1 δ A T A A T A 1 ( A + δ A ) T = A + A T A 1 δ A T A δ A A A T A 1 δ A T A A ,
to first order. It therefore follows from (25) that
x 0 + δ x 0 = A + A T A 1 δ A T A δ A A A T A 1 δ A T A A ( b + δ b ) ,
and thus
δ x 0 = A δ b A δ A x 0 + A T A 1 δ A T I A A b .
Hence,
δ x 0 A δ b + A δ A x 0 + A T A 1 δ A I A A b ,
and the substitution of (18) and (26) into this expression yields (27). □
An upper bound for the last term in (27) allows ρ ( A , b ) to be expressed in terms of η ( A , b ) and κ ( A ) . In particular,
I A A b = 0 0 0 I m n U T b b ,
and thus (27) becomes
ρ ( A , b ) = η ( A , b ) + κ ( A ) + κ ( A ) I A A b σ n x 0 η ( A , b ) + κ ( A ) + κ ( A ) b σ n A b = η ( A , b ) + κ ( A ) + κ ( A ) η ( A , b ) ,
and thus the inclusion of errors in the spatial derivatives of the intensity is significant because of the term κ ( A ) η ( A , b ) .
It was stated in Section 5.1 that the effective condition number η ( A , b ) is ill conditioned if the discrete Picard condition is satisfied, in which case (27) and its upper bound (28) cannot be computed reliably. This problem is addressed by using the upper bound (24) for the value of η ( A , b ) in (28). The condition number (27) is, however, useful because it shows that the inclusion of perturbations in the spatial derivatives of the intensity causes a significant increase in the condition number of the optical flow.

6. Examples

This section contains two examples that show the errors and condition numbers of the optical flow at each level of the pyramid. The Middlebury data set [1] is used in Examples 2 and 3, and the following numerical measures are considered:
  • The condition number κ ( A ) , the upper bound of the effective condition number, κ ( A ) cos θ , and the non-linear condition number ρ ( A , b ) .
  • The angle θ between b and its component that lies in R ( A ) .
  • The error e in the solution x 0 of the LS problem.
The examples use the images shown in Figure 2 [1].
Example 2.
The five measures listed above (the three condition numbers, the angle θ and the error e in x 0 ) were computed in each window for windows of side lengths 2 s + 1 , s = 2 , 3 , , 8 , 9 , at each level of a pyramid that has four levels, for the images in Figure 2. The mean and standard deviation of log 10 κ ( A ) , the upper bound log 10 κ ( A ) cos θ of the effective condition number, and the non-linear condition number log 10 ρ ( A , b ) , at each level of the pyramid and for each window size, are shown in Figure 3. The figures in the left column show that, as expected, the mean of ρ ( A , b ) at each of the 32 points in the grid is larger than the corresponding values of κ ( A ) and κ ( A ) cos θ . Also, the means of ρ ( A , b ) and κ ( A ) cos θ increase, but not monotonically, as the pyramid is descended from the coarsest image (level 4) to the finest image (level 1), and they decrease as the window size increases, at each level of the pyramid. The variation of the mean of each condition number, Figure 3 (left), is similar to the variation of its standard deviation, Figure 3 (right).
The mean and standard deviation of the angle θ, where cos θ is defined in (16), and the error log 10 e , where e is defined in (11), are shown in Figure 4. It is seen that the mean of θ is approximately constant for windows of side length 7 pixels or more at levels 2, 3 and 4 of the pyramid. Figure 4 (left) shows that the variation of the means of θ and log 10 e are very similar, but Figure 4 (right) show that, for each measure, the standard deviation is significant, which implies that there is considerable variation of each measure.
Example 3.
The LK method with a pyramid of three levels was applied to the images in Figure 2, with a window of size 15 × 15 . The condition number log 10 κ ( A ) , upper bound log 10 κ ( A ) cos θ of the effective condition number, the non-linear condition number log 10 ρ ( A , b ) , angle θ and error log 10 e were computed for every LS problem in the windows for each level of the pyramid. Figure 5 shows the axes and size of the image at each level of the pyramid for Figure 6, Figure 7, Figure 8, Figure 9 and Figure 10. These figures show that the quantities above increase, and the standard deviation of each of these quantities increases, as the pyramid is descended. The figures are therefore consistent with Figure 3 and Figure 4, and it follows that the computational reliability of the optical flow decreases as the pyramid is descended.
Figure 9 shows that the values of the local maxima of the angle θ increase significantly as the pyramid is descended. In particular there are many LS problems for which θ is about 80 degrees, from which it follows that the errors in the solutions of these problems are large. This result is consistent with the plots of the condition numbers in Figure 7 and Figure 8. Furthermore, it follows from (15) that large values of θ are associated with large errors e, which is confirmed in Figure 10.

7. Summary and Conclusions

The work in this paper considers the numerical properties of the solution of the LS problem because of their consequences on the computed optical flow. These consequences are compounded by the large number of LS problems that must be solved in the LK method, where the solutions of the LS problems at level l of the pyramid form the data, after upsampling, for the LS problems at level l 1 of the pyramid. It has been shown that the large values of the condition numbers η ( A , b ) and ρ ( A , b ) , and error e, of the solution x 0 of the LS problem arise because a significant component of b lies in the space that is orthogonal to R ( A ) , and thus the terms 1 cos θ and cos θ in, respectively, the expressions for the condition numbers and error are significant. It has been shown that the error and condition numbers of the LS problem increase as the pyramid is descended from the top level (coarsest image) to the bottom level (finest image) and thus the fidelity of the computed optical flow decreases as the pyramid is descended. There have been several studies that consider the errors associated with the LK method, but the new aspect of the work described in this paper is the extensive use of computational linear algebra, specifically, condition estimation and error analysis of the solution of the LS problem.
The large values of the condition numbers and errors follow from the assumption that the optical flow in each window at each level of the pyramid is constant, that is, the motion in each window can be represented by a translation. It follows that an improved form of the LK method requires that a richer class of motions be allowed in each window, and this is achieved by the introduction of warping functions in the LK method [17]. These warping functions are defined by parametric models, for example, affine and projective models, and the values of the parameters of these models are computed in the optimisation procedure in an extended form of the LK method.

Funding

This research received no external funding.

Data Availability Statement

The data used in this paper are taken from the Middlebury data set [1], which is in the public domain and frequently used in computer vision.

Conflicts of Interest

The author declares that he does have not relationships or interests that have direct or potential influence, or impart bias, on the work.

References

  1. Baker, S.; Scharstein, D.; Lewis, J.; Roth, S.; Black, M.; Szeliski, R. A database and evaluation methodology for optical flow. Int. J. Computer Vision 2011, 92, 1–31. [Google Scholar] [CrossRef]
  2. Horn, B.; Schunck, B. Determining optical flow. Artif. Intell. 1981, 17, 185–203. [Google Scholar] [CrossRef]
  3. Lucas, B.D.; Kanade, T. An iterative image registration technique with an application to stereo vision. In Proceedings of the 7th International Joint Conference on Artificial Intelligence, Vancouver, BC, Canada, 24–28 August 1981; pp. 674–679. [Google Scholar]
  4. Fortun, D.; Bouthemy, P.; Kervrann, C. Optical flow modeling and computation: A survey. Comput. Vis. Image Underst. 2015, 134, 1–21. [Google Scholar] [CrossRef]
  5. Hofinger, M.; Bulò, S.; Porzi, L.; Knapitsch, A.; Pock, T.; Kontschieder, P. Improving optical flow on a pyramidal level. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 770–786. [Google Scholar]
  6. Shah, S.T.H.; Xuezhi, X. Traditional and modern strategies for optical flow: An investigation. SN Appl. Sci. 2021, 3. [Google Scholar] [CrossRef]
  7. Tu, Z.; Xie, W.; Zhang, D.; Poppe, R.; Veltkamp, R.; Li, B.; Yuan, J. A survey of variational and CNN-based optical flow techniques. Signal Process. Image Commun. 2019, 72, 9–24. [Google Scholar] [CrossRef]
  8. Zhai, M.; Xiang, X.; Lv, N.; Kong, X. Optical flow and scene flow estimation: A survey. Pattern Recognit. 2021, 114, 107861. [Google Scholar] [CrossRef]
  9. Savian, S.; Elahi, M.; Tillo, T. Optical flow estimation with deep learning: A survey on recent advances. In Deep Biometrics; Jiang, R., Li, C.T., Crookes, D., Meng, W., Rosenberger, C., Eds.; Springer: Berlin/Heidelberg, Germany, 2020; pp. 257–287. [Google Scholar]
  10. Lin, T.; Barron, J. Image reconstruction error for optical flow. Res. Comput. Robot. Vis. 1995, 269–290. [Google Scholar]
  11. Kearney, J.; Thompson, W.; Boley, D. Optical flow estimation: An error analysis of gradient-based methods with local optimization. IEEE Trans. Pattern Anal. Mach. Intell. 1987, 9, 229–244. [Google Scholar] [CrossRef]
  12. Golub, G.H.; Loan, C.F.V. Matrix Computations; John Hopkins University Press: Baltimore, MD, USA, 2013. [Google Scholar]
  13. Tsai, C.; Galatsanos, N.; Katsaggelos, A. Total least squares estimation of stereo optical flow. In Proceedings of the IEEE International Conference on Image Processing, Chicago, IL, USA, 4–7 October 1998; pp. 622–626. [Google Scholar]
  14. Kim, Y.; Kak, A. Error analysis of robust optical flow estimation by least median of squares methods for the varying illumination model. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 1418–1435. [Google Scholar] [CrossRef]
  15. Tarasenko, V.; Park, D. Detection and tracking over image pyramids using Lucas and Kanade algorithm. Int. J. Appl. Eng. Res. 2016, 11, 6117–6120. [Google Scholar]
  16. Klette, R. Concise Computer Vision: An Introduction into Theory and Applications; Springer: London, UK, 2014. [Google Scholar]
  17. Baker, S.; Matthews, I. Lucas-Kanade 20 years on: A unifying framework. Int. J. Computer Vision 2004, 56, 221–255. [Google Scholar] [CrossRef]
  18. Brandt, J.W. Analysis of bias in gradient-based optical flow estimation. In Proceedings of the 28th Asilomar Conference of Signals, Systems and Computers, Pacific Grove, CA, USA, 31 October–2 November 1994; pp. 721–725. [Google Scholar]
  19. Pham, T.Q.; Bezuijen, M.; van Vliet, L.J.; Schutte, K.; Hendriks, C.L.L. Performance of optimal registration estimators. Proc. SPIE 2005, 5817, 133–144. [Google Scholar]
  20. Robinson, D.; Milanfar, P. Fundamental performance limits in image registration. IEEE Trans. Image Process. 2004, 13, 1185–1199. [Google Scholar] [CrossRef] [PubMed]
  21. Burt, P. Fast filter transforms for image processing. Comput. Graph. Image Process. 1981, 16, 20–51. [Google Scholar] [CrossRef]
  22. Burt, P.; Adelson, E. The Laplacian pyramid as a compact image code. IEEE Trans. Comm. 1983, 31, 532–540. [Google Scholar] [CrossRef]
  23. Winkler, J.R.; Mitrouli, M. Condition estimation for regression and feature estimation. J. Comput. Appl. Math. 2020, 373, 112212. [Google Scholar] [CrossRef]
  24. Winkler, J.R.; Mitrouli, M.; Koukouvinos, C. The application of regularisation to variable selection in statistical modelling. J. Comput. Appl. Math. 2022, 404, 113884. [Google Scholar] [CrossRef]
  25. Hansen, P.C. Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion; SIAM: Philadelphia, PA, USA, 1998. [Google Scholar]
  26. Hansen, P.C.; Nagy, J.G.; O’Leary, D.P. Deblurring Images: Matrices, Spectra, and Filtering; SIAM: Philadelphia, PA, USA, 2006. [Google Scholar]
Figure 1. The pyramid implementation of the Lucas-Kanade method.
Figure 1. The pyramid implementation of the Lucas-Kanade method.
Electronics 13 00812 g001
Figure 2. The images used in Examples 2 and 3.
Figure 2. The images used in Examples 2 and 3.
Electronics 13 00812 g002
Figure 3. The variation of the mean (left) and standard deviation (right) of the condition number log 10 κ ( A ) , upper bound log 10 κ ( A ) cos θ of the effective condition number, and the non-linear condition number log 10 ρ ( A , b ) with the size of the window and number of levels in the pyramid, for Example 2.
Figure 3. The variation of the mean (left) and standard deviation (right) of the condition number log 10 κ ( A ) , upper bound log 10 κ ( A ) cos θ of the effective condition number, and the non-linear condition number log 10 ρ ( A , b ) with the size of the window and number of levels in the pyramid, for Example 2.
Electronics 13 00812 g003
Figure 4. The variation of the mean (left) and standard deviation (right) of the angle θ and the error log 10 e , with the size of the window and number of levels in the pyramid, for Example 2.
Figure 4. The variation of the mean (left) and standard deviation (right) of the angle θ and the error log 10 e , with the size of the window and number of levels in the pyramid, for Example 2.
Electronics 13 00812 g004
Figure 5. The sizes of the images at level 1 ( 384 × 576 ), level 2 ( 192 × 288 ) and level 3 ( 96 × 144 ) in Figure 6, Figure 7, Figure 8, Figure 9 and Figure 10, for Example 3.
Figure 5. The sizes of the images at level 1 ( 384 × 576 ), level 2 ( 192 × 288 ) and level 3 ( 96 × 144 ) in Figure 6, Figure 7, Figure 8, Figure 9 and Figure 10, for Example 3.
Electronics 13 00812 g005
Figure 6. The condition number log 10 κ ( A ) at level three (top), level two (middle) and level one (base) of the pyramid, for Example 3.
Figure 6. The condition number log 10 κ ( A ) at level three (top), level two (middle) and level one (base) of the pyramid, for Example 3.
Electronics 13 00812 g006
Figure 7. The upper bound log 10 κ ( A ) cos θ of the effective condition number at level three (top), level two (middle) and level one (base) of the pyramid, for Example 3.
Figure 7. The upper bound log 10 κ ( A ) cos θ of the effective condition number at level three (top), level two (middle) and level one (base) of the pyramid, for Example 3.
Electronics 13 00812 g007
Figure 8. The non-linear condition number log 10 ρ ( A , b ) at level three (top), level two (middle) and level one (base) of the pyramid, for Example 3.
Figure 8. The non-linear condition number log 10 ρ ( A , b ) at level three (top), level two (middle) and level one (base) of the pyramid, for Example 3.
Electronics 13 00812 g008
Figure 9. The angle θ at level three (top), level two (middle) and level one (base) of the pyramid, for Example 3.
Figure 9. The angle θ at level three (top), level two (middle) and level one (base) of the pyramid, for Example 3.
Electronics 13 00812 g009
Figure 10. The error log 10 e at level three (top), level two (middle) and level one (base) of the pyramid, for Example 3.
Figure 10. The error log 10 e at level three (top), level two (middle) and level one (base) of the pyramid, for Example 3.
Electronics 13 00812 g010
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Winkler, J.R. Error Analysis and Condition Estimation of the Pyramidal Form of the Lucas-Kanade Method in Optical Flow. Electronics 2024, 13, 812. https://doi.org/10.3390/electronics13050812

AMA Style

Winkler JR. Error Analysis and Condition Estimation of the Pyramidal Form of the Lucas-Kanade Method in Optical Flow. Electronics. 2024; 13(5):812. https://doi.org/10.3390/electronics13050812

Chicago/Turabian Style

Winkler, Joab R. 2024. "Error Analysis and Condition Estimation of the Pyramidal Form of the Lucas-Kanade Method in Optical Flow" Electronics 13, no. 5: 812. https://doi.org/10.3390/electronics13050812

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop