Anisotropic variation formulas for imaging applications

Abstract: The discrete anisotropic variation, sometimes referred to as the anisotropic gradient, and its integral are important in a variety of image processing applications and set boundary measure computations. We provide a method for computing the weight factors for general anisotropic variation approximations of functions on R2. The method is developed in the framework of regular arrays, but applicability extends to arbitrary finite discrete sampling strategies. The mathematical model and computations use concepts from vector calculus and introductory linear algebra so the discussion is accessible for upper-division undergraduate students.


Introduction
Given a function f : Ω → R, Ω ⊆ R 2 , bounded. The variation of f is given by In the case that f is differentiable, Var f (x, y) = |∇ f (x, y)|. Measures of the total variation, and its generalization the total p-variation, in images are used to solve a variety of image and data processing tasks through the use of variational minimization methods [1,6].
One key application is noise removal [8,11]. The total variation is an approximate measure of the magnitude of the noise because noisy images have more pixel-to-pixel intensity variation. The left and right images in Figure 1 show an image and a noisy version of the same scene, respectively. The computed variation is about 18 times larger for the noisy image. A second application is that of edge-detection [12]. The image variation is relatively large at image locations with sudden changes in intensity, which is characteristic of a (visual) edge. The left and right images in Figure 2 shows an image of a pile of rocks and the corresponding variation image, respectively. The same method applied to characteristic functions (binary images) can be used to measure the length of a set boundary [7]. The left image of Figure 3 shows a binary image of the state of Idaho. The right image of Figure 3 shows the corresponding variation image for which the sum of the pixel intensities is proportional to the boundary length. Other applications include the removal of blur artifacts due to imperfect focus of camera optics or long exposure of objects moving in a scene [4], inpainting [2,5] which seeks to recover missing or corrupted data, image segmentation [3], compressed sensing [10], and determining characteristic length scales [14,16].
An image is the discretization of a function f : Ω → R restricted to a finite regular grid. For example, a digital camera image is a collection of values { f i, j } at (pixel) locations (i, j) on a rectangular grid. Figure 4 is an example of such an image. It has image intensities, shown as shades of gray, on a regular grid of square pixels. In displaying the image, each pixel value is assigned a grayscale color which covers its associated pixel. In this case, a completely black pixel corresponds to a pixel value of zero, while a completely white pixel corresponds to a value of 255. We say that this image was  A regular grid may be composed of hexagonal pixels. One example is shown in Figure 5. This type of grid is not in widespread use because of the relative difficulty in simply specifying the geometry and the widespread use of rectangular coordinates in digital displays.
We propose a method for computing variation on images and image-like data as the sum of weighted differences of pixel intensities. Weights are chosen optimally for computing the variation on smooth arbitrary functions and their discrete representations. We begin in Section 2 by defining the anisotropic variation on discrete images and introduce displacement vector sets used to calculate the variation. In Section 3 we develop the theoretical aspects of the variation on discrete data such as images and define the anisotropic gradient function. This function is defined in terms of a geometric sampling strategy and optimal weight parameters. In Section 4 we provide details on the computation of the anisotropic gradient weights for the general scenario and address existence and uniqueness. In Section 5, we provide specific computational examples for determining optimal weights including computations on rectangular grids, hexagonal grids and sampling uniformly in angle without reference to a grid. Conclusions are provided in Section 6.

Computing anisotropic image variation
A central task of image processing is to approximate key continuous functions, such as Var f in Eq. (1.1), on the scene space using only the discrete image data. Recall that ∇ f (x, y) is the vector whose components are the partial derivatives of f , each of which is defined by limits In the case of discrete image data, we cannot take the limit h → 0 because the function is defined at only discrete locations. Thus, for our image analysis tasks, f need not be differentiable. Instead, our plan is to approximate Var f at each pixel location (x , y ). That is, we are interested in a finite sampling strategy for approximating g ≡ |∇ f (x , y )|.
It is natural to consider a direct computation However, f is known only at the finite set of locations (x , y ) and we do not know f x nor f y . We can approximate the variation at each array location, g i, j , with the forward difference strategy where h x and h y are the horizontal and vertical distances between pixels. Eq. (2.1) is an example of an isotropic gradient approximation, which is exact in the following sense for differentiable functions: Throughout the remaining discussion, we will take h x = h y = 1 for simplicity. We consider anisotropic approximations for the variation having the following form [8].
is an anisotropic variation of f .
If a location (x, y) corresponds to pixel array indices (i, j) in an image and if the components of the displacement vectors are integer, then g(x, y) is determined by available pixel intensities. Several possible configurations of displacement vectors are shown in Figure 6. These cases correspond to the following displacement vector sets. Condat [6] with unit weights. V B specifies computations on the Von Neumann neighborhood of four nearest neighbors and V E the Moore neighborhood of eight nearest neighbors. V F uses 16 nearest neighbors which has been successful in some recent computations [9]. V A is a two-vector forward difference set yielding the approximation V C shows displacement vectors that lead to the approximation These examples show that the anisotropic variation is not a unique function, but is defined by the chosen direction set V and scalar weights {w k }. Notice also that, at a local extreme point for f , where ∇ f = 0, the anisotropic variation is not necessarily equal to zero. This suggests that the anisotropic variation does not accurately approximate the steepness of functions around local extrema.

Anisotropic weights
The weight computations presented here are not specific to the rectangular grids illustrated in Figure 6. We also consider direction sets on hexagonal tilings, some of which are shown in Figure 7. However, any other nonzero direction set in R 2 can be used. No standard method for determining weights seems to be in general use, though some authors reference weights used in similar computational tasks [15] or weights based on geometric considerations [13]. In this section, we discuss a criterion for choosing optimal weights. We also discuss assumptions about the functions for which we use the anisotropic variation.
The goal in choosing specific weights for the anisotropic variation is to minimize the discrepancy between g(x, y) Eq. (2.1) and Var f (x, y) Eq. (2.2) at pixel locations. Weights should be chosen so that the approximation is good for arbitrary image functions f (x, y) as no set of weights can be optimal for all functions. Thus, we must make assumptions on the types of functions we expect to encounter. Our first assumption is that the gradient of f (x, y) is slowly varying over the sampling distances d k . In this case, differencing strategies, such as Eq. (2.2), should produce good results because any variation calculation is only weakly affected by sampling distance.
To understand this, consider a one-dimensional example in which we seek to compute |d f /dx| for the function f : R → R. We have Notice that if f is approximately linear or affine on [x, x + h] with slope s then Our second assumption is that we have no prior knowledge concerning the direction of the gradient. If we had some such information, we could preferentially weight the anisotropic gradient terms associated with directions which we believe are close to the gradient direction. So, whatever weights we choose must be independent of actual gradient vector orientation, but they could depend on relative angles between sampling directions. We expect the use of our anisotropic variation to be robust to computing gradients of any orientation.
With these two assumptions, we begin by considering a family of functions, F , whose graphs are planes and whose gradient has unit length. We seek optimal weights relative to this family. Functions in this family satisfy ∇ f θ = (cos θ, sin θ), where θ is a parameter identifying a family member. The functions are then given by f θ (x, y) = x cos θ + y sin θ, 0 ≤ θ < 2π.
Our objective is to approximate the variation Var f θ with the anisotropic variation g θ as closely as possible by determining optimal weights {w k } for a given direction set V. To add clarity of notation, let w = [w 1 , w 2 , . . . , w m ] T and define g w to be the anisotropic gradient with chosen weights w k , 1 ≤ k ≤ m. We measure optimality based on the squared error function. That is, for any given function f θ ∈ F , we seek weights w 1 , w 2 , . . . , w m which would minimize the function given by However, finding the weights that minimize the squared error above is optimal for only one such function f θ ∈ F . We are interested in finding our best overall choice of weights that would minimize the total integrated squared error over all family members (for all possible θ): That is, we want to find a point w = [w 1 , w 2 , . . . , w m ] T whose coordinates are the weights so that H(w) ≤ H(w) for all w ∈ R m . We seek minimizers w of H by finding stationary points (where ∇ w H(w) = 0). Here, we use the notation ∇ w to indicate that the derivatives are taken with respect to the variables w 1 , w 2 , . . . , w m . The j th component of In order to simplify the representation and facilitate further computations, we define the integrals in each of these terms as Notice that this second term, involving K, can be written using a matrix product: Now, stationary points w of H satisfy m k=1 w k K(k, j) = J( j), for 1 ≤ j ≤ m. That is, the matrix equation below holds.
The integrals evaluate to where ∆ k j is the minimum positive angle between sampling directions d j and d k .

Determining optimal weights
Collecting the integral computations, we find that the optimal weights w satisfy the following system of equations.
We see that the entries of each row of the matrix K and the corresponding entry in J contain a common factor, d k for row k. These common factors do not affect the solution and can be removed. Furthermore, the entries of each column of the matrix K contain a common factor, d j for column j. Thus, each entry in the solution vector w can be rescaled by the corresponding vector norm. This leads to the somewhat simplified system of equations: Now we consider existence and uniqueness of solutions to Az = B. We first show that at least one solution exists. Next, we show that if some pair of direction vectors are colinear, then there exist infinitely many solutions. Finally, we choose, among possibly infinitely many solutions, the solution z that has minimum 2-norm. Claim 1. The system of equations Az = B given in (4.1) has at least one solution.
Proof. Recall that solutions to (4.1) are stationary points of the function H given in Eq. (3.1). We will show that H has at least one stationary point. First note that H(w) ≥ 0 for all choices of weights w 1 , w 2 , . . . , w m . Now, let us use the integral computations above to rewrite H(w).
where J and K are defined above. Notice that H is a quadratic function of w 1 , w 2 , . . . , w m (the weights).
Notice also that if K is negative definite or indefinite (possessing at least one negative eigenvalue) then the quadratic, H, would obtain negative values for some w. So K must be positive definite (or positive semi-definite). Thus, H is a convex quadratic function, though not necessarily strictly convex, and bounded below by zero. Therefore H has at least one stationary point and the system of equations (4.1) has at least one solution.
Another way to understand the result of the above proof is to note that a quadratic function which is bounded below (non-negative in this case) must be either convex quadratic or constant along any direction. If the function were to behave (nonconstant) linearly along some direction then it could not be bounded below.
Claim 2. Suppose there exist two distinct vectors d k , d j ∈ V which are colinear. Then, the system of equations Az = B given in (4.1) has infinitely many solutions.
Proof. The entries in row k of matrix A are determined solely by the minimum angle between each vector in V and the line {v = αd k | α ∈ R}. If d j is colinear with d k , then they define the same line and rows j and k of A must be equal. Thus, rank(A) < m and, by the previous claim, Az = B has infinitely many solutions.
If the solution is unique, then we have z * = A −1 B. If the solution is not unique, we proceed by diagonalizing A and constructing a pseudo-inverse operator P such that z * = PB.
Since A is a real symmetric matrix, it is diagonalizable by an orthogonal matrix Q as A = QDQ T where D is a diagonal matrix consisting of the eigenvalues of A. We can order the eigenvalues λ i , and the corresponding columns of Q so that λ 1 ≥ λ 2 ≥ · · · ≥ λ m . Note also that λ m ≥ 0 because of the convexity of H(w). Suppose there are κ positive (not zero) eigenvalues. LetQ be the matrix consisting of the first κ columns of Q. LetD be the κ × κ diagonal matrix with entries λ 1 , λ 2 , · · · , λ κ . Proof. Consider the unique representation z =z + z 0 , where z 0 is in N(A), the null space of A, andz is in N(A) ⊥ , the orthogonal complement of N(A). We show that PB =z. That is, P is the invertible operator that maps from the range of A to N(A) ⊥ . We write Q = Q Q 0 and note that the columns of Q 0 form a basis for N(A) and the columns ofQ form a basis for N(A) ⊥ . Observe, Thus, P returns solutionz: Az = Az + 0 = Az + Az 0 = Az = B.
Next, we show thatz represents the solution of minimum 2-norm. Ifz is the unique solution, then it is also the solution of minimum norm. Otherwise, letỹ be any other solution. As Aỹ = B,ỹ −z ∈ N(A). Consider the standard inner product on R m and noting that any vector in N(A) is orthogonal toz, we have ỹ,ỹ = z +ỹ −z,z +ỹ −z = z,z + ỹ −z,ỹ −z . Or, equivalently, ỹ 2 = z 2 + ỹ −z 2 . Thus, ỹ ≥ z .

Example weight computations
We next present example computations illustrating the determination of weights for several scenarios from Figures 6 and 7. Table 1 provides weight computation results for each scenario. We end this section with the computation for uniformly distributed (in angle) unit vector directions.
For this set, Eq. (4.1) takes the form The matrix A is invertible. The solution is ≈ 0.77797 , and the weights are ≈ 0.77797 .

Non-colinear three-direction set, V C
We have direction set V C = {(1, 0), (0, 1), (−1, −1)} We first compute the direction vector norms and the matrix elements A(k, j): For this set, Eq. (4.1) takes the form The matrix A is invertible. The solution is Thus,
For this set, Eq. (4.1) takes the form The matrix A has rank 2 and is not invertible. A is diagonalizable with The pseudo-inverse operator is The solution is z = PB: Since all direction vectors have norm 1, we have w 1 = w 2 = w 3 = w 4 = 2 π + 2 ≈ 0.38898 .
Several weight values are given in Table 2. Notice that m = 3, 4, 6 are scenarios which also appear in Table 1.

Conclusion
The discrete anisotropic gradient and its integral are important in a variety of image processing applications and set boundary measure computations. We provided a method for computing weight factors for general anisotropic variation approximations of functions on R 2 . Optimal weights are found by minimizing the total integrated square error over all unit gradient functions. The method was developed in the framework of regular arrays, but applies more broadly to arbitrary finite discrete sampling strategies.