Geometric-algebra affine projection adaptive filter

Geometric algebra (GA) is an efficient tool to deal with hypercomplex processes due to its special data structure. In this article, we introduce the affine projection algorithm (APA) in the GA domain to provide fast convergence against hypercomplex colored signals. Following the principle of minimal disturbance and the orthogonal affine subspace theory, we formulate the criterion of designing the GA-APA as a constrained optimization problem, which can be solved by the method of Lagrange Multipliers. Then, the differentiation of the cost function is calculated using geometric calculus (the extension of GA to include differentiation) to get the update formula of the GA-APA. The stability of the algorithm is analyzed based on the mean-square deviation. To avoid ill-posed problems, the regularized GA-APA is also given in the following. The simulation results show that the proposed adaptive filters, in comparison with existing methods, achieve a better convergence performance under the condition of colored input signals.

Adaptive filters (AF) have been extensively applied in many areas such as system identification, active noise control, and echo cancellation during the past decades [10]. The least mean square (LMS) and normalized LMS (NLMS) are widely used owing to their simplicity and ease of implementation. However, they show a slow convergence speed with highly colored input signals. The affine projection algorithm (APA), suggested by Ozeki and Umeda [11], is one method to overcome this problem. The APA and its variants [12][13][14] were found to be attractive choices with faster convergence than the NLMS and lower computational complexity than the recursive least squares (RLS).
However, standard AFs treat different dimensions as multi-channel signals, which may lose the structural information between different dimensions [15]. The GA-based adaptive filters (GAAF) have been used for simultaneous filtering of multi-dimensional signals for its faster convergence speed and suitability of multivector for multi-dimensional signal modeling. For example, quaternion adaptive filters were used to forecast Saito's Chaotic Signal and wind speed with superior performance [16]. GA-based beamformer of electromagnetic vector-sensor arrays has a better convergence performance than the standard beamformer [17,18]. GAAF algorithms have great advantages in processing multi-dimensional signals.
Hitzer extended the quaternion AF to the GA-based nonlinear AF for hypercomplex signals of high dimensions [19]. Afterward, the authors in [1] proposed the GA-LMS to estimate the rotor in a three-dimensional point-clouds registration problem and analyzed its performance. The GA-LMS was later used to recover the 6-degrees-of-freedom alignment of two point clouds [20]. The authors in [21] developed a robust adaptive filter based on maximum GA correntropy criteria against non-Gaussian noise. The GAbased Normalized Least Mean Fourth (GA-NLMF) and GA-NLMS derived in [22] have a certain improvement in the convergence speed compared with NLMS. However, their convergence speed is reduced considerably by the colored input signals, commonly encountered in real applications.
Based on the principle of minimal disturbance and the orthogonal affine subspace theory, this article introduces the APA in the GA domain to deal with the hypercomplex system with colored signals. Firstly, the fundamentals of geometric algebra are presented in Sect. 2. We then propose the algorithm and analyse its stability in Sect. 3. Finally, several simulations are conducted to analyse the performance and the stability of the proposed algorithm.

Fundamentals of geometric algebra
The GA G(R n ) was introduced by William K. Clifford, also called Clifford algebra. The GA enables the algebraic representation of magnitude and orientations and provides a coordinate-free framework to make the calculations efficiently.
The GA can be viewed as a geometric extension of the linear algebra R n . Vectors in R n are also vectors in G(R n ) . Take a, b vectors in R n , the geometric product of a and b is defined as ab = a · b + a ∧ b , on the basis of the inner (·) and outer (∧) product. Since the outer product doesn't satisfy the commutative law, namely a ∧ b = −(b ∧ a) , the geometric product is also non-commutative in general. Unless otherwise specified, all products in this article are geometric products. Take R 3 for example, G(R 3 ) has 2 3 = 8 dimensions, with basis {1, e 1 , e 2 , e 3 , e 12 , e 23 , e 31 , I} . All bases can be divided into four parts, namely one scalar, three unit orthogonal vectors e i (basis for R 3 ), three bivectors e ij e i e j = e i ∧ e j , i � = j (e i · e j = 0, i � = j) , and one trivector I e 123 = e 1 e 2 e 3 = e 1 ∧ e 2 ∧ e 3 . To illustrate, take a = e 1 and b = 3e 1 + 2e 2 . Then, ab = e 1 (3e 1 + 2e 2 ) = e 1 · (3e 1 + 2e 2 ) + e 1 ∧ (3e 1 + 2e 2 ) = 3 + 2(e 1 ∧ e 2 ) = 3 + 2e 12 (a scalar plus a bivector).
Multivector is the fundamental element of GA. The multivector A consists of its r-vectors �·� r as follows: in which A 0 , A 1 , and A 2 are scalar, vector, and bivector, respectively. The foundation of GA theory is the ability to fuse scalar, vector, and hyper-planes into a unique element, namely multivector.
Analogous to the conjugation of complex numbers and quaternion, the reverse of the multivector A is defined as Taking the bivector A = �A� 0 + �A� 1 + �A� 2 for an example, its reverse is The scalar product ( * ) is defined as A * B = �AB� , in which �·� ≡ �·� 0 . In addition, the magnitude of a multivector is defined as |A| 2 = A * A = � AA�.

Methods
An array of multivector consists of a collection of multivectors. Give M multivectors {U 1 , U 2 , . . . , U M } in G(R 3 ) , the M × 1 array collects them as follows: The reverse transpose operation, denoted by (·) * , is the extension of the reverse operation of multivector to arrays of multivectors. For example, the reverse transpose of the Consider reference data D(k), which is a multivector, observed at time k that comes from the linear model T is an unknown M × 1 array of multivector to be estimated with (·) T denotes transpose, V(k) accounts for measurement noise, U(k) denotes input signal observed at time k, and u(k) The model allows one to assign heterogeneous signals from different sources s i (k) , i = 0, 1, . . . , 2 n − 1 , to each entries of the multivector, e.g., U (k) = s 0 (k)+s 1 (k)e 1 +s 2 (k)e 2 +s 3 (k)e3+s 4 (k)e 12 +s 5 (k)e 23 +s 6 (k)e 31 +s 7 (k)I . For example, fusion and linear prediction of aircraft parameters can be assigned as follows: The squared Euclidean norm providing a measure of distance in LA is represented by the array product ||u|| 2 u * u , which is a scalar. However, the result of the array product in GA is a multivector rather than a scalar. Therefore, we take the scalar part of the array product �u * u� as the distance measure of the array of multivectors.

GA affine projection algorithm
The GA-APA also follows the principle of minimal disturbance and the orthogonal affine subspace theory as the standard APA. In mathematical terms, the criterion for designing the affine projection filter can be formulated as an optimization problem subject to multiple constraints. We will minimize the scalar product of the change of the estimated weight array and its reverse transpose (the distance measure of the array space of w o ), which is defined as subject to the set of N constraints where N is smaller than or equal to the length M of the weight array. The number of constraints N can be viewed as the order of the affine projection algorithm.
We apply the method of Lagrange multipliers to solve this optimization problem. Combining formulas (5) and (6), then we get the following cost function where E(n) = D(k − n) − u * (k − n) w(k + 1) and n is a multivector. For convenience of presentation, we introduce the following definitions: Then, the second term of the cost function (7) can be represent as Now, we will get the derivative of the cost function J(k) with respect to the weight array w(k + 1) following the rules of GC. In GA, the differential operator ∂ w = ∂ w(k+1) has the algebra properties of a multivector in G(R n ) . In other words, the gradient ∂ w J (k) can be calculated by the geometric product of the multivector-valued quantities ∂ w and J(k). Any multivector A ∈ G(R n ) can be decomposed into blades [5,Eq. (3.20)] via in which A i is a scalar valued, and {e i } and {e i }, i = 0, . . . , 2 n − 1 are two different bases of G(R n ) . {e i } is the reciprocal blade basis, which is an important analytical tool for differentiation in GA. The reciprocal blade basis can convert non-orthogonal to orthogonal vectors, vice versa. Since orthogonal elements cancel out mutually, the analytical procedure is simplified. Suffice to know that the following relation holds for reciprocal bases e i · e j = δ j i , where δ j i = 1 for i = j and δ j i = 0 for i = j (Kronecker delta). In particular, applying (12) to ∂ w results in The gradient ∂ w J (k) is obtained by multiplying (13) and (7), yielding in which ∂ 1 w,l = ∂ w,l �δ * w δ w � and ∂ 2 w,l = ∂ w,l �− w * (k + 1)U (k) � . As a matter of fact, arrays of multivectors can be decomposed into blades. Thus, employing (12) once again, we can rewrite δ w and δ * w in term of their 2 n blades as follows: Thus, the first term of the gradient (14) can be obtained by Then, we calculate the second term of the formula (14) and get Taking the results of (17) and (18), and setting the gradient (14) equal to zero, we get 2 δ w = (U (k) ) . Taking the reverse of both sides of the equation yields Next, we will eliminate the Lagrange vector from (19). Firstly, we use the definitions of (8) and (9) to rewrite (6) in the equivalent form Premultiplying both sides of (19) by U * and then using (20) to eliminate the updated weight array w(k + 1) yields � e q e p �∂ w,l (δ T w,q δ w,p ) � e q e p �(∂ w,lδ T w,q δ w,p +∂ w,l δ T w,qδ w,p ) � e q e p �(δ q l δ w,p + δ p l δ w,q ).    Based on the data available, the difference e(k) between d(k) and U * (k) w(k) at the adaptation cycle k is a N × 1 error array denoted Assuming the array product U * (k)U (k) to be invertible [23] allows us to solve (21) for , yielding Substituting this solution into (19), we obtain the optimum change of the weight array Finally, we introduce the step-size parameter µ into (24), yielding which is the desired update formula of the GA-APA. The algorithm is summarized in Algorithm 1. We can notice that GA-APA has the same format as standard APA adaptive filters. Since quaternion, complex numbers, and real numbers are subalgebras of geometric algebra, the Quaternion APA [24], the Complex APA [12], and the real-entries APA can be recovered by the GA-APA. In other words, GA-APA is a unified expression of the above algorithms.

Remark
APA is the same as NLMS in the LA domain when the order N = 1 . But, the update equation of the first order GA-APA is different from GA-NLMS proposed in [22]. Specially, the update term of the first order GA-APA is µu(k)(u(k) * u(k)) −1 e(k) which is similar to the update term of the GA-NLMS µu(k)�u(k) * u(k)� −1 e(k) . We will compare them in the simulation section.

Stability of the GA-APA
The mismatch between w o and w(k) is measured by weight-error array (21)  We base the stability analysis of the GA-APA on the mean-square deviation accounts for expectation. Taking the distance measure of both sides of (27), rearranging terms, and taking expectations, then we get From the equation above, we see that the GA-APA algorithm is stable in the meansquare-error sense provided the mean-square deviation y(k) decreases with the increasing number of adaptation cycles k. Therefore, the step-size parameter µ is bounded as follows:

Computational complexity analysis
As we can see from Algorithm 1, the main calculations are in Step 2 and Step 3. The number of real multiplications in Step 2 is NMα 2 , where α = 2 n represents the number of basis, N and M are the order of GA-APA and the length of w(k) , respectively. The computational complexity of multivector matrix inversion is N 3 α 2 β , where β represents the computational complexity of the inverse of a multivector. Therefore, the computation in Step 3 requires approximately α 2 (N 3 β + N 2 (M + 1) + NM) real multiplications. The total number of multiplications in GA-APA is α 2 (N 3 β + N 2 (M + 1) + 2NM) per adaptation cycle.

Regularized GA-APA
Since a matrix inversion (u * u) −1 is required within the GA-APA, ill-posed problems usually occur, especially under the condition of noisy observation data. To avoid this problem, we regularize the matrix that needs to be inverted. Then we get the update equation of the regularized GA-APA (R-GA-APA) where γ is the regularization parameter, and I is the N × N identity matrix of real number.

Results and discussion
The proposed algorithm's performance is evaluated and analyzed in this section. We compare the proposed algorithm with the GA-LMS and GA-NLMS in Sect. 4.1. The impact of order N and step size µ on algorithm performance is analyzed in Sect. 4.2.
(30) w(k + 1) = w(k) + µU (k)(U * (k)U (k) + γ I) −1 e(k), For the sake of generality, the underlying GA in all cases is G(R 3 ) . The regularization parameter of the R-GA-APA is γ = 10 −3 . The measurement noise is zero-mean uniform distributed sequences. The blades coefficients of the colored signal U(k) are obtained by filtering 2 3 = 8 white zero-mean Gaussian random sequences through a first-order system G(z) = 1/(1 − 0.9z −1 ) , respectively. All simulation results are obtained by averaging 100 independent trials.

Performance comparison
The variance of measurement noise is σ 2 The variance of U(k) is 0.1 for both white and colored signals. In this subsection, the step size value is µ = 0.2 for all algorithms. The length M of the optimal weight is 10, Figure 1 shows several mean-square error (MSE) E[|D(k) − u * (k) w(k)| 2 ] learning curves for the GA-LMS [3], GA-NLMS, GA-APA with N = 1 and R-GA-APA with N = 1 with both white and colored input signals. All the multivectors entries W o i are the same, namely . . , 10 . The coefficients of W 1 are selected in an aleatory manner. As we can see, all algorithms can converge at the same speed with white signals under some given parameters. These experiments show that the GA-APA and R-GA-APA, the same as the GA-LMS, are capable of identifying multivector-valued linear systems. But the GA-LMS and GA-NLMS suffer from slow convergence with colored input. The GA-APA and R-GA-APA achieve better performance with the colored signal with the same parameter. Comparing the GA-APA and the R-GA-APA, we conclude that their performance is roughly the same when the regularization parameter γ is small. We will focus our simulations on the R-GA-APA since it reaches the same performance as the GA-APA and avoids ill-posed problems. Additionally, the first order GA-APA reaches a much faster convergence than the GA-NLMS. We think it's because GA-NLMS doesn't follow the principle of minimal disturbance. Although the GA-NLMS introduces the normalization comparing with the GA-LMS, it suffers from slow convergence with colored signals. Figure 2 shows the convergence performance under colored input signals when filter weights w o change after 1190 iterations. The filter weights change from the weights W 1 in the above experiments to W o i = W 2 = 0.5e 0 + 1.8e 1 − 2e2 + 0.86e 3 + 0.31e 12 − 0.9e23 − 0.4e31 + 0.34I, i = 1, . . . , 10 . We can see from Fig. 2 that the proposed R-GA-APA can track the changes of weights faster than the GA-LMS and GA-NLMS after the filter weights change suddenly.
The time required for each iteration of different algorithms is given in Table 1. The results are obtained using Python on an Inter Core i7 CPU running at 3.6GHz and 16 GB of RAM. We can see that the proposed algorithm requires more calculation time. As the order of the algorithm N increases, the calculation time also increases.

Parameters analysis
The comparison results of different orders of the R-GA-APA under colored input signals are shown in Fig. 3 , 10 −4 } and 10 −1 , respectively. The step size µ is set to 0.2. As we can see, the convergence rate speeds up with the order N increases. The misadjustment of APA increases when the order N increases. As expected, this is also the case with the R-GA-APA, as   Fig. 3. Despite the speed of convergence, the steady-state error and the computational complexity of matrix inversion limit the choice of higher orders.
Finally, we evaluate the impact of the step size on convergence rate and steady-state error under colored input signals. The order of the R-GA-APA is set to 1. The σ 2 V and σ 2 U is set to 10 −3 and 10 −1 , respectively. The length of the filter weight is set to 10, and each multivector entries W o i is the same as W 1 in Sect. 4.1. Figure 4 shows the MSE learning curves of the R-GA-APA with step size µ = {0.05, 0.2, 0.6, 1.2} . We can see that the algorithm will have a faster convergence rate with a larger step size. However,

Availability of data and materials
There is no additional data to be made available.