Enabling quaternion derivatives: the generalized HR calculus

Quaternion derivatives exist only for a very restricted class of analytic (regular) functions; however, in many applications, functions of interest are real-valued and hence not analytic, a typical case being the standard real mean square error objective function. The recent HR calculus is a step forward and provides a way to calculate derivatives and gradients of both analytic and non-analytic functions of quaternion variables; however, the HR calculus can become cumbersome in complex optimization problems due to the lack of rigorous product and chain rules, a consequence of the non-commutativity of quaternion algebra. To address this issue, we introduce the generalized HR (GHR) derivatives which employ quaternion rotations in a general orthogonal system and provide the left- and right-hand versions of the quaternion derivative of general functions. The GHR calculus also solves the long-standing problems of product and chain rules, mean-value theorem and Taylor's theorem in the quaternion field. At the core of the proposed GHR calculus is quaternion rotation, which makes it possible to extend the principle to other functional calculi in non-commutative settings. Examples in statistical learning theory and adaptive signal processing support the analysis.


Introduction.
Quaternions have become a standard in physics [40], computer graphics [1], and have also been successfully applied to many signal processing and communications problems [9,10,11,20,37,38,39,43,45,46,47].One attractive property is that quaternion algebra [52] reduces the number of parameters, and offers improvements in terms of computational complexity [23] and functional simplicity [14].Often, the task is to find the values of quaternion parameters which optimize a chosen objective function.To solve this kind of optimization problems, a common approach is to build adaptive optimization algorithms based on the gradient of the objective function, as in the quaternion least mean square (QLMS) adaptive filter [9].However, a confusing aspect of QLMS adaptive filtering, and other gradient-based optimization procedures, is that the objective functions of interest are often real-valued and thus not analytic according to the analyticity condition in quaternion analysis [4,8,33,41,42,44].An alternative way to find the derivative of real functions of quaternion variables is therefore needed.Following the idea of the CR calculus in the complex domain [2,17,34,51], two alternative ways can be used to find the derivative of a real function f with respect to the unknown quaternion variable q.The first way, called the pseudo-derivative, rewrites f as a function of the four real components q a , q b , q c and q d of the quaternion variable q, and then find the real derivatives of the so rewritten function with respect to the independent real variables, q a , q b , q c and q d , separately.In this way, we can treat the real-valued function f as a real differentiable mapping between R 4 and R. The second way, called the HR calculus, is more elegant [15,18], and aims to find the formal derivatives of f with respect to the quaternion variables q, q i , q j , q k and their conjugates.The differentials of these quaternion variables are independent, and shown in Lemma 1.2 and Lemma 1.3.
In this paper, motivated by the CR calculus [3,19,34], we revisit the theory of HR calculus introduced by three of the authors [18], and further extend this theory by developing the product rule and chain rule for the HR calculus.However, as shown in Section 2.3, the traditional product rule is not suitable for the HR calculus due to the non-commutativity of quaternion algebra.Other functional calculi [4,8,24,33,44] in quaternion analysis are similarly suffering from this barrier.
To this end, we firstly generalize the HR calculus based on a general orthogonal system.The generalized HR (GHR) calculus encompasses not just the left-and right-hand versions of quaternion derivative, we also show that for the two versions of the HR derivative, their results are identical for real-valued functions.One major result is therefore that using the GHR calculus, it is no longer important which version of the HR derivative is used.Also, within the GHR framework, we introduce a novel product rule to facilitate the calculation of the HR derivatives of general functions of quaternion variables, and show that if one function of the product is real-valued, this novel product rule degenerates into the traditional product rule shown in Corollary in 3.3.The core of the novel product rule is the quaternion rotation, this idea can be also naturally applied to other functional calculus in non-commutative settings.In the process of refining the theory of HR calculus, we revisit two important and fundamental theorems: the mean value theorem and Taylor's theorem.Taylor's theorem is presented in a compact and familiar form involving the HR derivatives.The GHR calculus poses an answer to an long-standing mathematical problem [16], while illustrative examples show how it can be applied as an important tool for solving problems in signal processing and communications.
Quaternions are an associative but not commutative algebra over R, defined as H = span{1, i, j, k} {q a + iq b + jq c + kq d | q a , q b , q c , q d ∈ R} (1.1) where {1, i, j, k} is a basis of H, and the imaginary units i, j and k satisfy i 2 = j 2 = k 2 = ijk = −1, which implies ij = k = −ji, jk = i = −kj, ki = j = −ik.For any quaternion q = q a + iq b + jq c + kq d = Sq + V q (1.2) the scalar (real) part is denoted by q a = Sq = R(q), whereas the vector part V q = I(q) = iq b + jq c + kq d comprises the three imaginary parts.The quaternion product for p, q ∈ H is given by where the symbols ′ • ′ and ′ × ′ denote the usual inner product and vector product, respectively.The presence of the vector product causes the quaternion product to be noncommutative, i.e., for p, q ∈ H, pq = qp in general.The conjugate of a quaternion q is defined as q * = Sq − V q, while the conjugate of the product satisfies (pq) * = q * p * .The modulus of a quaternion is defined as |q| = √ qq * = q 2 a + q 2 b + q 2 c + q 2 d , and it is easy to check that |pq| = |p||q|.The inverse of a quaternion q = 0 is q −1 = q * /|q| 2 which yields an important consequence (pq) − (1.4) (note the change in order).
If |q| = 1, we call q a unit quaternion.A quaternion q is said to be pure if R(q) = 0. Then q * = −q and q 2 = −|q| 2 .Thus, a pure unit quaternion is a square root of -1, such as imaginary units i, j and k.

Analytic Functions in H.
A function that is analytic is also called regular, or monogenic.Due to the non-commutativity of quaternion products, there are two ways to write the quotient in the definition of quaternion derivative, as shown below.
Proposition 1.1 is discussed in detail in [4,36] and indicates that the traditional definitions of derivative in (1.5) and (1.6) are too restrictive.One attempt to relax this constraint is due to Feuter [41,42], summarized in [4,8] (1.7) The limitations of the CRF condition were pointed out by Gentili and Struppa in [24,25], illuminating that the polynomial functions (even the identity f (q) = q) satisfy neither the left CRF nor the right CRF.To further relax the analyticity condition, a local analyticity condition (LAC) was proposed in [44], by using the polar form of a quaternion to give Left LAC : where q = q a + qα, q = Vq |Vq| and V q = iq b + jq c + kq d .This theory of local analyticity is now very well developed, and in many different directions, and we refer the reader to [21,24,25] for the slice regular functions.More recent work in this area includes [22,26,27], and references therein.The advantage of the local analyticity condition is that both the polynomial functions of q and some elementary functions in Section 4.1 satisfy the left LAC or the right LAC.
Remark 1.Note that the product and composition of two LAC functions f and g generally no longer meet the local analytic condition.For example, if f (q) = q and g(q) = ωq, ω ∈ H, then f and g satisfy the left LAC, but the product f g = qωq does not satisfy the left LAC.It is the same situation for the right LAC, only we need to write the function g as g(q) = qω.
The quaternion derivative in quaternion analysis is defined only for analytic functions.However, in engineering problems, objective functions of interest are often real-valued to minimize or maximize them and thus not analytic, such as Notice that if the definition of the analytic (regular) function given in [4,8,21,24,25,33,44] is used, then the function f is not analytic.In order to take the derivative of these functions (but not limited to only such functions), the HR calculus extends the classical idea of complex CR calculus [2,17,34,51] to the quaternion field [18].This generalization is not trivial, and we show that many rules of the CR and HR calculus are different.The details are given in Section 2.
Remark 2. It is important to note that the left (right) terminology in (1.7), (1.8) and below differ from those in [4,8,21,24,33,44].In this paper, the standard of left (right) is based on the position of ∂f ∂qa , ∂f ∂q b , ∂f ∂qc and ∂f ∂q d , rather than on the positions of imaginary units i, j, k.Although this is only a notational difference, we later show that the left derivatives (named based on this standard) in Definition 4 and 6 result in a left constant rule (2.2) and (3.2), that is, the left constant can come out from the left derivative of product, and the left derivatives stand on the left side of the quaternion differential in (5.6) and (5.10).This allows for a consistent use of terminology.

Quaternion Rotation.
Every quaternion can be written in the polar form where q = V q/|V q| is a pure unit quaternion and θ = arccos(S q /|q|) is the angle (or argument) of the quaternion.We now introduce the rotation and involution operations.
We should point out that the real representation in (1) can be easily extended to other orthogonal bases.In particular, for any non-zero quaternion µ = µ a + iµ b + jµ c + kµ d , we consider an orthogonal system {1, i µ , j µ , k µ } given by [28] so that the matrix representation of the map (•) µ becomes It is easily shown that M is orthogonal: MM T = I 3 and det(M) = 1, so that the linear map q µ = µqµ −1 represents a rotation in R 3 , which implies Thus, any quaternion q can be alternatively represented in the µ-basis as where (q ′ b , q ′ c , q ′ d ) = (q b , q c , q d )M T .

The Equivalence Relations and Involutions
Given a complex number z = z a + iz b , its real and imaginary part can be extracted as [30].Such convenient manipulation offers a number of advantages, but is not possible to achieve in the quaternion domain.To deal with this issue, we employ the quaternion involutions (self-inverse mappings), given by [48] and their conjugate involutions given by In this way, the four real components of the quaternion q can now be computed based on (1.22) or (1.23) as [4,12,18,27] q a = 1 4 (q + q i + q j + q k ), q a = 1 4 (q * + q i * + q j * + q k * ), q b = i 4 (q * + q i * − q j * − q k * ) This allows for any quaternion function of the four real variables q a , q b , q c , q d to be expressed as a function of the quaternion variables {q, q i , q j , q k } or {q * , q i * , q j * , q k * }, whereby, the relationship between the involutions in (1.22) and conjugate involutions in (1.23) is given by q * = 1 2 −q + q i + q j + q k , q i * = 1 2 q − q i + q j + q k q j * = 1 2 q + q i − q j + q k , q k * = 1 2 q + q i + q j − q k (1.26) Remark 3. Observe that q and q * are not independent and thus ∂q * ∂q = 0 as shown in (2.1).This is the main difference from the CR calculus, where the derivative ∂z * ∂z = 0.

Results Used to Introduce GHR Derivatives
The quaternion components, that is, the real variables q a , q b , q c and q d are mutually independent and hence so are their differentials.Although the quaternion variables q, q i , q j and q k are related, it is important to notice that their differentials are linearly independent, similar to the CR calculus [3].This condition is very important for distinguishing the GHR derivatives from the quaternion differential of the function under consideration.Lemma 1.2.Let f n : H → H, (n = 1, 2, 3, 4) be any arbitrary quaternion-valued functions.If the left case or the right case for ∀µ ∈ H, µ = 0, then f n = 0 for n ∈ {1, 2, 3, 4}.
Proof.The Left Case: By applying the rotation transformation on both sides of (1.2) and (1.22), it follows that q µ = q a + i µ q b + j µ q c + k µ q d , q µi = q a + i µ q b − j µ q c − k µ q d q µj = q a − i µ q b + j µ q c − k µ q d , q µk = q a − i µ q b − j µ q c + k µ q d (1.30) By applying the differential operator to the above expressions, and substituting dq µ , dq µi , dq µj and dq µk into (1.28),we have This is equivalent to Since the differentials dq a , dq b , dq c and dq d are independent, we have Hence, it follows that The right case can be proved in a similar way.
The next lemma enables us to identify the conjugate GHR derivatives, and its proof is essentially the same as that of Lemma 1.2, so it is omitted.Lemma 1.3.Let f n : H → H, (n = 1, 2, 3, 4) be any arbitrary quaternion-valued function.If the left case or the right case for ∀µ ∈ H, µ = 0, then f n = 0 for n ∈ {1, 2, 3, 4}.

The HR calculus
The optimization problems in quaternion variables frequently arise in engineering applications such as control theory, signal processing, and electrical engineering.Solutions often requires a first-or second-order approximation of the objective function to generate a new search direction.However, real functions of quaternion variables are essentially non-analytic.The recently proposed HR calculus solves these issues by using the quaternion involution, and we now introduce two kinds of HR derivatives (the derivation of HR calculus is given in Appendix A).
Definition 4 (The Left HR Derivatives).Let q = q a + iq b + jq c + kq d , where q a , q b , q c , q d ∈ R, then the formal left HR derivatives, with respect to {q, q i , q j , q k } and {q * , q i * , q j * , q k * } of the function f , are defined as where ∂f ∂qa , ∂f ∂q b , ∂f ∂qc and ∂f ∂q d are the partial derivatives of f with respect to q a , q b , q c and q d , respectively.
The properties of Definition 4 are: ) It is important to note that if a function f is premultiplied by a constant η in the second line of (2.2), then the derivative of the product is equal to the derivative of f premultiplied by the constant, but not for postmultiplication.In other words, the left constant η can come out from the derivative of the product, which is the reason we call Definition 4 the left HR derivative.
Definition 5 (The Right HR Derivatives [18]).Let q = q a + iq b + jq c + kq d , where q a , q b , q c , q d ∈ R, then the formal right HR derivatives, with respect to {q, q i , q j , q k } and {q * , q i * , q j * , q k * } of the function f , are defined as where ∂f ∂qa , ∂f ∂q b , ∂f ∂qc and ∂f ∂q d are the partial derivatives of f with respect to q a , q b , q c and q d , respectively.
The properties of Definition 5 are: where the second line of (2.4) is just a mirrors image of (2.2).Thus, we call Definition 5 the right HR derivative, denoted by the ∂ r to distinguish from the left HR derivatives.
Remark 4. The only difference between the left HR derivatives and the right HR derivative is the position of the partial derivative ∂f ∂qa , ∂f ∂q b , ∂f ∂qc and ∂f ∂q d .In the left HR derivative, ∂f ∂qa , ∂f ∂q b , ∂f ∂qc and ∂f ∂q d stand on the left side and imaginary units i, j, k on the right side.It is exactly the opposite case for the right HR derivative.Note that the ∂f ∂qa , ∂f ∂q b , ∂f ∂qc and ∂f ∂q d cannot swap position with the imaginary units i, j, k because of the noncommutative nature of quaternion product.

Relation Between the HR Derivatives
By applying the Hermitian operator to both sides of the expression in Definition 4 and using (AB) H = B H A H , we obtain by replacing f with its conjugate f * in (2.5) and using ∂f * ∂ξ * = ∂f ∂ξ , ξ ∈ {q a , q b , q c , q d }.Then, the pair of relationships between the left HR derivatives and the right HR derivatives becomes Remark 5. From the identity (2.6), we can see that the left HR derivative is equal to the right HR derivative if the function f is real-valued.This result is instrumental for practical applications of the HR calculus, where the objective function (or cost function) is often realvalued, such as the mean square error.Using the HR calculus, it is not important to choose the kind of HR derivative, because the final results are exactly the same.In the sequel, we therefore mainly focus on the left HR derivatives.

Higher Order HR Derivatives
Since a formal derivative of a function f : H → H is (wherever it exists) again a function from H to H, it makes sense to take the HR derivative of HR derivative, i.e., a higher order HR derivative.We shall consider second order left derivatives of the form where µ, ν ∈ {1, i, j, k}.From (2.1) and (2.2), we obtain If f is a real-valued function, the second formula in the above expression can be simplified as This clearly shows that the mixed second order left HR derivatives are in general not equal, that is where µ, ν ∈ {1, i, j, k}.The second order left HR derivatives have a commutative property [4] 16 If f is a real-valued function, we obtain In a similar manner, the second order right HR derivatives can be defined as where µ, ν ∈ {1, i, j, k}.An important commutative property between second order left and right derivatives of a real valued function f is given by

The Validity of the Traditional Product Rule.
Definitions 4 and 5 give a method to calculate the HR derivatives, but are complicated and inefficient.For example, the power function f (q) = q n , it is too complicated and inconvenient to compute using Definition 4 or 5.The greatest difficulty with the HR calculus is that it does not satisfy the traditional product rule, that is, for any quaternion functions f (q) and g(q), in general we have We shall illustrate this technical obstacle by two examples.
Example 1. Find the HR derivative of the function f : H → H given by where q = q a + iq b + jq c + kq d , q a , q b , q c , q d ∈ R.
Solution: By Definition 4, the left side of (2.15) becomes (2.17) Alternatively, using the property (2.1), the right side of (2.15) can be calculated as This clearly shows that the left side is not equal to the right side of (2.15), and thus the product rule is not valid.
Example 2. Find the HR derivative of the function f : H → H given by

19)
Solution: By Definition 4, we will first calculate the left side of (2.15) as while, using the property (2.1), the right side of (2.15) can be calculated as This clearly shows that the left side is not equal to the right side of (2.15).
Remark 6. Examples 1 and 2 show that the traditional product rule is not applicable for the left HR derivative in Definition 4. In a similar manner, the traditional product rule is not applicable for the right HR derivative in Definition 5.

The Generalization of HR Calculus
In this section, we propose the GHR derivatives to solve the obstacle of the product rule within the HR calculus.We achieve this by changing the basis {1, i, j, k} in Definition 4 and 5 to a general orthogonal basis {1, i µ , j µ , k µ }, as shown in (1.20).This allows us to give a similar derivation of the GHR calculus as that of HR calculus in Appendix A, but this is omitted to save the space.
Definition 6 (The Left GHR Derivatives).Let q = q a + iq b + jq c + kq d , where q a , q b , q c , q d ∈ R, then the left GHR derivatives, with respect to q µ and q µ * (µ = 0, µ ∈ H), of the function f are defined as where ∂f ∂qa , ∂f ∂q b , ∂f ∂qc and ∂f ∂q d are the partial derivatives of f with respect to q a , q b , q c and q d , respectively, and the set {1, i µ , j µ , k µ } is a general orthogonal basis of H.
The properties of Definition 6 are: where the properties (1.13) and (1.14) are used in the first line of (3.2), and µq = q µ µ and (1.14) are used in the second line of (3.2).The detail is omitted because the proof is similar to (2.2).If f is a real-valued function, the conjugate rule of the left GHR derivatives is given by Definition 7 (The Right GHR Derivatives).Let q = q a + iq b + jq c + kq d , where q a , q b , q c , q d ∈ R, then the right GHR derivatives with respect to q µ and q µ * (µ = 0, µ ∈ H) of the function f , are defined as where ∂f ∂qa , ∂f ∂q b , ∂f ∂qc and ∂f ∂q d are the partial derivatives of f with respect to q a , q b , q c and q d , respectively, and the set {1, i µ , j µ , k µ } is a general orthogonal basis of H.
The properties of Definition 7 are: Similar to the relation in (2.6), the relation between the two kinds of the GHR derivatives can be found as Remark 7. By comparing Definition 4 and Definition 6, it is seen that the GHR derivative is more concise and easier to understand than the HR derivative, and that the HR derivative is a special case of the GHR derivative.More importantly, as shown below, the GHR derivatives incorporate a novel product rule, which is very convenient for calculating the HR and GHR derivatives.In addition, the GHR derivative can be extended to other orthogonal systems, such as {1, η, η ′ , η ′′ } in [31,32].

The Novel Product Rule.
In Section 2.3, we have explained that the traditional product rule is not feasible for the HR calculus.Now, we propose a novel product rule to solve this technical obstacle, and show that the traditional product rule is a special case of the novel product rule in Corollary 3.3.Theorem 3.1 (Product Rule of Left GHR).If the functions f, g : H → H have the left GHR derivatives, then so too has their product f g, and where ∂f ∂q gµ and ∂f ∂q gµ * can be obtained by replacing µ with gµ in Definition 6.
Proof.The proof of Theorem 3.1 is given in Appendix B.
Corollary 3.2 (Product Rule of Left HR).If the functions f, g : H → H have the left HR derivatives, then so too has their product f g, and where ∂f ∂q , ∂f ∂q i , ∂f ∂q j , ∂f ∂q k and so on are the left HR derivatives in Definition 4, and ∂f ∂q gi , ∂f ∂q gj , ∂f ∂q gk and so on can be obtained by replacing µ with gi, gj, jk in Definition 6.
Theorem 3.1 is also valid for the product of quaternion-valued function and real-valued function of quaternion variables, as stated below.
Corollary 3.3.If the functions f : H → H and g : H → R have the left GHR derivatives, then their product f g satisfies the traditional product rule where ∂f ∂q µ and ∂f ∂q µ * are the left GHR derivatives in Definition 6.
Proof.From q gµ = q µ and q gµ * = q µ * for the real function g, then the corollary follows.
Theorem 3.4 (Product Rule of Right GHR).If the functions f, g : H → H have the right GHR derivatives, then so too has their product f g, and where ∂r g ∂q µf and ∂r g ∂q µf * are obtained by replacing µ with µf in Definition 7.
Corollary 3.5 (Product Rule of Right HR).If the functions f, g : H → H have the right HR derivatives, then so too has their product f g, and where ∂r f ∂q , ∂r f ∂q i , ∂r f ∂q j , ∂r f ∂q k etc. are the right HR derivatives in Definition 5.
Corollary 3.6.If the functions f : H → R and g : H → H have the right GHR derivatives, then their product f g satisfy the traditional product rule where ∂r f ∂q µ and ∂rf ∂q µ * are the right GHR derivatives in Definition 7.
The proofs of Theorem 3.4, Corollary 3.5 and Corollary 3.6 are conformal with Theorem 3.1, Corollary 3.2 and Corollary 3.3, and thus omitted.

The Chain Rule.
Another advantage of the GHR derivative is defined in Definition 6 and Definition 7 is that the chain rule can be obtained in a very simple form, and is formulated in the following theorem.Theorem 3.7 (Chain Rule of Left GHR).Let S ⊆ H and suppose g : S → H has the left GHR derivative at an interior point q of the set S. Let T ⊆ H be such that g(q) ∈ T for all q ∈ S. Assume f : T → H has left GHR derivatives at an inner point g(q) ∈ T , then the left GHR derivatives of the composite function f (g(q)) are as follows: where µ, ν ∈ H, µν = 0.
Proof.The proof of Theorem 3.7 is given in Appendix C. Corollary 3.8 (Chain Rule of Left HR).Let S ⊆ H and suppose g : S → H has the left HR derivative at an interior point q of the set S. Let T ⊆ H be such that g(q) ∈ T for all q ∈ S. Assume f : T → H has left HR derivatives at an inner point g(q) ∈ T , then the left HR derivatives of the composite function f (g(q)) are as follows: where µ ∈ {1, i, j, k}.
Theorem 3.7 is also valid for complex-valued and real-valued composite functions of quaternion variables stated in the following two corollaries, whose proofs are the same as that of Theorem 3.7, thus omitted.Corollary 3.9.Let S ⊆ H and suppose g : S → C has the left GHR derivative at an interior point q of the set S. Let T ⊆ C be such that g(q) ∈ T for all q ∈ S. Assume f : T → C has the CR derivatives at an inner point g(q) ∈ T , then the left GHR derivatives of the composite function f (g(q)) are as follows: where µ ∈ H, µ = 0, ∂f ∂g and ∂f ∂g * are the CR derivatives in CR calculus.
Corollary 3.10.Let S ⊆ H and suppose g : S → R has the left GHR derivative at an interior point q of the set S. Let T ⊆ R be such that g(q) ∈ T for all q ∈ S. Assume f : T → R has real derivatives at an inner point f (q) ∈ T , then the left GHR derivatives of the composite function f (g(q)) are as follows: where µ ∈ H, µ = 0 and f ′ (g) is the real derivatives of real-valued function.
Theorem 3.11 (Chain Rule of Right GHR).Let S ⊆ H and suppose g : S → H has the right GHR derivative at an interior point q of the set S. Let T ⊆ H be such that g(q) ∈ T for all q ∈ S. Assume f : T → H has right GHR derivatives at an inner point g(q) ∈ T , then the right GHR derivatives of the composite function f (g(q)) are as follows: where µ, ν ∈ H, µν = 0.
Corollary 3.12 (Chain Rule of Right HR).Let S ⊆ H and suppose g : S → H has the right HR derivative at an interior point q of the set S. Let T ⊆ H be such that g(q) ∈ T for all q ∈ S. Assume f : T → H has right HR derivatives at an inner point g(q) ∈ T , then the right HR derivatives of the composite function f (g(q)) are as follows: where µ ∈ {1, i, j, k}.
Corollary 3.13.Let S ⊆ H and suppose g : S → C has the right GHR derivative at an interior point q of the set S. Let T ⊆ C be such that g(q) ∈ T for all q ∈ S. Assume f : T → C has the CR derivatives at an inner point g(q) ∈ T , then the right GHR derivatives of the composite function f (g(q)) are as follows: where µ ∈ H, µ = 0, ∂f ∂g and ∂f ∂g * are the CR derivatives in CR calculus.
Corollary 3.14.Let S ⊆ H and suppose g : S → R has the right GHR derivative at an interior point q of the set S. Let T ⊆ R be such that g(q) ∈ T for all q ∈ S. Assume f : T → R has real derivatives at an inner point f (q) ∈ T , then the right GHR derivatives of the composite function f (g(q)) are as follows: where µ ∈ H, µ = 0 and f ′ (g) is the real derivatives of real-valued function.
The proofs of Theorem 3.11, Corollaries 3.12, 3.13 and 3.14 are essentially the same as those of Theorem 3.7, Corollaries 3.8, 3.9 and 3.10, thus omitted.

Mean Value Theorem.
The mean value theorem is one of the most important theoretical tools in calculus.In this section, we propose a version of mean valued theorem for quaternion-valued functions of quaternion variables.Theorem 3.15 (Mean Value Theorem of Left Form).Let f : S ⊆ H → H be continuous and its left HR derivatives exist and are continuous in the set S. If q 0 , q 1 ∈ S such that the segment joining them is also in S then, then ∂q k and so on are the left HR derivatives in Definition 4.
Corollary 3.16.Let f : S ⊆ H → R be continuous and its left HR derivatives exist and are continuous in the set S. If q 0 , q 1 ∈ S such that the segment joining them is also in S, then where λ = q 1 − q 0 , ∂f ∂q and ∂f ∂q * are the left HR derivatives in Definition 4.
If λ is sufficiently small in the modulus, the right-hand side of (3.23) can be approximated as If the left HR derivatives of f is Lipschitz continuous in the vicinity of q and q 1 with the Lipschitz constant L, we can estimate the error in this approximation as follows: Theorem 3.17 (Mean Value Theorem of Right Form).Let f : S ⊆ H → H be continuous and its right HR derivatives exist and are continuous in the set S. If q 0 , q 1 ∈ S such that the segment joining them is also in S, then where λ = q 1 − q 0 , ∂r ∂q , ∂r ∂q i , ∂r ∂q j , ∂r ∂q k and so on are the right HR derivatives in Definition 5.
Corollary 3.18.Let f : S ⊆ H → R be continuous and its right HR derivatives exist and are continuous in the set S. If q 0 , q 1 ∈ S such that the segment joining them is also in S, then where λ = q 1 − q 0 , ∂f ∂q and ∂f ∂q * are the right HR derivatives in Definition 5.
The proofs of Theorem 3.17 and Corollary 3.18 are essentially the same as those of Theorem 3.15 and Corollary 3.16, thus omitted.

Taylor's Theorem.
In this section, we derive Taylor's theorem of quaternion-valued functions as a consequence of the univariate Taylor theorem.
where the remainder R k is given by Theorem 3.20 (Taylor's Theorem of Left Form).Let f : S ⊆ H → H be continuous and its 3-times left HR derivatives exist and are continuous in the set S. If q 0 , q 0 + λ ∈ S such that the segment joining them is also in S, then where µ, ν ∈ {1, i, j, k}, ∂ 2 f ∂q ν ∂q µ and ∂q ν * ∂q µ * are the second order HR derivatives, given in Section 2.2.
Proof.Define the auxiliary function g(t) = f (q 0 + tλ) with 0 ≤ t ≤ 1.By using the chain rule in Corollary 3.8, we obtain: where µ, ν, η ∈ {1, i, j, k}.The second order Taylor polynomial in Lemma 3. 19 gives This is equivalent to where This integral contains three factors of λ in it and the remaining factors are bounded.So, R 2 is of the order of |λ| 3 making the fraction |R2| |λ| 3 bounded, as λ → 0. Hence, the first equality of the theorem follows, and the second equality can be proved in similar manner.
Corollary 3.21.Let f : S ⊆ H → R be continuous and its 3-times left HR derivatives exist and are continuous in the set S. If q 0 , q 0 + λ ∈ S such that the segment joining them is also in S, then where ν ∈ {1, i, j, k}, ∂ 2 f ∂q ν ∂q and ∂ 2 f ∂q ν * ∂q * are the second order HR derivatives in Section 2.2 Proof.By the term ∂R(q) ∂q given in Table 1 and the chain rule in Corollary 3.8, the corollary can be proved similar to the proof of Corollary 3.16.Theorem 3.22 (Taylor's Theorem of Center Form).Let f : S ⊆ H → R be continuous and its 3-times left HR derivatives exist and are continuous in the set S. If q 0 , q 0 + λ ∈ S such that the segment joining them is also in S, then where µ, ν ∈ {1, i, j, k}, ∂ 2 f ∂q ν ∂q µ * and ∂ 2 f ∂q ν * ∂q µ are the second order HR derivatives, given in Section 2.2.Theorem 3.23 (Taylor's Theorem of Right Form).Let f : S ⊆ H → H be continuous and its 3-times right HR derivatives exist and are continuous in the set S. If q 0 , q 0 + λ ∈ S such that the segment joining them is also in S, then where µ, ν ∈ {1, i, j, k}, Corollary 3.24.Let f : S ⊆ H → R be continuous and its 3-times right HR derivatives exist and are continuous in the set S. If q 0 , q 0 + λ ∈ S such that the segment joining them is also in S, then where ν ∈ {1, i, j, k}, Remark 8.The Taylor expansion in Theorem 3.20 is concisely expressed using the HR derivatives.This is different from the Taylor expansion given by Schwartz [16], which decomposes a quaternion q into two mutually perpendicular quaternions in a local coordinate system.In contrast, our idea is to extend the quaternion q as an augmented quaternion based on quaternion involutions [48].Schwartz has also stated that his Taylor expansion would cause trouble when the function has terms qωq, where ω is a general quaternion, which limits the admissible class of functions to be real functions, the fixed point to be real and so on.Notice that there are no such restrictions in Theorem 3.20, which only requires the same condition as that in [16], that is, the functions f should be real analytic functions.

Method of Steepest Descent.
In the CR calculus, the steepest descent direction of the real-valued function f (z) is expressed as − ∂f ∂z * in [17].For the quaternion case, we now show that the direction of steepest descent of real-valued function f (q) is − ∂f ∂q * , that is, a generic extension from that in R and C. Using the first order Taylor expansion in Corollary 3.21, we have Set q − q k = αd k , α ∈ R + , then the second term can be neglected by shrinking α.In this case, d k is the direction of steepest descent if and only if R ∂f (q k ) ∂q d k < 0. Obviously, in order to decrease f (q) − f (q k ), the fastest way is to minimize R ∂f (q k ) ∂q d k .It then follows that Hence, the R ∂f (q k ) ∂q d k is minimized if and only if ∂q * .The iterative rule of the steepest descent method can therefore be expressed as where α ∈ R + is the step size.

Applications of the GHR calculus
The GHR calculus has an important significance in quaternion analysis, and can be used in optimization, statistics, signal processing, machine learning and other fields.

The GHR Derivatives of Elementary Functions.
We now present some of quaternion-valued derivatives of elementary functions, these functions are often used in nonlinear adaptive filters and quaternion-valued neural networks.
Example 3 (Power Function).Find the GHR derivative of the power function f : H → H given by where n is any positive integer number.
Solution: By using the product rule in Theorem 3.1, it follows that where the term ∂q ∂q µ µ, given in Table 1, was used in the last equality.Note that the above expression is recurrent about ∂q n ∂q µ µ.Expanding this expression and using the initial condition In a similar manner, we have which is equivalent to Example 4. (Exponential Function): Find the GHR derivative of the function f : H → H given by exp(q) +∞ n=0 q n n! (4.6) Solution: From (4.3), it follows that In a similar manner, we have Remark 9.The exponential function is the most important elementary function, as both trigonometric functions and hyperbolic functions can be expressed in terms of the exponential function.The elementary function in Example 4 is a power series, and does not change the direction of the vector part of quaternion.Therefore, such elementary functions can swap positions with a quaternion q, i.e., f (q)q = qf (q), giving an important property, f * (q) = f (q * ), which can be used in practical applications, such as quaternion neural networks [48] and quaternion nonlinear adaptive filters [6].It is important to note that if the quaternion variable q degenerates into a real variable x in the definitions of elementary functions in this subsection, then the GHR derivatives simplify into the real derivative, e.g., the GHR derivative of the power function in (4.3) will become nx n−1 .Therefore, the GHR derivatives are a generalized form of the real derivatives and the real derivatives are a special case of the GHR derivatives.

Derivation of the QLMS Algorithm
In this section, we rederive the quaternion least mean square (QLMS) algorithm given in [9] according to the rules of the GHR calculus.The same real-valued quadratic cost function as in LMS and CLMS is used, that is where The weight update of QLMS is then given by where α is the step size, and ∇ w * J(n) is the conjugate gradient of J(n) with respect to w * .In Section 3.5, we have shown the conjugate gradient is the direction of steepest descent.By using the novel product rule in Corollary 3.2, the gradient is therefore calculated by To find ∇ w * J(n), we need to calculate the following two derivatives where the terms ∂(qν) ∂q * and ∂(ωq * ) ∂q µ * µ are given in Table 1, and are used in the last equalities above.Substituting (4.13) and (4.14) into (4.12)yields Finally, the update of the adaptive weight vector of QLMS becomes where the constant 1 2 in (4.15) is absorbed into α.
Remark 10.Note that if we start from y(n) = w H (n)x(n), the final update rule would become w(n + 1) = w(n) + α x(n)e * (n).The QLMS algorithm in (4.16) is different from the QLMS in [9], due to the use of different product rule.Although, the traditional product rule was used to derive the weight update rule in [9], our counter-examples in Section 2.3 illustrate that the traditional product rule is not applicable for the HR calculus.We therefore use the novel product rule within the GHR calculus to derive the weight update rule in (4.12).Another advantage of QLMS in (4.16), derived based the GHR calculus, is that it has the same generic form as that of the CMLS [5].

Derivation of the WL-QLMS Algorithm
In this section, we rederive the WL-QLMS algorithm based on quaternion widely linear model given in [12,29,31,32].The cost function to be minimized is a real-valued function of quaternion variables and The weight updates are then given by where α is the step size, ∇ h * J(n), ∇ g * J(n), ∇ u * J(n) and ∇ v * J(n) are respectively the conjugate gradients of J(n) with respect to h * , g * , u * and v * .By using the novel product rule in Corollary 3.2, ∇ h * J(n) is calculated by To find ∇ h * J(n), we need to calculate the following two derivatives where the terms ∂(q * ν) ∂q * and ∂(ωq) ∂q µ * µ are given in Table 1, and are used in the last equalities above.Substituting (4.22) and (4.23) into (4.21)yields The gradients ∇ g * J(n), ∇ u * J(n) and ∇ v * J(n) can be calculated in a similar way to (4.24) and are given by Finally, the weight update within WL-QLMS can be expressed as where the constant 1 2 in (4.25) is absorbed into α.

Derivation of Quaternion Nonlinear Adaptive Filtering Algorithm.
In this section, we derive the Quaternion nonlinear gradient descent (QNGD) algorithm given in [6] according to the rules of GHR calculus.The same real-valued quadratic cost function as in LMS and CLMS is used, that is where and Φ is the quaternion nonlinearity.The weight update is given by where α is the step size, and ∇ w * J(n) is the conjugate gradient of J(n) with respect to w * .By using the chain rule in Corollary 3.8, the gradient is calculated by where the derivatives of |e(n)| 2 can be calculated using the term ∂|q| 2 ∂q µ * µ in Table 1.It then follows that To find ∇ w * J(n), we need to calculate the following derivative By the chain rule in Corollary 3.8, it follows that where the HR derivatives of s * (n) can be calculated by using the term ∂(ωq * ) ∂q * in Table 1, giving Using the property (2.2) and the term ∂(ωq * ) ∂q µ * µ in Table 1, we have , we arrive at By combining (4.31), (4.32) and (4.36) with (4.30), we obtain Finally, the update of the adaptive weight vector of QNGD algorithm can be expressed as where the constant 1 2 in (4.37) is absorbed into α.
Remark 12.If the function Φ(q) = q, then the QNGD algorithm will degenerate into QLMS in Section 4.2, that is, the QLMS algorithm is a special case of QNGD.Using the GHR calculus for the QNGD algorithm, the nonlinear function Φ does not need to satisfy the odd-symmetry condition required in [6].We can also derive the augmented QNGD (AQNGD) and widely linear QNGD (WL-QNGD) algorithms in the same way.In order to save space, we leave this to the interested readers.

Conclusions
A new framework for the efficient computation of quaternion derivatives, termed the GHR calculus, has been established.The proposed methodology has been shown to greatly relax the existence condition for the derivatives of functions of quaternion variables, and to simplify the calculation of quaternion derivatives through its novel product and chain rule, unlike the existing quaternion derivatives, the GHR calculus is general and can be used for both analytic and non-analytic functions of quaternion variables, The core of the GHR calculus is the use of quaternion rotation to overcome the non-commutativity of quaternion product, and the use of quaternion involutions to obtain a quaternion basis, such as {q, q i , q j , q k } or their conjugates.The use of quaternion involutions has been instrumental in establishing two fundamental results: the quaternion mean value theorem and Taylor's theorem.The proposed framework allows for real-and complex-valued optimization algorithms to be extended to the quaternion field in a generic, compact and intuitive way.Illustrative examples in adaptive signal processing demonstrate the effectiveness of the proposed framework.

Appendix.
In this section, we give the detail of proofs of theorems and combine some important results into a table to make them more accessible.
Appendix A: The derivation of HR calculus.
For any quaternion-valued function f (q) ∈ H, we shall start from (since the fields H and R 4 are isomorphic) f (q) = f a (q a , q b , q c , q d ) + if b (q a , q b , q c , q d ) + jf c (q a , q b , q c , q d ) + kf d (q a , q b , q c , q d ) ( where Then, the function f can be seen as a function of the four independent real-valued variables q a , q b , q c and q d , and the differential of f can be expressed as follows [4]: or where ∂f ∂qa , ∂f ∂q b , ∂f ∂qc and ∂f ∂q d are the partial derivatives of f with respect to q a , q b , q c and q d , respectively.Note that the two equations are identical since dq a , dq b , dq c and dq d are real quantities.As a result, both equations are equally valid as a starting point for the derivation of the HR calculus. The Left Case (5.2).There are two ways (1.24) and (1.25) to link the real and quaternion differentials, which correspond to the HR derivatives and conjugate HR derivatives, respectively.A1.The Left HR Derivatives: From (1.24), the differentials of the components of a quaternion can be formulated as By inserting (5.4) into (5.2), the differential of the function f becomes The Right Case (5.3).There are two ways (1.24) and (1.25) to link the real and quaternion differentials, which correspond the HR derivatives and conjugate HR derivatives, respectively.
A3.The right HR Derivatives: By applying a rotation transformation on both sides of (5.4), we have Then, by substituting (5.12) into (5. A4.The Right Conjugate HR Derivatives: By applying a rotation transformation on both sides of (5.8), we have Appendix B: The Proof of The Product Rule.
Within the GHR calculus, when a quaternion-valued function is postmultiplied by a realvalued function, then the novel product rule will degenerate into the traditional product rule.This is stated in the next lemma.Proof.
Using the property ∂(ηf ) ∂q µ = η ∂f ∂q µ in (3.2), we have The first part of the lemma is proved, and the second part can be shown in a similar way.
Proof of Theorem 3.1.From (1.20), it is seen that {1, i µ , j µ , k µ } is another orthogonal basis of H. Then the quaternion-valued function g can be expressed in the following way By applying Lemma 5.1 to the right side of (5.23), it can be shown that Next, by using the result ∂(f η) ∂q µ = ∂f ∂q ηµ η in (3.2), we have where the Definition 6 and (1.14) were used in the last equality above.Grouping ∂f ∂qa , ∂f ∂q b , ∂f ∂qc and ∂f ∂q d in (5.25) yields where (1.14) was used in the second to last equality above, and Definition 6 was used in the last equality.Hence, the first part of the theorem follows, and the second part can be proved in an analogous manner.
Appendix C: The Proof of The Chain Rule.
To prove the chain rule, the following lemma shall be used.
Lemma 5.2.Let q = q a + iq b + jq c + kq d , where q a , q b , q c , q d ∈ R, then the partial derivatives of the quaternion-valued composite function f (g(q)) satisfy the following chain rule where ξ ∈ {q a , q b , q c , q d } and ν ∈ H, ν = 0.
Proof.Let g(q) = g a + ig b + jg c + kg d , where g a , g b , g c , g d ∈ R.Then, the function f (g(q)) can be seen as a function of the four real-valued variables g a , g b , g c and g d , and the partial derivative of f (g(q)) can be expressed as (5.28) By applying the quaternion rotation transform (•) ν to both sides of (1.24) and replacing q with g, the real-valued components g a , g b , g c and g d can be expressed as Hence, the first equality of the lemma follows, the second equality can be derived in a similar fashion.
Proof of Theorem 3.7.By using Definition 6, the left HR derivative of the product f g can be expressed as Hence, the first equality of the theorem follows, the other equalities can be derived in a similar fashion.
Appendix D: Fundamental Results Based on the GHR Derivatives.
Several of the most important results of left GHR derivatives are summarized in Table 1 and are easy for the reader to locate, assuming ν, ω and λ to be constant quaternions, q to be a quaternion-valued variable, and µ to be any quaternion constants or functions.To show how to use Table 1, the following examples are presented.
Example 5. Find the GHR derivative of the function f : H → H given by f (q) = q = q a + iq b + jq c + kq d (5.35) Solution: By using Definition 6, it follows that In a similar manner, it can be shown that Sq = R(q) q −1 −q −1 R(q −1 µ) where ν, ω, λ ∈ H are constants.

Lemma 3 . 19 (
Taylor's Theorem for Univariate Functions[50]).Let f : D ⊆ R → R be (k + 1)-times continuously differentiable on an open interval D. If x ∈ D, then ν * ∂q µ * are the second order HR derivatives in Section 2.2 ∂q * are the second order HR derivatives in Section 2.2 The proofs of Theorem 3.22 and 3.23 and Corollary 3.24 are essentially the same as those of Theorem 3.20 and Corollary 3.21, thus omitted.