On the differentiation of a composite function with a generalized vector argument on homogeneous time scales

The paper proves a theorem on the differentiation of a composite function with a generalized vector argument. The theorem is formulated in terms of the delta derivative, which in the case of homogeneous time scales incorporates both the ordinary derivative and the difference operator. The term “generalized vector argument” implies that a composite function is allowed to depend not only on some variables but also on their delta derivatives. A formula in the theorem shows how the higher-order delta and partial derivatives of a composite function commute. Moreover, it enables reducing the order of the delta derivative, making computations simpler and more efficient. The computational efficiency of the formula was analysed on the basis of experiments in the symbolic computation software Mathematica.


INTRODUCTION
Various areas of mathematics, physics, control theory, etc. require the differentiation of a composite function with a vector argument, whose components are functions of a common variable (for instance, time variable t).In order to solve some theoretical problems, it is useful to have a rule for the commutation of the higher-order total and partial derivatives of a function.The theorem proved in this paper offers such a rule (see Section 3).In order to make the theorem applicable to homogeneous time scales, we formulated it in terms of the delta derivative, merging both the ordinary derivative (in the case of a continuous time scale) and the difference operator (in the case of a uniformly sampled discrete time scale).The special case of this theorem, when a composite function depends only on some variables but not on their derivatives, was proved in [7] for the ordinary derivative, and then applied in [6] and [8] to prove the main results.Afterwards, the extension of the theorem to the case of homogeneous time scales was presented in [4] as a supplementary result.Unlike [7] and [4], in this paper we address a more general case when a vector argument contains besides some variables also their derivatives of different order.The main motivation to derive the theorem for such a case was the necessity to prove the main result of [5].A secondary benefit of the theorem can be found in its computational efficiency.In other words, the formula from the theorem allows one to reduce the time required for the computation of certain combinations of derivatives (see Sections 4 and 5).
Note that though the theorem is stated in a single language of time scale formalism, the proof is given separately for the continuous-and discrete-time cases, since, to the best of our knowledge, a single explicit formula for the higher-order delta derivative of a composite function is absent.To prove the continuous-time part, we used Mishkov's formula [10], which explicitly expresses the higher-order total derivative of a composite function with a vector argument.In the discrete-time case basic tools from the time scale calculus [3] were employed.

PRELIMINARIES
In order to represent the main result of this paper for both the ordinary derivative and the difference operator simultaneously, we employ the time scale formalism, which is briefly recalled in this section.A more thorough introduction to the time scale calculus can be found in [3].
A time scale T is an arbitrary nonempty (topologically) closed subset of the real numbers R. Though the variety of time scales is wide (see, for example, [1,3]), this paper is focused on only two of them, i.e., the continuous time scale, T = R, and the discrete time scale, T = τZ := {τk : k ∈ Z} for τ > 0. The most important notions of the time scale calculus are the forward jump operator σ , the delta derivative ∆, and the graininess function µ.Applications of σ and ∆ to the function ξ : T → R as well as the values of the graininess function µ are presented in Table 1 for two cases T = R and T = τZ.A time scale T is called homogeneous if µ ≡ const and, as can be seen from Table 1, both time scales T = R and T = τZ possess this property.Note that for the simplicity of exposition we omit the variable t, i.e., use instead of ξ (t) the shorter notation ξ .Moreover, we denote by ξ ⟨n⟩ the delta derivative of an arbitrary order n, whereas ξ σ n stands for the n-fold application of the forward jump operator, so ξ ⟨n⟩ := ( ξ ⟨n−1⟩ ) ∆ and ξ σ n := ( ξ σ n−1 ) σ .
Proposition 2.1.For F, G : T → R the delta derivative and forward jump operator satisfy the following properties: (iv) on a homogeneous time scale operators ∆ and σ commute, i.e., Observe that the generalization of item (i) above yields for µ ̸ = 0 and where C k n and C i n are the binomial coefficients.

MAIN RESULT
The main result of this paper is the theorem below, which shows how the partial derivative of the higher-order delta derivative of a composite function can be expressed through the higher-order delta derivatives of the partial derivatives of this function.A composite function with a vector argument, consisting of an arbitrary but finite number of variables and their delta derivatives, is considered.
with ξ ⟨s j ⟩ j : T → R for all j = 1, . . ., r and s j = 0, . . ., ς j , such that delta derivatives of Φ (Ξ) are defined up to and including an order b; then where l ∈ {1, . . ., r} ; C b−a+γ b denotes the binomial coefficient, and a, b are non-negative integers, such that b + ς l ≥ a.
Proof.Since there is no single explicit formula for the higher-order delta derivative of a composite function on time scales, two cases T = R and T = τZ will be considered separately.
Case 1 (T = R).Note that throughout this part of the proof the symbol ⟨n⟩ stands for the nth-order ordinary derivative with respect to t.According to Mishkov's formula [10], the bth derivative of the composite function Φ(Ξ) can be computed by the formula (Φ (Ξ) where and the sum ∑ b is taken over all non-negative integer solutions of the Diophantine equations ( Moreover, integers p j,s and k in B Φ b stand for the order of the partial derivative with respect to ξ ⟨s⟩ j and the order of the mixed partial derivative of Φ(Ξ), respectively, and satisfy the relations Using the product rule for differentiation, one obtains where In order to express explicitly It is easy to observe that ι varies from 1 to b + ς j .In this case s = ι − i, whose minimal and maximal values are ι − b and ι − 1, respectively.On the other hand, s changes form 0 to ς j .Therefore, we define the range of s from max(0, ι − b) to min(ς j , ι − 1).As a result, one may use the following identity: Next, separating the multiplier with j = l and ι = a, one may rewrite the product above as , where, in order to avoid confusion, γ replaces the index s, independent from j and ι.Using the product of powers property, we obtain Now it is easy to observe that in M only addends of the sum ∑ b with q a−γ,l,γ ̸ = 0 will matter.Note that, according to (5), q a−γ,l,γ ̸ = 0 implies k a−γ ̸ = 0. Taking into account that in this case k a−γ −1 and q a−γ,l,γ −1 are non-negative, it is eligible to introduce the following notations: for i = 1, . . ., b; j = 1, . . ., r; s = 0, . . ., ς j and rewrite (5) as Note that ( 10) is satisfied only under the following conditions: As a consequence, (10) yields the following Diophantine equations: leading to the conclusion where the sum ∑ b−a+γ is taken over all non-negative integer solutions of (12).Next, for j = 1, . . ., r and s = 0, . . ., ς j we denote p j,s := which together with ( 9) and (11) allows one to rewrite (6) as Furthermore, applying ( 9), (11), and ( 14) to (4) and taking into account that by direct computations where and As a result, recalling that q a−γ,l,γ = qa−γ,l,γ + 1 and using ( 13), ( 16), we express (8) as Now, taking into account ( 12), ( 15), (17), and (18), one may again apply Mishkov's formula to obtain Next, the application of Mishkov's formula to N yields ) ⟨b−a+γ⟩ for γ = a.Thus, observing that N is different from zero only for ς l ≥ a, we may rewrite (7) as Case 2 (T = τZ).Note that in this case the symbol ⟨n⟩ means the n-fold application of the difference operator.Using (1) for n = b, the definition of the operator σ , and the chain rule for the partial derivative with respect to ξ ⟨a⟩ l , one obtains In order to find ∂ ξ ⟨γ+i⟩ l /∂ ξ ⟨a⟩ l , it is necessary to express the sum γ + i in terms of a single index.For that purpose we, first, use the identity , and then we change the summation index i into ι = γ + i.It is easy to observe that ι varies from 0 to b + ς l .In this case γ = ι − i, whose minimal and maximal values are ι − b and ι, respectively.On the other hand, γ changes form 0 to ς l .Therefore, we define the range of γ from max(0, ι − b) to min(ς l , ι).As a result, one may use the identity equals 0 for every ι, except for ι = a when it equals 1, and using the relation µ a−γ /µ b = 1/µ b−a+γ , one may rewrite the equality above as . This completes the proof.
The following corollary provides, in a sense, an inverse relation to that presented in Theorem 3.1, i.e., it shows how the delta derivative of the partial derivative of a composite function can be expressed through the partial derivatives of the delta derivatives of this function.
and changing the summation order according to the identity min(v,w) we obtain Separation of the last addend of the sum where Using the binomial theorem Thus, the identity (20) confirms (19).

EXAMPLES
The examples in this section illustrate the application of formula (3).
Example 4.1.Consider the composite function Φ 1 ( u, y, y ⟨1⟩ , y ⟨2⟩ ) = uy ⟨1⟩ + yy ⟨2⟩ defined on the continuous time scale T = R (the symbol ⟨i⟩ stands for the ith-order ordinary derivative with respect to t and σ i = id).Assume that we need to compute the partial derivative with respect to y ⟨3⟩ of the 5th-order total derivative of the function Φ 1 (•).For this purpose the ordinary steps are the computation of the total derivative and then differentiation with respect to y ⟨3⟩ , yielding On the other hand, one may use formula (3) as follows: Observe that formula (3) does not require the computation of (Φ 1 (•)) ⟨5⟩ .Moreover, it allows one to compute partial derivatives before total derivatives and reduces the maximal order of the total derivative.As a result, the computations become simpler and more efficient.
Example 4.2.Consider the composite function Φ 2 ( u, y, y ⟨1⟩ , y ⟨2⟩ ) = uy ⟨1⟩ + yy ⟨2⟩ defined on a homogeneous time scale 1 T. Assume that we need to compute the partial derivative with respect to y ⟨4⟩ of the 2nd-order total delta derivative of the function Φ 2 (•).Usually, one needs to compute first the total delta derivative and then the partial derivative with respect to y ⟨4⟩ , which yields Alternatively, formula (3) leads to Like in the case of Example 4.1, one may observe that the application of formula (3) significantly simplifies the computations.

COMPUTATIONAL EFFICIENCY
In this section the computational efficiency of formula ( 3) is examined on the basis of experiments performed via the symbolic computation program Mathematica.The objective of the experiments is the comparison of the computation times for both sides of formula (3) with different variations in parameters.For that purpose we employed the Mathematica function Timing, which returns the time (in seconds) used by CPU for the evaluation of an expression.Furthermore, in order to avoid the influence of previous computations, internal system caches of stored results were cleared between the experiments by means of the command ClearSystemCache.The technical The experiments were performed separately for a continuous time scale (T = R) and its discrete counterpart (T = τZ) due to the computational complexity of the single formula for the delta derivative of a composite function on time scales [3].In the case T = R we used the built-in Mathematica function D to compute derivatives, whereas in the case T = τZ the functions DeltaD and ForwardShift from the package NLControl (see [2]) were employed as the difference and forward jump operators, respectively.The results of the experiments are presented in Tables 2-5.Note that for the sake of reliability, each experiment was repeated ten times and the average value of the obtained results was entered into the table.
Tables 2 and 3 consist of the results for T = R, whereas the results for T = τZ are displayed in Tables 4 and  5.Both in Tables 2 and 4 the number of variables r in a vector argument Ξ varies, while the maximal orders of their derivatives ς 1 , . . . ,ς r are fixed.In contrast, Tables 3 and 5 contain the results for vector arguments with a fixed number of variables r = 1 and various maximal orders of derivatives ς 1 .For ease of comparison, cells with worse results have grey background.Moreover, light grey means that a result is less than two times worse, grey denotes the results that are twice or more but less than ten times worse, and dark grey indicates that a result is ten or more times worse.
Both Tables 2 and 3 show that for T = R the right-hand side of formula (3) is more efficient than its lefthand counterpart in almost all cases, except for the trivial one, when b = 0. Nevertheless, even in this case the differences between results are not very significant.Next, one can observe that the bigger the values of the parameters b, r, ς 1 , . . . ,ς r , the more efficient is the right-hand side of (3).Moreover, its efficiency extremely increases for a > ς l , l ∈ {1, . . ., r}.Finally, note that in the worst of the presented cases (Table 3: b = 0, a = 0, ς 1 = 2) the right-hand side of (3) is 1.56 times slower than its traditional analogue, while in the best case (Table 2: b = 3, a = 5, r = 4) it is almost 500 times faster than the rival.Analysis of Tables 4 and 5 reveals that, like in the case T = R, in the discrete-time case T = τZ the right-hand side of formula (3) is not efficient for b = 0 and is extremely efficient for a > ς l , l ∈ {1, . . ., r}.However, the situation is different for a ≤ ς l , l ∈ {1, . . ., r}, when the right-hand side of (3) yields almost the same but (in most cases) slightly worse results than its left-hand counterpart.The reason for such discrepancy between the cases T = R and T = τZ is the forward jump operator σ a−γ , which requires additional computational time for T = τZ and vanishes for T = R as σ = id in this case.For this reason the right-hand side of (3) is frequently faster than the left-hand one for a = 0, when σ a−γ = id.Finally, one may observe that in the worst of the presented cases (Table 5: b = 0, a = 0, ς 1 = 2) the right-hand side of (3) is 1.69 times slower than its traditional analogue, while in the best case (Table 4: b = 3, a = 5, r = 4) it is 140 times faster than the rival.

CONCLUSIONS
The theorem in the paper provides the commutation rule for the higher-order delta and partial derivatives of a composite function with a generalized vector argument.The applicability of the theorem to theoretical research in the field of nonlinear control systems has already been confirmed in [4][5][6]8], whose proofs of the main results rely on the suggested commutation rule.Due to the unified formulation of the theorem on homogeneous time  scales, it can be used for both continuous-time control systems and delta-domain models, introduced in [9] as the sampled-data models of continuous-time systems, expressed in terms of the difference (delta) operator (i.e., delta derivative in the time scale formalism).Moreover, we tend to believe that the presented theorem has potential benefit for various areas of physics and mathematical analysis, where the differentiation of a composite function is employed.As a supplementary advantage, the formula in the theorem provides a more efficient way of computing certain combinations of the higher-order delta and partial derivatives than the straightforward one.The presented result is limited only to the case of homogeneous time scales and its formulation and proof for an arbitrary time scale remain a topic for future research.Note, however, that such kind of generalization is not an easy task due to the absence of necessary mathematical tools, unified for all time scales.Therefore, one of the approaches in this situation is gradual extension of results.For example, one of the possible future steps in this direction can be the study of the problem on a larger class of regular time scales, which comprises non-homogeneous time scales with certain properties [1].

Corollary 3 . 2 .
Under the assumption of Theorem 3.1 for all b ≤ w the following holds:

Proof.
Taking into account that w − ζ and v − ζ are always non-negative and w + ς l ≥ v, one may apply formula (3) to a = v − ζ and b = w − ζ to rewrite (19) as

Table 3 .
Average computation times [s] of the left-hand side (L) and the right-hand side (R) of formula (3) for T = R, r = 1 and various values of a, b, ς 1

Table 4 .
Average computation times [s] of the left-hand side (L) and the right-hand side (R) of formula (3) for T = τZ, ς 1 = • • • = ς r = 2 and various values of a, b

Table 5 .
Average computation times [s] of the left-hand side (L) and the right-hand side (R) of formula (3) for T = τZ, r = 1 and various values of a, b, ς 1

Table 1 .
Basic types of operators/functions

Table 2 .
Average computation times [s] of the left-hand side (L) and the right-hand side (R) of formula (3) for T = R, ς 1 = • • • = ς r = 2 and various values of a, b, r