A Novel Neural Network with the Ability to Express the Extreme Points Distribution Features of Higher Derivatives of Physical Processes

: Higher derivatives are important to interpret the physical process. However, higher derivatives calculated from measured data often deviate from the real ones because of measurement errors. A novel method for data ﬁtting without higher derivatives violating the real physical process is developed in this paper. Firstly, the research on errors’ inﬂuence on higher derivatives and the typical functions’ extreme points distribution were conducted, which demonstrates the necessity and feasibility of adopting extreme points distribution features in neural networks. Then, we proposed a new neural network considering the extreme points distribution features, namely, the extreme-points-distribution-based neural network ( EDNN ), which contains a sample error calculator ( SEC ) and extreme points distribution error calculator ( EDEC ). With recursive automatic differentiation, a model calculating the higher derivatives of the EDNN was established. Additionally, a loss function, embedded with the extreme points distribution features, was introduced. Finally, the EDNN was applied to two speciﬁc cases to reduce the noise in a second-order damped free oscillation signal and an internal combustion engine cylinder pressure trace signal. It was found that the EDNN could obtain higher derivatives that are more compatible with physical trends without detailed differentiation equations. The standard deviation of derivatives’ error of the EDNN is less than 62.5 percent of that of traditional neural networks. The EDNN provides a novel method for the analysis of physical processes with higher derivatives compatible with real physical trends.


Introduction
Higher derivatives are important to theoretical research and engineering application [1][2][3][4][5][6] for deepening the understanding of physical processes and improving analysis efficiency [7,8]. However, it is difficult to obtain higher derivatives directly from a real signal because of measurement errors [9]. Though varied techniques such as the Savitzky-Golay polynomial [10,11], Fourier transform [12,13] and wavelet transform [14,15] have been developed to obtain higher derivatives of measured signals, the higher derivatives obtained with the above techniques often deviate from the real one in trends. It is a challenge and urgent demand [9] to develop a data fitting method without higher derivatives violating real trends.
Suppose that there is a continuous noise ε(t) in the vicinity of t 0 ; the noise ε(t) can also be approximated using a Taylor series as in expression (2).
The signal to be measured and noise can be superposed together. Then, the measured signal C(t) in the vicinity of t 0 can be approximated using a Taylor series as in expression (3).
Because the noise ε(t) is random, there exists the probability that the first k terms of the Taylor series at t 0 are zero and the term after the k + 1 term can be ignored; thus: Suppose the noise amplitude at the time t t is ε t , i.e., ε(t t ) = ε t , then: For ∀A ε , when |ε t | > A ε · 1 k! (t t − t 0 ) k , ∃ ε (k) (t 0 ) > |A ε |. Therefore, when the noise amplitude ε t is a constant, as t t approaches t 0 , ε(t t ) (t t −t 0 ) k will become larger and larger, as will ε (k) (t 0 ). That is, the kth order derivative of the noise at t 0 Appl. Sci. 2023, 13, 6662 3 of 17 becomes larger and larger. Subsequently, the kth order derivative of the measured signal C(t) also becomes larger and larger. When ε t is large enough or (t t − t 0 ) is small enough, the kth order derivative of noise ε(t) at t 0 will be large enough, and the kth order derivative of measured signal C(t) at t 0 will deviate seriously from the corresponding derivative of the signal f (t).
The above analysis shows that even if the Taylor series expansion of the noise has only the kth order term, provided the kth order derivative is large enough, the kth order derivative of the measured signal can be seriously interfered with. Additionally, the kth order derivative of the measured signal will deviate substantially from the kth order derivative of the real physical process. Therefore, it is necessary to reduce the noise disturbance to obtain the higher derivatives of the measured data conforming to the real physical trend.

Extreme Points Distribution of Typical Process Functions
When fitting a measured signal using neural networks, noise often leads to the higher derivatives of the fitted signal to deviate from the true physical trend [37].
Generally, the relation between the physical quantities and time of a physical process can be modeled by a polynomial, Gaussian function, cosine function, or exponential function. The extreme points of those functions' higher derivatives on a specific interval are distributed in certain patterns. When extreme points distribution is involved in data fitting, the approximation between the higher derivatives of the measured signal and the higher derivatives of the real physical process could be improved. It is necessary to analyze the extreme points' distribution pattern of those typical process functions and their higher derivatives to involve the extreme points distribution in data fitting with neural networks.
An nth order polynomial modeling a continuous process is shown in expression (7).
Additionally, the kth order derivative of an nth order polynomial can be obtained in (8).
It can be seen that the kth order derivative of nth order polynomial is an (n − k)th-order polynomial. Therefore, the number of the kth-order derivative's extreme points decreases with the increase of order k. When fitting a measured signal modeled by an nth-order polynomial, the extreme point number could be applied to test the consistency between the fitted kth order derivative and that of the real physical process.
An exponential function describing the growth or decline process is shown in (9).
Additionally, the kth-order derivative of the exponential function is shown in (10).
According to (10), the number of extreme points of the 0-kth-order derivatives of the exponential function is always zero. When fitting a measured signal modeled using the exponential function, the number of extreme points should always remain zero. Any extreme point appearing in the fitting results indicates a deviation from the physical trend.
A Gaussian function describing a random process is shown in expression (11). The kth-order derivative of the Gaussian function is shown in expression (12).
and F i (n) = n! i!(n−i)! is the binomial coefficient. The highest degree of x m in expression (12) increases as the derivative order increases [38]. Then, the number of the derivative's extreme points increases as the derivative order increases. When fitting a measured signal modeled using the Gaussian function, the extreme point number could be applied to test the consistency between the fitted kth-order derivative and that of the real physical process.
The cosine function describing a simple periodic process is shown in (13).
The kth-order derivative of the cosine function is shown in expression (14).
Cosine function shifted left by kπ/2 yields its kth-order derivative. If the number of extreme points of the cosine function is N hp on any arbitrary interval [−a, +a], the number of extreme points of the cosine function's kth-order derivative is N hp − 1, N hp or N hp + 1. When fitting a measured signal using a cosine function, it is feasible to judge whether the higher derivative deviates from the physical trend according to whether an abnormal number of extreme points appears.
In summary, there are definite relations between the number of extreme points and the derivative order for the typical process function. Moreover, the pattern of a real physical process could be modeled using a typical function or its combinations. Therefore, the number of extreme points of the higher derivatives could be applied as constraints to fit the measured signal for denoising purpose, which will get noise-reduced signals with higher derivatives conforming to the physical trend.

The Extreme-Points-Distribution-Based Neural Network (EDNN)
For similar physical processes, the functions modelling the physical quantities varying with time and space have the same form. Additionally, the numbers of the extreme points of those functions and their derivatives are confined in a certain range. Therefore, a novel neural network that introduces the extreme points distribution as constraints was proposed in order to acquire higher derivatives conforming to physical trends. The outline of building the EDNN is shown in Figure 1.
According to (10), the number of extreme points of the 0-kth-order derivatives of the exponential function is always zero. When fitting a measured signal modeled using the exponential function, the number of extreme points should always remain zero. Any extreme point appearing in the fitting results indicates a deviation from the physical trend.
A Gaussian function describing a random process is shown in expression (11).
The kth-order derivative of the Gaussian function is shown in expression (12). where is the binomial coefficient.
The highest degree of in expression (12) increases as the derivative order increases [38]. Then, the number of the derivative's extreme points increases as the derivative order increases. When fitting a measured signal modeled using the Gaussian function, the extreme point number could be applied to test the consistency between the fitted kthorder derivative and that of the real physical process.
The cosine function describing a simple periodic process is shown in (13).
The kth-order derivative of the cosine function is shown in expression (14).
Cosine function shifted left by /2 yields its kth-order derivative. If the number of extreme points of the cosine function is on any arbitrary interval − , + , the number of extreme points of the cosine function's kth-order derivative is − 1 , or + 1. When fitting a measured signal using a cosine function, it is feasible to judge whether the higher derivative deviates from the physical trend according to whether an abnormal number of extreme points appears.
In summary, there are definite relations between the number of extreme points and the derivative order for the typical process function. Moreover, the pattern of a real physical process could be modeled using a typical function or its combinations. Therefore, the number of extreme points of the higher derivatives could be applied as constraints to fit the measured signal for denoising purpose, which will get noise-reduced signals with higher derivatives conforming to the physical trend.

The Extreme-Points-Distribution-Based Neural Network (EDNN)
For similar physical processes, the functions modelling the physical quantities varying with time and space have the same form. Additionally, the numbers of the extreme points of those functions and their derivatives are confined in a certain range. Therefore, a novel neural network that introduces the extreme points distribution as constraints was proposed in order to acquire higher derivatives conforming to physical trends. The outline of building the EDNN is shown in Figure 1.   The extreme-points-distribution-based neural network (EDNN) is composed of an input layer, hidden layers, an output layer, an automatic differentiation layer, an extreme points distribution feature layer and a loss function containing extreme points distribution errors. As shown in Figure 2, the EDNN contains a sample error calculator (SEC) and an extreme points distribution error calculator (EDEC). The total loss function of the EDNN is composed of the sample error loss function LossSp and the extreme points distribution error loss function LossEV. points distribution error calculator (EDEC). For each xd in the measured signal's definition domain, the automatic differentiation layer (ADLayers) calculates the 0-(k+1)th-order derivatives. Additionally, the extreme points distribution feature layer (EVLayer) calculates the number of extreme points of the 0-kth-order derivatives. Then, LossEV can be calculated from the difference between the calculated extreme points number and extreme points number of the real physical process.
The total loss function of the EDNN (Loss) is the weighted summation of LossSp and LossEV. The EDNN is trained, and the weight and the bias are updated until the total loss drops below the stopping criterion or the gradient is less than the stopping criterion.

Recursive Automatic Differentiation
Calculating the derivatives of the output with respect to the input is necessary for the EDNN. A recursive formulation is established for calculating the derivatives. The current layer's derivatives with respect to the input are modeled as a function of the previous layer's derivatives with respect to the input.

Derivatives of Hidden Layers
The derivative of the (l+1)th hidden layer with respect to the input layer can be formulated as a function of the derivatives of the lth hidden layer with respect to the input layer.
The ith node's output of the (l+1)th layer is denoted as ( ), and its mth-order partial derivative with respect to the input layer's nth node is denoted as ( ), . According to Faà di Bruno's formula [39,40], ( ), can be expressed as: The sample error loss function LossSp is calculated from the difference between the output f s of the sample error calculator (SEC) and the corresponding target f t . The extreme points distribution feature error loss function LossEV is calculated using the extreme points distribution error calculator (EDEC). For each x d in the measured signal's definition domain, the automatic differentiation layer (ADLayers) calculates the 0-(k+1)th-order derivatives. Additionally, the extreme points distribution feature layer (EVLayer) calculates the number of extreme points of the 0-kth-order derivatives. Then, LossEV can be calculated from the difference between the calculated extreme points number and extreme points number of the real physical process.
The total loss function of the EDNN (Loss) is the weighted summation of LossSp and LossEV. The EDNN is trained, and the weight and the bias are updated until the total loss drops below the stopping criterion or the gradient is less than the stopping criterion.

Recursive Automatic Differentiation
Calculating the derivatives of the output with respect to the input is necessary for the EDNN. A recursive formulation is established for calculating the derivatives. The current layer's derivatives with respect to the input are modeled as a function of the previous layer's derivatives with respect to the input.

Derivatives of Hidden Layers
The derivative of the (l + 1)th hidden layer with respect to the input layer can be formulated as a function of the derivatives of the lth hidden layer with respect to the input layer.
The ith node's output of the (l + 1)th layer is denoted as a l+1 i (x), and its mth-order partial derivative with respect to the input layer's nth node x I n is denoted as D m a l+1 i (x), x I n .
Appl. Sci. 2023, 13, 6662 6 of 17 According to Faà di Bruno's formula [39,40], D m a l+1 i (x), x I n can be expressed as: where the sum is over all different solutions in non-negative integers b 1 , b 2 , · · · , b m of i is the kth-order derivative of the activation function; S l is the number of nodes in layer l; is the weight of the jth node in the lth layer to the ith node in the (l + 1)th layer; b l+1 i is the bias of the ith node of the (l + 1)th layer; l = 1, · · · , (L h − 1), L h is the total number of hidden layers. When there is no transform in the input layer, the mth-order partial derivative D m a 1 i (x), x I n of the ith node's output a 1 i (x) in the first hidden layer with respect to the nth node of the input layer is shown in expression (16).
i is the mth-order derivative of the activation function; S I is the number of nodes of the input layer; w 1,I i,j is the weight of the jth node in the input layer to the ith node in the first hidden layer; b 1 i is the bias of the ith node of the first hidden layer.

Derivatives of Output Layers
The partial derivatives of the output layer with respect to the input layer are modeled as the functions of the partial derivatives of the last hidden layer with respect to the input layer. The mth-order partial derivative of the output layer's ith node with respect to the nth node in the input layer is shown in expression (17).
where L h is the total number of hidden layers; S L h is the number of nodes in the (L h )th hidden layer; w is the weight of the jth node in the (L h )th hidden layer to the ith node in the output layer.

Extreme Points Distribution Feature Layer
The role of the extreme points distribution feature layer is calculating the extreme points number. The extreme points of the mth-order partial derivatives Equation (18) is an equation for x.
The solution of Equation (18) is a set: The number of extreme points of D m a O i (x), x I n is the rank of the set Sol m,i,n : where Rank() is the ranking function, R m,i,n is the number of extreme points of D m a O i (x), x I n and R m,i,n is defined as the extreme points distribution feature.

Loss Function Containing Extreme Points Distribution Feature Errors
The number of extreme points of the real physical process is the reference for calculating the extreme points distribution feature error.
If the number of extreme points of derivative D m a O i (x), x I n is R m,i,n and the number of extreme points of the real physical process is TCE n,i,m , then the extreme points distribution feature error of derivative D m a O i (x), x I n is defined as: The extreme points distribution feature error loss function LossEV is defined as: where N o is the number of nodes in the output layer, N I is the number of nodes in the input layer and M d is the highest order of the partial derivatives. The sample error loss function LossSp is defined as: where t i,s is the ith component of the sth training sample; a O i,s (x) is the ith node's output of the output layer corresponding to the sth training sample; N t is the number of training samples.
The total loss function of the EDNN is defined as: where S S is the weight of the sample error loss function LossSp; S E is the weight of the extreme points distribution feature error loss function LossEV. The optimization criterion is to minimize the total loss function. Additionally, the stopping criterion is when the total loss function is less than 10 −4 or the gradient is less than 10 −7 .

Application of the EDNN in Denoising
The research on signal denoising with EDNN was conducted. Firstly, a single-input, single-output, single-hidden-layer EDNN was realized. Then the EDNN is applied in the denoising of a second-order damped free oscillation signal and an internal combustion engine cylinder pressure trace signal.

Realization of EDNN
The single-input, single-output EDNN is shown in Figure 3. There are eight nodes in the hidden layer, and the activation function of the hidden layer is a sigmoid function. The output of the sample error calculator (SEC) is: where W 1,I i,1 is the weight of the input to the ith node of the first hidden layer; W 2,1 1,i is the weight of the ith node in the first hidden layer to output; b 1 i is the bias of the ith node of the first hidden layer; b 2 1 is the bias of the output. The th derivative of the output with respect to the input is shown in (26).
where ( ) , , * + represents the th-order derivative of the activation function corresponding to the th node of the hidden layer.
The highest derivative order of the voltage with respect to the time of the secondorder damped free oscillation signal is two. Thus, the highest derivative order of the EDNN could be set to four, which satisfies the noise reduction demand. Therefore, the maximum value of k in expression (26) is four. Additionally, only the first-through fourthorder derivatives of the activation function are required [41]. The first-through fourthorder derivatives of the activation function are shown in expressions (27) where ( ) = is the sigmoid function; ( ), ( ), ( ) ( ), ( ) ( ) are the first, second, third, and fourth derivatives of ( ) with respect to .
The th derivative of the output with respect to the input is ( ) . Additionally, the solution of ( ) = 0 is: The number of extreme points corresponding to the derivative ( ) is the rank of the set , i.e.:  The kth derivative of the output with respect to the input is shown in (26).
where f (k) W 1,I i,1 * x + b 1 i represents the kth-order derivative of the activation function corresponding to the ith node of the hidden layer.
The highest derivative order of the voltage with respect to the time of the second-order damped free oscillation signal is two. Thus, the highest derivative order of the EDNN could be set to four, which satisfies the noise reduction demand. Therefore, the maximum value of k in expression (26) is four. Additionally, only the first-through fourth-order derivatives of the activation function are required [41]. The first-through fourth-order derivatives of the activation function are shown in expressions (27)-(30): where are the first, second, third, and fourth derivatives of f (x) with respect to x.
The kth derivative of the output with respect to the input is  The number of extreme points R k−1 corresponding to the derivative dx k−1 is the rank of the set Sol k−1 , i.e.: when the number of extreme points of the real physical process corresponding to the derivative d k−1 a o (x) dx k−1 is TCE k−1 . Therefore, the extreme points distribution feature error corresponding to the (k − 1)th where TCE k−1 is determined using known physical process information.
The total loss function of the EDNN is: where LossEV is the extreme points distribution feature error loss function; LossSp is the sample error loss function; S S is the weight of the sample error loss function LossSp; S E is the weight of the extreme points distribution feature error loss function LossEV; t s is target of the sth training sample; a o s (x) is the output corresponding to the sth training sample; N t is the number of training samples.

Acquisition of Second-Order Damped Free Oscillation Signal
The measured second-order damped free oscillation (SODFO) signal has errors, and the effect of the errors on the derivatives is difficult to evaluate. Therefore, a SODFO signal is obtained via simulation. High-frequency noise and random noise were superposed on the simulated signal. Then, the performance of the EDNN on denoising was studied.
The simulation model of SODFO signal is shown in expression (35). Additionally, the simulated signal is a cosine-shape curve with decreasing amplitude (shown in Figure 4a). The simulation model of the high-frequency noise is shown in expression (36). Moreover, the simulated high-frequency noise is a cosine signal with a frequency 10 times the frequency of the SODFO signal (shown in Figure 4b). The simulation model of the random noise is shown in expression (37). Further, random noise is a uniformly distributed noise on [−0.01, 0.01] (shown in Figure 4c). The superposition of the higher-order noise and random noise on the SODFO signal does not change the amplitude, apparently (shown in Figure 4d).

Comparative Research of EDNN with Shallow Neural Network on Denoising of Second-Order Damped Free Oscillation Signal
Comparative research on denoising performance of the EDNN was conducted. The NSODFO signal processed using the shallow neural network and the EDNN were compared. The SODFO signal, the noise-removed signal from the shallow neural network (NRSNN signal) and the noise-removed signal from the EDNN (NREDNN signal) are shown in Figure 5. It can be found in Figure 5a that the SODFO signal, NRSNN signal and the NREDNN signal are well coincident. In Figure 5b, the differences between the denoising signals and

Comparative Research of EDNN with Shallow Neural Network on Denoising of Second-Order Damped Free Oscillation Signal
Comparative research on denoising performance of the EDNN was conducted. The NSODFO signal processed using the shallow neural network and the EDNN were compared. The SODFO signal, the noise-removed signal from the shallow neural network (NRSNN signal) and the noise-removed signal from the EDNN (NREDNN signal) are shown in Figure 5.

Comparative Research of EDNN with Shallow Neural Network on Denoising of Second-Order Damped Free Oscillation Signal
Comparative research on denoising performance of the EDNN was conducted. The NSODFO signal processed using the shallow neural network and the EDNN were compared. The SODFO signal, the noise-removed signal from the shallow neural network (NRSNN signal) and the noise-removed signal from the EDNN (NREDNN signal) are shown in Figure 5. It can be found in Figure 5a that the SODFO signal, NRSNN signal and the NREDNN signal are well coincident. In Figure 5b, the differences between the denoising signals and It can be found in Figure 5a that the SODFO signal, NRSNN signal and the NREDNN signal are well coincident. In Figure 5b, the differences between the denoising signals and the SODFO signal show that the errors of the NRSNN signal and the NREDNN signal are of the same magnitude. The maximum error of the NRSNN signal is 0.09 percent of the maximum amplitude of the SODFO signal. The maximum error of the NREDNN signal is 0.08 percent of the maximum amplitude of the SODFO signal. It indicates that the shallow neural networks and the EDNN have similar performance in approximating the amplitude of the signal.
The first-through fourth-order derivatives and the derivative errors are shown in Figure 6.
Appl. Sci. 2023, 13, 6662 11 of 17 the SODFO signal show that the errors of the NRSNN signal and the NREDNN signal are of the same magnitude. The maximum error of the NRSNN signal is 0.09 percent of the maximum amplitude of the SODFO signal. The maximum error of the NREDNN signal is 0.08 percent of the maximum amplitude of the SODFO signal. It indicates that the shallow neural networks and the EDNN have similar performance in approximating the amplitude of the signal. The first-through fourth-order derivatives and the derivative errors are shown in Figure 6.   Figure 6a is the first-order derivative and the errors. The first-order derivatives of the NRSNN signal and the NREDNN signal coincide with the first-order derivative of the SODFO signal. The maximum error of the first-order derivative of the NREDNN signal is 0.16 percent of the maximum amplitude of the first-order derivative of the SODFO signal. The maximum error of the first-order derivative of the NRSNN signal is 0.49 percent of the maximum amplitude of the first-order derivative of the SODFO signal. Figure 6b is the second-order derivatives and the errors. The second-order derivatives of the NRSNN signal and the NREDNN signal coincide with the second-order derivative of the SODFO signal. The maximum error of the second-order derivative of the NREDNN signal is 0.42 percent of the maximum amplitude of the second-order derivative of the SODFO signal. The maximum error of the second-order derivative of the NRSNN signal is 2.22 percent of the maximum amplitude of the second-order derivative of the SODFO signal. Figure 6c is the third-order derivatives and the errors. The third-order derivative of the NREDNN signal coincides with the third-order derivative of the SODFO signal. Additionally, no trend discrepancy occurs in the third-order derivative of the NREDNN signal. However, there is a trend discrepancy occurs on the third-order derivative of the NRSNN signal (Figure 6d). The maximum error of the third-order derivative of the NREDNN signal is 4.02 percent of the maximum amplitude of the third-order derivative of the SODFO signal. The maximum error of the third-order derivative of the NRSNN signal is 12.05 percent of the maximum amplitude of the third-order derivative of the SODFO signal. Figure 6e is the fourth-order derivatives and the errors. The fourth-order derivatives of the NRSNN signal and the NREDNN signal coincide with the fourth-order derivative of the SODFO signal. The maximum error of the fourth-order derivative of the NREDNN signal is 39.92 percent of the maximum amplitude of the fourth-order derivative of the SODFO signal. The maximum error of the fourth-order derivative of the NRSNN signal is 114.24 percent of the maximum amplitude of the fourth-order derivative of the SODFO signal. The error of NREDNN signal is less than that of the NRSNN signal (Figure 6f).
The standard deviation of the first-through fourth-order derivatives are shown in Table 1. The standard deviation error between the derivatives of the SODFO signal and that of the NREDNN signal is less than 62.5 percent of the standard deviation errors between the derivatives of the SODFO signal and the NRSNN signal. In summary, the noise reduction performance of the NREDNN is better than that of the shallow neural networks. Additionally, no trend discrepancy occurred in the firstthrough fourth-order derivatives of the NREDNN signal. Those are the advantages of the EDNN. However, the EDNN requires more memory to store the derivatives and more computational time to calculate the derivative and the extreme feature error.

Denoising of the Cylinder Pressure Signal with EDNN
The internal combustion engine cylinder pressure trace is the variation of cylinder pressure with crank angle. Generally, the cylinder pressure trace is recorded using a piezoelectric high-temperature pressure sensor and smoothed via cyclic averaging [42]. The cyclically averaged cylinder pressure trace can satisfy the demands of calculation of indicated mean effective pressure. However, when the analysis involves derivatives beyond the second order, the noise often leads to deviation from the physical trend. The denoising performance of the EDNN on cylinder pressure trace was studied.
The cylinder signal is the in-cylinder pressure which varies with the crank angle. The pressure signal is between the −120 degree CA and 120 degree CA, and the sample interval is 0.1 degrees CA. Additionally, the cylinder pressure signal is a vector of length 2400 whose maximum value is 1.6946 × 10 6 Pa.
The single-input, single-hidden-layer, single-output EDNN with eight nodes in the hidden layer was applied to reduce the noise of the cylinder pressure trace. Moreover, the noise reduction results were compared with that from a shallow neural network with 8 nodes (SNN_8), a shallow neural network with 18 nodes (SNN_18) and the smoothing splines that are traditionally adopted in smoothing cylinder pressure trace.
The noise reduction results are shown in Figure 7. The noise-reduced signals of the four techniques are all consistent with the measured cylinder pressure (Figure 7a). Additionally, the errors between the noise-reduced signal and the measured signal in Figure 7b are all less than 400 Pa, and the relative error is less than 1.2 × 10 −6 %, which satisfies the accuracy requirements.
Appl. Sci. 2023, 13, 6662 13 of 17 the second order, the noise often leads to deviation from the physical trend. The denoising performance of the EDNN on cylinder pressure trace was studied. The cylinder signal is the in-cylinder pressure which varies with the crank angle. The pressure signal is between the −120 degree CA and 120 degree CA, and the sample interval is 0.1 degrees CA. Additionally, the cylinder pressure signal is a vector of length 2400 whose maximum value is 1.6946 × 10 6 Pa.
The single-input, single-hidden-layer, single-output EDNN with eight nodes in the hidden layer was applied to reduce the noise of the cylinder pressure trace. Moreover, the noise reduction results were compared with that from a shallow neural network with 8 nodes (SNN_8), a shallow neural network with 18 nodes (SNN_18) and the smoothing splines that are traditionally adopted in smoothing cylinder pressure trace.
The noise reduction results are shown in Figure 7. The noise-reduced signals of the four techniques are all consistent with the measured cylinder pressure (Figure 7a). Additionally, the errors between the noise-reduced signal and the measured signal in Figure  7b are all less than 400 Pa, and the relative error is less than 1.2 × 10 −6 %, which satisfies the accuracy requirements.  Figure 8 shows the first-through fourth-order derivatives. It can be seen in Figure 8a that the first-order derivatives of the noise-reduced signal from the four techniques are all consistent with each other, and there are no fluctuations deviating from the physical trend. The first-order derivative of the measured signal shows a small fluctuation near the zerodegree crank angle, which deviates from the physical trend.
It can be seen in Figure 8b that the second-order derivative of the noise-reduced signal from the EDNN and the SNN_18 neural network is consistent with real trend and there is no fluctuations deviating from the physical trend. The second-order derivatives of the noise-reduced signal from the SNN_8 neural network and the smoothing splines showed small fluctuations. Further, the second-order derivative of the measured signal shows frequent fluctuations deviating from the physical trend in the whole crank angle range.
It can be seen in Figure 8c that there are no fluctuations deviating from the physical trend occur on the third-order derivative of the noise-reduced signal, obtained from the EDNN. There are small fluctuations on the third-order derivative of the noise-reduced signal from the SNN_18 neural network. In addition, there are obvious fluctuations on the third-order derivative of the noise-reduced signal from the SNN_8 neural network. There are frequent fluctuations on the third-order derivative of the noise-reduced signal, obtained from the smoothing spline. Additionally, the third-order derivative of the measured signal is submerged in the noise.  Figure 8 shows the first-through fourth-order derivatives. It can be seen in Figure 8a that the first-order derivatives of the noise-reduced signal from the four techniques are all consistent with each other, and there are no fluctuations deviating from the physical trend. The first-order derivative of the measured signal shows a small fluctuation near the zero-degree crank angle, which deviates from the physical trend.
It can be seen in Figure 8b that the second-order derivative of the noise-reduced signal from the EDNN and the SNN_18 neural network is consistent with real trend and there is no fluctuations deviating from the physical trend. The second-order derivatives of the noise-reduced signal from the SNN_8 neural network and the smoothing splines showed small fluctuations. Further, the second-order derivative of the measured signal shows frequent fluctuations deviating from the physical trend in the whole crank angle range.
It can be seen in Figure 8c that there are no fluctuations deviating from the physical trend occur on the third-order derivative of the noise-reduced signal, obtained from the EDNN. There are small fluctuations on the third-order derivative of the noise-reduced signal from the SNN_18 neural network. In addition, there are obvious fluctuations on the third-order derivative of the noise-reduced signal from the SNN_8 neural network. There are frequent fluctuations on the third-order derivative of the noise-reduced signal, obtained from the smoothing spline. Additionally, the third-order derivative of the measured signal is submerged in the noise.
It can be seen in Figure 8d that there are no fluctuations deviating from the physical trend on the fourth-order derivative of the noise-reduced signal, obtained from the EDNN.
There are obvious fluctuations on the fourth-order derivative of the noise-reduced signal from the SNN_18 neural network and the SNN_8 neural network. Additionally, the fourthorder derivative of the measured signal and the noise-reduced signal obtained from the smoothing splines are submerged in the noise. Appl. Sci. 2023, 13, 6662 14 of It can be seen in Figure 8d that there are no fluctuations deviating from the physic trend on the fourth-order derivative of the noise-reduced signal, obtained from the EDN There are obvious fluctuations on the fourth-order derivative of the noise-reduced sign from the SNN_18 neural network and the SNN_8 neural network. Additionally, t fourth-order derivative of the measured signal and the noise-reduced signal obtain from the smoothing splines are submerged in the noise. In summary, the noise reduction performance of the EDNN is better than that of t SNN_8 neural network, the SNN_18 neural network and the smoothing splines. Furth there is no trend discrepancy in the first-through fourth-order derivatives of the nois reduced cylinder pressure signal from the EDNN.
The EDNN could derive higher derivatives consistent with the real physical proce in the absence of the process' detailed mathematical model. However, the EDNN nee more computational resources due to the recording and calculation of the higher deriv tives.

Conclusions
The effect of the error on the higher derivatives of the measured signal was analyze The feasibility of using the extreme points distribution as constraints in data fitting w studied. Then, the extreme-points-distribution-based neural network (EDNN) was pr posed. Finally, the superiority of the EDNN on signal denoising was verified. The deta are as follows: In summary, the noise reduction performance of the EDNN is better than that of the SNN_8 neural network, the SNN_18 neural network and the smoothing splines. Further, there is no trend discrepancy in the first-through fourth-order derivatives of the noisereduced cylinder pressure signal from the EDNN.
The EDNN could derive higher derivatives consistent with the real physical process in the absence of the process' detailed mathematical model. However, the EDNN needs more computational resources due to the recording and calculation of the higher derivatives.

Conclusions
The effect of the error on the higher derivatives of the measured signal was analyzed. The feasibility of using the extreme points distribution as constraints in data fitting was studied. Then, the extreme-points-distribution-based neural network (EDNN) was proposed. Finally, the superiority of the EDNN on signal denoising was verified. The details are as follows: 1.
The error's deviation effect on the higher derivatives of the measured signal was analyzed and a possible way of applying the extreme points distribution as constraints on data fitting was studied.
A mathematical model was established for the analysis of the error's effect on higher derivatives. It was found that even though the Taylor series of the noise only has the kth-order term and the coefficients of other terms are zero, the kth-order derivative of the measured signal could deviate from the real physical process greatly as long as the kth-order derivative of the noise is large enough. The necessity of finding a way to fit data with higher derivatives compatible to real process trends was clarified.
The higher derivative's extreme points distribution of typical process functions were analyzed, and the pattern of the extreme points distribution of higher derivatives was investigated. The pattern provides a theoretical basis for applying the extreme points distribution as a constraint in denoising with data fitting techniques.

2.
The extreme points distribution pattern was adopted as a constraint and the EDNN was established. The EDNN consists of an input layer, hidden layers, an output layer, an automatic differentiation layer and an extreme points distribution feature layer. A recursive formulation was established for calculating the derivatives. Additionally, a novel loss function, embedded with the extreme feature error, was proposed. 3.
The effectiveness of the EDNN on signal denoising was verified. The proposed EDNN was applied to reduce the noise in the second-order damped free oscillation signal and the internal combustion engine cylinder pressure signal. Compared with shallow neural networks and smoothing splines, the EDNN could obtain higher derivatives that are consistent with the real physical process in the absence of a detailed mathematical model. Therefore, data fitting for higher derivatives conforming to real physical process trends could be realized with EDNN, which provides a novel approach for analyzing physical processes with higher derivatives. Additionally, it could be used in understanding the real process with higher derivatives.
In summary, the advantage of the EDNN is that it could fit the measured signal with higher derivatives compatible to real physical process trends, which provides a novel tool to mine information of the real physical process through higher derivatives. However, it needs more computational resources and needs the knowledge of extreme distribution of the real process. The aim of future research is to study the computational efficiency of the EDNN and apply it in other engineering fields.