FLRNN-FGA: Fractional-Order Lipschitz Recurrent Neural Network with Frequency-Domain Gated Attention Mechanism for Time Series Forecasting

: Time series forecasting has played an important role in different industries, including economics, energy, weather, and healthcare. RNN-based methods have shown promising potential due to their strong ability to model the interaction of time and variables. However, they are prone to gradient issues like gradient explosion and vanishing gradients. And the prediction accuracy is not high. To address the above issues, this paper proposes a Fractional-order Lipschitz Recurrent Neural Network with a Frequency-domain Gated Attention mechanism (FLRNN-FGA). There are three major components: the Fractional-order Lipschitz Recurrent Neural Network (FLRNN), frequency module, and gated attention mechanism. In the FLRNN, fractional-order integration is employed to describe the dynamic systems accurately. It can capture long-term dependencies and improve prediction accuracy. Lipschitz weight matrices are applied to alleviate the gradient issues. In the frequency module, temporal data are transformed into the frequency domain by Fourier transform. Frequency domain processing can reduce the computational complexity of the model. In the gated attention mechanism, the gated structure can regulate attention information transmission to reduce the number of model parameters. Extensive experimental results on five real-world benchmark datasets demonstrate the effectiveness of FLRNN-FGA compared with the state-of-the-art methods.


Introduction
Time series data play an important role in modern society [1][2][3].A large number of time series data are presented in various fields, such as finance [4], economics [5], transportation [6], and meteorology [7].These time series data contain a wealth of useful information.Accurate analysis and information extraction can help decision makers and managers in various fields to mitigate risks, manage resources, predict future trends, and formulate long-term plans for better development.
Traditional time series forecasting methods are mainly divided into statistical methods and grey models [8,9].These methods can achieve good forecasting results for lowdimensional and stationary time series data [10].However, accurately forecasting highdimensional and non-stationary time series data remains challenging.For forecasting complex time series data, deep learning technology, with its powerful computational and learning capabilities, has become a research hotspot in the academic community [9,10].Compared to statistical time series forecasting models, deep learning models can handle complex nonlinear relationships.Additionally, deep learning models can accurately capture the long-term dependencies in sequences [11,12].
Recently, thanks to their tremendous success in natural language processing and computer vision [13,14], Transformers have also been successfully applied to time series tasks, with a great many Transformer variants being proposed [15].This is attributed to the attention mechanism's ability to enhance the model's focus and understanding of different parts of the time series, allowing it to more effectively learn and utilize the information in the time series, thereby improving prediction accuracy.However, Transformer modeling requires substantial computational resources and costs, which limits its development in time series forecasting [16][17][18].To this end, in this paper, we adopt recurrent neural networks as the main structure.
The recurrent neural network (RNN) model can establish temporal dependencies through recursion, and it is considered suitable for handling sequential data [19].Many studies have indicated the effectiveness of recurrent neural network models in time series data [20].However, when recurrent neural network models are applied to longsequence tasks, they are prone to gradient issues like gradient explosion and vanishing gradients [21,22].Although long short-term memory (LSTM) networks and other variants alleviate these issues to some extent, there still exist problems with inaccurate dynamic descriptions and high computational complexity.
In order to address these gradient-related issues, Hochreiter proposed the long shortterm memory (LSTM) networks [23].The LSTM networks use gate structures to control the flow of gradients to solve gradient-related issues.Benefiting from the dynamic systems perspective of RNNs [24][25][26], Rubanova and Brouwer formulated new recurrent model equations and their discrete integrals based on differential theory [27].Lechner and Hasani extended these models based on ordinary differential equations [28].They designed an ordinary differential equation model based on the LSTM to address the gradient vanishing and exploding issues.Ding combined fuzzy systems to construct the recurrent fuzzy neural network [29].Park applied a dual RNN architecture with partial linear dependencies to forecast time series data [30].Erichson reconstructed the weight matrices of recurrent units using Lipschitz RNN to alleviate the gradient issues [31].However, the prediction accuracy was not high.
For the prediction accuracy, fractional calculus theory can achieve more accurate predictions because fractional calculus is an expansion of integer calculus, which can more accurately describe the actual system.And fractional-order systems have more precise results [32].The essence of the world is a fractional-order system.And numerous natural phenomena cannot be precisely captured by traditional integer-order calculus equations.Therefore, an extension of traditional calculus is necessary to better describe and analyze such occurrences.Using Lipschitz recurrent units, the issues of gradient vanishing and exploding in recurrent neural networks for long-term sequence prediction tasks are alleviated.
To overcome the above problems, this paper proposes a Fractional-order Lipschitz Recurrent Neural Network with a Frequency-domain Gated Attention mechanism (FLRNN-FGA) for time series prediction.Deep learning techniques and fractional calculus theory can achieve more accurate predictions of long-term time series data by capturing the long-term dependencies of the time series.This paper utilizes piecewise recurrent units, which can effectively reduce the number of iterations of the recurrent units.Based on fractional calculus, fractional-order integration is used to accurately describe the dynamics of the system [33], and it can enhance the model's ability to capture long-term dependencies and improve the prediction accuracy.By introducing Fourier transform, the temporal data are transformed into the frequency domain.Some processing is conducted, including frequency domain selection and sampling.This reduces the computational complexity of the model, addressing the inefficiency involved in handling long-term time series data.To address the issue of insufficient feature interaction in the model, a gated attention mechanism is adopted, which combines gated techniques with an attention mechanism.The gated structure can adjust the information flow of the attention, enabling the model to filter out noise and avoid its introduction.This facilitates better handling of inter-variable relationships, enhancing features while reducing model parameter count and improving efficiency.The main contributions of this study are as follows.
• A new Fractional-order Lipschitz Recurrent Neural Network with a Frequency-domain Gated Attention mechanism is proposed for time series prediction.Extensive experimental results on five datasets demonstrate the effectiveness of this method.• This paper introduces fractional calculus to describe the system dynamics of recurrent neural networks, effectively improving the model's prediction accuracy.

•
In this paper, piecewise recurrent units are introduced to reduce the number of iterations of recurrent units.By reconstructing the weight matrix of the recurrent layer to control the system's dynamic changes, the gradient problem in recurrent neural networks is effectively alleviated.• This paper combines gated techniques with an attention mechanism to regulate atten- tion information, which can reduce the number of model parameters and effectively improve the model's efficiency and accuracy.
The remainder of the article is structured in the following way.In Section 3, the FLRNN-FGA method is deduced in detail.Section 4 verifies the effectiveness of the proposed method through some experiments.Finally, our conclusions are derived in Section 5.

Related Work A Time Series Forecasting Model Based on RNNs
Recurrent neural networks (RNNs) have long been the preferred choice for time series forecasting tasks due to their ability to handle sequential data.Extensive research has focused on applying RNNs to short-term and probabilistic forecasting, achieving significant progress.For instance, the work by Lai et al. [33], Wen et al. [34], Tan, Xie, and Cheng [35], and Bergsma et al. [36] has made important contributions in this area.However, in the field of long-term sequence forecasting (LTSF), RNNs are considered ineffective at capturing long-term dependencies when faced with excessively long historical windows and prediction horizons, leading to their gradual abandonment [37,38].
To address these challenges, novel RNN architectures such as SegRNN and RWKV-TS have emerged, aiming to enhance the ability of RNNs to capture long-term dependencies by improving their structure.Additionally, in the field of large-scale language models, some new RNN architectures, like RWKV, Retentive Network [39], and Mamba [40], have demonstrated performance comparable to Transformer models [41], while also being more efficient.These new developments suggest that, despite the limitations of traditional RNNs in certain long-sequence forecasting tasks, RNNs still hold great potential in time series forecasting through architectural innovation and improvement.
However, when recurrent neural network (RNN) models are applied to long-sequence tasks, they are prone to issues such as gradient explosion and gradient vanishing.Although long short-term memory (LSTM) networks and other variants have mitigated these issues to some extent, they still face challenges related to inaccurate dynamic descriptions and high computational complexity.

Methods
Addressing the issues of inaccurate prediction and gradient problems in recurrent neural networks, a Fractional-order Lipschitz Recurrent Neural Network with a Frequencydomain Gated Attention mechanism (FLRNN-FGA) is proposed for time series prediction.The model architecture of FLRNN-FGA is depicted in Figure 1, which mainly involves the Fractional-order Lipschitz Recurrent Neural Network (FLRNN), frequency module, and gated attention mechanism.
First, we present our redesigned FLRNN architecture of FLRNN-FGA in Section 3.1 Time Series Forecasting Model Based on RNN.Here, piecewise recurrent units are introduced.The segmented sequence is used as the input to the Lipschitz recurrent unit.Through segmented recurrent units, the model can accelerate the processing speed of the recurrent units for long sequences without compromising their performance.By employing fractional-order Lipschitz recurrent units, it is possible to control the sensitivity of the system.And it can alleviate gradient issues and accurately capture system dynamics and dependencies.Thus, more precise predictions can be provided for time series tasks.Then, we present the frequency module in Section 3.2.The temporal data are transformed into the frequency domain by Fourier transform.Then, the low-frequency part that contains more sequence information is selected using frequency domain selection.A strategy of random frequency domain sampling is employed to reduce the computational overhead.Frequency domain processing can effectively decrease the computational cost while retaining crucial sequence features.In addition, we also present the detailed gated attention mechanism in Section 3.3.This combines gated techniques with an attention mechanism.The correlation between features is captured by attention.Then, it can be enhanced through feature interaction.The gated structure can regulate the attention information transmission.That method can reduce the number of model parameters.First, we present our redesigned FLRNN architecture of FLRNN-FGA in Sec Time Series Forecasting Model Based on RNN.Here, piecewise recurrent units ar duced.The segmented sequence is used as the input to the Lipschitz recurre Through segmented recurrent units, the model can accelerate the processing spee recurrent units for long sequences without compromising their performance.By e ing fractional-order Lipschitz recurrent units, it is possible to control the sensitivit system.And it can alleviate gradient issues and accurately capture system dynam dependencies.Thus, more precise predictions can be provided for time series task we present the frequency module in Section 3.2.The temporal data are transform the frequency domain by Fourier transform.Then, the low-frequency part that c more sequence information is selected using frequency domain selection.A stra random frequency domain sampling is employed to reduce the computational ov Frequency domain processing can effectively decrease the computational cost w taining crucial sequence features.In addition, we also present the detailed gated a mechanism in Section 3.3.This combines gated techniques with an attention mec The correlation between features is captured by attention.Then, it can be en through feature interaction.The gated structure can regulate the attention info transmission.That method can reduce the number of model parameters.

FLRNN
The Fractional-order Lipschitz Recurrent Neural Network (FLRNN) can ac more accurate result and avoid gradient vanishing and exploding problems.
The recurrent neural network is shown as follows.
where ℎ represents the hidden state that contains past information.ℎ is the de

FLRNN
The Fractional-order Lipschitz Recurrent Neural Network (FLRNN) can achieve a more accurate result and avoid gradient vanishing and exploding problems.
The recurrent neural network is shown as follows. .
where h represents the hidden state that contains past information.
. h is the derivative of the hidden state h with respect to time.A, W, and U are matrices.b is the offset of the system.x is input, and y is output.The Lipschitz Recurrent Neural Network reconstructed the hidden layer weight matrix [22] as follows.
where I is a unit diagonal matrix.M T A and M T W are transposed matrices.β and γ are used to control the spectrum of the weight matrix.The former one controls the width of the spectrum, and the latter one shifts the position of the entire spectrum.The Lipschitz Recurrent Neural Network obtains the hidden state h through numerical integration.
To describe the system dynamics more accurately, fractional-order GL calculus [42] can be used to compute this hidden state.Fractional calculus is a generalization of classical calculus where the order of differentiation and integration can be a fraction rather than an integer.This generalization allows the modeling of systems with memory and hereditary properties, which are common in real-world phenomena.In the FLRNN, fractional calculus is used to integrate the hidden states of the RNN.
where f is the function, and p i0s the step.α represents the fractional order.And α j expresses a binomial coefficient, and j denotes a natural number.
Here, fractional-order integration is employed to calculate the hidden state of the system.
. h is taken as the integrand.The coefficient (−1) j ( α j ) can be pre-computed before training.The hidden state h can be computed through fractional-order integration, and it is shown in Equation (5).
where D −p t is a p-order integration operator, and it can be a non-integer.The < • > represents the Grünwald number.Its computation process is shown in Equation (6).
Fractional calculus is a generalization of integer-order calculus, where the order is not limited to integers but can be any real number or even a complex number.Fractional calculus possesses non-local properties, which are inherent in many complex systems.The dynamic behavior of complex systems can be better described by fractional calculus.
By introducing fractional calculus and Lipschitz recurrent units, this paper constructs a Fractional-order Lipschitz Recurrent Neural Network.And this can accurately capture the temporal relationships in long-term sequential data and alleviate the gradient issue of recurrent neural networks.
Figure 2 shows the structure of the Fractional-order Lipschitz Recurrent Neural Network in this paper, where x i represents the input data at different time points.h 0 represents the initial hidden state of the Lipschitz recurrent unit, and h i represents the time derivative of the hidden state at different time points, which is calculated through the Lipschitz recurrent unit.After the model calculates the time derivatives at all time points, the final hidden state h is computed through the fractional-order GL integration.Through fractional-order integration, the dynamic behavior of complex systems can be described more accurately, which can make the prediction results more precise.
calculus where the order of differentiation and integration can be a fraction rather than an integer.This generalization allows the modeling of systems with memory and hereditary properties, which are common in real-world phenomena.In the FLRNN, fractional calcu lus is used to integrate the hidden states of the RNN.
[ ] where  is the function, and p i0s the step. represents the fractional order.And j α expresses a binomial coefficient, and j denotes a natural number.
Here, fractional-order integration is employed to calculate the hidden state of the system.ℎ is taken as the integrand.The coefficient −1 can be pre-computed before training.The hidden state ℎ can be computed through fractional-order integration, and it is shown in Equation ( 5). ( where  is a -order integration operator, and it can be a non-integer.The < ⋅ > rep resents the Grünwald number.Its computation process is shown in Equation ( 6). ( 6Fractional calculus is a generalization of integer-order calculus, where the order is not limited to integers but can be any real number or even a complex number.Fractiona calculus possesses non-local properties, which are inherent in many complex systems.The dynamic behavior of complex systems can be better described by fractional calculus.
By introducing fractional calculus and Lipschitz recurrent units, this paper constructs a Fractional-order Lipschitz Recurrent Neural Network.And this can accurately capture the temporal relationships in long-term sequential data and alleviate the gradient issue o recurrent neural networks.
Figure 2 shows the structure of the Fractional-order Lipschitz Recurrent Neural Net work in this paper, where  represents the input data at different time points.h repre sents the initial hidden state of the Lipschitz recurrent unit, and h represents the time derivative of the hidden state at different time points, which is calculated through the Lipschitz recurrent unit.After the model calculates the time derivatives at all time points the final hidden state ℎ is computed through the fractional-order GL integration Through fractional-order integration, the dynamic behavior of complex systems can be described more accurately, which can make the prediction results more precise.When recurrent neural networks process sequential data, they typically take the data at a single time point as the input for each iteration.However, adjacent time point data often contain similar information in temporal data analysis.To reduce the training time of the network, this paper proposes that adjacent data segments can be input into the recurrent unit.This approach not only accelerates the training process of the network but also maintains the prediction accuracy of the model.The structure of the piecewise recurrent units is illustrated in Figure 3.When recurrent neural networks process sequential data, they typically take the data at a single time point as the input for each iteration.However, adjacent time point data often contain similar information in temporal data analysis.To reduce the training time of the network, this paper proposes that adjacent data segments can be input into the recurrent unit.This approach not only accelerates the training process of the network but also maintains the prediction accuracy of the model.The structure of the piecewise recurrent units is illustrated in Figure 3.The adjacent time point data can be inserted as a segment into the recurrent unit.Then, the model can more effectively utilize the temporal characteristics of the data.This improves the training efficiency of the model.The piecewise recurrent units help reduce the computational burden of the network, and this can accelerate the convergence speed of the model.

Frequency Module
Here, the frequency domain transformation is used to process temporal information, where there are differences between the time domain and frequency domain information.Time domain information focuses more on the data changes over time, while frequency domain information represents the frequency components of the data.When temporal data are processed in the frequency domain, it is possible to identify various frequency components in the data, and different frequency components are processed, which is very effective in handling complex data.The frequency domain processing module is illustrated in Figure 4.
The Fourier transfer is used to convert time domain data into frequency domain data.A function is represented by sine and cosine waves of different frequencies.Through the Fourier transform, the frequencies of the original data can be analyzed.The Fourier transfer is shown in Equation ( 7).The adjacent time point data can be inserted as a segment into the recurrent unit.Then, the model can more effectively utilize the temporal characteristics of the data.This improves the training efficiency of the model.The piecewise recurrent units help reduce the computational burden of the network, and this can accelerate the convergence speed of the model.

Frequency Module
Here, the frequency domain transformation is used to process temporal information, where there are differences between the time domain and frequency domain information.Time domain information focuses more on the data changes over time, while frequency domain information represents the frequency components of the data.When temporal data are processed in the frequency domain, it is possible to identify various frequency components in the data, and different frequency components are processed, which is very effective in handling complex data.The frequency domain processing module is illustrated in Figure 4.As high-frequency data often contain noise that may not be beneficial for prediction, the low-frequency data are selected for the subsequent calculation.Here, frequency domain sampling is employed to avoid getting stuck in local optima.The different frequency components are randomly selected to enhance the oscillation ability of the network, and they can also accelerate convergence to a better solution.The Fourier transfer is used to convert time domain data into frequency domain data.A function is represented by sine and cosine waves of different frequencies.Through the Fourier transform, the frequencies of the original data can be analyzed.The Fourier transfer is shown in Equation (7).
As high-frequency data often contain noise that may not be beneficial for prediction, the low-frequency data are selected for the subsequent calculation.Here, frequency domain sampling is employed to avoid getting stuck in local optima.The different frequency components are randomly selected to enhance the oscillation ability of the network, and they can also accelerate convergence to a better solution.

Gated Attention Mechanism
This paper combines gated technology with the attention mechanism.For feature interaction and enhancement, the attention mechanism can capture the correlations between features.By calculating similarity and weighted scores, the attention mechanism is able to acquire attention to different parts.This process has excellent adaptability to the input data and can accurately capture relevant relationships.To accurately capture the correlations between features, the attention module is focused on the changing trends of the sequence.The gated structure is able to regulate the information flow of attention.This can make the model exclude the influence of noise, and it can result in a more accurate analysis of information with a higher correlation.
The calculation process of the gated attention mechanism is shown in Equation (8).
where O represents the output.U and V are matrices calculated from the input, and W is the parameter matrix.A is the attention weight matrix, and it is calculated by Equation (9).
where relu is the activation function, and b is the bias.Z is the same as U and V.
where X is the input, and ϕ is the activation function, where this paper uses the Silu activation function.Q and K are functions from Z.They are similar to a LayerNorm layer structure with one learnable parameter, and they perform scaling and translation operations on each dimension of Z.The structure of the gated attention mechanism is shown in Figure 5.By introducing the gated attention mechanism, the gated structure can be used to regulate the flow of attention information, making the attention computation more efficient.The gated attention mechanism can reduce the possibility of noise and encourage the model to focus on learning relevant information.Additionally, the gated attention By introducing the gated attention mechanism, the gated structure can be used to regulate the flow of attention information, making the attention computation more efficient.The gated attention mechanism can reduce the possibility of noise and encourage the model to focus on learning relevant information.Additionally, the gated attention mechanism unit utilizes attention information more efficiently through the gated structure, and it requires less resources.

Gated Attention Mechanism
In this paper, we propose a time series forecasting method based on FLRNN-FGA.First, multivariate time series data are input into the FLRNN module, initializing the RNN model, and integrating all hidden layers through fractional calculus to capture long-term dependencies.Then, the time domain data are transformed into the frequency domain via Fourier transform, selecting low-frequency components to reduce noise interference.Next, a gated attention mechanism is used to capture correlations between features, regulating the transmission of the attention information through the gated structure and finally calculating the model's output.
As shown in Algorithm 1, the specific steps are as follows: first, the data are input, and the RNN model is instantiated, using fractional calculus to integrate all hidden layers, thus outputting the results of the FLRNN module.Second, the time domain data are transformed into the frequency domain via Fourier transform, and low-frequency components are selected to reduce noise.Finally, the correlations between the features are calculated using the attention mechanism, and the final output is computed through the gated attention layer to obtain the prediction results.This method not only addresses the gradient issues in traditional RNN and LSTM models for long-sequence tasks but also significantly improves prediction accuracy and computational efficiency.Step 2 Integrating all hidden layers using fractional calculus, Equations (2)~(6) Step 3 Output of the FLRNN module Section 3.2: Frequency Module Step 4 Fourier transform converts the time domain into the frequency domain, Equation (7) Step 5 Select the low-frequency components to reduce noise.Section 3.3: Gated Attention mechanism Step 6 Capture the correlations between features using the attention mechanism.
Step 7 Calculate the final output through the gated attention layer using Equation ( 8).Output: The output of the prediction results

Experiments
To evaluate the performance of FLRNN-FGA, we conduct some experiments on five real-world time series benchmarks and compare them with the corresponding state-of-theart methods.

Datasets
The datasets cover multiple major application areas, including energy, economy, transportation, and weather.This provides a more comprehensive test for the performance validation of the model.The real-world multidomain datasets make the model's prediction results more credible.
Brief descriptions of the datasets: A sliding window method is adopted to extract data.To avoid the impact of missing data, the time series segments with missing data are excluded from the datasets.
Each dataset used in the experiments is divided into three parts: training set, test set, and validation set.For the ETT dataset, the partitioning ratio of the training set, test set, and validation set is 6:2:2, while for other datasets, the partitioning ratio is 7:1:2.

Baselines and Evaluation Metrics
To evaluate the effectiveness of the proposed model, this paper selects some current deep learning models in the field of time series forecasting as baselines, including: ➢ CN: Temporal Convolutional Network, a deep learning method that utilizes onedimensional dilated convolutions and causal convolutions to process sequential data.➢ LSTM: Long Short-Term Memory Network, a deep learning method that uses gated structures to retain long-term information to learn long-term dependencies.➢ LSTNet: A method that combines Convolutional Neural Networks and Long Short-Term Memory Networks.➢ Informer [37]: A Transformer variant that utilizes sparse self-attention.➢ Autoformer [43]: A Transformer variant that employs inter-series attention.➢ FEDformer [38]: A Transformer variant that uses the frequency domain to analyze time series data.
The selected baseline models are the ones that have shown good performances in the field of time series data prediction in recent years.Experimental comparisons with these baseline models can demonstrate the feasibility and effectiveness of the proposed model in this paper.
In this paper, mean squared error (MSE) and mean absolute error (MAE) are used as evaluation metrics for the model.These two metrics are the most important indicators to demonstrate the predictive accuracy of a model in time series forecasting.
The calculation of MSE is shown in Formula (11).
The calculation process of MAE is shown in Formula (12).

Experimental Settings
The choice of parameters can affect the experimental results of deep learning models.A batch of data is one input into the network.The size of this batch is called the batch size.A larger batch size can fully utilize the computational resources and parallelism.At the same time, too many input data can reduce the model's ability to escape from local optimal solutions.Here, a batch size is 32, and it can balance the model's computational efficiency and its ability to escape from local optimal solutions.
Fractional calculus often leads to increased model complexity and difficulty in network training.Here, the coefficients are pre-computed, and this will result in a relatively small computational burden.Additionally, it is difficult to determine the appropriate fractional order.The optimal order may vary depending on the specific dynamics and time dependence of the data.The most relevant fractional-order interval is (0, 2), and experiments are conducted with the orders [0.2, 0.5, 0.8, 1.0, 1.2, 1.5, 1.8] to find a more appropriate order.
The experiments use the electricity dataset and weather dataset for evaluation, and the fractional order is set to 1.8.
In this paper, the Lipschitz recurrent unit is utilized to reconstruct the weight matrix of the hidden layer in the recurrent neural network.The size of the hidden layer vector is set to 128.There are two parameters for controlling the dynamic changes in the system.They are chosen as 0.7 and 0.01.This enables the network to learn the dynamics appropriately while maintaining the stability of the system.Fractional-order integration is introduced to calculate the hidden state of the recurrent neural network.And the fractional-order integration order is set to 1.8, which is an appropriate value determined through experimental comparison.This aims to describe the dynamic behavior of the system more accurately, thereby improving the performance of the model.
The optimizer is ADAM, and the initial learning rate is set to 0.01.After some epochs of training, the learning rate will be adjusted to find the global optimal solution faster in the early stages of training, while avoiding excessive adjustments in the later stages, improving the convergence and stability of the model.
To prevent the model from overfitting the training set, this paper introduces an early stopping mechanism in the experiments.The training will be terminated if there is no improvement on the test set for five consecutive epochs.This mechanism helps stop the training in time after the model performance reaches its peak.To reduce the training time, all the experiments are limited to 20 epochs.The low-frequency part ratio is 0.5.The input length is 336.The hidden layer size of the gated attention module is set to 168.

Main Results
The results of the multivariate time series forecasting are presented in Table 1.It can be seen that the FLRNN-FGA model has a lower number of parameters and lower MSE and MAE compared to the others.This indicates that the frequency module and the gated attention mechanism can decrease the parameters and improve the model's accuracy.The FLRNN-FGA model can obtain the better results.

Visualization
To facilitate a clear comparison between the different models, we visualize the prediction results on the ETTm2 dataset with a 96-step output horizon.We compared three recent baseline models: FEDformer, Autoformer, and Informer.The visualization of the prediction results is shown in Figure 6.In contrast, among the various models, the prediction results of FLRNN-FGA are closer to the ground truth results and exhibit superior performance.

Computational Efficiency Analysis
In this section, we perform a comparative analysis of the computational efficiency of the models on the ETTm2 dataset.As shown in Table 4, we select models with good MSE for the comparison analysis, including FEDformer, Informer, and Autoformer.As we can observe, our proposed FLRNN-FGA model has the smallest number of parameters and the lowest memory usage, which are 1.60 M and 0.49 G, respectively.Moreover, the FLRNN-FGA model also has a shorter training time.Overall, our proposed efficiency analysis indicates that FLRNN-FGA achieves a good balance between computational resources and prediction accuracy.

Computational Efficiency Analysis
In this section, we perform a comparative analysis of the computational efficiency of the models on the ETTm2 dataset.As shown in Table 4, we select models with good MSE for the comparison analysis, including FEDformer, Informer, and Autoformer.As we can observe, our proposed FLRNN-FGA model has the smallest number of parameters and the lowest memory usage, which are 1.60 M and 0.49 G, respectively.Moreover, the FLRNN-FGA model also has a shorter training time.Overall, our proposed efficiency analysis indicates that FLRNN-FGA achieves a good balance between computational resources and prediction accuracy.Table 4. Computational efficiency analysis on the ETTm2 dataset (predict-96).The symbols "M", "G", and "S" are used to denote the units for "#Para", "Memory" and "Time", respectively, representing million bytes, gigabytes, and seconds, respectively.Smaller values indicate better performance.

Conclusions
Recurrent neural network (RNN) models establish temporal dependencies for time series data.Many studies have demonstrated the effectiveness of RNN models on time series data.However, gradient explosion and gradient vanishing are important issues when applying RNNs to time series tasks.And the prediction accuracy is not high.
To overcome these problems, the FLRNN-FGA is proposed in this paper.This method mainly involves the FLRNN, frequency module, and gated attention mechanism.In the FLRNN module, piecewise recurrent units are introduced to accelerate the processing speed of recurrent units for long sequences without compromising their performance.Fractional-order Lipschitz recurrent units can alleviate gradient issues and accurately capture system dynamics and dependencies, and they improve the prediction accuracy.In the frequency module, the temporal data are transformed into the frequency domain by the Fourier transform.Frequency domain processing can reduce the computational complexity of the model.In addition, the gated attention mechanism combines gated techniques with the attention mechanism.The gated structure can regulate the attention information transmission.This method can handle inter-variable relationships and reduce model parameter count.The extensive experimental results show the effectiveness and excellence of our proposed method, and the ablation experiments further confirm the rationality of the various components of the model.We hope this work can facilitate more future research on the fractional-order method of time series modeling.

FractalFigure 1 .
Figure 1.The framework overview of FLRNN-FGA: the FLRNN is to improve the predicti racy and mitigate gradient vanishing and exploding problems; the frequency module red computational complexity by Fourier transform; the GA combines gated techniques with tion mechanism to reduce model parameter count and improve efficiency.

Figure 1 .
Figure 1.The framework overview of FLRNN-FGA: the FLRNN is to improve the prediction accuracy and mitigate gradient vanishing and exploding problems; the frequency module reduces the computational complexity by Fourier transform; the GA combines gated techniques with an attention mechanism to reduce model parameter count and improve efficiency.

Fractal 15 Figure 5 .
Figure 5.The structure of gated attention mechanism.

Figure 5 .
Figure 5.The structure of gated attention mechanism.

Algorithm 1
FLRNN-FGA Input: Multivariate time series data Section 3.1: FLRNN module Step 1 Data input instantiation of the RNN model, Equation (1) (a) Electricity dataset: the hourly electricity consumption records of 321 users from 2012 to 2014; (b) ETT dataset: Transformer data from July 2016 to July 2018; (c) Weather dataset: 21 meteorological indicator records from every 10 min throughout 2020; (d) Exchange dataset: the daily exchange rate records of eight countries from 1990 to 2016; (e) Traffic dataset: the hourly road occupancy rates records measured by different sensors on highways in the San Francisco Bay Area.

Fractal
Fract.2024, 8, x FOR PEER REVIEW 13 of 15 recent baseline models: FEDformer, Autoformer, and Informer.The visualization of the prediction results is shown in Figure6.In contrast, among the various models, the prediction results of FLRNN-FGA are closer to the ground truth results and exhibit superior performance.

Author
Contributions: Conceptualization, C.Z. and Z.Z.; methodology, C.Z. and Z.Z.; software, C.Z., J.Y. and Z.Z.; investigation, C.Z.; resources, C.Z. and Z.Z.; data curation, C.Z. and J.Y.; writingoriginal draft preparation, C.Z.; writing-review and editing, C.Z. and Y.H.; visualization, C.Z. and Y.H.; supervision, C.Z. and Y.H.; project administration, C.Z.; funding acquisition, C.Z.All authors have read and agreed to the published version of the manuscript.Funding: This research was funded by the National Natural Science Foundation of China, grant number 61862062 and 61104035.

Table 1 .
Forecasting results in terms of MSE and MAE.The best results are highlighted in bold, and the second best results are underlined.The 1st Count means the first number.Smaller values indicate better performance.

Table 3 .
Ablation study on frequency module and gated structure.