A NARX-NN OPTIMIZATION ALGORITHM FOR FORECASTING INFLATION DURING A POTENTIAL RECESSION PERIOD USING LONGITUDINAL DATA

,


INTRODUCTION
Recently, the global economy has been threatened by the possibility of a recession. Speculation about this potential recession has been persistent throughout 2022, and it is now widely believed to be inevitable in 2023 [1]. A recession refers to a situation where a country's economic activity slows down or deteriorates, which can last for years if a country's Gross Domestic Product (GDP) growth decreases for two consecutive quarters. GDP represents a country's economic activity over a period of time. If a country experiences a continuous decline in economic activity over two consecutive periods, it is considered to be in recession [2].
Several factors can trigger a recession in a country, one of which is inflation. The World Bank's comprehensive study suggests that as central banks raise interest rates in response to inflation, the world may be heading towards a global recession in 2023, as well as financial crises in emerging markets and developing economies that could cause lasting harm [3]. Inflation refers to a general and persistent increase in the prices of goods and services over time, which weakens people's purchasing power and leads to a decrease in the production of goods and services [4]. In this study, inflation forecasting will be the main focus, and several other economic indicator variables will also be used to reduce the error rate of the inflation forecasting results.
The type of data that will be used in this research is longitudinal data, which is obtained from repeated measures of multiple individuals (cross-sectional units) over time (time units) [5]. The GLMM (Generalized Linear Mixed Model) method is commonly used for longitudinal data analysis due to the correlation between observations in the same unit [6,7]. In addition, many studies have been conducted on the machine learning method as an alternative method to GLMM [8,9]. However, this study will use the NARX NN method, which is a combination of Generalized Linear Mixed Model (GLMM) and Neural Network (NN) approaches, to handle the complex temporal and cross-sectional structure of panel data. Evaluation values such as RMSE and MAE will be used to measure the accuracy of the NARX NN method's prediction results. It is expected that the results of this study will help to clarify whether the threat of a recession in the next few years is a credible concern. 3 NARX-NN OPTIMIZATION ALGORITHM

Research Data
The dataset utilized is a longitudinal dataset obtained from the World Bank Dataset website [10].
It comprises a sample of 48 countries from 7 regions, and the following shows the allocation. The data consists of a 24-year period, starting from 1998 to 2021. The dataset consists of 1 main variable and 5 exogenous variables. Table 2 describes the variables used in this study.

NARX-NN
Artificial Neural Network (ANN), commonly referred to as Neural Network, is a type of information processing system that exhibits properties similar to biological neural networks [12].
Neural networks are effective in modeling nonlinear relationships. These systems are designed to repeatedly learn from patterns in data, enabling prediction without being influenced by the initial data pattern. One of the developments in Neural Network method is the Nonlinear Autoregressive Exogenous Neural Network (NARX NN), which is a nonlinear regression technique that uses Neural Network (NN) for time series data prediction with the inclusion of additional exogenous variables for more accurate forecasting [13].
The NARX network utilizes past values of the target time series and past values of other inputs to make predictions about future target series values. These networks can be classified as seriesparallel or parallel architecture. In this study, we will use a series-parallel. The series-parallel architecture of the NARX network is depicted in Figure 1  represent past exogenous values, while the output values (y(t)) represent past values of the actual series to be predicted. The predicted values (ŷ(t)) are indicated by ŷ(t). ). In the series-parallel model, the future value of the time series ŷ(t) is derived from the current and/or past values of u(t) and the actual past values of the time series y(t) [14]. If the past values of the actual series are not recorded, they will not be available to the system and the network will use its past predicted values instead. However, in this study, we will use actual past values, which are more reliable than predictions [15].
The NARX model sets the current value of the dependent variable (Y) in relation to the past value or the current value of the independent variable (X) and the past value of the dependent variable (Y). The NARX model can be expressed as: Here, y is the response variable and main observed variable, while x is the predictor variable and exogenous variable that explains the response variable. The past value of the response variable, along with information from the predictor variable, is used to predict the current value of the response variable. The model uncertainty of measurement noise is represented by a random variable . The f function can be a nonlinear function, such as a neural network function or wavelet, among others [13].
In a series-parallel model, multiple parallel networks are connected in series, allowing the model to learn both temporal and cross-sectional dependencies in the data. For example, in a time series forecasting task, a series-parallel model could use a parallel network to learn the cross-sectional dependencies between different variables, and a series network to learn the temporal dependencies in the time series data.

Performance Evaluation Model
To evaluate the suitability of the models to the actual data, error measures are required. In this study, the model's error is measured using Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). The formulas for calculating the error of the model are shown in Table 3.

Metrics
Equation

1) Data partitioning
In this study, the dataset is split into 15% for testing data and 85% for training data. The training data covers the years from 1998 to 2017, while the testing data covers the years from 2018 to 2021. The model formation process will be performed on the training dataset to optimize it for panel data analysis. The formed model will then be tested on the testing dataset to assess its effectiveness. 8 RESTU ARISANTI, YAHMA NURHASANAH, SRI WINARNI

2) Normalization
The variable data is normalized using a standard scaler with the scale() function in R or using the following formula: (

4) Model Evaluation
The evaluation process in this study involves assessing the RMSE and MAE values, as well as visualizing the predicted training and test sets compared to the actual data.

5) Forecasting exogenous variables
The forecasting of the exogenous variables is performed using the FFNN method, which best fits the characteristics of the variables. A denormalization is required to obtain the true forecast value from the forecast result in standardization form, using the following formula:

Descriptive Statistics
The initial step involved conducting a descriptive statistical analysis of the main and exogenous variables to gain insights into the data. The findings of this analysis are presented in the following table.

Linearity Test
The linearity of the model between the main variable (Y) and the exogenous variables (X1, X2, X3, X4, and X5) was tested. If the model of the data does not fulfill the linearity assumption, the NARX NN method is an appropriate time series data analysis technique that is used to overcome the nonlinearity relationship [16]. The linearity was tested using the Ramsey RESET test statistic as follows:  • Test criteria Reject the null hypothesis ( 0 ) if the calculated p-value is less than the significance level (α), otherwise accept it.

• Conclusion
The null hypothesis is rejected because the calculated p-value of 2.2e-16 is less than the significance level alpha of 0.05. It means that the model is nonlinear or there is a misspecification in the model.

NARX-NN Model
The dataset consists of 24 years of data, which were divided into two parts, training and testing.
The training data covers the years 1998 to 2017, while the testing data covers the years 2018 to 2021. After splitting the data, the network architecture of the NARX-NN was formed to predict inflation as the response variable. In this study, the inputs for inflation prediction included past values of inflation, current real interest rates, GDP per capita growth, final consumption expenditure, population growth, exports of goods and services. Two lags of inflation were used as the data was annual, meaning that was influenced by and . The country and region names were also used as inputs as this study uses longitudinal data, resulting in a total of 9 input neurons. One hidden layer was used with the number of hidden neurons determined by trial and error, selecting the number of neurons that produced the lowest error value. One output neuron was used as there was only one output, the inflation rate.

FFNN Model for Exogenous Variables (X1, X2, X3, X4, X5)
The Feed Forward Neural Network will be employed for predicting the exogenous variables. The input for this prediction will be the past values of the exogenous variables. Since the data interval is annual, the previous values of the exogenous variables will be taken into account using 2 lags (i.e. is influenced by and). Since this study utilizes longitudinal data, the variables for the regional and country names will also be incorporated. Hence, the number of input neurons used is 7. A single hidden layer is utilized, and the number of hidden neurons is determined through trial and error, with the best number being the one that results in the lowest error value. The number of (b) 14 RESTU ARISANTI, YAHMA NURHASANAH, SRI WINARNI output neurons used is 5, as there are five exogenous variables to be predicted.    Table 9, the neuron used for the FFNN model will be 15 neurons because it has the smallest RMSE and MAE value for the test dataset. By using 100 epochs, Figure 1 shows a graph of the training process for FFNN (7-15-5). As we can see in the graph below, the larger the epoch, the lower the MAE and MSE values.

Exogenous Variables Forecasting
After obtaining the best model for the exogenous variables using the FFNN (7-15-5) method, the exogenous variables will be forecasted for the next 5 years. The longitudinal data used in this study requires that the forecast values for each year be averaged. In Table 10, the variables X1, X2, X3, X4, and X5 are respectively designated as the real interest rate, GDP per capita growth, final consumption expenditure, population growth, and exports of goods and services. With the 5-year forecasting results for these variables, we can then use them to make predictions for the main variable of inflation.

Inflation Prediction
After determining the optimal NARX NN model, which is NARX NN (9-5-1), and obtaining the forecasted values for the exogenous variables, the main variable of interest, inflation, will be forecasted for the next five years. As this study utilizes longitudinal data, the forecasting values for each year will be calculated as the average of the obtained results. Here is the visualization of the prediction value for inflation.

Figure 6. Inflation Prediction Value for the Next 5 Years
According to Figure 6, there will be a decrease in inflation average in 2021, followed by an increase to 17,814 in 2023. However, a slight decrease is expected in 2024, followed by another increase in 2025, although not as high as 2023. Finally, there will be another decrease in 2026. This suggests that the high inflation average in 2023 may cause a recession for many countries.

CONCLUSION
Analyzing longitudinal data is challenging as it requires consideration of multiple factors. The NARX NN approach, in this scenario, enables the analysis of the longitudinal data by incorporating regional and country variables. This integration captures the random effect from both and enhances the learning of the model with the combination of both temporal and cross-sectional dependencies present in the data. The resulting model is better suited for forecasting future trends and patterns in the data.
In order to predict the inflation rate (Y) utilizing the NARX NN approach, it is crucial to first predict the exogenous variables (X1, X2, X3, X4, and X5). By utilizing the Feed Forward Neural Network (FFNN), the best performing model was FFNN (7-15-5). This optimal FFNN model was applied to all exogenous variable data, resulting in RMSE values of (X1 = 5.158, X2 = 7.377, X3 In the effort to forecast the inflation rate and its correlation with the exogenous variables, the NARX NN series parallel model was utilized and the best model was identified as NARX NN (9-