Chili Price Prediction One Year Ahead Using the Gated Recurrent Unit Method

— Chili is an important commodity in the Indonesian economy, and it experiences significant price fluctuations. This is due to several factors, such as demand and supply in the market and climate and temperature factors. The prediction of chili prices is important to minimize the risk of loss to the community and the government. Previous studies have used ARIMA, SVR, MLP, and RNN methods to predict chili prices. However, these methods are still considered less effective, especially in processing large amounts of information. In this study, the Gated Recurrent Unit (GRU) method is used as an alternative, and this method is designed to overcome obstacles to remembering long information. The results show that GRU provides excellent prediction performance, with a MAPE value below 10% for all types of chili peppers. The test results that have been carried out show that the size of the window data affects the accuracy of predicting chili prices. In addition, the combination of hyperparameters, namely hidden neurons and epochs, also influences the accuracy of predicting chili prices. The lowest MAPE value is obtained in predicting large red chili pepper prices of 4.850% with 512 hidden neurons and 100 epochs. Then, a MAPE of 6.434% was obtained using 512 hidden neurons and 100 epochs to predict curly red chili pepper prices. In predicting the price of green bird's eye pepper, a MAPE of 7.288% is obtained with a combination of 512 hidden neurons and 150 epochs. Finally, the lowest MAPE value is obtained in predicting the price of red bird's eye pepper at 6.452% with a combination of 512 hidden neurons and 150 epochs.


I. INTRODUCTION
Chili is an important commodity for the Indonesian economy, and it is usually used as a kitchen spice and food flavouring [1].Chili has many benefits from capsicum, vitamin A, vitamin C, and antioxidants [2].However, the price of various types of chili in Indonesia tends to be unstable in various regions due to several factors, including demand and supply in the market and climate and temperature factors [1][3][4] [5].This unstable chili price will certainly harm the country or society, and predictions are needed to estimate the price of chili in the future.
Some research has been done before, such as research [6], which predicts the price of red bird's eye pepper using the ARIMA method, resulting in a MAPE value of 20.73%.However, this method has the disadvantage of decreasing accuracy when applied to non-linear data [7].This is evidenced in the error produced in the study, which is still quite high.Previously, in 2020, red chili prices were predicted using the Support Vector Regression (SVR) method [8], resulting in a MAPE value of 9.11%.Furthermore, research by Suradiradja (2021), namely the prediction of the price of large red chili pepper carried out using the Multilayer Perceptron (MLP) and Recurrent Neural Network (RNN) methods, resulted in predictions with MAPE values of 3.79% and 4.82% respectively [9].Although both methods produce good predictions, they still have shortcomings.The disadvantage of MLP is its memory storage function, which causes limitations in remembering large amounts of data [10].
Meanwhile, in the RNN method, information can be processed in steps because the architecture is in a sequence or continuous and deep enough [10].Thus, this becomes a problem when RNN has to learn long information.RNN will tend to forget the previous information [10].
From the shortcomings of several methods in previous studies, the Gated Recurrent Unit (GRU) method can solve the problems mentioned earlier.The GRU method is one of the methods of artificial neural networks known to overcome nonlinear data [11].In addition, the GRU method can also overcome the problems of the SVR, MLP, and RNN methods due to its architecture, which is designed to learn long information.The architecture of the GRU method is composed of two gates in each sequence that regulate whether the information from the previous step should be stored or remembered [12].With this advantage, the performance of GRU is better than that of its pioneer method, RNN.Therefore, in this study, the GRU method was chosen to determine its performance by looking at the level of accuracy in predicting chili prices.
In several studies with similar themes using the GRU method, including research conducted by Sholeh & Hidayat in 2022, the results obtained in the GRU method obtained MAE validation of ± 0.045 and MSE validation of ± 0.0025 [13].In the meantime, the MSE validation was ± 0.002, and the MAE validation was ± 0.04 for the LSTM approach.Subsequently, the researcher's research [14] produced prediction results with a 4.84% MAPE error score.Research by [15] found that the data accuracy from BTPN Bank, Bank Mandiri, BRI Bank, and Bank BNI was 99.28%, 83.7%, 95.3%, and 93.1%, respectively.According to research [16], a window size of 2 produced a MAPE error value of 3.99% for the bitcoin price.Meanwhile, the price of Ethereum yielded a MAPE error value of 11.19%.
Research conducted by [17] obtained several results, including, for the GRU method, the RMSE error value was 0.034, MSE was 0.001, and MAE was 0.024.Then, the LSTM method obtained an RMSE error value of 0.048, MSE 0.002, and MAE 0.038.The Linear Regression method obtained an RMSE error value of 4.621, MSE of 2.136, and MAE of 2.890 [17].
The difference between this research and previous research is in methods and objects.There has been no research using chili objects or the GRU method, so the GRU method and chili objects were chosen for this study.

II. RESEARCH METHODOLOGY
In this study, the method used is a quantitative method with secondary data in the form of chili price data.At the same time, the type of research used in this research is implementation research, which models the prediction of chili prices using the Gated Recurrent Unit (GRU) method.A description of the research steps can be seen in the following flowchart.

A. Literature Studies
Literature studies are conducted to find in-depth information related to the research to be carried out.The literature study conducted in this research includes searching for information in the form of weaknesses and shortcomings of previous methods obtained from sources such as journals, proceedings, theses, and books.The information obtained is then used as a consideration when conducting this research.
In this research, literature studies were conducted by looking for several studies with similar themes and objects but using different methods.Among them is research [17], which results in predictions in the Morning Mark, which obtained MAPE and RMSE error results of 20.73%, and predictions with price data in the Development Market, which obtained an error value of 20.83%.The prediction of red chili was written by researchers [7] using the SVR method, which produces the smallest MAPE error value of 9.11%.Furthermore, the prediction of chili prices in Tangerang using the MLP and RNN methods resulted in a MAPE error value of 3.79% for MLP and 4.82% for the RNN method [8].Then, several studies with the same method but different objects, including research conducted by [13], the results obtained in the GRU method obtained MAE validation of ± 0.045 and MSE validation of ± 0.0025.Meanwhile, the LSTM method obtained MAE validation of ± 0.04 and MSE validation of ± 0.002.Then, research [14] obtained prediction results with a MAPE error value of 4.84%.Research [15] obtained an accuracy of 99.28% for BTPN Bank data, 83.7% for Bank Mandiri data, 95.3% for BRI Bank data, and 93.1% for Bank BNI.Research [16] concluded using a window size of 2, a MAPE error value of 3.99% was obtained for the bitcoin price.Meanwhile, a MAPE error value of 11.19% was obtained for the Ethereum price.Research [17] obtained several results, including, for the GRU method, the RMSE error value was 0.034, MSE was 0.001, and MAE was 0.024.Then, the LSTM method obtained an RMSE error value of 0.048, MSE 0.002, and MAE 0.038.The Linear Regression method obtained an RMSE error value of 4.621, MSE of 2.136, and MAE of 2.890 [17].

B. Data Collection
The data used in this study are price data for large red chili peppers, curly red chili peppers, green bird's eye peppers, and red bird's eye peppers downloaded from the National Strategic Food Price Information Center website.The data used is price data in traditional Yogyakarta Province markets from October 24, 2018, to April 26, 2024.

C. Preprocessing Data
The preprocessing stage begins by reading the price data of all chili peppers.Then, data cleansing will be carried out before changing the data format using the sliding window technique.After the data is changed, it will be divided into training, test, and validation data.The stage is normalizing the data that has been divided previously.
1) Data Cleansing: Data cleansing is a process where data is cleaned, such as removing incomplete data that is not needed so that the data can be used in the next stage [18].The first step is to remove unnecessary columns and replace values containing the character "-" and empty values with data from the previous day.Then, we also change the shape of the data by transposing it, cleaning it by removing the "," character, and changing the data type from string to integer.
2) Data Windowing: The next step is to perform data windowing or divide the data into two parts: input and output.This compares the predicted training results with actual data as a label or output to calculate the prediction error.An illustration of windowing data can be seen in Figure 2. 4) Data Normalization: Data normalization is a process to equalize values so that they are in the same range [19].In this study, data normalization was carried out using the normalization method standard scaler or Zscore normalization.
Zscore normalization is obtained from the mean and standard deviation of the data [20].This method can betable against outliers and values greater than maxA or smaller than minA [20].

D. Model Gated Recurrent Unit (GRU)
The Gated Recurrent Unit (GRU) method is a modification of the previous Recurrent Neural Network (RNN) method [21].The GRU method was created in 2014 by Kyunghun Cho to make each recurrent unit able to win dependencies on different time scales adaptively [22].This method can overcome the problems that exist in the previous method, vanishing gradient, by implementing a gate system in its architecture [12] [23].The architecture of the GRU method can be seen in Figure 2 [23].It can be seen above that the architecture of the gated recurrent unit method consists of a reset gate, an update gate, and a hidden state candidate.The process of this method starts from the calculation of the first gate, the update gate, which is denoted as   using Equation (1).
In Equation ( 1), each input   and the output of the previous unit  −1 will be multiplied by its weight  ℎ and then will be calculated using the sigmoid function.This update gate serves to store the information needed for the next process.The next process is the reset gate, as seen in Equation (2).
In the reset gate, the calculation is slightly similar to the update gate.Still, the function of this gate will determine how much information from the previous timestep must be forgotten.The calculation continues by calculating the hidden state candidate in the following Equation (3).
The calculation of the hidden state candidate denoted the ℎ  ̃ variable determines which information is relevant, and the calculation will be done the same as with the previous gates, but the activation function used is the tanh function.The last is calculating the new hidden state, as seen in Equation (4).
This process determines what is contained in the current memory.The results of this process will later be reused as input for the process in the next time step.

E. Loss Function Evaluation
The next stage is to evaluate the model by calculating the loss function using Mean Absolute Percentage Error (MAPE).The loss function calculates the error value on the dataset to measure how well the model is predictive.Before measuring the error value, the data will be denormalized first to get the actual value.

F. Mean Absolute Percentage Error (MAPE)
Mean Absolute Percentage Error (MAPE) is one method that can measure the percentage of forecasting error [24].Mean Absolute Percentage Error is often used because it is easy to understand [25].MAPE measures the difference between actual data and forecasting data.The difference from the calculation is then solubilized and calculated into a percentage [26].The MAPE calculation using Equation ( 5).MAPE itself is divided into four categories, and the smaller the error value, the better the prediction accuracy [16].The categories of mean absolute percentage error values can be seen in Table I.III.RESULT AND DISCUSSION Model testing in this study was carried out to know the accuracy or error of the predictions generated using the Gated Recurrent Unit (GRU) model.Before testing the hyperparameters of two combinations, hidden neurons and epochs, window data size adjustments were made by testing the most appropriate window data size for this study.This test will be carried out on the type of large red chili, and the results can be seen in Table II.After experimenting with different windowing data sizes 8 times, the largest data window size, 360, produces the largest error.In comparison, the smallest error value is a model with a data window size of 45.This aligns with research conducted by Wahyuni, who tested windows in sizes 12 and 18 [27].The results obtained in this study show that the best result is window size by using the smallest size value of 6.Then, in the research conducted by Yunizar, the best value for window size is 2, with testing window sizes of 2, 4, 6, 8, 20, 12, 14, 16, 18, and 20 [16].However, research [28] found that the window size value that got the best results was 28 among other window sizes 7, 14, 21, and 28.This study is almost the same, with the smallest error value produced at the smallest window size.However, other studies also found that there will be a point where the window size will produce the smallest error results, which does not necessarily mean the smallest window size is the best.The conclusion is that each model will produce different value results according to the data used.
Later, for this research, although the error value on the smallest data window size is the best, this is not appropriate when applied to this research.This is because if the data window size is applied to future predictions with a long-term for the next year, the results are less effective.After all, the prediction will be constant at that value at a certain point due to the lack of variation in the data in Figure 4. Therefore, in this study, a data window size of 170 was chosen where the resulting error is still quite small and can also provide sufficient data variation to the model.
After determining the data window size, this size will be applied to all types of chilis.Next, two combinations of hyperparameters, hidden neurons, and the number of epochs will be tested.The following are the test results for large red chili pepper commodities.A comparison of actual and prediction data for each type of chili will be shown with the best hyperparameter combination.First, a comparison of actual and predicted data from the price of large red chili peppers will be shown, getting a low MAPE error value of 4.850% with a hyperparameter combination of 512 hidden neurons and an epoch of 100.Furthermore, the price prediction results of all types of chilis will be displayed for the next year, from April 27, 2024, to April 26, 2025.This one-year prediction can be seen in Table IX.IV.CONCLUSION Based on research that has been conducted using the Gated Recurrent Unit (GRU) method, it is concluded that this method produces excellent prediction performance based on the MAPE scale on the prices of all types of chilies, large red chili pepper, curly red chili pepper, green bird's eye pepper, and red bird's eye pepper with a MAPE value below 10%.The best combination for predicting the price of a large red chili pepper is 512 hidden neurons and 100 epochs, which produces the lowest MAPE value of 4.850%.In the curly red chili pepper prediction, the lowest error value is 6.434%, with 512 hidden neurons using 100 epochs.In predicting green bird's eye pepper prices, the smallest MAPE value is 7.288%, with 512 hidden neurons trained with 150 epochs.Finally, in predicting the price of red bird's eye pepper, the smallest error value is 6.452% with 512 hidden neurons and 150 epochs.Tests conducted in this study show that the size of the data window and the hyperparameters used affect the accuracy of predicting chili prices.

Figure 2 .
Figure 2. Illustration of Data Windowing Using Sliding Window 3) Split Data: After the data is converted into a window format, the next step is to divide the data into training data, validation data, and testing data.The data division used is 75% for training data and 25% for testing data.Then, for validation data, 25% of the overall training data is taken.

Figure 4 .
Figure 4. Prediction Graph One Year ahead with Window Size 45

Table
III, the smallest MAPE error value is 4.850%, which results from a combination of 512 hidden neurons and 100 epochs.The model that gets the biggest error results is a model with a combination of 32 hidden neurons and 50 epochs.Then, TableIVshows the test results with the smallest error value of all types of chilies.
369% DOI : https://doi.org/10.25139/inform.v10i1.8269 Table VII shows a data comparison of the green bird's eye pepper price.This chili produces the smallest error of 7.288% with 512 hidden neurons and 150 epochs.
comparison of red bird's eye pepper price data can be seen in TableVIII.In this chili, the MAPE error is 6.452% with 512 hidden neurons and an epoch of 150. A

TABLE VIII COMPARISON
OF ACTUAL AND PREDICTED FOR RED BIRD'S EYE PEPPER