Intelligent Hybrid ARIMA-NARNET Time Series Model to Forecast Coconut Price

The global demand for coconut and coconut-based products has increased rapidly over the past decades. Coconut price continues to fluctuate; thus, it is not easy to make predictions. Good price modelling is important to accurately predict the future coconut price. Several studies have been conducted to predict the price of coconut using various models. One of the most important and widely used models in time series forecasting is the autoregressive integrated moving average (ARIMA). However, price fluctuations is considered a problem with uncertain behaviour. The existing ARIMA time series model is unsuitable for solving this problem, because of the nonlinear series. Artificial neural networks (ANN) have been an effective method in solving nonlinear data pattern problems in the last two decades. The non-linear autoregressive neural network (NARNET) gives good forecast, most especially when series are non-linear. Therefore ARIMA- NARNET is considered a universal approach to forecasting the coconut price. The aim of the study is to establish a linear and nonlinear model in time series to forecast coconut prices. The ability of a hybrid approach that combines ARIMA and NARNET(ANN) models is investigated. Based on the experimental study, the experimental results show that the proposed method ARIMA- NARNET, is better at forecasting the price of coconut, an agriculture commodity, than both the ARIMA model and NARNET models. The expected benefit of the proposed forecasting model is it can help farmers, exporters, and the government to maximize profits in the future.


I. INTRODUCTION
Indonesia produced 17.13 million tons of coconut in 2019. Based on the World Atlas report, coconut production in Indonesia is the highest in the world. Referring to the data from the Indonesian Central Statistics Agency (BPS), coconut exports from Indonesia reached 1.53 million tons or US$ 819.26 million as of the third quarter of 2020. The countries which are the destinations for Indonesia's coconut exports includes fthe United States, Netherlands, The associate editor coordinating the review of this manuscript and approving it for publication was Xianzhi Wang . South Korea, China, Japan, Singapore, Philippines and Malaysia [1]. Therefore, the coconut price forecast has a fundamental importance in the trading strategy of Indonesia. A good forecasting model is critically important to predicting the future price of coconut accurately, thus proper planning could be made by the farmers, exporters, and the government to maximize future profit. The forecast of the coconut price in time series is considered one of the most challenging because of fluctuation issues of coconut price. Fluctuations in agricultural prices affect the supply and demand of commodities and have a significant impact on consumers and farmers [2]. Fluctuations in coconut price lead to uncertainty of income for the farmer, making it difficult for the government to put in place policies and stabilize supply and demand.
The Autoregressive integrated moving average (ARIMA) model has been one of the vital and widely used methods in time series forecasting [3], [4], [5]. The popularity of the ARIMA model is due to its statistical properties as well as the use of the well-known Box-Jenkins methodology in the model-building process [6]. This model assumes the time series under study is generated from a linear process. Several methods have been used to model and predict coconut prices including the ARIMA model [7], [8], [9]. Results showed the potential of the ARIMA model accurately predict coconut price data. However, ARIMA time series models are generated from linear processes and therefore may be unsuitable for most practical problems that are nonlinear. Prices of industrial agriculture are largely influenced by eventualities, and seasonality, consequently prices are nonlinear and difficult to predict [10]. The fluctuation of coconut prices is considered an uncertain behaviors and changes over time. Some factors that influence the fluctuations of coconut price are company pricing, falling market demand for coconut, and declining quality, and quantity of coconut products. Therefore, it is a challenge to propose an appropriate approach to forecast coconut prices.
Recently, time series data can be modelled using artificial neural networks (ANN). The main advantage of a neural network is its flexible functional form and universal functional approximator. ANN is effective in solving nonlinear data pattern problems. Many non-linear problems are relevant today, including forecasting stock markets with uncertain behaviour and changing over time. There are several studies where neural networks are used to address agricultural commodity price forecasting [11], [12], [13], [14]. The results conclude that the ANN is a better model for forecasting agriculture commodity prices than the ARIMA model [15]. However, the use of a single ANN model could not be complementary in capturing patterns to obtain an optimal prediction. The literature review demonstrates that the ANN model is suitable for nonlinear time series data and the ARIMA model is suitable for linear time series data. In this paper, a hybrid model of coconut price prediction is proposed. The motivation behind this hybrid model is the fact that coconut price fluctuation is complex. The hybrid methodology combines both ARIMA and ANN models to take advantage of ARIMA and ANN models in linear and nonlinear modelling. The ability of a hybrid approach combining ARIMA and ANN models for coconut price forecasting is investigated. The use of hybrid models could be complementary in capturing patterns of coconut price data and could improve forecasting accuracy. Quite different from the ARIMA model and ANN model, this proposed hybrid model combines the ARIMA, and ANN to more accurate predictions for coconut prices. The coconut price prediction model proposed in this study will help farmers, exporters, and the government to maximize profits in the future. This paper is organized as follows: Section II discusses the methods used in the research work. Section III presents the results of ARIMA-NARNET modelling process for coconut price prediction. NARNET is a version of ANN. Section IV summarizes the present study and draws some conclusions.

II. RESEARCH METHODS
The research methods section involves acquiring the data, process' used to develop the models, forecasting process and lastly the evaluation of the forecasts.

A. DATA
The data for the study is the monthly price of coconut obtained from the Department of Industry and Trade, Indragiri Hilir Regency, Indonesia. The data sample is the average coconut price per month starting from January 2014 to February 2022. Therefore, there are 115 coconut price data points in rupiah currency which have been collected for 8 years and 3 months. The historical data of the coconut prices per month fluctuated over time. This is shown in Figure 1. The data can be found online at: https://bit.ly/ 3M7TvcM B. ARIMA MODEL ARIMA(p, d, q) model is stochastic in nature and has been used in diverse fields for prediction studies [16], [17], [18]. ARIMA was first put into use in time series for modelling and forecasting by Box Jenkins in 1970 [19]. The model is made up of three parts; autoregressive (AR), integrated (I ), and moving average (MA). ARIMA model is developed in three steps: (1) Model Identification, (2) Parameter Estimation and (3) Diagnostic Checking

1) MODEL IDENTIFICATION
Model identification involves finding the order of the ARIMA model using the sample autocorrelation function (SACF) and sample partial autocorrelation function (SPACF) charts [20]. SACF of a time series is the correlation of its past's values with its future values. Given that data points of the time series with first N − 1 observation is X t : t = 2, 3, . . . , N , where VOLUME 11, 2023 t = 1, 2, . . . ., N − 1, the relationship between X t and X t+1 is defined as equation (1) and equation (2); X 1 is the first N − 1 observation's mean. When N is substantially large, variations among sub-period means X 1 and X 2 are neglected and r 1 is calculated by equation (3): ( SACF is used to identify the moving average order of a stationary time series. SPACF is the correlation between lag values with other shorter lags of the group at various lags k, where k = 1, 2, 3 . . .. SPACF at varied lags k is defined by equation (4); SPACF is used to identify the autoregressive order of a stationary time series. SPACF of an AR(p) process at lag p + 1 and beyond is zero.

2) PARAMETER ESTIMATION
The Box-Jenkins model of order ARIMA(p, d, q) is given by equation (5).
The variable y t , the future value at time step t, is taken to be a linear function of several past observations y t−n , 1, 2, . . . , n < t and random errors, ε t as demonstrated by equation (5). p is the autoregressive order; q is the moving average order and d represents the differencing order of the coconut price time series. L is the Lag operator. φ and θ are the coefficients of regressions for the autoregressions and moving averages [21], [22].

3) DIAGNOSTIC CHECKING
The residual (white noise) of models is assessed using the correlogram (SACF and SPACF), Ljung-Box Q tests [23] and Durbin-Watson test [24] to test the sufficiency of the models.

C. ANN MODEL
The artificial neural network has the potential to represent complex, nonlinear relationships [25], [26], [27], [28], [29], [30], [31], [32], [33], [34]. The evolution of ANN has given rise to the multilayer perceptron (deep learning), which is effective at modelling and predicting complex, nonlinear relationships in time series. It is made up of an input layer, hidden layer; and an output layer. The hidden layer is a network of three layers connected by open-chain linkages as shown in Figure 2. w i,j and w j where i = 0, 1, 2, . . . , P j = 1, 2, . . . , Q and are the model parameters. Also referred to as connection weights, the model parameters have P as the number of input nodes and Q as the number of hidden nodes.
After the hidden layers, the sigmoid function, equation (6), among others is employed as an activation function to introduce nonlinearity to the output of the neural network. The nonlinearity allows the network to arbitrarily approximate complex functions as the perceptron is a linear combination of the weights and the input vector. The network is trained after the activation function is applied. Training is done through Optimization (backpropagation) of the activated perceptron. This allows some of the activated perceptron to drop out as the weight approaches zero (regularization). The perceptron is trained in minibatches to allow the central processing unit (CPU) or graphical processing unit (GPU) to process the network in a fast, accurate estimation of gradients, smooth convergence of gradients and also allow large learning rates [35]. The remaining perceptron acts as an input node again, and weights are added to form a new network. The backpropagation is done again. The training process continues until there is one perceptron node left. Equation (7) is the mathematical equation between inputs (y t−1 ,. . . , y t−p ) and output (y t ).
The ANN model, equation (7) maps the input data to the forecast values, y t . The connection weight, W is a vector containing all parameters [31]. Equation (7) implies one output node emerges as the step-ahead forecast. It shows that the network is robust and can model any function when the number of neurons of the hidden nodes (Q) are high enough [36]. An out-of-sample forecasting can be effectively done using a primary network layout with a modest number of hidden nodes (Q) [37]. The parameter Q is influenced by the input data and therefore there is no alternative process for determining it. The selection of the number of input vectors, P, and its dimensionality is critical to ANN modelling [37]. The autocorrelation, a nonlinear framework of the time series is defined by P. It is one of the most vital parameters to estimate in an ANN model. Known hypothesis have not been able to assist in P selection. Research is mostly done to identify appropriate Q and P. In the implementation phase, we select the NARNET, a shallow learning model in MATLAB to model the residual of the ARIMA model.

D. INTELLIGENT HYBRID ARIMA-NARNET MODEL
The hybrid modelling process has the linear (ARIMA), and nonlinear (NARNET) components of the model defined respectively asL t andĴ t [38]. The Intelligent hybrid model y t is estimated using the equation (9).
J t is the NARNET model trained from the residual of the ARIMA modelĴ t at time t.
The alternative hypothesis is; A test of significance level is conducted by rejecting the null hypothesis of EPA when; Rejection of the null hypothesis using S or Sc implies that one or more of the alternative models stands out in terms of predictive ability [39]. The hypothesis for the Diebold Marino test of EPA is implies that the observed differences between the performance of two forecasts are not significant, while the alternative hypothesis, implies that the observed differences between the performance of two forecasts are significant. The DM test has a normal distribution [40]. The assumption for the test is that the models are not nested. Alternative models are invariant to any permutation (reordering) [39], [40], [41], [42], [43], [44].

III. RESULTS AND ANALYSIS
In this section, The Intelligent ARIMA-NARNET model is developed and its forecasting power is assessed. A 12month forecast of coconut price is made using the ARIMA, NARNET and ARIMA-NARNET Hybrid models.

A. COCONUT PRICE ARIMA MODELING
The ARIMA modelling is presented in this section.

1) ARIMA MODEL IDENTIFICATION
The entire data obtained is used to train the ARIMA model. The data as seen in Figure 1 is not stationary. A condition necessary to train the ARIMA model is that the data is stationary. The ARIMA model is anticipated by identifying a stationary time series at the first difference, d = 1. This is shown in Figure 3.
The SACF and the SPACF are plotted from the stationary time series. The ARIMA p, q parameters were identified using SACF and SPACF plots which are shown in Figure 4 and Figure 5 respectively. Observing the SPACF, the autocorrelations spike at lag 1, and die off sharply for the other lags, hence the p is estimated to be 1. The identified tentative model for the coconut price data is an ARIMA(1, 1, 0) with equation (10).
2) ARIMA MODEL PARAMETER ESTIMATION The model parameters are estimated using the MATLAB Econometric Modeler [45]. Tentative models are assessed and compared, using the AICs and BICs; for instance, the ARIMA models with and without the constant terms were compared and the models trained under the Gaussian, and t VOLUME 11, 2023    distributions are compared. The results obtained by following the iterative procedure of ARIMA model estimation are given in Table 1 and Table 2.
The estimation is done with the Gaussian probability distribution and the constant term omitted to optimize the model. The parameters in Table 1 are substituted into the model, equation (11) which gives equation (12).
The MATLAB code for equation (12) is Appendix I. Appendix I is then applied in Appendix II to carry out the ARIMA forecast. Figure 6 and Figure 7 are the ARIMA Model Fit Plot and Residual Plot of the Average Coconut Price respectively.

3) DIAGNOSTIC CHECKING
The ARIMA Model Fit Plot and Residual Plot model is assessed using the Residual Sample Autocorrelation Plot as shown in Figure 8. There are spikes in the Residual SACF which indicates autocorrelation in the residual data, thus the ARIMA model is still not sufficient for the coconut price data hence the need to model the residual data. The Nonlinear Autoregressive Neural network (NARNET) is used to model the residual.

B. RESIDUAL NARNET MODELING
The NARNET modelling process involves three steps (1) setting the input parameters for the NARNET training, (2) training the Network and (3) deploying the Neural network. The NARENT modelling process is discussed below.

1) SETTING INPUT PARAMETERS FOR NARNET TRAINING
The training process is achieved using the Neural Net Time Series application which is part of the Machine learning and Deep learning Applications cluster in MATLAB [46]. The residual time series is used as the only input for the NARNET and requires a continuous feed of forecasted data to allow the network to continue working. The input of the neural network is the residual of the ARIMA model. The residual is retrieved using the code provided in Appendix III. The input data is 98 months data points; short by 1 month because of the first differencing at the ARIMA modelling stage. The delay time step is set at 2 months. 70% of the data is used for training the neural network, 15% is used to test the trained network and the other 15% is used to validate the Network. The neural network architecture is set as per Figure 9 for a single horizon forecast and Figure 10 for multiple horizons forecast.  There are 90 hidden layers and one output layer with an output node. The 90 hidden layers are optimal and were arrived at through continuous testing process.
The NARNET is first initialized using random weights at the start of the training process. A Levenberg-Marquardt Back Propagation (LMBP), an iterative algorithm is chosen to train the NARNET model. The LMBP algorithm locates the minimum of a function expressed as the sum of the squares of nonlinear functions through an iterative process. The training cycle 'epoch', is set automatically by converging at the minimum point of the function. The least MSE is used in the NARNET training to identify the best number of layers and associated neurons in each hidden layer [39].

2) TRAINING RESULT
After the NARNET architecture has been set, in the workflow, Appendix IV, or the training application window, the training command is executed by clicking the train button and waiting up until it is done. The training outputs of the neural network have several parameters which are necessary for the neural network to be trained optimally.
The Progress box in Figure 11, the Trained output, shows the error performance of the network which is initialized at 4.06 × 10+7 MSE and stopped at 2.53 × 10+4 MSE. The training performance window in Figure 12 shows that overfitting and underfitting are avoided by training the network such that training, testing, and validation performance graphs are parallel. The R, an indication of the linear rela-  tionship between the outputs and targets, which measures the goodness of fit, of the neural network model is above 71% for the training set. Figure 13 shows the model summary where the testing and validation sets are above 42%. Here there is a little compromise on the R for the validation and testing R. The training is repeated until R above 50% is achieved for the training and the testing and validation sets. The fitted model for the residual is presented in Figure 14. Figure 15 is the errors associated with the neural network model. The sufficiency of the neural network is assessed using the autocorrelation of errors. There is no autocorrelation in the errors (Erro1) as shown in Figure 16.
The spikes do not die sharply beyond the first lag for the non-zero correlations from the neural network errors. It is the same case in the autocorrelation correlation between Input1 and Error1 (Target1 -Output1) as shown in Figure 17.  There is no evidence of a correlation between errors (Erro1), and input (ARIMA residual). The inference is that the NARNET model is now sufficient to model the residual component of the coconut price.

3) DEPLOYING THE NARNET
The NARNET model is deployed as a function with the input Arguments stored in the trained network structure in the MATLAB workspace, Appendix V is the function. The advantage of deploying a trained network in such way is to avoid the network behaving as a stochastic model but as a deterministic function.

C. INTELLIGENT HYBRID ARIMA-NARNET MODELING
The hybrid model is deduced from equation (9). The ARIMA model can be expressed mathematically as equation (12), which is also in MATLAB code as Appendix II, but the NARNET model as shown in Appendix IV cannot easily be expressed in a single mathematical equation, instead, the hybrid model is expressed in the code form as presented in Appendix VI.

D. A 12-MONTH FORECAST USING THE MODELS
In Figure 18, the forecast of coconut price is plotted, for three different models, ARIMA, NARNET and Intelligent   Hybrid ARIMA-NARNET. Other features of the plot are: the observed price time series and the 95% confidence bound of the Intelligent Hybrid ARIMA-NARNET model. The are some missing prices from March 2023-August 2023 from the observed prices time series; however these do not have any impact on the forecast as the data used for the model span from January 2014-February 2023. The previous month January 2023 is used to estimate the missing month's prices.

E. FORECAST EVALUATION
The MDM test showed a test statistic of S or Sc at infinity for the first forecast horizon and NAN for the other forecast  horizons (2nd to 12th). The NAN signifies that the test was not successful, this may be due to the models having nested properties and this results in the singular matrixes in calculating the S and Sc Statistics. On the other hand, the infinity on the chi-square scale signifies that there is at least one of the models with superior predictive ability, concerning the other models in the first forecast horizon. The DM test is resorted to identifying the model with the superior predictive ability. Here the assumption was that the models are 4th-order polynomials as can be seen in Figure 1. Per the nature of the DM statistic in this particular test, it may produce equivalent statistics at both ends of the normal distribution curve as can be seen in Figure 19. The DM test statistics generally reduce as the forecast horizon increases. Rejection of the null hypothesis using DM statistic implies that one or more of the alternative models have superior predictive ability. The models are characterized in a 95% confidence interval bound which is equivalent to the test statistic 1.96. or below. It is expected that the alternative model has superior predictive ability, if the DM statistic >1.96. as shown in Figure 19. The DM test result shows, the hybrid ARIMA-NARNET and ARIMA forecast comparison for the first month/horizon has a superior predictive ability. The ARIMA and NARNET comparisons are not considered as their results are inconsistent for both loss functions. Comparatively hybrid ARIMA-NARNET is better than ARIMA from the forecast graph ( Figure 18). The hybrid ARIMA-NARNET blends some nonlinear features which are captured by the NARNET with the ARIMA.

IV. CONCLUSION
The results conclude that the Hybrid ARIMA-NARNET model is better for forecasting agriculture commodity prices than both the ARIMA and NARNET models. This is because a single ARIMA model cannot capture all patterns for an optimal forecast, it captures mostly the linear patterns. Per the analysis above, the NARNET model is ideal for nonlinear time series. In this paper, a hybrid model of coconut price prediction is proposed. The forecast evaluation indicates that the hybrid ARIMA-NARNET model is the best at forecasting coconut prices as it has the strongest predictive ability. Hybrid models can complementarily capture patterns of coconut price data and improve forecasting accuracy. The proposed hybrid forecasting model blends linear and nonlinear model features. The coconut price forecast model suggested in this study will help farmers, exporters, and the government to maximize profits in the future. ABDULLAH received the bachelor's and master's degrees in computer science from Gadjah Mada University, Yogyakarta, Indonesia, and the Ph.D. degree in computer science from Universiti Utara Malaysia (UUM), in 2015. He is currently pursuing the bachelor's degree with the Information System Program, Engineering and Computer Science Faculty, Universitas Islam Indragiri, Indonesia. He has been with Universitas Islam Indragiri, since December 2020. He has authored and coauthored research work in several national and international conferences, and some national and international journals. His research interests include data mining, optimization algorithm, multimedia database, and pattern recognition and classification.
RICHARD MANU NANA YAW SARPONG-STREETOR received the Bachelor of Science degree in mathematics from KNUST, Ghana, and the Master of Science degree in applied sciences (mathematics) from Universiti Teknologi PETRONAS, Malaysia, where he is currently pursuing the Ph.D. degree in applied sciences (mathematics) with the Department of Fundamental and Applied Sciences. He was also a Student Accountant with ACCA Global, Ghana. His competencies include mathematical and statistical modeling and finance management. He is familiar with related data science software, such as MATLAB and Python. He has coauthored several published international journals and conference papers.
RAJALINGAM SOKKALINGAM received the M.Sc. degree in industrial technology from UKM and the Ph.D. degree in mathematics from UMS.
He has total of 25 years working experience with 12 years in manufacturing industries in engineering field and another 13 years in education industries as an academician. He has taught from the range of first-year bachelor's degree in mathematics and statistics up to the master's level in risk, project and cost management. He possesses good experience in computer laboratory sessions and very familiar with some related mathematics and statistical software, such as Maple, SPSS, and expert design. Previously, he was with the Department of Electrical and Electronic Engineering, Curtin University, Malaysia, conducting teaching and research in mathematics and statistics. He is currently a Senior Lecturer with the Department of Fundamental and Applied Sciences, Universiti Teknologi PETRONAS. He has managed to present his research work in several national and international conferences, and published some results at national and international journals. ABDUS SAMAD AZAD received the B.Sc. degree in computer science engineering from International Islamic University Chittagong, and the Master of Science (M.Sc.) degree from the Department of Fundamental and Applied Sciences, Universiti Teknologi PETRONAS, Seri Iskandar, Perak, Malaysia, where he is currently pursuing the Ph.D. degree. He has published a book chapter, three conference papers, one technical article, and a review article in the metaheuristic algorithms and machine learning field. He is also working in the domain of machine learning and has expertise in the research area.
GUNAWAN SYAHRANTAU received the master's degree in agribusiness management from Universitas Islam Riau, Indonesia, in 2012. He is currently a Lecturer with the Agribusiness Study Program, Faculty of Agriculture, Universitas Islam Indragiri, Indonesia. He has been active with Indonesian Agricultural Economic Association Organization, Komda Riau, for the last five years. He has authored and coauthored in several scientific journals. His latest research is ''The Role of Farmer Readiness in the Sustainable Palm Oil Industry.'' YUSRIWARTI received the bachelor's degree in accounting science from Universitas Bung Hatta, Indonesia, and the master's degree in management science from Universitas Riau, Indonesia, in 2011. She is currently a Lecturer with the Accounting Study Program, Faculty of Economics and Business, Universitas Islam Indragiri, Indonesia. She has also been active with Koppas Kasuma Cooperative for 24 years. She has also conducted several researches in the field of accounting and has authored and coauthored in several scientific journals. Her latest research is ''Analysis of Factors Affecting Implementation of Entity Financial Accounting Standards Without Public Accountability (SAK ETAP) in Middle Small Micro Businesses (UMKM) District in Indragiri District Region.'' ZAINAL ARIFIN received the bachelor's degree from the Management Study Program, and the master's degree from the Economics Studies Program. He is currently pursuing the Ph.D. degree with the Economics Study Program, Universitas Jambi, Indonesia. He is also a Lecturer with the Faculty of Economics and Business, Universitas Islam Indragiri, Indonesia, where he is also the Dean of the Faculty of Economics and Business. Previously, he was the Secretary of the Islamic Economics Study Program and the Chairperson of the Research and Community Service Institute, Universitas Islam Indragiri.