Product Demand Forecasting in Ecommerce Based on Nonlinear Autoregressive Neural Network

With the rapid growth of the e­commerce business scale, to meet customers’ demand for efficient order processing, it is of great significance to establish an order management mechanism capable of responding quickly by accurately predicting product demand. This study used real e­commerce order demand data and established a nonlinear autoregressive neural network (NAR) model after pre­processing methods including down­sampling and data set partition to effectively forecast the demand of products in the next 13 weeks. Compared with the Prophet time series prediction framework, NAR had better generalization ability, and the prediction time was reduced by 18.54%. Finally, we summarized two methods’ characteristics and gave instructions on applying our model in the real scene. After being deployed in the actual demand management, the trained artificial neural network provides a scientific reference for the data­driven e­commerce decision­making process and brings new advantages over other companies, achieving the rational allocation of resources.


Introduction
With the arrival of ecommerce and big data, the powerful business flow has promoted the logistics business's F. Peijian Wu School of Business and Administration, Anhui University of Finance and Economics, Bengbu 233030, China Tel.: 18712462020 Email: 120100001@aufe.edu.cn S. Yulu Chen School of Business and Administration, Anhui University of Finance and Economics, Bengbu 233030, China vigorous development.More and more enterprises begin to make full use of operational history data and focus on predicting future ecommerce orders (Song et al., 2016).The implementation of future demand and order forecasting quickly respond to customer needs and cope with the changing market environment, and the resulting data is an essential basis for the future development planning of enterprises (Chen et al., 2019).E commerce goods have the characteristics of rich categories that bring challenges to enterprises' warehousing space management, and passive order management can not meet customers' high response needs.When enterprises accurately predict the order demand of the ecommerce warehouse, they will arrange the relevant resources beforehand to realize the proper planning of commodity location.Effective demand prediction is of great significance to the management of the ecommerce warehouse (Zhang et al., 2020).
Many scholars have made comparative studies and case studies on the methods of product demand prediction.Mezzogori and Zammori (2019) discussed applying deep learning architecture in the demand prediction of fashion products.The first layer of the framework predicts the total order of a specific customer, and the second layer of network structure foresees the demand of a given product based on the realtime sales data.This forecasting method had advantages over fashion companies' existing marketing strategies after a decade of sales analysis.Huber and Stuckenschmidt (2020) focused on the order demand of bread chain stores on calendric special dates, transformed the prediction problem into supervised machine learning, and evaluated methods such as artificial neural networks and gradient lifting tree.Finally, the conclusion was that the machine learning model had superior performance, and the classification based optimization prediction method was superior to the regression method.Lee et al. (2012) probed into ways to predict the demand of products newly introduced to the market, conducting consumer surveys to estimate product trends, combining with the Bayes' rule to carry out demand forecast.They made fair use of 23 quarterly data of South Korea's broadband Internet services market for empirical research.In the end, they put forward the model of the solution was better than the other benchmark model.Xu and Chan (2019) used big data and machine learning methods to forecast medical equipment demand, collected the required data from search engines and companies, and established a univariate equipment demand prediction method.They introduced big data into the prediction model and improved the accuracy of it.Tarallo et al. (2019) proposed a exploratory research of machine learning methods in the sales demand prediction of perishable products with short shelf life.The results showed that machine learning methods exceeded the accuracy level of traditional statistical techniques, and the demand prediction of fastmoving consumer goods could improve the inventory balance in the supply chain and increase the profits of enterprises.
As consumers purchase items on ecommerce platforms, they will leave a large number of comments.Since the comments contain their purchase intention, these reviews have a meaningful impact on product sales.Yuan et al. ( 2017) obtained many product reviews from social networks, mined consumers' emotions towards products, and used sentiment analysis and other quantitative features to predict product sales demand in the next period.The demand prediction combined with sentiment analysis was more accurate than that of quantitative features alone through a case study.Shih and Lin (2019) combined LSTM and consumer sentiment analysis to forecast shortterm sales and adjusted the emotional evaluation weights to improve prediction accuracy further.The proposed method with shortterm demand for commodities sales accuracy was satisfactory and realized prediction using the least amount of transaction data.Fan et al. (2017) made use of online reviews and historical sales data to forecast product demand.A naive Bayes algorithm was applied to extract sentiment index from reviews and integrated it into the Bass model's imitation coefficient to improve the prediction accuracy.Compared with the standard Bass model, the method combined with sentiment analysis had a better performance.
The models used in the existing research is relatively complex.Using machine learning methods to predict the time series data in this paper will involve a complex feature construction process.It is necessary to split the time series data into multiple time windows according to a fixed length of time and then construct each time window's features.Furthermore, the selection of different machine learning methods requires continuous attempts, and finally, we may find the prediction method suitable for data rules.The prediction methods combined with consumer sentiment analysis need to obtain text through web crawler techniques and carry out many data cleaning and natural language processing analyses.The procedure of data acquisition and preprocessing is relatively complicated.
Many adjustable parameters enable an artificial neural network to have the potentiality for improvement, and the neural network has a strong ability of fault tolerance and robustness on noise data.Therefore this article selected the NAR neural network to establish a time series forecasting model, and we searched the most suitable number of hidden layer neurons by experiment.Our data source for empirical research comes from Kaggle, consisting of reallife warehouse operating records.
After necessary preprocessing, we feed the data into the model to test the accuracy.Finally, we compared our model with Prophet, a time series prediction framework, on accuracy and time.

NAR neural network
The artificial neural network is a network that simulates the human brain's nervous system and realizes specific functions.The network system is built based on the connection structure between the neurons of the brain (Kumar et al., 2020).Plenty of neurons that mimic the synaptic connections between neurons in a biological system give artificial neural networks more advantages than general mathematical models.All neurons participate in the whole system's information processing, and the final output is obtained by the interaction and mutual feedback between neurons.This characteristic makes the neural network model robust (Wu et al., 2020).Concurrently, partial errors in the network will only reduce the network's adaptability and will not make the network appear significant errors.
The NAR model uses itself as the regression variable and describes the random variable at a particular time using the linear combination of several time variables during the observation period.The following formula expresses this model's basic structure, and e(n) represents the white noise in the data collection process.Meanwhile, according to this formula, it can be judged that the observed value y(n + 1) at a specific moment has correlations with the previous value y(n).
The NAR neural network model adopted in this paper can be expressed as the following formula, where y(t) indicates the considered value of y at time t , d represents the time delay and f signifies the conversion function.The NAR neural network learns potential patterns from the data input into the network and minimizes the difference between the final output and the actual record through a continuous iterative fitting process.The neural network with a feedback mechanism transmits the error variation during iteration until the optimal prediction accuracy is achieved.

Model construction
The neural network looks for rules through fitting and training from taskrelated data and continually updates the model to fit a reasonable result.The ultimate purpose is to make the model maintain a good prediction effect when deployed in the real environment.Partitioning the data set will effectively improve the model's generalization ability, and the model with higher generalization ability will have a lower error when predicting future data.This paper divided the data set into three parts: training set, verification set, and test set.Using a training set to train the model and then the validation set will minimize the overfitting phenomenon in the training process.The test set's error denotes the model's generalization error when dealing with the real scene's prediction.
Aiming to improve the accuracy of the prediction results and accelerate the convergence rate, and considering the interval limit of the neural network activation function on the output data results, it is important to normalize the data (Han and Wang, 2020).This paper made fair use of the maximum and minimum standardization processing method to normalize the data, and the formula involved is as follows.x is the data to be normalized, x min and x max are the minimum value and the maximum value in the data sequence, respectively.And x ′ indicates the result data sequence after normalization.Inverse normalization processing is also needed to make the final prediction data fit the real level. In The diagram below shows how the function completes the procedure.

Fig. 2 Transig function
Inside the network, x t is the network's data input, the neural network's hidden layer obtains the output N j of the neuron according to the data input, and the connection weights w tj , thresholds b j , and activation function f between the neurons in the layer.
When the data signal transmits to the output layer, the neural network will carry out the linear operation according to the hidden layer result N j , and the linear function of the layer will eventually complete the calculation and output the final result.In the formula below, w j signifies the connection weight between the j neuron in the hidden layer and the neuron in the output layer and p represents the neuron threshold in the output layer.
Since the prediction model's final predictive variable comprises one output variable, we set the number of nodes in the output layer as one.Generally speaking, a network containing many middle layers will not obtain an expected result.The middle layer's excessively complex setting will often magnify the noise information in the model's training process, eventually leading to overfitting and reducing the model's generalization ability in the real application.Meanwhile, if we set too few intermediate layers, underfitting will be caused, and the final prediction effect can not meet the accuracy requirements.
The training requirements can be already satisfied if the intermediate layer set as one, and appropriately increasing the intermediate layer nodes will improve the network accuracy (Bandyopadhyay and Chattopadhyay, 2007).This paper set one layer of the intermediate layer and further optimized the network structure by adjusting the neurons' number.When the network cannot meet the requirements, the number of hidden layers was adjusted, and we found the best number of neuron nodes in the hidden layer by selfexperiment.
We trained the neural network in the form of offline learning.In the process of training, the network processes samples in batches.After the neural network gets all the samples' data for training, the network weights and thresholds will update at once.The training algorithms in the neural network include the Levenberg-Marquardt(LM) algorithm, Bayesian normalization method and elastic gradient descent method.The training algorithms with their unique characteristics will bring different results to the model.Because the LM algorithm has fast training speed and small error characteristics(Azar, 2013), we chose the LM algorithm as the preferred.

Model evaluation index
In the context of predicting the time series data of goods, in order to objectively evaluate the prediction effect and performance of the model in each stage, this paper uses three evaluation indexes, namely error autocorrelation coefficient, mean square error and root mean square error, to monitor and evaluate the model.
Assuming that the calculated error series is stable, the error autocorrelation coefficient signifies the autocorrelation coefficient between the model's predicted value and the actual value at the specified confidence level.In statistics, it's definition is as follows.T represents the length of the time series, The definition of the root mean square error measurement index is specified as follows.Where ŷ represents the predicted value of the model and y represents the real value.After taking the sum of the errors and the mean value, it carries out the square root.The smaller the value is, the higher the accuracy of model prediction will be, and vice versa.
The mean square error (MSE) mainly evaluates the neural network verification set's prediction effect, and it is one of the parameter settings in the neural network loss function.In the process of iterative training, the network will continuously seek to minimize the loss value and stops iteration when it finds the optimal prediction effect.The mean square error is the result of taking the square of RM SE, and the formula is as follows.
3 Empirical study and result analysis

Data selection and preprocessing
The data we used came from Kaggle, a famous data science competition platform.The data set contains a historical product demand of a manufacturing company with footprints globally, 33 product categories, and 2172 SKUs distributed in 4 warehouses, consistent with the characteristics of ecommerce goods with rich categories and extensive sources.We preprocessed the data, sorted out the time series demand data of each product, and conducted an example discussion based on the timeseries data.
The data set contains uncounted product demand data due to manual statistical errors and not ontime records.We retain the original data with relatively complete records and delete the goods with many missing values.The goods with relatively complete records are selected, and one of the goods is taken as an example to model and predict the order's demand.The order modeling process of other goods was the same as the selected one.
Data resampling converts the original frequency of the data sequence to another rate.Downsampling processes data recording from high frequency to low frequency (Shekhawat and Meinsma, 2015), which involves data aggregation operation.An acceptable record of product demand will update by day, but warehouse staff usually do not record an item in the warehouse's actual operation as it has zero demand.In data preprocessing, the unrecorded data will bring challenges to establish the model, so we used downsampling to convert the data to weekly record.As can be seen from Fig. 3, the original data set was relatively dense.After resampling as Fig. 4 demostrates, the weekly demand data is aggregated, which virtually eliminates the influence of missing values when the demand data of goods is zero.Preparing the data and exploring the fit model is also more feasible when working with smaller data sets, and a week is a complete cycle for the warehousing operation itself.
Original order demand data contains noises caused by unreasonable recording and measurement methods and the influence of abrupt factors (Ma et al., 2018).These data noises make it difficult to fit the model, and the cost of fitting is high.In this paper, we took the original data 'DWH 5HVDPSOHG'DWD Fig. 4 Resampled product data distribution logarithm.The specified formula is as follows, where ts ′ represents data after being logged and ts is the raw data.
The following Fig. 5 reflects that the data after logarithmic transformation concentrates within a specific range and fluctuate smoothly, while the original data fluctuates considerably.To train, verify, and test the model's prediction accuracy, this paper divided the original product's data into the training set, verification set, and test set according to the ratio of 70%:25%:5%.As shown in Fig. 6, the blue line represents the data of the training set, with a total of 183 samples; the orange line represents the data division of the verification set, containing 65 statistical data; the green line in the later period indicates the data of the test set used to measure the accuracy of the model, with 13 records of one item demand.

Parameter optimization
In the neural network designing process, if there are too few neurons in hidden layers, underfitting problems will emerge.On the contrary, when the network has too much information processing capacity due to too many neurons, the limited information contained in the training set cannot meet the training needs of all the neurons in the hidden layer, thus leading to the phenomenon of overfitting.Even if there is enough information in the training data, too many neurons will increase the model's training time and make it difficult to achieve the desired effect (Adil et al., 2020).Choosing an appropriate number of hidden layer neurons plays an important role to establish a robust neural network.
This paper started with five neurons and gradually increased the number of neurons.After each number set, the model ran ten times.Under this number of neurons, the final prediction performance was the average of RM SE.The number of neurons corresponding to the lowest root mean square error of the test set will become the optimal neuron parameter.We visualized the experimental results in Fig. 7.When the number of neurons in the hidden layer gradually increases, the model performs well in the training set.However, the root mean square error of the test set generally increases.That is, the situation of overfitting appears.The ultimate goal of neural network training is to ensure that the model still maintains good generalization ability in specific predictions.Nevertheless, with the increase of the number of hidden layer neurons, the model's generalization ability is generally decreasing.Therefore, in the case of data used in this paper, the optimal number of hidden layer neurons is five.

Predicted results
We set the number of neurons in the hidden layer as five and the output layer set as one.LM algorithm carried out optimization iteration inside the network, and sample data was input into the neural network with preset parameters to fit and predict the ecommerce product demand in the next 13 weeks.As depicted in Fig. 8, the error of the validation set reached the minimum at the end of the tenth epoch,.It can be concluded from the training results that the model fully reflects the time correlation between the values of variables and capable of fitting the historical data effectively.The overall prediction was close to the actual recorded values, and the root mean square error of this training was 0.5865.
The error autocorrelation coefficient refers to the difference between the observed value and the predicted value.Observing the error autocorrelation results will conclude whether the model fully predicts the expected trend, seasonality, and randomness.According to the degree of error correlation, we get an idea about the model's prediction performance.
Ideally, the model's error results at a onetime node will not affect the prediction errors at other time points, and the neural network only expands the prediction process according to the law followed by the time series itself.If the model's overall performance is not satisfactory due to one unreasonable prediction, it needs adjustment.It can be seen from Fig. 9 that the correlation of prediction errors among the prediction points of the model was nonlinear, and the correlation degree was small.This characteristic

NAR vs. Prophet
Prophet uses a decomposable time series model, consisting of trends, seasons, and holidays.The formula used can be expressed as follows, where g(t) is the trend function, representing the value of nonperiodic change.s(t) indicates the cyclical change, h(t) represents the influence of holidays on the target data, and the error term ε t signifies the particular factors that the model cannot predict.For this error term, we assume that it conforms to the normal distribution (Taylor and Letham, 2018).
We conducted Prophet's hyperparameter optimization using the crossvalidation method and searched optimal parameter combinations for best prediction results.We set linear model Logistic as growth trend and specified the maximum limit value as the maximum historical data value.The demand predicted tends to be saturated when approaching this value.In the same way as the data processing in the NAR model, the relatively stable downsampled data after logarithmic processing was input into the Prophet model, and the model obtained the final effect after training and fitting.

Conclusion
Taking the demand forecast for one of the product time series data as an example, we used the NAR model to construct an accurate prediction of ecommerce product demand.By comparing the Prophet and time series prediction framework, we find that the nonlinear regression neural network uses less time and has higher accuracy.The main conclusions of this study are as follows: First, it is vital to choose the appropriate model according to the research background.The NAR model has many parameters that need to be adjusted, such as the network topology, initial values of weights and thresholds, making enough space for this method's custom parameter setting (Melin et al., 2020).The Prophet is a time series prediction framework released by Facebook, capable of dealing with outliers in the time series and coping with partial missing values, predicting the future trend almost automatically.As shown in the above results, the Prophet framework performed poorly in the data used in this paper, while the NAR neural network achieved a satisfactory prediction effect.The reason is that the Prophet model was not suitable for the data scenario in this paper.
Secondly, an accurate product demand forecast model is of great significance to the development of enterprises.The academic contribution of this paper lies in that through accurate prediction of future product demand, the warehouse management department foresees the upcoming demand for freights in the next quarter, and the appropriate equipment and human resources in the warehouse can be reasonably allocated beforehand to realize the rational use of resources and support green and sustainable development strategy.Traditional warehousing management is too passive to achieve the same results.Applying the forecasting model to the real working environment can make the warehousing operation active and promote its lean management (Poll et al., 2018).

Advice and suggestions
As models and methods used in this study integrate into the enterprise warehouse management practice, through several steps of data analysis and processing mechanism, our model completes the task of future product demand prediction and meet the requirements of the product changes updated in realtime synchronization, and finally achieve a realtime prediction effect.Enterprises need to adjust and optimize resources dynamically according to the forecasting results to reduce costs and improve operating performance.
For the goods in high demand during the forecast period, the ecommerce warehouse department should optimize the storage space by combining the ABC classification rule, and place these goods in a position that is easy to pick, thus significantly reducing the time and cost needed for picking (Li et al., 2016).The unreasonable management of warehousing goods has made the warehousing department suffer benefit loss.More detailed data recording and accurate demand forecasting reduce cost and increase efficiency for the warehousing department.In the past, the warehouse space management mode arranged for ecommerce goods according to the subjective experience of warehouse personnel is no longer suitable for the needs of today's customers.
The logistics warehousing business must emphasize historical operation data, which is of great significance for its future management and decisionmaking (Li et al., 2018).Companies need to normalize the data recording process, as data with high quality and more granularity segment usually make the forecasting model more robust, while unreasonable records of data for demand forecasting is full of challenges.They also need to set standards for data collection and make the most of it.Scientific and datadriven warehouse management decisions win a unique competitive advantage for enterprises, enabling them to achieve sustainable management.

Prospect of improvements
In the future, there is still much room for improvement on ecommerce product demand prediction.
* First of all, the model's performance on different data levels needs to be tested and optimized to adjust superior parameters in small, moderate, and large data sets.* Secondly, different ecommerce goods have a variety of connections.Considering the development rule of a single product in the time series is not comprehensive enough, the direction of optimization is to add the correlation analysis of multiple products in the prediction process and then use the multivariate time series prediction model to develop the demand prediction (Nguyen et al., 2020).* Thirdly, introducing more abundant demand prediction models and comparing the prediction effects of more methods and the model's improvement direction will be summarized.* Finally, improving the evaluation index of the forecast results into a multiindex fusion scheme.Adding weight to the index according to the specific business needs and making the weighted combination of various indexes (Banerjee et al., 2017).

Funding
This research was supported by Anhui Social Science Planning General Projects (NO.AHSKY2020D09) and Anhui Virtual Simulation Experimental Teaching Project (NO.2019xfxm44).

Declaration of no conflict of interest
The authors declare that there are no known financial interests or personal relationships that will affect the research work described in this paper.
Da Chun Wu, Babak Bahrami Asl, Ali Razban, and Jie Chen.Air compressor load forecasting using artificial neural network.Expert Systems with Applications, 2020.Shuojiang Xu and Hing Kai Chan.Forecasting medical device demand with online search queries: A big data and machine learning approach.2019.Hui Yuan, Wei Xu, Qian Li, and Raymond Lau.Topic sentiment mining for sales performance prediction in e commerce.Annals of Operations Research, 2017.Bo Zhang, Runhua Tan, and Cheng Jian Lin.Forecasting of ecommerce transaction volume using a hybrid of extreme learning machine and improved mothflame optimization algorithm.Applied Intelligence, pages 1-14, 2020.

Fig. 1
Fig. 1 NAR neural network structure the training process of NAR, the selection of activation function will directly affect the final prediction performance of the model, and it is necessary to select the appropriate activation function according to the requirements of data and network(Sato and Hikawa, 1999).The Sigmoid function compresses the original data into a specified date range in the model's hidden layer.The linear activation function amplifies the data information in the output layer, and the network returns mapped value as the model's final output result.This paper selected the Transig function in the hidden layer.Its mathematical formula is as follows, which converts the input data sequence to [−1, 1].

T
t=k+1 (y t − ȳ) (y t−k − ȳ) is the covariance between the error of the time series t and the error series t − k, and T t=1 (y t − ȳ) 2 indicates the variance of the error series.

Fig. 3
Fig. 3 Distribution diagram of original demand data

Fig. 5
Fig.5The original data and logarithmic results

Fig. 6
Fig. 6 Diagram of data set partition

Fig. 7
Fig. 7 RMSE under different number of neurons in one hidden layer

Fig. 8
Fig. 8 Iterative variation diagram of MSE evaluation index

Fig. 9
Fig. 9 Analysis diagram of the error autocorrelation coefficient Fig. 10 demonstrates Prophet's prediction result.A red vertical dotted line divides the predicted value and the training data set to show the difference.The black dots in the figure represent the original discrete points of the time series, and the dark blue lines represent the values obtained by fitting.Area with light blue covering indicates the reasonable upper and lower bounds of Prophet's predicted value.The prophet model had an overfitting phenomenon in the training process, and the RM SE of the predicted value was 0.7305.