Application of Neural Networks to Explore Manufacturing Sales Prediction

: Manufacturing sales prediction is an important measure of national economic development trends. The plastic injection molding machine industry has its own independent R and D energy and mass production technology, with all products sold globally through international brands. However, most previous injection molding machine studies have focused on R and D, production processes, and maintenance, with little consideration of sales activity. With the development and transformation of Industry 4.0 and the impact of the global economy, Taiwan’s injection molding machine industry growth rate has gradually ﬂattened or even declined, with company sales and proﬁts falling below expectations. Therefore, this study collected key indicators for Taiwan’s export economy from 2008 to 2017 to help understand the impact of economic indicators on injection molding sales. We collected 35 indicators, including net entry rate of employees into manufacturing industries, trend indices, manufacturing industry sales volume indices, and customs export values. We used correlation analysis to select variables a ﬀ ecting plastic injection machine sales and artiﬁcial neural networks (ANN) were applied to predict injection molding machine sales at each level. Prediction results were veriﬁed against the correlation indicators, and seven key external economic factors were identiﬁed to predict accurate changes in company annual sales prediction, which will be helpful for e ﬀ ective resource and risk management.


Introduction
Production and sales volumes for Taiwan's plastic and rubber machinery in 2017 was $NTD 43.1 B, representing 6.4% growth over 2016. Total export value was $USD 1.163 B, 12.8% increase over 2016, with corresponding import $USD 0.387 B, 8.7% increase over 2016. However, 2018 sales volumes were only $NTD 7.6 B, 8.5% reduction compared with the previous year's $NTD 8.3 B, for 4,528 units. The largest export country was China, then Vietnam ($NTD 1.34 B, 21.4%) and India ($NTD 0.721 B, 11.4%) [1]. Market changes in recent years have been somewhat volatile since plastic and rubber machinery makers are traditional resource-dense industries and manufacturers sought to reduce manufacturing cost by gradually transferred China production bases to Southeast Asia [2]. Industry 4.0 has rapidly increased the demand for high technology precision machinery and automated equipment, which forced transformations onto plastic and rubber manufacturing industries. It was hoped that the large number of plastic and rubber machinery manufacturers in Taiwan (more than 300 in 2018) could be gradually integrated more to help markets become more competitive. Product classification, reconciled functional differences, quality improvements, and market segmentation, helped reduce friction and prices, but a consensus was, and still is, required for manufacturers to avoid using price bargaining as a competitive market strategy [1].
Understand external economic variables' impacts on prediction models; and 3.
Establish a sales prediction framework for practical and feasible plastic injection machine markets to provide assistance for business management decisions.
The remainder of this article is organized as follows. Section 2 reviews the relevant literature, and Section 3 introduces the assessment methodology employed. Section 4 verifies the proposed methodology using an example with practical variables affecting plastic injection machine sales data. Section 5 summarizes and concludes the paper, and discusses some useful directions for future research.

Literature Review
Prediction is an important aspect of data exploration based on various statistical methods to identify useful trends or models from historical data, which can then be used to obtain predictions for the next period or cycle [4]. Thus, predictions anticipate future conditions, focusing on practical applications and integration, and generally incorporate mathematical models based on historical data to provide subjective or intuitive future expectations, or a synthesis of these, i.e., a mathematical model incorporating expert judgment and adjustment.
Every artificial neural network (ANN, NN) model can be classified in terms of its architecture and learning algorithm. The architecture (or topology) describes the neural connections, and the learning (or training) algorithm provides information on how the ANN adapts its weight for every training vector [5]. Artificial neural networks (ANNs) have become widely used for many commercial applications [6], but only a few previous studies have focused on the overview of published research results in this important and popular realm. Most previous studies have focused on financial distress and bankruptcy issues, stock price prediction, and decision support with particular emphasis on classification tasks [7]. ANNs have been widely used for commercial prediction, with traditional multilayer feed-forward networks with gradient descent back-propagation and various hybrid networks developed to improve standard model performances [8].

Sales Prediction
Sales predictions have been used in research and practical application for many years, with the various prediction models broadly divided into qualitative and quantitative approaches. Quantitative approaches collect historical data and analyze developing trends to provide useful input for management decision making. Prediction methods include case based reasoning, ANNs incorporating artificial intelligence and regression analysis, moving average methods, and exponential smoothing methods based on statistical logic. Analyzing key product characteristics and combining previous sales history data helps to provide better sales predictions from limited data volumes.
Sales prediction is also important for e-commerce, critically impacting wise decision making, and helping to manage workforce, cash flow and resources; and optimize manufacturer supply chains. Thus, sales prediction is generally somewhat challenging because sales are affected by many factors, including promotions, price changes, user preferences, etc. [9], and yet impacts significantly on important company policies and plans, with accurate predictions helping to reduce costs, optimize inventory management, and improve product competitiveness [10].

Relevant Factors
Relevant factors meant the correlation between variables in data and the correlation between variables and variables in view of whether any linear correlation. Most statistical methods consider correlations between independent variables and variables, whereas factor analysis investigates the relevance for multiple variables. Potential variables affecting variables behind were found even if these variables were not directly measured. These inferred potential variables were called factors [11,12]. Previous factor studies have covered commercial, e.g., [13,14]; sales, e.g., [15][16][17]; industry, e.g., [18][19][20]; and manufacturing, e.g., [21][22][23] applications amongst many others, and has been shown to effectively extract important factors that can then be employed for prediction using different methods.
Companies use identified sales factors for overall economic, industry, and company sales predictions. Overall economic prediction includes inflation, unemployment rates, interest rates, consumer spending and savings, investment, government spending, net export, and gross national output; whereas industry prediction uses the sales factors and other industrial environmental indicators to predict industry sales, and company sales are generally predicted by including company market share, new product listings, and marketing activities [24].

Artificial Neural Network Prediction
Artificial neural networks were developed in 1950s, and the original neuron-like perceptron model was proposed in 1969 [25]. However, the original theory was somewhat simple and was not taken seriously until Hopfield defined the modern ANN form in 1982. Many diverse ANNs have been subsequently proposed incorporating new architectures and theories, and the enormous increase of computing power has allowed modern and powerful ANNs to be widely used, with many practical ANN applications across various industries.
Various market sales prediction models have been proposed based on back-propagation ANNs, incorporating improved methods to compensate for known back-propagation inadequacies, particularly for problems with agnostic rules. Although the PCB industry has significant impact on the Taiwanese economy, serious inventory dumping and material shortages remained. Therefore, Chang et al. [26] established an ANN-based prediction model to solve these inventory and material shortage problems. Au et al. [27] developed an optimized ANN prediction system for clothing sales prediction using two years of clothing sales data. They compared model performance with basic fully connected ANN and traditional predictive models, and showed that the proposed algorithm was superior to traditional prediction methods for products with low demand uncertainty and weak seasonal trends. Thus, the method was suitable for fashion retailers to produce short-term retail predictions.
Kong and Martin [28] used back-propagation networks (BNPs) to predict future food sales for large wholesalers in Victoria. They showed that the proposed BPN method provided superior prediction results compared with traditional methods for trending and market analysis, i.e., simple linear regression models. Hence, the BPN model could provide a useful tool for sales prediction. Thiesing et al. [29] used an ANN trained by BPN to predict future time series values, including weekly demand for supermarket items, considering prices, advertising activity, and holiday impacts.
Convenience stores are an integral part of the retail industry, distributing goods to consumers from suppliers. Some convenience stores were out of stock of customer desired products, where others had excess of the same products. However, customer satisfaction was always significantly impacted where desired goods were out of stock, regardless of how good the service was. Thus, controlling ordering and inventory has become a very important issue for store convenience store management. Therefore, Chen et al. [30] developed an ANN based system to control orders and manage the inventory in convenience stores based on business circle and sales prediction operational characteristics. The proposed system improved order and discard rates to better ensure ordering the right items in the correct quantities.
Vhatkar and Dias [31] considered several sales prediction algorithms and developed an inverse transfer-like ANN model to predict oral care product sales and error rates for several different products. Mo et al. [32] proposed an optimized BPN method to improve supermarket daily average rice sales prediction accuracy. Weron [5] proposed an electricity price forecasting. This review article aims to explain the complexity of available solutions, their strengths and weaknesses, and the opportunities and threats that the forecasting tools offer or may be encountered. Cincotti et al. [33] used an ANN to analyze electricity spot-prices of the Italian Power Exchange.

Pearson's Correlation Coefficient
Pearson's correlation, Γ, is a commonly employed measure for the linear correlation between two variables [34]: where n i+ is the number of observations in the i-th row, n +i is the number of observations in the i-th column, n ij is the number of observations in the i-th row, j-th column cell, and n ++ is the total number of observations. A perfect linear correlation between the variables would have |Γ| = 1, with positive result meaning the two variables increase together, and negative meaning one decreases as the other increases.
Hypothesis testing is also commonly employed to test if the data supports the null hypothesis, i.e., the two variates are not statistically different. The metric for this test is the probability of generating a type I error: where − X and − y are sample means for the first and second variable, respectively; S X and S y are standard deviations for the first and second variable, respectively; and n is the column length.
Thus, the null hypothesis for Pearson's correlation is H0: r = 0, and the commonly apply test is to reject the null hypothesis if ρ ≤ 0.05, i.e., at the 95% confidence level.

Artificial Neural Network Principles
The BPN is one of the most widely used and representative models in current ANN learning models. Werbos [35] proposed the basic concept and Rumelhart et al. [36] defined the underlying theory and algorithm from the propagation or generalized delta learning rule. ANNs simulate biological nerves, using continuous learning and error correction to achieve accurate output. Modern ANNs commonly incorporate large learning databases, parallel processing, nonlinear outputs, and multiple layers.
The basic ANN component is a processing element (PE), where the output from each processing unit is connected to the processing unit of the next layer. Input for the next processing unit originates from corresponding processing units in the upper layer with the sum of the output values. Processing unit output is generally expressed as a function of the sum of the input and linked weighted product: where Y j is the output from a neural network processing unit, X i is the set of input values for the processing unit, net are integration functions, θ i is the applied threshold, W ij is the influence of intensity of the i-th processing unit in the previous layer exerted on the j-th processing unit in the latter layer between processing units of neural networks were between the data transfer paths between the processing units, and f is the transfer function of neural network processing units obtained the output value via the processing unit transfer functions by using the sum of the input values input from the other processing units and join weight values.

Back-Propagation Network Algorithm
Back-propagation networks were widely used in ANN architectures [37], providing supervised learning [36]. BPN architecture uses input layers to represent network input variables, where the number of processing units depends on problems by using a linear transfer function, a number of hidden layers, and a single output layer. Propagation in a neural network can be forward or backward. Through these two processes, output errors can be reduced, and the network can be set to learn how to solve a given problem. Forward propagation starts from the input layer and produces computations at each layer until it reaches the final layer. Reverse propagation starts from the output layer and computes weights and errors for each previous layer until it reaches the input layer. Back-propagation compares the gaps between the targeted output and the actual computed output values. The synapse values are re-adjusted to minimize errors [38]. The propagation rule is the basis of the architecture shown in Figure 1.
propagation compares the gaps between the targeted output and the actual computed output values. The synapse values are re-adjusted to minimize errors [38]. The propagation rule is the basis of the architecture shown in Figure 1. Hidden layers represent interactions between input units. There is no specified method to decide upon the appropriate or correct number of processing units or hidden layers, hence most studies used accumulated experience to determine these. Generally, the number of hidden layer processing units increases with increasing problem complexity. However, too many processing units causes memorization of the network and poor inductive power; and too few means the network will be unable to obtain the correct mapping relationship between the input and output.
The number of outer layer processing units also depends on problem complexity. If the output is a single real value, the number of output layer processing units = 1, whereas the number of processing units for classification problems depends on the number of classifications. To refrain the processing of a big sample set of data, an ANN tool including a particular training-validation-test procedure for small datasets has been developed some years ago and recently refined in order to obtain realistic and reliable regression laws [39][40][41].
The basic BPN is to minimize the error function using the gradient steepest descent. For a given training paradigm, the network slightly adjusts link weighting values proportional to the error function's sensitivity to the weighting values. Thus, error functions are proportional to the magnitude of the partial differential values of weightings [12]. Typically, the error (or energy) function representing learning quality has the form: where η is the learning rate, which controls the magnitude for every change of weighting value; η is learning rate; and E is the error, hence larger E implies poorer network quality: Hidden layers represent interactions between input units. There is no specified method to decide upon the appropriate or correct number of processing units or hidden layers, hence most studies used accumulated experience to determine these. Generally, the number of hidden layer processing units increases with increasing problem complexity. However, too many processing units causes memorization of the network and poor inductive power; and too few means the network will be unable to obtain the correct mapping relationship between the input and output.
The number of outer layer processing units also depends on problem complexity. If the output is a single real value, the number of output layer processing units = 1, whereas the number of processing units for classification problems depends on the number of classifications. To refrain the processing of a big sample set of data, an ANN tool including a particular training-validation-test procedure for small datasets has been developed some years ago and recently refined in order to obtain realistic and reliable regression laws [39][40][41].
The basic BPN is to minimize the error function using the gradient steepest descent. For a given training paradigm, the network slightly adjusts link weighting values proportional to the error function's sensitivity to the weighting values. Thus, error functions are proportional to the magnitude of the partial differential values of weightings [12]. Typically, the error (or energy) function representing learning quality has the form: where η is the learning rate, which controls the magnitude for every change of weighting value; η is learning rate; and E is the error, hence larger E implies poorer network quality: where T j is the target output value for the j output unit in the output layer, and T j is the inference output value of the j output unit in an output layer. Substituting Equation (5) into Equation (4) and applying the chain rule: If we define δ j as the error of the j output processing unit in an output layer, then: Thus, the correction for W ij is: and the correction to the threshold for the output unit is: Backward correction, i.e., Equations (7)-(9), are repeated layer by layer to obtain the weighting value between each layer and threshold for each neuron, network output values and target values were gradually reduced. The error made the establishment of network models reachable. Generally, the momentum coefficient or inertia factor, α, is added to Equations (8) and (9) to correct the previous weighting value, reducing oscillation during convergence and making training smoother, i.e.: ∆W ij (n) = ηδ j X i + α∆W ij (n − 1), and: There are many data splitting methods reported and used in literatures. These methods can be roughly categorized into three different types [42][43][44]: (1) Cross-validation, (2) randomly selecting a proportion of samples and retaining these as a validation set and then using the remaining samples for training, and (3) the Kennard-Stone algorithm. In this study, we choose the random method for the ease of application, and are applied on 7/8 portion of training data. The remaining 1/8 portion is the test data.
The ANN is trained by selecting a training set of samples to repeat the learning steps multiple times until errors converge. However, training loops should not be too long to avoid over-training the ANN, ensuring new input samples produce correct outputs.

Performance Indicators
Predictive metrics compare predicted and actual values to confirm model feasibility. This study used the root mean squared error (RMSE) [16]: and mean absolute error (MAE) [17]: to measure differences between actual and predicted values. Smaller RMSE and MAE shows predicted values were closer to the actual values [45], and hence the model provided better predictive power.

Results and Discussion
The collected dataset comprised 120 entries from 2008 to 2017 for key external economic indicators affecting predicted sales, including net employee entry rate into manufacturing and service industries, trend indices, manufacturing industry sales volume indices, and customs export indices, as shown in Table 1.

Pearson's Correlation
To improve BPN prediction accuracy the data was pre-processing used Minitab V16 to identify significant correlations, as shown in Table 2.  Pearson's correlation was considered significant (i.e., ρ < 0.05) for 12 items: X6, X8, X9, X11, X12, X13, X24, X25, X26, X27, X29, and X30. To improve prediction accuracy, it was recommended to use the correlation coefficient with moderate correlation and low correlation indicator items, with p values selected from the significance below 0.001 as an indicator item. The values with negative correlations were not included. The scenario studied contains 120 entries of actual data released by the government [46].

Artificial Neural Network Prediction
Study parameters were established from previous ANN studies. Zhang et al. [47] showed that a single hidden layer was sufficient for most nonlinear problems, and Lee and Chen [48] showed there was no standard for setting the number of hidden layer nodes, but recommended to use n, i.e., the column length or number of variates, as the number of nodes in the input layer to test the number of hidden layer nodes. Previous studies considering generally considered setting the learning rate for predictive and estimated problems between 1-10, with subtypes were set between 0.1-1.0 Kuo [49] indicated there was no absolute rule to determine network parameters, and parameter settings providing improved results could only be found through trial and error.
Therefore, we set BPN within Neuro Solutions 7. The network has one hidden layer, with its number neurons as twice the input layer, whereas one neuron in the outer layer is used. The conversion function used is sigmoid, with learning rate, training iterations and error targets as 0.01-1.0, 2000 epochs, and MSE ≤ 10.5, respectively.
The BPN operation steps are as follows: 1. Set the number of input, hidden, and output layers; inertia coefficient, learning rate, learning cycles and other relevant parameters.

3.
Transfer to the BPN to start calculation. 4.
Correct join and bias weights according to errors. 5.
After selecting the relevant factors, they were input into the BPN learning using 7/8 of the available data as the training set with the remaining 1/8 served as the test set. The BPN training algorithm generally uses a steep gradient algorithm, but this has slow convergence and can easily fail. Therefore, this study used the Levenberg-Marquardt (LM) algorithm for training, with maximum iterations = 2000. Figure 2 shows the average MSE and standard deviation from three runs. The training data achieved convergence after 3 runs. All MSEs were below 0.1 with targeted effect reachable shown as Figure 3. The training parameters were entered into the BPN and the test dataset predicted, as shown as Figure 4. The training set had MSE = 0.025882721 ± 0.000802464., as shown Table 4.  The training data achieved convergence after 3 runs. All MSEs were below 0.1 with targeted effect reachable shown as Figure 3. The training data achieved convergence after 3 runs. All MSEs were below 0.1 with targeted effect reachable shown as Figure 3. The training parameters were entered into the BPN and the test dataset predicted, as shown as Figure 4. The training set had MSE = 0.025882721 ± 0.000802464., as shown Table 4.  The training parameters were entered into the BPN and the test dataset predicted, as shown as Figure 4. The training set had MSE = 0.025882721 ± 0.000802464., as shown Table 4.     Table 5 summarizes the final BPN performance (for the test dataset). After analysis by relevant factors, the variables with higher homogeneity were clustered together, but the data with significant differences were separated. Therefore, compared with the traditional statistical regression model, the prediction accuracy was very high relatively.

Conclusions
The Taiwanese plastic injection machine industry has not previously analyzed correlations between predicted industrial sales accuracy and key economic indicators. Therefore, this study used the propagation neural prediction method to improve prediction accuracy for plastic injection machines sales in Taiwan for 2018. We first identified influential factors using correlation analysis and integrated a BPN. Overall prediction accuracy achieved was RMSE = 24,858,562.25 for the test set.
Overall outcomes confirmed it was feasible to predict plastic injection machine sales using factor analysis and back-propagation neural networks, and the proposed prediction system clarified the importance of prediction accuracy (MAE and RMSE) could be clearly seen, and was consistent with predicted results.
However, the proposed system has some several limitations that should be addressed: • The system should be extended to consider changes occurring for the different factors. • Implementation required for more time than other prediction methods. • Current accuracy is acceptable, but could be improved.
Industrial sales are driven by many complicated factors. Although the proposed system provided acceptable prediction accuracy, future research should evaluate alternative prediction methods for sales, including classification and fuzzy. Improved accuracy with help further reduce production inventories, improve sales target predictions, and achieve operational goals.