Influencing Factors and Forecasting Statistics of Enterprise Market Sales Based on Big Data and Intelligent IoT

With the acceleration of economic development, enterprise management is facing more severe challenges. Big data analysis based on the intelligent Internet of Things (IoT) has a positive effect on the development of enterprise management and can make up for the shortcomings of enterprise management. In this paper, we develop a big data processing method based on intelligent IoT which can mine the factors that affect the company’s market sales from the collected data. Then, we propose a KNN classification algorithm based on overlapping k-means clustering. This algorithm adds a training process to the traditional KNN algorithm, which can accurately classify data and greatly improve the efficiency of the classification algorithm. Numerical analysis results prove the effectiveness of the proposed algorithm.


Introduction
In recent years, with the rapid development of social economy and science and technology, big data analysis in Intelligent Internet of Things (IoT) has been widely applied to various industries and fields. It can help enterprises find problems existing in management and promote the improvement of enterprise management level. Applying big data analysis in intelligent IoT to enterprise management can adapt to the changes of internet enterprises, help enterprises better cope with various challenges, and lay a good foundation for the sustainable development of enterprises.
The development of modern science and technology also drives the development of intelligent IoT industry. Intelligent IoT is a concept that emerged in 2018. It refers to the system collects all kinds of information in real time through various information sensors (generally in the context of monitoring, interaction, and connection) and makes intelligent analysis of data through machine learning in terminal devices, edge domains, or cloud centers, including positioning, comparison, prediction, and scheduling. At the technical level, AI enables the Internet of Things to acquire the perception and recognition ability, and the Internet of Things provides the data of the training algorithm for AI.
As big data applications continue to penetrate into all walks of life around the world and take root, traditional data management methods no longer meet the data management needs of enterprises [1]. In the future, the surrounding environment of enterprises in business activities is unpredictable. To increase the risks and opportunities in its business activities, if it can predict the sales volume in the business activities, use big data technology to analyze the factors influencing the company's market sales, and formulate response strategies in advance, it can be better at resisting risks and transformation opportunities will ultimately increase the company's profit in the market and stabilize its leading position in business activities.
When applying big data technology in intelligent IoT to enterprise management, a large number of real data of enterprises can be extracted and processed and analyzed with big data technology, so as to provide reliable reference basis. Compared with the traditional data management system, the system architecture of the Internet of Things includes the following parts: (1) The LAAS layer, which is the important data storage layer of the Internet of Things system, can select cloud for data storage to facilitate data query and utilization (2) PaaS layer, mainly to provide the development languages and tools required by customers, such as Python, Hive, and Hadoop (3) SaaS layer, mainly to provide the applications needed by customers, to facilitate the use of devices for client interface access, such as intelligent large screen, PC terminal, and common client interface. The data information of each business and each sales market can be monitored. Due to the different requirements for data information in data management of various enterprises, the application of big data analysis in intelligent IoT should be combined with its own situation, and the systematic analysis of the existing data resources of the enterprise should be carried out to help the enterprise find its own problems and find the best solution Many foreign scholars have conducted research on the influencing factors of corporate market sales and forecast statistics and achieved good results. For example, Chemmanur et al. designed an attribute cleaning method for different categories of "dirty data" and proposed a method based on the tree-structured Bloom Filter algorithm cleans duplicate data. Massive data has been cleaned by multiple iterations to ensure the data quality of the following sales forecast analysis [2]. Singh and Mohanty proposed a new type of online data storage and processing model-online analytical processing. In terms of data analysis and processing functions, OLAP has greatly improved compared with OLTP and can meet various application needs of users [3]. Yadegaridehkordi et al. proposed a combined forecasting model, that is, analyzing the characteristics of a single forecasting model, and then combining different models according to a certain weight ratio, so as to give full play to the advantages of different models and improve the prediction accuracy of the model [4].
In the research of related scholars in our country, Wang and Han designed an extended radial basis function as the kernel function for the multidimensional and nonlinear characteristics of the product sales sequence and used an improved immune optimization algorithm to adjust the parameters carry out optimization, establish a set of support vector machine forecasting system, apply the system to car sales forecasting examples, and verify the feasibility of the system by comparing with BP neural network and general radial basis function forecasting accuracy [5]. Gu et al. use a background value optimization formula with adjustment factors to optimize the gray GM model and applies the established optimization model to the number of tourists and tourism income in Hangzhou's tourism industry to predict [6]. Xin et al. provide guidance and suggestions for the tourism industry to formulate sales strategies [7]. Xing et al. use principal component analysis to reduce the dimensions of 8 related clothing promotion factors, use particle swarm optimization algorithm to optimize the neural network, establish a clothing product sales forecast model, reduce the training time of the network, and improve the accuracy of the network prediction degree [8].
Facing a large amount of enterprise data acquisition, storage, and analysis work, we use big data technology in intelligent IoT to ease the workload of data analysts. This paper uses the big data analysis method in intelligent IoT to analyze the historical sales data of enterprises, and its time series presents the characteristics of dual trend changes. According to its characteristics, a combination forecasting model based on the trend method and seasonal index method is proposed, and MAE, RMSE, and MAPE forecast evaluation standards are used to forecast the combination the model is compared with the forecast effect of a single-trend forecast model and seasonal index model, and then, the optimal forecast model is selected.

Influencing Factors and Forecasting
Statistical Model of Enterprise Market Sales Based on Big Data Analysis in Intelligent IoT 2.1. Intelligent IoT Architecture. It mainly includes three levels: intelligent devices and solutions, the OS layer, and infrastructure in intelligent IoT architecture, which are finally delivered through integration services, as shown in Figure 1. Intelligent equipment can achieve the data collection of view, audio, and pressure and perform the action of capturing, sorting, and handing. Usually IoT devices and solutions are provided to customers. This layer involves diversification of device forms. The OS layer is equivalent to the "brain" of intelligent IoT, which can mainly connect and control the device layer, provide intelligent analysis and data-handling capacity, and solidify the core applications for scenarios into function modules. This layer has high requirements on business logic, unified modeling, full-link technical capacity, and high-concurrency supporting capacity. The infrastructure layer provides the IT infrastructure of servers, storage, AI training, and deployment capabilities, etc. In the era of intelligent IoT, mass data generated in the production and life of people will be collected by sensors in intelligent IoT. In the era of big data, the individual behavior of consumers not only can be collected, quantified, and predicted but also consumers' personal opinions may change the operation of the business society.

Use MapReduce to Simplify Massive Data.
MapReduce adopts the master-slave mode, which is to set up a master node and multiple slave nodes to jointly complete the entire process of distributed computing [9,10]. In the calculation process, data processing needs to be carried out through the cooperation of two stages of map (mapping function) and reduce (reduction function). Generally, the output of one stage is the input of the next stage, and the two require multiple coordination and cooperation.
2.2.1. JobTracker. JobTracker is mainly used to receive the application processing program, resource monitoring, and job scheduling submitted by the client. This part of the program mainly depends on the programmer to make some 2 Wireless Communications and Mobile Computing complex algorithms. It monitors the running status of Task-Tracker and jobs in the system by accepting the heartbeat information sent by TaskTracker. The good fault tolerance mechanism of JobTracker enables TaskTracker or jobs to run abnormally, and tasks running on TaskTracker can be aborted due to abnormalities the task is backed up and executed on other TaskTracker to ensure the stability and reliability of the system [11,12].

Client.
The client is mainly a layer where the user interacts with the MapReduce process. This layer mainly plays the role of input, passing the program that the user needs to perform operations to the JobTracker. During the operation, the operation process can be monitored through the console.

TaskTracker.
The task scheduler is mainly used to coordinate specific calculations and responds to the Job-Tracker with information such as the time spent in the calculation process, the number of processing tasks, the occupied CPU, and memory, and at the same time, it processes the assigned tasks [13,14]. Map and reduce are coordinated to complete. MapReduce uses split as the smallest processing unit of data and is used to store the corresponding data block to be processed. Each split will be processed by the corresponding Map Task.

Introduction to Machine Learning Classification
Algorithms. Machine learning is essentially an approximation to the real model of the problem. Among them, the supervised classification algorithm has been widely used in many business scenarios. There are many ways to solve classifica-tion problems. The basic classification methods mainly include decision trees [15], naive Bayes [16], support vector machines [17], K-nearest neighbors [18], and artificial neural networks [19]. The decision tree algorithm is a method of approximating the value of a discrete function. It is a typical classification method. It first processes the data, uses induction algorithms to generate readable rules and decision trees, and then uses decisions to analyze new data. Naive Bayes classification is a classification method based on Bayes' theorem and the assumption of the independence of characteristic conditions. It originated from classical mathematical theory and has a stable mathematical foundation and classification efficiency. A support vector machine is a supervised learning method, which can be widely used in statistical classification and regression analysis. K-nearest neighbor algorithm, referred to as KNN (k-nearest neighbor), is also a relatively simple classification and prediction algorithm. For selecting the K training data that are most similar to the data to be classified and predicted, the results of the K data or the classification labels are averaged and the mode is taken to obtain the results or classification labels of the data to be classified and predicted. The artificial neural network, abbreviated as neural network or quasineural network, is a mathematical model or calculation model that imitates the structure and function of biological neural network and is used to estimate or approximate functions. The neural network is calculated by connecting a large number of artificial neurons. In most cases, the artificial neural network can change the internal structure on the basis of external information and is an adaptive system.   (1) Algorithm Flow. Calculate the distance between the sample to be classified and each training sample: the distance function in the KNN algorithm generally has Euclidean distance: Manhattan distance: Chebyshev distance: As well as Min's distance, average distance, and geodetic distance, among these distances, the Euclidean distance is often used because of its simplicity.
(2) Selection of Prediction Algorithm. Because the sales of cigarettes need to be predicted, and the sales data of cigarettes is not a continuous time, but a time point, it is more appropriate to choose a time series model when choosing a forecasting algorithm, and the time series can establish a relationship that includes dynamic dependencies. Based on the data model, the trend of future data can be observed from historical behavior information [20,21].

Improve Data Mining
Algorithm. This paper introduces the training process of clustering after partitioning into the KNN algorithm. That is, the big data is divided into equidistant blocks, and then, the data is clustered on each block of data. In this way, for big data, dividing the data into many blocks can effectively reduce the requirements for computer memory, so that the KNN classification algorithm can be used in a big data environment.
First, the big data is divided into blocks. According to different requirements, it can be divided into blocks according to different blocking methods. Given n data samples fx 1 , x 2 , ⋯, x n g, find K clusters Class center fa 1 , a 2 , ⋯, a n g, so that the sum of squared distances between each data sample and its nearest cluster center is the smallest. This sum of squared distances is called the objective function W N , and its mathematical expression is formula (4), the data is recorded as data matrix formula (5), the difference between data and data uses dissimilarity matrix formula (6), according to the characteristics of the data in this article, the objective function W N is transformed into formula (7): , ð5Þ , ð6Þ In formula (5), each column represents a data attribute, and each row represents a piece of data. In formula (6), dðm, nÞ represents the degree of difference between the mth data and the nth data. When the difference between the two data is smaller, the value of dðm, nÞ will also be smaller [22,23].

Time Series Stationarity Test.
According to the calculation formula of unit root in MyEclipse, if the sequence is nonstationary, there is β = 1, and if the sequence is stationary, then β < 1. Now, assuming that the sequence is nonstationary, let β = 11 into formula (8), (9) and calculate the value of DF: The running result is the unit root test value of the original data. According to the value, it can be judged whether the original data is stable. When the value does not exist or is particularly small, the original data can be judged to be a stationary sequence; otherwise, it is not stable [24,25].

Time Series Smoothing
Processing. Because the original sequence is a nonstationary sequence, the difference calculation is performed on the original sequence for stationary processing. The calculation formula is Among them, t is the time point. If there are periodic fluctuations in the time series, then the data should also be subjected to a seasonal difference operation. The seasonal difference processing operation can clear the periodicity of the time series. The calculation formula is 2.5. Second Moving Average Method. Establishing a forecasting model for the second moving average method is the key to forecasting using this method. The forecasting model of the second moving average forecasting method is shown in formula (12): The T in the above formula represents the expectation that starts at time t and moves backward. M ð1Þ t is the last moving average in the first moving average sequence obtained by calculation. M ð2Þ t represents the last moving average in the second moving average sequence [26,27]. Correspondingly, the formulas for calculating the primary and secondary moving average are as follows: Formulas (15) and (16) represent the time series observation values to be predicted, and M ð1Þ t and M ð2Þ t represent the primary and secondary moving average values of period t, respectively and n is the spanning dimension of this calculation. The basic prediction formula of the exponential smoothing prediction model is In formula (17), when S t−1 represents time t, the actual value Y t−1 at that time corresponds to the smooth value, and S t−1 represents the smooth value corresponding to the actual value Y t−1 at time t − 1. The parameter a in the formula is a weight value, which is also called a smoothing constant under normal circumstances, and the value range is [0,1].

Design of Enterprise
Marketing System Based on Hadoop 2.6.1. System Architecture. The design of the enterprise marketing system is to build a Hadoop-based data processing platform as a data management center and provide massive data storage and processing support to implement a Hadoop-based enterprise marketing system [28,29].
(1) Data Source Layer. The main job of the data source layer is to collect data. The data source of the enterprise marketing  The data collection method is mainly through the National Bureau No. 1 Project. Downstream data includes the production and sales information of more than a dozen industrial companies across the country, as well as the company's purchase, sales, and inventory data in various markets across the country. Salespersons report market data and import external systems through the Web Services interface.
(2) Data Transmission Layer. There are two main ways of data transmission; one is through ETL middleware, and the other is a data transmission interface through enterprise applications. Generally, small-scale data provides a data transmission interface, and the data can be transmitted directly through the interface, while for large-scale data extraction, data is directly extracted by connecting to the database through ETL middleware.
(3) Data Processing Layer. Since the source data transmitted from the data source layer is not only the finest granularity but also the amount of data is very large, and there are many "dirty data," the transmitted data needs to be preprocessed before storage including data cleaning and processing. Due to the huge amount of data, data processing is performed by the Hadoop platform to achieve dimensionality reduction and aggregation of massive data and simplify the data on the basis of satisfying model analysis and maintaining data integrity and accuracy.
(4) Data Storage Center. The data storage center of the system in this paper is coordinated by the Hadoop distributed storage platform and the relational database. The Hadoop platform is built to store massive data, process massive data, and transmit data to the system [30]. After the data is exported from the Hadoop platform to the relational database, the relational database performs real-time analysis or mining on it to ensure data consistency during processing. Using Hadoop and relational databases to work together to process computing tasks, separate the processing of massive data from the processing of real-time data, so that largescale data operations will not affect the operating efficiency of the marketing system, and also make the entire system easier for expansion, more stable.    [31]. Data transmission uses XML format as the data transmission standard for each interface and realizes the application integration and data integration of each system through message middleware, and realizes the integration of data collection, data storage, and data preprocessing. Building a Hadoop-based data processing platform as the system's data management center [32], which is low-coupled with other running hardware devices, processes, and stores massive amounts of data at high speed, and can allocate computing and storage resources to other running systems, and use the Hadoop platform working with relational data, it separates the processing of massive data from business logic and analysis operations, reduces the coupling degree of analysis and calculation at the hardware level, and greatly improves the analysis and calculation performance and stability of the system.

Experimental Subjects and Data
Collection. This article selects the monthly sales volume of a certain brand of ciga-rettes from China Tobacco in this province from 2016 to 2019 as the analysis data set. The time series of the brand's sales volume has a strong upward trend and also has periodic phenomena, so the time series has the characteristics of dual trend changes, namely, seasonal volatility and overall trend variability, the desired effect will not be achieved. In this paper, the single-term model of the trend method and the seasonal index method will be fitted to forecast, respectively, and then, the linear combination forecasting model of the two will be established to forecast the sales, and finally the forecast results will be compared.

Forecast Method
(1) Seasonal index method the calculation method and steps are as follows: (a) Calculate the average of the same quarter over the years. Suppose the average of the same quarter over the years is r i , i = 1, 2, 3, 4. A total of 12 quarterly time series in three years are represented as y 1 , y 2 , ⋯, y 12 , then you can get: (c) Adjust the seasonal index of each season. Theoretically, the sum of seasonal indices should be 4, but due to calculation errors in practice, the sum of seasonal indices is greater than or less than 4, so it needs to be readjusted. The adjustment formula is When using the seasonal index method to forecast time series, it should be noted that the time series should not have an obvious linear trend; otherwise, the forecast accuracy will be greatly reduced.
(2) Long-term trend This article uses the least square method to find the parameters in the linear trend formula. The core idea of the least square method is to use a straight line to approximate the historical data in the past. The mathematical language is the actual observation value of the time series of the object model y i and the predicted valueŷ i in the linear trend model have the smallest sum of squared deviations, that is, the value of ∑ðy i y∧ i Þ 2 is the smallest. The least square method is used to determine the value of the parameter. The specific derivation process is omitted here. The calculation formulas for parameters a and b arê (3) Decomposition prediction model After determining the seasonal index and long-term trend with the decomposition method, the two key factors, the new forecast value of the cigarette sales model can be calculated according to formula (22).

Wireless Communications and Mobile Computing
As the factors in the model are determined, the random fluctuation I and the cyclic index C have been reduced. After simplification, the predicted value is obtained. Table 1, among the demographic factors in this area are the resident population, floating population, and other influencing factors, cigarette sales, cigarette sales, and other data values and data trends. As shown in Figure 2, there are many factors affecting cigarette sales. All the factors affecting the product are analyzed by big data technology. After a large number of calculations and modeling, the product sales amount is mainly reflected in the number of permanent residents and economic the efficiency is getting higher and higher over time.

Personal Consumption Ability and Product Sales Trend.
As shown in Table 2, the data association trends between factors are per capita consumption expenditure, average wages of employees, and cigarette sales in the region.
As shown in Figure 3, according to big data, the main reason for affecting product sales is the city's per capita living expenses, followed by per capita GDP in 2016-2019. Due to the limitations of data statistics, there is a lack of regularity. In this case, traditional mathematical induction statistical methods cannot be used.

Sales Forecast Analysis
. We use this model to obtain the sales forecast value of the key brand from January to June 2018 and compare it with the actual sales value and compare it with the sales value of the same period in 2017. The unit is box. The results are shown in Table 3.
As can be seen in Figure 4, the sales forecast model based on the trend method has a good forecast of the seasonality and periodicity of the monthly sales of cigarettes, but the relative error of the forecasted sales in a certain two months is still more than 10%. The prediction effect of extreme values in the time series is not ideal, and we can continue to improve on the basis of this prediction.

Forecast by Seasonal Index Method.
After repeated training, the number of hidden layer nodes is determined to be 5, and the trained seasonal index method model is used to predict the test data set. The results obtained are shown in Table 4: It can be seen intuitively from Figure 5 that the relative error of the sales forecast model based on the seasonal index method is relatively stable and the error is small. Therefore, this paper adopts a seasonal index method model based on the forecast based on the trend method model to modify the forecast value of the trend method model to improve the accuracy of the forecast.

Establish a Combined Forecasting Model Based on Trend
Method and Seasonal Index Method. First, the linear structure part of the time series is fitted with a trend method model to obtain the predicted valueL t . The first order and seasonal differences have been performed on the time series before modeling. The purpose is to eliminate the trend of the time series and reduce the time series. The seasonal index method is used to identify the nonlinear part e t of the time series, and the prediction resultN t is obtained. After repeated trials and comparisons, this paper constantly adjusts the   Wireless Communications and Mobile Computing number of nodes, and finally determines the input node is 12, the hidden layer node is 5, and the output node is 1 structure. The predicted results are shown in Table 5.
It can be seen intuitively from Figure 6 that the relative error of the forecast results of the combined forecasting model is less than 5%, which is also lower than the relative error of the forecasting results of the trend forecasting model and the seasonal index forecasting model, that is, the forecasting effect is better than that of a single model. Through the comprehensive application of the two models of trend method and seasonal index method, they can give full play to their respective strengths to achieve the purpose of improving the forecasting effect.
This article uses historical monthly sales data of a certain brand of cigarettes from 2016 to 2019 to establish a trendbased sales forecast model, a seasonal index-based forecast model, and a combined forecast model to predict the monthly sales in 2018. Table 6 shows the comparison of the prediction effects of the three prediction models by the MAE, RMSE, and MAPE prediction evaluation standards. Table 6 shows that from the three indicators of MAE, RMSE, and MAPE, it can be seen that the evaluation indicators of the combined model are the lowest among the three models, so the prediction effect is the best. In the original data, there are both linear factors and nonlinear factors, so a single forecasting model, whether it is a trend forecasting model or a seasonal index forecasting model, cannot achieve the ideal forecasting effect. The combined model of the trend method and the seasonal index method can synthesize the advantages of a single model, better dig out the complex linear and nonlinear features behind the data, and also improve the prediction accuracy of the model. 4.6. Comparative Analysis of Algorithm Performance. Based on the above classification results, these sample data are processed into feature vector values. The KNN classifier is used in MATLAB to classify the three data samples, and the    Table 7.
As shown in Table 7 and Figure 7, it can be concluded that the classification accuracy of the algorithm proposed in this paper is 0.41%~6.2% higher than the random block algorithm, and 0.7%~1.8% lower than the traditional KNN algorithm. It is 55.15%~63.17% faster than the random block algorithm, and 83.28%~90.55% faster than the traditional KNN algorithm. As the data increases, the speed will increase more obviously.

Comparative Analysis of Forecast
Results. According to the data in the above chart, the value obtained by the long-term trend method predicted by the formula, and the value obtained by the decomposition method, the forecast data for the next few periods are, respectively, predicted. Finally, we compare these data with the actual sales results of the cigarette market. For comparison, the data details are shown in Table 8.
From Figure 8, we can see that the error analysis between the forecasted value of the three forecasting models of trend method, seasonal index method, and decomposition method and actual sales volume, we can see that the time series decomposition method established by the multiplication model performs best. The average error rate is about 2% and the fluctuation is small, followed by the seasonal index method, and the trend method has the worst performance of about 4% and the fluctuation is large. For the tobacco industry, the decomposition prediction model can fully meet its forecasting needs, so as to guide industrial companies to produce cigarettes and commercial companies to sell cigarettes based on the predicted values.

Conclusion
The KNN classification algorithm based on overlapped k-means clustering proposed in this paper is still lower than the traditional KNN algorithm in classification accuracy. This situation comes from the algorithm's clustering process, which affects the accuracy of the classification algorithm. If the effect of clustering can be improved in the future, then the accuracy of classification will catch up with the traditional classic KNN algorithm, especially if other excellent clustering algorithms can be introduced or the conditions and methods of partitioning can be changed according to the data. The effect and efficiency of the algorithm can be further improved.
Using the big data method in intelligent IoT to analyze the historical sales data of the company, its time series presents the characteristics of dual trend changes. According to its characteristics, a combined forecasting model based on the trend method and seasonal index method is proposed, and MAE, RMSE, and MAPE forecasting evaluation standards are used to compare the combined forecasting model. A comparative analysis with the forecasting effect of a single-trend forecasting model and a seasonal index model proves that the combined forecasting model is better than a single model. This paper studies the construction of a Hadoop massive data processing platform, an in-depth study of its key technologies HDFS and MapReduce, analyzes its working mechanism, connects with the actual situation of the enterprise, analyzes the feasibility of technology implementation and environment construction, and uses the Hadoop platform to sell mass sales first data information undergoes preprocessing such as data cleaning, dimensionality reduction, and structural standardization and then provides these processed data to a relational database to perform data analysis and processing of related businesses. This provides support for sales forecasting models based on massive data processing.

Data Availability
This article selects the monthly sales volume of a certain brand of cigarettes from China Tobacco in this province from 2016 to 2019 as the analysis data set.

Conflicts of Interest
The authors declare that they have no conflicts of interest.