Cloud Model-Based Fuzzy Inference System for Short-Term Trafﬁc Flow Prediction

: Since trafﬁc congestion during peak hours has become the norm in daily life, research on short-term trafﬁc ﬂow forecasting has attracted widespread attention that can alleviate urban trafﬁc congestion. However, the existing research ignores the uncertainty of short-term trafﬁc ﬂow forecasting, which will affect the accuracy and robustness of trafﬁc ﬂow forecasting models. Therefore, this paper proposes a short-term trafﬁc ﬂow forecasting algorithm combining the cloud model and the fuzzy inference system in an uncertain environment, which uses the idea of the cloud model to process the trafﬁc ﬂow data and describe its randomness and fuzziness at the same time. First, the fuzzy c-means algorithm is selected to carry out cluster analysis on the original trafﬁc ﬂow data, and the number and parameter values of the initial membership function of the system are obtained. Based on the cloud reasoning algorithm and the cloud rule generator, an improved fuzzy reasoning system is proposed for short-term trafﬁc ﬂow predictions. The reasoning system cannot only capture the uncertainty of trafﬁc ﬂow data, but it also can describe temporal dependencies well. Finally, experimental results indicate that the proposed model has a better prediction accuracy and better stability, which reduces 0.6106 in RMSE, reduces 0.281 in MAE, and reduces 0.0022 in MRE compared with the suboptimal comparative methods.


Introduction
In recent years, rapid economic development has brought about rapid population growth and the increase of vehicle occupancy per capita, which impose a heavy burden on transportation infrastructure, such as insufficient parking spaces. According to the 2020 National Economic and Social Development Statistical Bulletin (http://www.gov.cn/ xinwen/2021-02/28/content_5589283.htm, accessed on 22 October 2022), the total number of civilian vehicles in the country has increased by 7.41% year-over-year, and the total number has exceeded 280 million. The continuous increase in the number of motor vehicles has brought many problems to society, such as traffic congestion, a waste of resources, economic losses, excessive commuting times, and frequent traffic accidents. In addition, the pollution caused by the large number of cars may threaten human health [1]. Since traffic flow can reflect the number of vehicles that pass a point in a certain period of time [2], accurate traffic flow forecasting is of great significance to management departments and individuals, which can optimize the design and operation of transportation systems to proposed a method to predict the spatio-temporal characteristics of short-term traffic flow by combing the k-nearest neighbor algorithm and bidirectional long-short-term memory network model. However, a single short-term traffic flow prediction model is difficult to meet various situations in real life. Therefore, to improve the prediction ability and prediction accuracy, hybrid prediction models have received extensive attention, which takes full advantages of different models.
Some hybrid methods are raised for forecasting short-term traffic flow by combining several techniques [23][24][25]. Considering the forecasting performance is seriously deteriorated by non-Gaussian noises inside the traffic flow sequence, Fang et al. [26] presented an error distribution free deep learning for short term traffic flow forecasting. Liu et al. [27] put forward a hybrid short-term traffic flow forecasting method combining the neural networks and KNN. In order to improve the forecasting accuracy of short-term traffic flow and provide precise and reliable traffic information for traffic management units and travelers, Liu et al. [28] raised a hybrid forecasting model based on KNN and SVR. Luo et al. [29] proposed a spatiotemporal traffic flow prediction method by combining KNN and long-short-term memory network (LSTM), called KNN-LSTM. However, the above-mentioned methods ignore the uncertainty in the traffic flow data, which affects the accuracy and robustness of the traffic flow prediction model. The uncertainty involves ambiguity and randomness, and they often appear at the same time [30]. It is worth noting that fuzzy systems can describe the ambiguity well. Therefore, researchers often combine fuzzy systems with ANNs, which are called fuzzy neural networks or neuro-fuzzy models. Zhou et al. [31] proposed a novel deep-learning model for short-term traffic flow prediction by considering the inherent features of traffic data. In addition, a novel approach of the estimation of uncertainty is proposed, which is based on the notion of Intuitionistic fuzzy set (an extension of the Fuzzy set of Lotfi Zadeh) and an intuitionistic fuzzy traffic characterization [32].
Considering the Fuzzy Inference System (FIS) has the ability to autonomously imitate the human brain for reasoning, the Adaptive Neuro-Fuzzy Inference System (ANFIS) was developed by Jang Roger [33]. The system combined the learning mechanism of neural networks and the reasoning ability of FIS. ANFIS can adaptively extract network inference rules from data samples with the help of the neural network's autonomous learning advantages. It shows unique characteristics and has been successfully applied in many fields. Keskin et al. [34] used the synthetic sequence generated by the ARIMA model as the training set of ANFIS and developed a flow prediction method based on the combination of ANFIS and the stochastic hydrological model. Ahmadianfar et al. [35] adopted the integration of an adaptive hybrid of differential evolution and particle warm optimizations with an adaptive neuro fuzzy inference system model for EC prediction. Acakpovi et al. [36] used ANFIS to predict the reliability of power demand. Mohiyunddin et al. [37] introduced a novel ANIFS for data protection to improve and determine the degree of security. Chen et al. [38] proposed a short-term traffic flow prediction based on ANFIS. Ghenai et al. [39] developed a short-term and accurate energy consumption forecast for educational building. This aims to balance the supply from renewable power systems and the building electrical load demand. Although ANFIS can describe the ambiguity in the traffic flow data, it cannot reflect the randomness of the data. The cloud model proposed by Li et al. [40] can simultaneously capture multiple uncertainties, especially randomness. In order to describe the ambiguity and randomness of traffic flow data simultaneously and improve the prediction performance of the model, we combine cloud models and FIS to solve traffic flow forecasting problems in ANN. In summary, the main contributions of our work are listed below: (1) The cloud model and fuzzy inference system are combined to describe the ambiguity and randomness in the traffic flow. Put the cloud model in the network of the fuzzy inference system for training instead of using the inference rules to perform simple mapping between two cloud models.
(2) By calculating the weight of the historical time series of traffic flow, a weighted multi-dimensional cloud model is generated.
(3) Based on the weighted multi-dimensional cloud model, the improved fuzzy prediction system is constructed for short-term flow predictions; the system can describe the randomness problems and ambiguity of the data at the same time. It overcomes the shortcomings of fuzzy inference systems, which cannot capture the timing characteristics of long sequence data well.
The following content includes four sections. Section 2 introduces the basic knowledge related to the paper. Section 3 describes the improved fuzzy inferenced systems and explains the input layer, the cloudification fuzzy layer, the cloudification rule layer, the standardization layer, the inverse cloudification layer, and the output layer in detail. Section 4 demonstrates experiments for verifying the effectiveness of the raised model. Section 5 summarizes the whole paper.

Fuzzy Inference Systems
A Fuzzy Inference System (FIS) is a system with the ability to handle fuzzy data based on fuzzy set theory and fuzzy logic methods, which simulate the fuzzy reasoning process of human beings by applying fuzzy sets and fuzzy rules to input data to generate fuzzy output results. Next, we will give the definition of fuzzy rules.
Definition 1 [41]. Suppose the input-output data records of fuzzy rules are given, (x p ; y p ), p = 1, 2, . . . , N, where x p (x p ∈ R m ) is the input, y p (y p ∈ R) is the output, and p denotes the pth sample. Then, single fuzzy IF-THEN rule performs as follows: IF x p is A, THEN y p is B, where A and B are fuzzy sets defined in R. Fuzzy systems mainly consist of a fuzzy input layer, a fuzzy inference method, a fuzzy rule base, and a defuzzification layer [42].
The fuzzy layer is responsible for mapping the exact values entering the fuzzy system to a fuzzy set over a given theoretical domain. Fuzzification methods include the fuzzy single value method, the triangular membership function method, and the Gaussian membership function method. Since the Gaussian membership function has a good antiinterference ability and the fuzzification results are closer to human cognition, it is mostly used in research.
The fuzzy rule base, which is the core part of the fuzzy inference system, consists of all the fuzzy rules in the system. It has two forms, including one-dimensional fuzzy rules and multi-dimensional fuzzy rules. The fuzzy inference engine is mainly responsible for calculating the incentive intensity of the rules in the rule base.
The defuzzification layer is to determine the best accurate value that can represent the fuzzy set. The method of defuzzification is not unique, as it mainly includes the maximum membership method, the center of gravity method, and the center average method.

Cloud Model
Inspired by probabilistic mathematics and the fuzzy set theory, Li et al. [40] created the cloud model, which is a new method to recognize uncertainty and an important way to realize two-way cognitive conversions between qualitative semantics and quantitative values. The cloud model allows for a certain degree of deviation between random phenomena and normal distribution and measures the deviation between them. At the same time, cloud models can describe the inherent correlation between randomness and fuzziness in uncertainty. Next, the definition of the cloud model is given.
Definition 2 [42]. Assume a universe X = {x i }, where x i is an exact value and there exists a set of linguistic terms, T with X. If x is a random instance on T, and the degree of certainty u(x) of x for T is a random number with a stable tendency within the interval [0, 1], then the distribution of x is called a cloud on X, and each random instance of x is called a cloud drop on domain X.
The cloud model is generally described using three characteristic values: Ex, En, and He. Among them, Ex is the expectation, representing the expectation of the sample with a membership degree of 1 in T and reflecting the center position of the sample; En is entropy and He is hyperentropy, both of which are determined by the correlation between randomness and fuzziness within T simultaneously. The entropy En can be used to measure the degree of randomness in the sample, manifested as the width of the cloud model (that is the distribution range of cloud droplets on the horizontal axis within the universe). Hyperentropy He can reflect the degree of dispersion of T, manifested as the thickness of the cloud model, i.e., the degree of condensation of cloud droplets within the universe. A cloud model can be labeled as C = (Ex, En, He). The Gaussian cloud model, based on the Gaussian distribution function and Gaussian membership function, is the most important cloud model, which is defined as follows.
Definition 3 [42]. Let U be the universe of discourse and T be a linguistic terms set in U. If x ∈ U is a random instantiation of concept T and satisfies x ∼ N Ex, En 2 , En ∼ N En, He 2 , then the certainty degree of x belonging to T satisfies y = e −(x−Ex) 2 2(En ) 2 where y belongs to [0, 1].
The distribution of X in the universe U is named a one-dimensional normal cloud, and the cloud drop can be written as (x, y). The cloud can effectively describe both fuzziness and randomness of a concept by three quantitative variables, i.e., expectation Ex, entropy En, and hyper entropy He.
The one-dimensional cloud was originally applied to solving the problem of decisionmaking evaluations. When the number of evaluation factors increases, the evaluation results deviate significantly from the actual situation. Therefore, a multi-dimensional cloud model is proposed to overcome the above-mentioned problems. The multi-dimensional cloud model is an extension of the one-dimensional cloud model, which adopts the onedimensional cloud method for each attribute of the multi-dimensional cloud [43]. In the following, we give the definition of the multi-dimensional cloud model. Definition 4 [43]. Let U be a set of samples where ∀X ∈ U, X = (x 1 , x 2 , · · · , x m ), and T be a qualitative concept on the domain U. ∀X ∈ U, there is a membership degree µ ∈ [0, 1] of X with respect to T. That is: U → [0, 1] .

Definition 5.
Assuming that the dimensions in the universe of discourse are independent of each other, then the m-dimensional cloud has 3m numerical eigenvalues: (Ex 1 , En 1 , He 1 , Ex 2 , En 2 , He 2 , · · · Ex m , En m , He m ). Where Ex 1 , Ex 2 , · · · , Ex m is the expectation, En 1 , En 2 , · · · , En m is the entropy of the multidimensional normal cloud, and He 1 , He 2 , · · · , He m is super-entropy. A multi-dimensional cloud model can be expressed by the following formula, which is called MEHS (Mathematical Expected Hyper Surface): The foundation of uncertainty reasoning is uncertainty knowledge, and the uncertainty information contained in uncertainty knowledge is often extracted using IF-THEN fuzzy rules. IF-THEN fuzzy rules include one-dimensional fuzzy rules and multidimensional fuzzy rules. Among them, the one-dimensional fuzzy rules are: If x is A, then y is B, which is called an uncertainty inference machine. The condition A corresponds to the linguistic terms set of universe U 1 , which is called the one-dimensional front part; the conclusion B corresponds to the linguistic terms set of universe U 2 , which is called the one-dimensional back part. In cloud-reasoning algorithms, A is called a one-dimensional front part cloud for determining the membership u degree of x the linguistic terms set of U 1 , and it generally uses the X conditional cloud generator. B is called a one-dimensional Back Part Cloud, and a Y-condition cloud generator is used to determine the membership u degree of x belonging to the linguistic terms set of U 2 .
The one-dimensional precursor cloud generator, shown in Figure 1, converts the input data into cloud droplets and obtains the distribution range and pattern of the data. The mapping relationship between the input data and the membership degree is established. In the process of generating the membership degree, normal random numbers based on expectation and variance are used, and it considers the fuzziness and randomness of data in the overall calculation process. The detailed algorithm is as follows: Back Part Cloud, and a Y-condition cloud generator is used to determine the membership degree of x belonging to the linguistic terms set of . The one-dimensional precursor cloud generator, shown in Figure 1, converts the input data into cloud droplets and obtains the distribution range and pattern of the data. The mapping relationship between the input data and the membership degree is established. In the process of generating the membership degree, normal random numbers based on expectation and variance are used, and it considers the fuzziness and randomness of data in the overall calculation process. The detailed algorithm is as follows: Input: A cloud model ( , , ) and a quantitative x. Output: The membership degree of the quantitative values of x.
(1) Produce a normal random entropy based on the entropy and the hyperentropy .
(2) Calculate the cloud droplet at the specified value x, The accuracy of the one-dimensional back-part cloud generator depends on the amount of data in the model. When the number of cloud droplets is large enough, its three parameter values can be calculated according to the statistical characteristics. The greater the number of cloud droplets, the better the statistical effect.

Cloud Model Inference Rule Generator
By connecting an antecedent cloud generator to a consequent cloud generator, a single rule generator is constructed. The operating mechanism of a single rule generator is to connect the two in sequence, so that the combination of the two conditional cloud generators can realize the preservation and transmission of the uncertainty of the data and complete the uncertainty inference. The execution process of the algorithm is as follows: Step 1: Generate a normal random number with as the expected value and as the mean squared deviation.
Step 2: Calculation of the membership degree: Step 3: Generate a normal random number with as the expected value and as the mean squared deviation.
Step 4: When the quantization value , the antecedent cloud activates and rises along, the latter also activates and rises along this direction (1) Produce a normal random entropy based on the entropy E n and the hyperentropy He.
(2) Calculate the cloud droplet at the specified value The accuracy of the one-dimensional back-part cloud generator depends on the amount of data in the model. When the number of cloud droplets is large enough, its three parameter values can be calculated according to the statistical characteristics. The greater the number of cloud droplets, the better the statistical effect.

Cloud Model Inference Rule Generator
By connecting an antecedent cloud generator to a consequent cloud generator, a single rule generator is constructed. The operating mechanism of a single rule generator is to connect the two in sequence, so that the combination of the two conditional cloud generators can realize the preservation and transmission of the uncertainty of the data and complete the uncertainty inference. The execution process of the algorithm is as follows: Step 1: Generate a normal random number Ex A with En A as the expected value and He A as the mean squared deviation.
Step 2: Calculation of the membership degree: Step 3: Generate a normal random number En B with En B as the expected value and He B as the mean squared deviation.
Step 4: When the quantization value x A ≤ Ex A , the antecedent cloud activates and rises along, the latter also activates and rises along this direction Step 5: When the quantization value x A > Ex A , the cloud of antecedents is activated and descends along, then the latter also activates and descends in this direction.
In practice, the multi-rule inference algorithm is generally used, as shown in Figure 2. Through a logical calculation, uncertainty reasoning for multi-rule reasoning can be achieved. In the actual operation process, the number of conditions and rules is determined based on the specific manifestations of different datasets. For different inference rules, logical computation operators are mainly divided into "soft AND" and "soft OR" operators. When the result of reasoning needs to meet the requirements of all conditional attributes, a logical "AND" operation is performed, which is called the "soft AND" algorithm. When the inference result satisfies one or more of the conditions, a logical "OR" operation is performed, which is called the "soft OR" algorithm. In order to simplify the calculation, it is necessary to minimize the possibility of multiple conditions and rules appearing during the inference process. As the number of multiple conditions and rules in the system increases, the number of rules will rapidly increase, and the computational difficulty will then significantly increase. Therefore, it is necessary to perform a certain degree of "dimensionality reduction" on inference rules, split complex rules that are difficult to calculate. Thus, they reduce the computational workload and complexity of the model. In researching the literature, the "max" function is generally used to take the maximum value, and the "prod" function is used to calculate the cumulative result for the "soft sum" calculation to obtain the comprehensive membership degree.
Step 5: When the quantization value A A x Ex > , the cloud of antecedents is activated and descends along, then the latter also activates and descends in this direction.
In practice, the multi-rule inference algorithm is generally used, as shown in Figure  2. Through a logical calculation, uncertainty reasoning for multi-rule reasoning can be achieved. In the actual operation process, the number of conditions and rules is determined based on the specific manifestations of different datasets. For different inference rules, logical computation operators are mainly divided into "soft AND" and "soft OR" operators. When the result of reasoning needs to meet the requirements of all conditional attributes, a logical "AND" operation is performed, which is called the "soft AND" algorithm. When the inference result satisfies one or more of the conditions, a logical "OR" operation is performed, which is called the "soft OR" algorithm. In order to simplify the calculation, it is necessary to minimize the possibility of multiple conditions and rules appearing during the inference process. As the number of multiple conditions and rules in the system increases, the number of rules will rapidly increase, and the computational difficulty will then significantly increase. Therefore, it is necessary to perform a certain degree of "dimensionality reduction" on inference rules, split complex rules that are difficult to calculate. Thus, they reduce the computational workload and complexity of the model. In researching the literature, the "max" function is generally used to take the maximum value, and the "prod" function is used to calculate the cumulative result for the "soft sum" calculation to obtain the comprehensive membership degree.

Improved Fuzzy Inference System
Fuzzy inference systems usually use the membership function in the fuzzification layer to project the exact value of the input values into the fuzzy set. Common fuzzy membership functions include the triangular membership function, trapezoidal membership function, generalized bell-shaped membership function, Gaussian membership function, joint-Gaussian membership function, etc., among which the Gaussian membership function is most widely used. However, due to the different driving habits of drivers, there is a certain degree of randomness in the traffic flow data, and the above-mentioned functions

Improved Fuzzy Inference System
Fuzzy inference systems usually use the membership function in the fuzzification layer to project the exact value of the input values into the fuzzy set. Common fuzzy membership functions include the triangular membership function, trapezoidal membership function, generalized bell-shaped membership function, Gaussian membership function, joint-Gaussian membership function, etc., among which the Gaussian membership function is most widely used. However, due to the different driving habits of drivers, there is a certain degree of randomness in the traffic flow data, and the above-mentioned functions cannot describe the randomness of the traffic flow data well, so this paper introduces the cloud model as the membership function.
For ease of understanding, Figure 3 shows the improved fuzzy inference system. The fuzzy inference system consists of five network layers, namely the input layer, the cloudification fuzzy layer, the cloudification rule layer, the standardization layer, the inverse cloudification layer, and the output layer, where {X t |t = 1, 2, 3, · · · , n } denotes a time sequence of the observed traffic flow data and the output result represents the predicted traffic flow at time t + 1.
For ease of understanding, Figure 3 shows the improved fuzzy inference system. The fuzzy inference system consists of five network layers, namely the input layer, the cloudification fuzzy layer, the cloudification rule layer, the standardization layer, the inverse cloudification layer, and the output layer, where { | = 1,2,3, ⋯ , } denotes a time sequence of the observed traffic flow data and the output result represents the predicted traffic flow at time t + 1. The execution function of each layer of the improved fuzzy inference system is as follows (for the convenience of symbolic representation, let the input of neuron i in network layer k be denoted as and the output as ): (1) Input layer: Each node on the input layer is directly connected to the clouded fuzzy layer and is primarily used to receive traffic flow data with a time window.
In addition to the strong cyclical correlation that traffic flow demonstrates with the same day of each week, there is also a cyclical similarity in traffic flow on a daily basis. If only daily variation is considered, the overall trend of traffic flow over a 24-h period is reflected; if only the weekly cyclical variation is considered, the overall trend of traffic flow over a 24-h period on the same day of each week is reflected. If only one of the above is considered, it does not fully reflect the traffic flow pattern and needs to be considered in a comprehensive manner. In order to model the daily and weekly periodicity of traffic flow, the periodic input matrix for time t is given as follows: where x represents the time series, ( ) and ( ) represent the traffic flow data for the previous d days and w weeks, respectively. Therefore, the input layer of the fuzzy system is responsible for passing each component of the traffic flow history data  The execution function of each layer of the improved fuzzy inference system is as follows (for the convenience of symbolic representation, let the input of neuron i in network layer k be denoted as I k i and the output as O k i ): (1) Input layer: Each node on the input layer is directly connected to the clouded fuzzy layer and is primarily used to receive traffic flow data with a time window.
In addition to the strong cyclical correlation that traffic flow demonstrates with the same day of each week, there is also a cyclical similarity in traffic flow on a daily basis. If only daily variation is considered, the overall trend of traffic flow over a 24-h period is reflected; if only the weekly cyclical variation is considered, the overall trend of traffic flow over a 24-h period on the same day of each week is reflected. If only one of the above is considered, it does not fully reflect the traffic flow pattern and needs to be considered in a comprehensive manner. In order to model the daily and weekly periodicity of traffic flow, the periodic input matrix for time t is given as follows: where x represents the time series, x d (t) and x w (t) represent the traffic flow data for the previous d days and w weeks, respectively. Therefore, the input layer of the fuzzy system is responsible for passing each component of the traffic flow history data X(t) X d (t), X w (t) to the clouded fuzzy layer. Input: I 1 i = x i (t); Output: O 1 i (i = 1, 2, · · · , n); n = d + w, indicating the total number of nodes in the first layer of the network.
(2) Cloudification fuzzy layer: It performs uncertainty processing on the data and maps the exact traffic flow values to the uncertainty space. Each node in the clouded fuzzy layer represents a sub-subordinate cloud model generated by the time value t and the Xconditional cloud generator, which calculates the degree of certainty of each input temporal component. In this layer of network, the input time series data is clustered first according to the chapter fuzzy clustering algorithm. The number of membership functions in the system is equal to the number of clusters, and the initial parameters of the membership functions are determined by the clustering results. Input: where i = 1, 2, · · · , n; j = 1, 2, · · · , m i , µ j i denotes the jth cloud model affiliation function corresponding to the time series of the input system; mi denotes the number of discrete sub-clouds into which x i (t) is divided.
(3) Cloudification rule layer: The rule layer is mainly responsible for cloud rule matching, and each fuzzy rule has a corresponding node in this layer. t-mode "AND" and "OR" are the most commonly used operators for fuzzy set combination, and the soft "AND" operator is activated on the rule layer. The activation of each cloud rule a k can be determined by the soft "AND" operator, where k = 1, 2, · · · , m. The soft "AND" calculation process refers to the multi-dimensional normal cloud generator introduced in Definition 3 to calculate the membership degree of the multi-dimensional normal cloud.
This paper argues that the closer the historical traffic flow series is to the prediction time point, the higher the similarity with the prediction time period. Thus, this paper gives higher weights to the time series with high impact in the historical traffic flow sequence for compensating the lack of learning ability of the fuzzy inference system. Assuming that the input sequence is {X(t)|x 1 (t), x 2 (t), · · · , x n (t) }, we assume that the ith time series of {X(t)|x 1 (t), x 2 (t), · · · , x n (t) } has a high impact on the prediction result [44]. Therefore, we calculate the corresponding weights to be assigned to each time series to improve the prediction accuracy. Then, we perform multiple linear regression using multiple time series data, calculated as follows: where w n is the corresponding weight and b is the bias. The weight and bias parameters in the cloud rule can be obtained by minimizing the equation L(h θ (x), y t ). h θ (x) is the predicted value. Finally, the weights can be obtained as: where W n is the weight of the nth day before the prediction time point. In this paper, the Softmax classifier function is used to ensure that the sum of all weights is 1. The multi-dimensional normal cloud, processed by the fuzzy-rule enhancement mechanism, is calculated as follows.
The fuzzy inference rule for this layer is: If the sub-subordinate cloud function has a membership degree of µ k1 , µ k2 , · · · , µ kn , then the combined membership of the rule is µ k .
(4) Standardization layer: It is mainly responsible for the standardization operation of values. The following formula is used to calculate the normalized activation intensity a k corresponding to the activation degree a k passed into this layer.
Input:   Figure 4, where data quality represents the detected proportion, i.e., vehicles detected and recorded in the current time period/the total number of vehicles.
(5) Inverse cloudification layer: Quantitatively transform the fuzzy membership degree ̄ and generate a subsequent cloud by the Y conditional cloud generator. Then output the inference result and its corresponding traffic flow value qk.

Data Description and Indexes
We  Figure 4, where data quality represents the detected proportion, i.e., vehicles detected and recorded in the current time period/the total number of vehicles. In addition, we adopt three indexes for reflecting the performance of the improved fuzzy prediction system, which are Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Relative Error (MRE). The detailed calculation formulas of these three indexes are as follows: where denotes the number of samples, is the predicted value and is the real value. RMSE is used to reflect the unbiasedness of sequence prediction. The smaller the value is, the smaller the dispersion degree of error distribution is and the better the prediction performance is.
MAE is used to represent the average absolute deviation between the predicted value and the real value. The smaller MAE is, the smaller the error is, and the better the prediction effect is. In addition, we adopt three indexes for reflecting the performance of the improved fuzzy prediction system, which are Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Relative Error (MRE). The detailed calculation formulas of these three indexes are as follows: (10) where N denotes the number of samples, y i is the predicted value and y i is the real value. RMSE is used to reflect the unbiasedness of sequence prediction. The smaller the value is, the smaller the dispersion degree of error distribution is and the better the prediction performance is.
MAE is used to represent the average absolute deviation between the predicted value and the real value. The smaller MAE is, the smaller the error is, and the better the prediction effect is.
MRE is the average value of relative error. The smaller MRE is, the closer the predicted value is to the actual value, and the better the prediction effect is.

Experiment Settings and Results
In this paper, the FCM (Fuzzy C-Means) method is used to generate fuzzy inference rules. The distribution matrix index is 2, the maximum number of iterations is 500, the model training method uses a BP neural network algorithm, and the time window of traffic flow history data is 5. All experiments in this chapter are performed on MATLAB R2017b, and the platform used is Windows 10 (CPU: i7-10875H).
The number of fuzzy clusters affects the number of membership functions and the number of rules in the fuzzy system, which affects the final model prediction effect in turn. This paper first increases the number of fuzzy clusters from 2 to 10 at an interval of 1, and then increases the number of fuzzy clusters from 10 to 100 at an interval of 10 before observing the change in the error value of the model output result. The prediction result and calculation time are shown in Table 1. In addition, we also present a visual Figure 5 of Table 1 for comparative analysis. MRE is the average value of relative error. The smaller MRE is, the closer the predicted value is to the actual value, and the better the prediction effect is.

Experiment Settings and Results
In this paper, the FCM (Fuzzy C-Means) method is used to generate fuzzy inference rules. The distribution matrix index is 2, the maximum number of iterations is 500, the model training method uses a BP neural network algorithm, and the time window of traffic flow history data is 5. All experiments in this chapter are performed on MATLAB R2017b, and the platform used is Windows 10 (CPU: i7-10875H).
The number of fuzzy clusters affects the number of membership functions and the number of rules in the fuzzy system, which affects the final model prediction effect in turn. This paper first increases the number of fuzzy clusters from 2 to 10 at an interval of 1, and then increases the number of fuzzy clusters from 10 to 100 at an interval of 10 before observing the change in the error value of the model output result. The prediction result and calculation time are shown in Table 1. In addition, we also present a visual Figure 5 of Table 1 for comparative analysis.   Among them, the rate of change represents the growth rate of the RMSE value (operation time) under each cluster number relative to the RMSE value (operation time) under the previous cluster number. It can be seen from Table 1 that as the number of membership functions increases, the RMSE value of the training set gradually decreases at the beginning. This paper deems that this is because the greater the number of fuzzy clusters, the more the parameters of each fuzzy membership function. It can reflect the characteristics of time series data in this category and finally obtain better training set RMSE results. When the number of clusters increases from 2 to 6, the RMSE of the test set also decreases. However, as the number of clusters continues to increase, the RMSE value of the prediction set gradually increases. This shows that with the increase of membership functions, the model can better fit time series data. However, when the number of membership functions exceeds a certain threshold, there is overfitting in the model, and the generalization ability of the model becomes worse. In addition, as the number of fuzzy clusters increases, the RMSE value of the training set decreases less, while the RMSE value of the prediction set increases more. Take the two sets of data with the number of clusters 20 and 30 as an example. When the number of clusters increases from 20 to 30, the RMSE value of the training set decreases by 0.888%, while the RMSE of the test set increases by 1.718%. The increase is the decrease of the training set. The increase is nearly twice the decrease of the training set. This means that for the current data set, a continuous increase in the number of membership functions does not effectively improve the performance of the model, but it actually leads to a very high error value in the prediction. In addition, when the number of clusters increases, the average running time of the model increases by 85.661%, indicating that the complexity of the model is greatly increased, the convergence speed is greatly slowed down, and the prediction effect is greatly reduced. In order to reduce the degree of model overfitting, this paper sets the number of fuzzy clusters to 6. At this time, the prediction effect of the model is relatively good, the model is relatively simple, and the algorithm converges faster.

Experimental Results
According to the experimental results in Section 4.2, the experiment in this section takes 6 as the number of fuzzy clusters, and the maximum number of iterations is 500. The traffic flow forecast results for one day on 30 August 2019 are shown in Figure 6. It can be seen that the peak flow of this monitoring point is about 300 vehicles, and it generally appears at about 15:00 pm to 16:00 pm. The general trend of traffic flow at the monitoring site is to decrease from about 0:00 in the morning and remain at about 20 for a duration of about 5 h. From 5:00, the traffic gradually increases until it reaches the maximum at 8:00 in the morning. The traffic flow reaches the peak of the day from 15:00, and then maintains a high flow value until around 20:00 and starts to slowly decrease. In addition, as shown in Figure 6, the traffic monitoring point was disturbed by a large amount of external noise during the time period from 0:00 to 5:00 on 30 August 2019, and a large number of data points with a traffic value of 0 appeared. According to the source data, the observation rate of the monitoring points in this time period is 0% (Observed = 0%), that is, the monitoring points are not working normally in this time period. On the day of 30 August 2019, there were a total of 288 observation points, of which 22 were missing points, accounting for 7.6% of the total observation data for that day. However, the improved fuzzy inference system proposed in this paper has not been affected by the missing data values. The model's prediction effect on the day was good. The RMSE of the model was 19.32. After removing the missing points, the model's predicted RMSE value on the day was reduced to 19.10, a decrease of only 1.2%. From this we know that the proposed model has good robustness and can effectively resist the external noise interference in the data.

Comparative Experiments
In order to verify the accuracy of the membership function of the cloud model adopted in this paper, the traditional Gaussian membership function, linear membership function, and triangular membership function are used in Comparative Experiment 1. In order to verify the rationality of the FCM algorithm for calculating the number of membership functions, Comparative Experiment 2 uses subtractive clustering and grid segmentation to cluster the data. According to the experimental results in Section 4.2, Comparative Experiments 1 and 2 both take 6 as the number of clusters, and the maximum number of iterations is 500. This section takes 30 August 2019 16:00-19:00 in the afternoon as an example.

Comparative Experiments
In order to verify the accuracy of the membership function of the cloud mod adopted in this paper, the traditional Gaussian membership function, linear membersh function, and triangular membership function are used in Comparative Experiment 1. order to verify the rationality of the FCM algorithm for calculating the number of mem bership functions, Comparative Experiment 2 uses subtractive clustering and grid se mentation to cluster the data. According to the experimental results in Section 4.2, Com parative Experiments 1 and 2 both take 6 as the number of clusters, and the maximu number of iterations is 500. This section takes 30 August 2019 16:00-19:00 in the afterno as an example.
The most common, simplest, and easy-to-implement fuzzification method is the li ear function method. Because this method is simple enough, it can only process relative accurate input data. Under this circumstance, the fuzzification performance of th method is good. When the interference noise contained in the input data gradually i creases, it is necessary to change the fuzzification method to the triangular membersh function method. Relatively speaking, the operation process of this function is relative simple, and the robustness of the calculation result is improved to a certain extent. A hough the Gaussian membership function is more complicated than the first two ba methods, its calculation result has good anti-interference ability, and the fuzzification r sult is closer to the characteristics of human cognition, so it is widely used in fuzzy re soning systems. The prediction results of Experiment 1 are shown in Figure 7. The mod evaluation matrix is shown in Table 2:  The most common, simplest, and easy-to-implement fuzzification method is the linear function method. Because this method is simple enough, it can only process relatively accurate input data. Under this circumstance, the fuzzification performance of this method is good. When the interference noise contained in the input data gradually increases, it is necessary to change the fuzzification method to the triangular membership function method. Relatively speaking, the operation process of this function is relatively simple, and the robustness of the calculation result is improved to a certain extent. Although the Gaussian membership function is more complicated than the first two basic methods, its calculation result has good anti-interference ability, and the fuzzification result is closer to the characteristics of human cognition, so it is widely used in fuzzy reasoning systems. The prediction results of Experiment 1 are shown in Figure 7. The model evaluation matrix is shown in Table 2 the range of changes is large. The prediction results of the triangular membership function, the Gaussian membership function, and the cloud model proposed in this paper constitute a consistent trend. The linear membership function has a relatively simple structure, and only the predicted trend is consistent with the actual value. However, there are many outliers in the prediction result, and the prediction effect is poor. This article considers that this is because the traffic flow data contains a lot of noise interference, and the anti-interference ability of the linear membership function is poor. From Table 2, it can be seen that three evaluation matrix results of the cloud model are better than the other three membership function models. Although the triangular membership function and Gaussian membership function models have certain robustness, the prediction effect is still inferior to the cloud model. This is because the cloud model not only considers the fuzziness of the data, but it also considers the randomness in the data so the improved fuzzy inference system proposed in this paper has certain advantages. The grid segmentation and subtractive clustering algorithm are two commonly used clustering methods in fuzzy inference systems. In Comparative Experiment 2, the influence radius of subtractive clustering algorithm is set to 0.55. The experimental results are shown in Table 3:   As can be seen from Figure 7, because the traffic flow data is interfered by a large amount of external noise, its data changes show a certain degree of twists and turns, and the range of changes is large. The prediction results of the triangular membership function, the Gaussian membership function, and the cloud model proposed in this paper constitute a consistent trend. The linear membership function has a relatively simple structure, and only the predicted trend is consistent with the actual value. However, there are many outliers in the prediction result, and the prediction effect is poor. This article considers that this is because the traffic flow data contains a lot of noise interference, and the anti-interference ability of the linear membership function is poor. From Table 2, it can be seen that three evaluation matrix results of the cloud model are better than the other three membership function models. Although the triangular membership function and Gaussian membership function models have certain robustness, the prediction effect is still inferior to the cloud model. This is because the cloud model not only considers the fuzziness of the data, but it also considers the randomness in the data so the improved fuzzy inference system proposed in this paper has certain advantages.

RMSE MAE MRE
The grid segmentation and subtractive clustering algorithm are two commonly used clustering methods in fuzzy inference systems. In Comparative Experiment 2, the influence radius of subtractive clustering algorithm is set to 0.55. The experimental results are shown in Table 3: It can be seen from the table that the model error results of the FCM algorithm are better than the other two clustering algorithms, which proves that the FCM algorithm used for generating the system membership function is reasonable. The grid-segmentation algorithm has the worst predictive effect. It may be because the grid-segmentation algorithm is a hard classification. All clusters are performed on the grid, and only regular clusters (such as cluster boundaries horizontal or vertical) can be explored, while the clusters with a strong correlation on the oblique boundary at the boundary point cannot be detected. In addition, traffic flow data belong to complex high-latitude data, and the grid cells divided in the grid-segmentation method have an exponential relationship with the data dimension. When the data dimension increases, the number of grid cells will explode, and the time complexity will also increase. Its running time is as long as 7681.32 s, which is much higher than the 104.1 s of the FCM algorithm and 104.977 s of the subtractive clustering algorithm. Although the subtractive clustering algorithm is an improved clustering algorithm proposed for large sample data, the efficiency of the algorithm has been improved. However, this paper thinks that the initial centers of subtractive clustering are all based on the points in the original data, which is not the true clustering center in theory. This kind of algorithm will gradually produce errors in the clustering process, and the errors will eventually lead to poor prediction results after multiple accumulations.

Conclusions
This paper proposes a cloud model-based fuzzy inference system for short-term traffic flow prediction. First, it briefly introduces the basic knowledge and algorithms of fuzzy mathematics. Then, it introduces a cloud model overview, cloud inference algorithm, and its rule generator, respectively. Finally, this paper uses the cloud model to fit the randomness and uncertainty of the traffic flow data and compares it with the typical road section, i.e., the fuzzy inference system based on the traditional Gaussian membership function, triangular membership function, and linear membership function. The experimental results show that the improved fuzzy prediction system has superiority and practicability under different conditions. The proposed model laid the theoretical foundation for the construction of the short-term traffic flow forecasting model based on the improved fuzzy theory.
There are still many research flaws in this paper. In future research, we will study the problem using the Intuitionistic fuzzy sets to model the uncertainty in the short-term traffic flow prediction.

Conflicts of Interest:
The authors declare that there is no conflict of interest regarding the publication of this paper.