Application of Grey Model and Neural Network in Financial Revenue Forecast

: There are many influencing factors of fiscal revenue, and traditional forecasting methods cannot handle the feature dimensions well, which leads to serious over-fitting of the forecast results and unable to make a good estimate of the true future trend. The grey neural network model fused with Lasso regression is a comprehensive prediction model that combines the grey prediction model and the BP neural network model after dimensionality reduction using Lasso. It can reduce the dimensionality of the original data, make sep-arate predictions for each explanatory variable, and then use neural networks to make multivariate predictions, thereby making up for the shortcomings of traditional methods of insufficient prediction accuracy. In this paper, we took the financial revenue data of China’s Hunan Province from 2005 to 2019 as the object of analysis. Firstly, we used Lasso regression to reduce the dimensionality of the data. Because the grey prediction model has the excellent predictive performance for small data volumes, then we chose the grey prediction model to obtain the predicted values of all explanatory variables in 2020, 2021 by using the data of 2005–2019. Finally, considering that fiscal revenue is affected by many factors, we applied the BP neural network, which has a good effect on multiple inputs, to make the final forecast of fiscal revenue. The experimental results show that the combined model has a good effect in financial revenue forecasting.

In recent years, with the continuous development of the cloud computing technology, neural network technology has also ushered in a new wave, and has achieved good results in many fields, such as data analysis [11], spam detection [12], image recognition [13], and automatic driving [14]. Fang et al. [15] studied the problem of the ARMA-BP neural network combination model for forecasting the fiscal revenue. Jiang et al. [16] gave a Lasso-GRNN neural network model for the local fiscal revenue, taking into account the complex nonlinear relationship of its influencing factors. Chen et al. [17] proposed a deep network prediction model based on BP neural network. The fiscal revenue is affected by multiple factors such as economy and policy. A single model can only obtain part of the information on data changes, and the prediction accuracy is relatively low. Based on the above research, in this paper, a model, combining the GM(1,1) and the BP neural network, was proposed to predict the local fiscal revenue of Hunan Province in 2020 and 2021. Compared with a single prediction model GM (1,1), the results show that the combined model not only improves the prediction accuracy, but also provides a basis for the complex, dynamic and accurate forecast of fiscal revenue.

Variable Selection Model 2.1.1 Lasso Regression Theory and Algorithm
Lasso regression is a compression estimation method. In order to achieve compression of the model regression coefficient, its core principle is to constrain the absolute values' sum of the parameters to be estimated within a certain preset threshold by constructing a penalty function in the model [18]. When this threshold is set to a very small number, some regression coefficients could be compressed to 0, then variables with a coefficient of 0 could be eliminated, thereby achieving variable screening. Reducing irrelevant coefficients can enhance the interpretability of the model.
The Lasso method is equivalent to adding the L1 penalty term to the ordinary linear model: The equivalent is: where t is called the adjustment factor, corresponds to λ one-to-one. Lett 0 = d j=1 |β(OLS)|. When t < t 0 , some coefficients will be compressed to 0, resulting in sparseness, thereby reducing the dimension of X and achieving the effect of variable screening.

Ridge Regression Theory and Algorithm
In the ordinary linear model, when the covariates X (f ) are independent of each other, the parameter β obtained by ordinary least squares estimation has good properties, andβ OLS is an unbiased estimate. Among all unbiased estimates,β OLS has the smallest variance. But when the dimension of the covariate X (f ) increases, the correlation will increase, and matrix X will no longer be a full-rank matrix, which is commonly called an ill-conditioned matrix. When the matrix X is ill-conditioned, X T X is a singular matrix. At this time, theβ OLS variance is the smallest, but the value is large, resulting in low accuracy and instability. In this case, the Ridge regression method is usually used.
The Ridge method adds a L2 penalty term to the ordinary linear model: Equivalent to: β Ridge is a biased estimate of β, andβ Ridge is a proportional compression of the ordinary least squares estimationβ OLS , and it could not compress the non-zero coefficients t to zero.

Comparing Lasso Regression with Ridge Regression
The difference between Lasso Regression and Ridge regression is shown in Fig. 1. The constraint domain and contour lines of the two methods are described in the figure. The ellipse center pointβ corresponds to the least squares estimation of the linear model. The red ellipse contour represents the squares sum of the model residuals corresponding to λ, and the cyan part below is the constraint domain. The Lasso regression on the left is a square constrained domain, and the Ridge regression on the right is a circular constrained domain. The tangent point between the constraint domain and the contour line is the optimal solution. It can be clearly seen from the figure that the square constraint domain of Lasso regression can easily make the tangent point fall on the coordinate axis, and the variable coefficient could be taken to 0, resulting in sparseness. The circular constraint domain of ridge regression generally does not make the tangent point fall on the coordinate axis, and the variable coefficient could not be compressed to 0, then the variable selection could not be performed. Although ridge regression can also compress the original variable coefficients to a certain extent, it cannot compressed them to 0, so the final model will retain all the variables. On the contrary, Lasso directly compresses the coefficients with less correlation to 0 and proceed directly variable screening.

Grey System Theory
Grey theory is an emerging edge scientific theory, initiated by the famous Chinese scholar Deng Julong, which aims at "poor information" or "small sample" systems with incomplete information. That is to say, while reflecting the reality, the gray system theory conducts reasonable analysis and in-depth mining of incomplete information, obtains unknown information, and then makes a more accurate description of the overall development law and change trend [19].

Gray Sequence Generation
The information of the gray system is usually chaotic. By generating gray sequence, the method of mining the originally irregular data to explore the change law of the data is called gray generation. Gray generation can adjust the value and nature of the data in the sequence while maintaining the original sequence form, thereby revealing the regularity of the data and weakening the randomness of the data through a certain generation. Gray generation provides the basis and direction for modeling decision-making. It can dig out the hidden nature of the sequence, expose the monotonous increasing trend hidden in the sequence, and turn incomparable sequences into comparable sequences [20].
The commonly used gray sequence generation methods are: Accumulating Generation Operator, Inverse Accumulating Generation Operator, Average-generating Arithmetic Operators, Level Ratio Generation, and Buffer Generation [21]. In this paper, accumulating generation operator and average-generating arithmetic operators are used, and the two generation methods are briefly described below.
Accumulating Generation Operator is the most basic and important generation method of gray theory. Through accumulation, the data characteristics of the original sequence are transformed, and the regularity and predictability of the newly obtained sequence are integrated, thus reducing the randomness of the original sequence. The specific form of the original sequence X (0) is: Let: So: Here X(1) is an Accumulating Generation Operator of X(0). In the same way, any number of cumulative sequences can be derived.
Average-generating Arithmetic Operators includes adjacent generation and non-adjacent generation. Adjacent generation means that when the original sequence is equally spaced, the adjacent data in the sequence are averaged to generate a new data, so the new constructed sequence will be one unit less than the original sequence. Non-adjacent generation means that when the original sequence is not evenly spaced or there are abnormal points in the original data, the mean value of adjacent data is used to replace the abnormal points. It can be used to make up for missing points in the original sequence and construct new data reasonably. The problem of sequence vacancies caused by missing data is solved, and a complete sequence is formed.
As mentioned earlier, the original sequence X (0) expression is: Let: Expression p is called the generation coefficient, and the value of the generation coefficient represents different information weights in the new sequence X ( * ) . The value of p is usually 0.5.

Grey Prediction Model GM(1,1)
The GM(1,1) model is a classic model of gray theory. The two 1s in parentheses represent first-order differential equations and one variable. The GM(1,1) model firstly accumulates the original sequence data, converts the original data to non-negative and non-subtractive ones, establishes a differential equation for the accumulation sequence, uses the least square method to solve the equation coefficients, then predicts the accumulation sequence, and finally restitutes the accumulation sequence to obtain the prediction of the original sequence.
For the original sequence X (0) : The cumulative form of the gray generation sequence is X (1) : Let: So there is a sequence of mean values X (2) : With the above expressions, the original form of the G(1,1) model can be obtained: a is the development coefficient, and b is the ash effect.
GM(1,1) is solved by using the least square method, and a differential equation will be obtained. This equation is called the whitening equation of GM(1,1). The specific form is as follows: Solve the differential equation, and discretize the time response sequence: Finally, it can be used to predict the fitted value of the original sequence:

Neural Network Theory
The artificial neural network is a calculation model designed to simulate the human brain neural network. It simulates the human brain neural network in terms of structure, realization mechanism and function [22]. An artificial neural network is similar to a biological neuron. It is composed of multiple nodes (artificial neurons) connected to each other and can be used to model complex relationships between data. The connections between different nodes are given different weights, and each weight represents the influence of one node on another node. Each node represents a specific function, and the information from other nodes is comprehensively calculated with its corresponding weights, and then is used as input to an activation function to obtain a new activity value (excitement or inhibition). In the neural network, the function of the activation function is to add some nonlinear factors to the neural network, so that the neural network can better solve more complex problems. The commonly used activation functions are sigmoid function, and ReLU function [23,24].
The BP neural network learning algorithm is one of the most successful neural network learning algorithms. It is generally multi-layered, and another related concept is the multi-layer perceptron [25]. The multilayer perceptron emphasizes that the neural network is composed of multiple layers in structure, while the BP neural network emphasizes that the network adopts the learning algorithm of error back propagation. In the BP neural network, the weight parameter of each neuron is adjusted by back propagation to reduce the output error.

The Proposed Model
In order to better predict local fiscal revenue, we propose a combined model, as shown in Fig. 2. Firstly, the combined model executes the lasso algorithm to analyze the main factors affecting local fiscal revenue, and eliminates redundant factors with a correlation coefficient of 0. Secondly, it uses the GM(1,1) model for each main influencing index to get the predicted value. Thirdly, the GM(1,1) model predicted results are used as the input sample of the neural network, and the actual value of the relevant local fiscal revenue is used as the output sample for model training. Finally, the fiscal revenue forecasting result is obtained by adjusting the weights and thresholds of the corresponding nodes.

Data Acquisition and Variable Selection
The main factors affecting local fiscal revenue are: general public budget expenditure, total retail sales of consumer goods, fixed asset investment, total wages of employees, resident consumption index, regional GDP and other indicators. By consulting the local fiscal revenue structure analysis literature data, combined with the current economic situation of Hunan Province, we chose general public budget revenue as the explained variable, and 20 explanatory variables such as public budget expenditure, fixed asset investment, and so on [26][27][28]. These explanatory variables are shown in Tab. 1. We selected the latest data for 15 years from 2005 to 2019 for the experiment. The amount of data can not only reflect the changes in data, but also meet the small sample size required by the gray model. The selected fiscal revenue data sample size does not exceed 20, which is in line with the excellent feature of the gray system in predicting the small sample size. All data are from the "Hunan Provincial Statistical Yearbook 2020" (http://222.240.193.190/2020tjnj/indexch.htm).

Data Description and Statistics
We firstly carried out a comprehensive statistical description of the data and got a comprehensive grasp of the existing data. Usually the analysis of data statistics uses the maximum value, minimum value, average value, and standard deviation to make the overall description. We used python's built-in functions to directly find these four quantities, and then used the Pandas library to convert the data to Dataframe type. The output is shown in Tab. 2. Per capita gross domestic product X11 The output value ratio of the tertiary industry and the secondary industry X12 Total agricultural output value X13 Industrial output value X14 Total import and export X15 Total export X16 Total import X17 Consumer Price Index X18 Registered population at the end of the year X19 Permanent population at the end of the year X20 Number of employees at the end of the year Combined with the original data and the statistical indicators in Tab. 2, it can be seen that the local budget revenue of Hunan Province has increased significantly and all the indicators have also increased comprehensively. The standard deviation of the explained variable Y is as high as 954.59, indicating that there is a great difference between the data of each year. Since 2010, the local budget income has grown substantially, which also indicates that Hunan has been developing rapidly in the recent ten years. Through the analysis of the explanatory variables X6, X7, X8, and X9, it can be seen that the GDP of Hunan Province has been rising steadily. In the ten years from 2005 to 2015, the secondary industry's GDP accounted for the highest proportion and the growth rate was the fastest. This shows that Hunan Province has vigorously developed industry and introduced a large number of industrial production enterprises in the past decade. In 2016, the tertiary industry's GDP began to surpass. The industrial structure of the entire Hunan Province has begun to gradually transform, and the service industry has slowly risen. Linking the variables X3 and X4, this shows that the living income of Hunan residents has increased and the living standards have been greatly improved, thus attracting more people to live and develop in Hunan, and increasing the values of the variables X18, X19, and X20.

Correlation Analysis
Correlation analysis is a statistical method used to describe the correlation between variables. Because the correlation is a non-deterministic relationship, it can be used to initially judge the degree of correlation between the dependent variable and the explanatory variable. The commonly used correlation analysis coefficients are Pearson correlation coefficient and Spearman rank correlation coefficient. The Pearson coefficient is used in the experiment. The formula of Pearson coefficient is as follows: Based on the correlation coefficient p, the correlation degree could be obtained, which is shown in Tab. 3.
In order to show the degree of correlation more intuitively, we used a heat map to display the correlation coefficients of these 20 explanatory variables, as shown in Fig. 3.
It can be seen from the above Fig. 3 that the blue column represents the positive correlation between features, while the red column represents the negative correlation between features. The deeper the blue is, the stronger the correlation is, while the deeper the red is, the weaker the correlation is. Among them, the variables X11, X17, and X20 are relatively weak in correlation with the other explanatory variables, so they will be eliminated in the later feature selection.

Feature Selection and Dimensionality Reduction
Since a total of 20 explanatory variables are selected, the sample data is relatively complicated and the features are not obvious, so the Lasso algorithm is used to achieve dimensionality reduction, and select the most important features. By calling python's SKLEARN library and executing the Lasso algorithm, the results obtained are shown in Tab. 4.

Grey Model Predicting General Public Budget Revenue
After screening in the previous section, 10 explanatory variables are retained from the original 20 explanatory variables. By using the gray model, these 10 variables are used one by one to predict short-term data in order to obtain the values for 2020 and 2021. We took the compiled GM(1,1) program as a class object and directly imported it into the main program, then predicted the data of 10 explanatory variables. It is necessary to test whether the variable data is applicable to the gray prediction model before predicting, and the smooth ratio is an indicator specifically used to measure the applicability.
In this paper, the explanatory variable X7 was selected to show the prediction effect of the grey prediction model. Firstly, data applicability test was carried out. When the original data with smoothness less than 0.5 accounts for more than 60%, the test indicates that the data is suitable for the grey prediction model. The smoothness of the original data in each year is shown in Fig. 4. It can be seen that the proportion of less than 0.5 reaches 85.7%, so the data could be predicted by the gray model. By bring the data of X7 into the GM(1,1) model, we used the original data for fitting and prediction. The result is shown in Fig. 5. It can be seen from Fig. 5 that there is a small error between the fitted data and the original data. The forecast result show an upward trend, which indicates that the total value of the primary industry in Hunan Province will increase steadily in 2020 and 2021. However, the effect of the graph cannot alone determine the quality of the fitting and prediction. There are scientific methods to measure the quality of model prediction and fitting results. There are usually two indicators used to describe the degree of data fitting results: the relative residuals and the grade ratio deviation. When the relative residuals is less than 0.2 and the order ratio deviation is less than 0.15, the model fitting effect will be very good. We calculated the relative residuals and the grade ratio deviations of the X7 variable for each year, as shown in Fig. 6. It can be clearly seen from Fig. 6 that the relative residuals and the grade ratio deviations of the GM(1,1) model's fitting data pass the test very well. The posterior difference ratio is usually used to verify the quality of the predicted data. It has a set of test standards, as shown in Tab. 5. For the forecasting data of 2020 and 2021, the posterior difference ratio of the X7 is 0.27032, which meets the first-level accuracy standard. By using the GM(1,1) model, all the explanatory variables were predicted for 2020 and 2021, and the posterior difference ratio was used to test whether the prediction is good or bad. The results are shown in Tab. 6. It can be seen from the table that except for the variable X11, the other prediction accuracy is very good, which also proves that the gray prediction model has a very good prediction effect for short-term time series. In view of the predicting effect of the variable X11 is not good, which may affect the use of neural network to predict the fiscal revenue in the later period, so the variable X11 is artificially removed in the experiment. Since the above experiments proved the feasibility and the accuracy of the GM(1,1) model to predict the shorter time series data, we directly used the GM(1,1) model to predict the variable Y (financial revenue), and the result is shown in Fig. 7. It can be seen from the figure that the data fitting has achieved good results, but the forecasting effect is obviously faster than the growth trend in previous years. Through analysis, we find that the fiscal revenue is affected by multiple variables, but the GM(1,1) model only predicts the future trend based on the data of current variables, without considering other influencing factors, so the forecasting results are inaccurate. So we decide to use the neural networks to make predictions.

Neural Network Predicting General Public Budget Revenue
By using the GM(1,1) model, we get the predicted values of 9 explanatory variables X1, X3, X4, X7, X8, X13, X15, X16, and X19, and then we can use the neural network to predict the financial revenue. The neural network model needs to set the number of layers of the network in advance, and the hidden layer of the BP neural network model usually does not exceed two layers. The sample size here is not large, so only two hidden layers are used. The setting of the number of neurons in the hidden layer is also skillful. If the number of nodes in the hidden layer is too small, the network cannot have the necessary learning and information processing capabilities. On the contrary, if it is too much, it will not only greatly increase the complexity of the network structure, but also slow down the learning speed. The Kolmogorov method [29] is most commonly used when setting the number of neurons in the hidden layer, and it is set to 19. Because the neural network model is particularly sensitive to data, if there is a big difference in the magnitude of the data, the accuracy of the trained model will be very poor. Therefore, it is necessary to ensure that each of the 9 explanatory variables is at the same magnitude before the formal training begins. The z-score method is used for standardization.

New datas =
Raw datas − Mean Standard deviation (20) There is a very useful Keras library in Python, which is an open source advanced deep learning library that can run on TensorFlow or Theano. We used the Keras library to build a 3layer BP neural network, and the ReLU function was used as the activation function. When Keras library is used to build BP neural network, there is a very key parameter-BATCH_SIZE, which represents the number of samples used in one iteration of the algorithm. When the parameter is too large, although it will reduce the number of iterations, it will make the gradient descent effect worse, which makes the model effect bad. When the parameter is too small, the correction direction will be corrected by the gradient direction of the respective sample, which is difficult to converge. The BATCH_SIZE parameter in the experiment was set to 7.
After training the neural network model, we used the model.predict() function to predict the value of the financial revenue in 2020 and 2021. The result is shown in Fig. 8. It can be clearly seen from the figure that the fiscal revenue in 2020 and 2021 have a relatively stable upward trend. Compared with the prediction results of using the GM(1,1) model alone in the previous section, the upward trend of the prediction results of using the neural network is more gentle and more in line with the growth law of previous years. This is because the neural network model combines multiple influences, so it is obviously more convincing than the univariate prediction of the GM(1,1) model. The prediction result of the neural network is better, but compared with the actual fiscal revenue data released by Hunan Red Net, the predicted value in 2020 is much higher than the actual value. The actual fiscal revenue in 2020 is 300.87 billion yuan with a growth rate of 0.1%, and the forecast fiscal revenue in 2020 is 347.2056 billion yuan with a growth rate of 15.4%. The actual average growth rate from 2005 to 2019 was 14.48%. The growth rate predicted by the neural network is consistent with the growth rate of the previous 15 years. The reason for the low actual fiscal revenue is the outbreak of the new crown pneumonia epidemic in early 2020. Hunan Province has introduced tax and fee reduction policies in response to the new crown epidemic. Affected by both tax and fee reduction policies and the epidemic, Hunan's fiscal revenue continued to decline, so the actual fiscal revenue was lower than expected.

Conclusions
In order to overcome the problem of poor prediction accuracy caused by a single model, this paper proposed a combined model based on GM (1, 1) and the neural network to predict fiscal revenue. In order to verify the prediction effect of the model, we analyzed the fiscal statistical data of the 2020 Hunan Statistical Yearbook from 2005 to 2019, and selected 20 main indicators that affect the fiscal revenue as explanatory variables. Secondly, we used the Lasso algorithm to reduce dimensionality to select the most important 10 variables from these 20 explanatory variables. Thirdly, we chose the gray prediction model GM(1,1) to predict each single variable, and used the predicted value as the input of the neural network. Finally, we applied the BP neural network to forecast the fiscal revenue. Experimental results show that this combined model has a better prediction effect. In the next work, we will try other variable selection algorithms, such as the principal component analysis method, which is used to process the variables in the early stage, and then predict combined with the RBF neural network to achieve better prediction results.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.