Assessing the Forecasting of Comprehensive Loss Incurred by Typhoons: A Combined PCA and BP Neural Network Model

: This paper develops a joint model utilizing the principal component analysis (PCA) and the back propagation (BP) neural network model optimized by the Levenberg Marquardt (LM) algorithm, and as an application of the joint model to investigate the damages caused by typhoons for a coastal province, Fujian Province, China in 2005-2015 (latest). First, the PCA is applied to analyze comprehensively the relationship between hazard factors, hazard bearing factors and disaster factors. Then five integrated indices, overall disaster level, typhoon intensity, damaged condition of houses, medical rescue and self-rescue capability, are extracted through the PCA; Finally, the BP neural network model, which takes the principal component scores as input and is optimized by the LM algorithm, is implemented to forecast the comprehensive loss of typhoons. It is estimated that an average annual loss of 138.514 billion RMB occurred for 2005-2015, with a maximum loss of 215.582 in 2006 and a decreasing trend since 2010 though the typhoon intensity increases. The model was validated using three typhoon events and it is found that the error is less than 1%. These results provide information for the government to increase medical institutions and medical workers and for the communities to promote residents’ self-rescue capability.


Introduction
Meteorological disasters have caused numerous fatal disasters to life and serious damage to property every year, and according to China marine disaster bulletin, typhoons alone account for approximately 25% of live disasters and 94% of property loss in China. Particularly, the situation usually becomes the most dangerous along the coast. For example, in 2013 typhoons in China's coastal cities hit 1380.34 million people, caused 121 fatalities, and resulted in a direct loss of 152.45 billion RMB according to China marine disaster bulletin. With the acceleration of the urbanization process, while the superior to other neural network models, and could predict the air quality effectively. Recently grey techniques and models have been more widely applied for the prevention and assessment of natural disasters since 1990s [Yang (1997)]. Zhang et al. [Zhang and Zhong (2013)] employed the grayness relational grade to develop the evaluation index and used the evaluation results as input factors to analyze the losses caused by typhoons based on regression models. Results showed that the combined model performed better than the one used separately. In this paper, typhoons are investigated directly by the combined model. At present, China suffers various impacts from severe typhoons, averaging nine landfalls in a year causing an estimated $3.9 billion in damages and 472 lives [Bakkensen (2013)]. The support vector regression (SVR) and parameter optimization method [Chen, Tang and Sui (2013)] were built which selected severe typhoons occurred in Guangdong Province from 1998 to 2008 were analyzed. The results showed that the SVR based on the genetic algorithm (GA) was the best model, among the three methods mentioned above, to estimate the loss of typhoon. The input-output model [Wang, Li, Chen et al. (2015)] assessed the indirect losses of meteorological disasters due to damaged industrial linkages and obtained that the secondary industry was more vulnerable to the impact of heavy rainstorms. The risk assessment model was presented to optimize the fishing vessel during typhoon emergencies [Zhao, Niu and Bai (2016)]. Results showed that using the hybrid closed loop algorithm was the best model based on Zhejiang Province, China. The above researches on assessing disasters are mainly limited to the assessment of economic damage [Wang, Wu and Chen (2016)]. The comprehensive loss is an important indicator for measuring typhoons, is not included in the evaluation results of these methods. In recent years, scholars have begun to further study [Liu, Zhang and Yang (2017)] the comprehensive loss by using the classification method to assess the blackout in power grid. Based on statistic data of blackout in an actual city, comprehensive loss for blackout was calculated at 99.08 RMB/kWh. The comprehensive loss can be summed up as social indexes and economic indexes, in which social indexes refer to loss of casualties, and the economic indexes refer to direct economic loss and indirect economic loss [Wei and Zhang (1996);Yang (1997) Yang and Zhang (2010)]. Therefore, the comprehensive loss should include 1. The loss of casualties, which is created by medical expenses, take time off work because of disasters, recuperate and lose abilities to work and soon． 2. The direct economic loss, which is the most direct and important indexes reflecting the degree of disasters． 3. The loss of rescue, it refers to prices of labor power and physical resources paid by national economy for disaster rescue. 4. The benefit loss, it refers to the lost value because of the reduction of products caused by disasters． 5. The ecological environment loss. Ecological environment loss is difficult to estimate, we can choose the direct economic loss 1.5 times to replace, because of the damaged filed area is proportional to the ecological environment loss. The above data originates from "China Statistical Yearbook" and "China Meteorological Disaster Yearbook".
However, the existing researches on typhoon assessment are generally carried out by putting together disaster-inducing factors and disaster environments, and assessment methods for damages include mathematical statistics, fuzzy mathematics, analysis hierarchy processes and multivariate regression method [Imamura and Dang (1997); Ramesh, Nagaraju and Ramanamurthy (2007)]. Since changes in the relation between the hazard factors, hazard bearing factors and disaster factors are not included in the investigation, these models have wide margins of error when applied to actual assessments [Lou, Chen and Qiu (2012)]. PCA and BP neural network model are highly effective in solving the disaster assessment as a single method. But, as far as we know, there are no studies that incorporates both PCA and BP neural network model to assess the comprehensive loss of typhoons. In addition, the BP neural network approach is not optimized in their studies, which would lower the precision [Wang, Song and Yin (2015); Lou, Chen and Zheng (2009);Guo, Dai and Li (2013)]. In this paper, we will apply the Levenberg Marquardt (LM) algorithm to optimize the PCA-BP model. In this paper, we will investigate the relation among the hazard factors, hazard bearing factors and disaster factors, and subsequently apply the PCA method to preprocess the impact factors. Further, the resulting principal components become the inputs for the BP model, which is optimized by the LM algorithm to measure the comprehensive loss caused by typhoons. For the remainder of the paper, Section 2 initiates the PCA, BP neural network model and LM algorithm. Section 3 applies the models established in Section 2 to assess the comprehensive loss by typhoons. In Section 4, we discuss the results and elaborate some limitations of the study. Section 5 provides the conclusion.

Principal component analysis
PCA as a statistical method turns multiple factors into fewer factors; it can not only explore large amount of information of the original data, but also simplify the data and reveal the relationship between variables. Hotelling first proposed the PCA [Hotelling (1933)] in 1923, which now has wide applications in the social economy, enterprise management, geology, biochemistry, medicine and other fields [Li and Sun (2010); Lin, Jiang and Guo (2014); Lu, Lee and Hadley (2014); Tarvainen, Cornforth and Jelinek (2014)]. This paper first proves the reliability of the selected factors by calculating the commonality, then preprocesses the factors according to the PCA, and extracts the principal component to lower the dimensions of data sets. The commonality [Skinner 1985] refers to the proportion of common factors that explain variable variance. Assuming that each variable is affected by m common factors and a special factor, as follows: where i X is the normalized variable with 0 mean and variance of 1, 1 2 , ,.
.. n f f f are independent common factors with the mean of 0 and the variance of 1, i e is the special factor, 1 2 , ,...
a are factor loadings. The commonality of i X is defined as: In general, when the commonality is more than 0.5, the variable can be selected in the model. PCA replaces the original indices with a new set of uncorrelated comprehensive indices.
Assuming that we have p indices, the ( ) i, j element of covariance matrix is: Then, the eigenvalues 1 2 , ,..., p λ λ λ of the covariance matrix are calculated and the eigenvectors i γ corresponding to each eigenvalue determined: 1 2 ( , ,..., ), 1, 2, , And the cumulative contribution rate v of each principal component is defined as When the cumulative variance contribution rate is more than 85%, the selected principal components can be considered to fully reflect the information of original variables [Xie, Zhong and Cao (2015)].

BP neural network model 2.2.1 Basic principle
A BP neural network model is a multilayer feed forward neural network composed of several network layers. Fig. 1 illustrates the structure diagram of three-layer BP neural network models. Rumelhart et al. [Rumelhart, Hinton and Williams (1986)] first proposed the BP neural network model, which now has numerous applications for pattern recognition, image processing, communication, drought and urban accident [Straub and Schroder (1996); Grossberg (1988); Nakamura, Yoshida and Engelmann (2000); Jia, Pan and Yuan (2015); Behbahani, Amiri and Imaninasab (2018)].

Figure 1: BP neural network model
A three-layer BP neural network can be utilized to approximate arbitrary nonlinear functions. Concurrently, the BP neural network model has the ability of self-learning: It can extract regularly information from data and memorize the weights, and hence it possesses a generalization ability. According to the activation function, the relationship between input and output of the input layer is:  The relation between hidden layer weighted input and output is： where ( ) is the hidden layer output.
The relation between the output layer's weighted input and hidden layer output is: is the output layer's weighted input, jt v is the weight from input layer to the output layer, t r is the output layer threshold for each node.
The relation between the output layer's weighted input and target output is： where ( ) The error function between target output and the actual output of the BP neural network model is: where ( )

Activation function settings
The activation function is a transformation function between input and output. Common activation functions are divided into threshold type, linear type and S type (Sigmoid). When the data is entered in the network, it is first transmitted from the input layer to the hidden layer. After activating the function, it is then transmitted to the next hidden layer and finally to the output layer. During this process, the data of each layer must be transformed by the corresponding activation function. Because the Sigmoid function has the function of nonlinear amplification coefficient, the output value can be changed to (-1,1). For larger input samples, the amplification coefficient is small, while for smaller input samples, the amplification coefficient is larger, so the Sigmoid function is used to approximate the nonlinear input and output relationship. The Sigmoid function is microscopic and can prevent network saturation, thus providing good support for model accuracy. If the Sigmoid function is used in the output layer, the output is limited to (-1,1). So a linear activation function is used so that the output can be any value. In addition, the BP network differs from other forward neural networks in that all hidden layer nodes in the network take Sigmoid functions, and the activation function requirements of the input layer must be microscopic. Therefore, we take the Sigmoid function for the activation function of the input layer and hidden layer. Tab. 1 shows two common Sigmoid functions, the tansig function and logsig function. As for the activation function of the output layer, we take linear function purelin.  (-1,1)

LM algorithm optimization
The slow convergence rate of the standard BP algorithm will lead to increase in error. Hence, we need to optimize the BP algorithm in practical applications. At present, popular optimization algorithms include: Gauss-Newton algorithm, Conjugate Gradient algorithm and so on. This paper uses the LM algorithm to adjust the weights and optimizes the BP neural model, which holds advantages such as good robustness, fast convergence, non-uniform iteration direction, and hence greatly improves the network convergence speed and generalization ability. The LM algorithm modifies network weights according to the following formula: where ω is the weight, I is an identity matrix, µ is the user-defined learning rate, J is the Jacobian matrix containing the first derivative of the error performance function.
In summary, the calculation steps of the LM algorithm are described as follows: (1) Giving the allowable values of target error ε and initializing the weights and threshold vectors.
(2) Calculating the network output, the error function and Jacobian matrix.
(3) Calculating modified value of weights (4) If E ε < , then the training stops. If E ε ≥ , then the sum of the squared errors between the target output and the actual output is repeatedly calculated. When the sum of the squared errors reaches the target error, the training stops and the network converges. In this paper, PCA and BP neural network are incorporated so that the output of the PCA becomes BP's input. In contrast, if the direct input is provided to the BP neural network model, it will not only enlarge the complexity of the network, but also affects the convergence speed and self-correction ability of the model. Because many factors are disorderly and complicated, this paper can extract the important information of the factors through PCA, and integrate them into specific indicators, which greatly reduces the number of variables and more clearly reveals the relationship between variables, thus improving the calculation accuracy. The purpose of the combined model lies to greatly clarify the input structure of the neural network, enhance the fitting ability of neural network model and improve the prediction accuracy of the model.

Analysis of empirical results
Disaster data of typhoons from 2005-2015 (most recent data available) are derived from the "China Meteorological Disaster Yearbook (Fujian Volume)". The economic data are referred to the "China Statistical Yearbook". This paper chooses Fujian Province as the research area because it is a coastal province and suffers most frequently from typhoon disasters among all the provinces in China (33 typhoons making landfall in Fujian province in 2005-2015).

Selection of impact factors
The disaster situation is composed of hazard factors, hazard bearing factors and disaster factors. The hazard factors of typhoon refer to the wind speed and rainstorm carried by typhoon, and the hazard bearing factors are the main body of direct loss caused by typhoon. Disaster factors refer to the integration of people and objects in a disaster. We select 15 factors from three aspects, hazard factors, hazard bearing factors and disaster factors, to study the comprehensive losses caused by typhoons, in which hazard factors are 1 2

The calculate of principal component
The commonality of each factor is firstly calculated (Tab. 3). The commonality of each factor is more than 0.5, which proves (Refer to "Section 2.1") the reliability of the selected factors.  In order to eliminate the impact of dimension, the data is further standardized. Then the appropriate principal components are selected by PCA. Fig. 2 gives the scree plot of PCA, and it can be shown that when five principal components are taken, the trend begins to stabilize. In addition, for the top five principal components, their total cumulative contribution rate has reached 88.02% (Tab. 4), and hence we select the top five principal components for analysis. The principal component was calculated and the results are obtained in Tab. 5.
The coefficients of the hazard factors and disaster factors in prin1 are all positive, and hence the comprehensive variable prin1 describes the overall disaster level. Higher value of prin1 would indicate a stronger destructive disaster. The maximum coefficient in prin2 is 0.431, which corresponds to the maximum wind speed describing the typhoon intensity. The coefficients of the number of collapsed houses and the number of damaged houses is -0.432 and -0.473 in prin3, and hence prin3 describes the damaged condition of houses. These are negative loadings. The greater the prin5, the less the number of collapsed houses and damaged houses meaning that the damaged condition of houses is slight.
In prin4, the maximum coefficient -0.740, corresponds to the number of medical works, and hence prin4 describes the medical rescue. The greater the prin4, the less the medical workers, meaning that the medical rescue operation is more difficult and more personnel would be needed. In addition to the maximum coefficient of the typhoon intensity in prin5, 0.639, the next loading corresponds to the proportion of the male population, 0.388, indicating that the more people are able to participate to strengthen the labor force. This is the self-rescue capability. By utilizing Eqs. (12)-(16), the principal component scores for each of the 33 typhoons in Fujian Province during 2005-2015 are calculated (Tab. 6). In Tab. 6, the scores of prin1 are generally larger than other principal components, and this implies more original information is extracted with prin1. Hence, prin1 corresponds to the comprehensive loss.
In addition, the principal component scores have positive and negative results because data are normalized throughout the process, which indicate the positional relationship with the average value. It can be seen that the scores of prin1 have decreased from the Typhoon 1006 to about -2.0, indicating that the comprehensive losses are lower than the average value, and reflecting that the government and the public are paying more attention to the impact of typhoons. Although typhoons occurred in 2013 the most often, but fortunately the damage is not the worst. The degree of awareness and response to the disasters have also been strengthened. For example, we can take measures such as increasing the medical workers, medical institutions, and medical expenses to promote people's self-rescue capability and awareness of prevention and to reduce the disasters of loss.

BP neural network model setting
Based upon the principal component scores in Section 3.2, the BP neural network model is built to estimate the comprehensive losses caused by typhoons. To ensure the reliability and validity of the model and according to the chronological order of typhoon, 30 typhoon samples are used in training samples for simulation and fitting of the BP neural network, and the rest 3 samples (Typhoons 1410, 1513, and 1521) are taken to examine the model. When the relative error between the fitted and actual values of the test sample is less than 1%, the establishment of the BP model to achieve the target accuracy can provide relevant departments with corresponding suggestions to reduce the comprehensive losses caused by the typhoons. The principal component scores are used as the BP neural network input and the comprehensive losses are the network output. Therefore, the node of the input layer is set to 5 and the node of the output layer is set to 1. The model parameters are set to: training steps=5000, target error=0.01, learning rate=0.01, momentum coefficient=0.9. After multiple trainings, it is found that when the node of the hidden layer is 10, the meansquared error reaches the minimum value, and hence the structure of the 5-10 BP neural network is determined.
Since the activation function used by the BP neural network has a limited range of input and output, the data needs to be normalized before the model is built. Based on the normalized data, the error and regression effects of the training samples are obtained, and the target accuracy and correlation coefficient of the model are investigated. Fig. 3 and Fig. 4 show the error curve and regression plot.

Figure 3: The error curve
According to the Fig. 3, the error begins to reach the target accuracy of 0.01 and continues to decline after 8 trainings, the network stops training and the final error is 0.008. The linear combination of the training samples obtained from Fig. 4 is: where the correlation coefficient reaches 0.9856, implying that the fitting effect of the equation is well and the overall training effect is good. The 5-10-1 BP neural network model can be constructed.

Figure 4: Regression plot
Further we use the trained 5-1-10 BP neural network model to fit the comprehensive losses of the test sample, and compare the actual value of the training sample with the BP fitting value. Fig. 5 shows the concrete results, indicating that the actual values are basically consistent with the BP fitting values. This shows that the BP neural network is fully learned and the trained network can embody the comprehensive losses caused by the typhoons.

Figure 5: Actual value and BP fitting value
According to Tab. 7, the BP neural network model has a good prediction effect on the comprehensive losses of typhoons. The relative errors of the Typhoons 1410, 1513 and 1521 are respectively 0.9%, 0.58% and -0.5%, which are far less than 10%. Next, from the trend in Fig. 5, the comprehensive losses from the Typhoon1006 to the Typhoon 1407 have been reduced. This is related with nationwide preventive measures in the past 10 years to carry out activities and popularize knowledge so as to improve people's self-rescue capability. At the same time, the relative error of the comprehensive loss of the training sample is within 10%, which fully shows that the BP neural network model has high accuracy and good effect. Therefore, the study of the typhoons by establishing the BP neural network model can play a role in reducing loss and disaster prevention. Assessment the loss of typhoons indicate that quantitative assessment is crucial for disasters assessment, which is in line with the work by Lou et al. [Lou, Chen and Zheng (2009)]. But the assessment model precision is 7% in error because Lou did not consider the impact of hazard bearing factors and disaster factors on the loss. However, Guo et al. [Guo, Dai and Li (2013) Although this paper is based on Fujian province as the research object and the final mathematical model cannot be directly applied to other provinces, assessment factors selected in this model are relatively easy to obtain. As long as there are data from other provinces, the corresponding model can also be constructed to predict the results using the method of this paper. Therefore, the PCA-BP model can be used in the assessment of typhoon damages in other areas, and this is a promotion value. In addition, because typhoon data is difficult to collect and meteorological department no longer statistics the number of damaged houses in recent years. We use methods for dealing with missing data to solve, which reduce the accuracy of the model in a certain extent. However, any model can only approximate the reality of reasonable simplification.

Conclusions
In this paper, a joint model which utilize the PCA and the BP neural network model optimized by the LM algorithm is employed to assess the damages caused by typhoons for a coastal province of China. Firstly, we comprehensively study the relationship between the hazard factors, hazard bearing factors and disaster factors by calculating the commonalities, and the problem of vulnerability and exposure of hazard bearing body are investigated. Then, we apply the PCA to reduce the dimensions and apply appropriate weights to obtain the principal component scores, which become the simplified input for the BP neural network model. Finally, the BP neural network model is optimized by the LM algorithm to assess the comprehensive loss. Moreover, an empirical analysis is conducted for the comprehensive loss by typhoons for Fujian province in 2005-2015. Based on the related literatures and empirical analysis, the following conclusions can be drawn. 1. Five principal components, prin1-prin5 are extracted from 15 impact factors by using PCA, which correspond to five principal indices, overall disaster level, typhoon intensity, medical rescue, damaged condition of houses, and self-rescue capability. Due to climate changes, the stronger wind speed will increase the typhoon intensity over time; however, the loss decreases since 2010, which is largely due to widely implemented prevention and rescue strategies. .036 billion RMB respectively, which contain the relative errors of 0.9%, 0.58% and -0.5% and justify the accuracy of the model. In summary，huge damages that already happened and future potential damages by typhoons cannot be overlooked. The study suggests to take effective actions: for the government to increase medical institutions and medical workers and for communities to promote residents' self-rescue capability.