Developing estimating equations of fatality ratio based on surveyed data of the 2011 Great East Japan Tsunami

The 2011 Great East Japan tsunami caused a wide range of devastating tsunami with maximum tsunami height of 40 m and 19,000 casualties especially along the Tohoku coast of Japan. The purpose of this study is to develop estimating equations of fatality ratio from tsunami arrival time for future tsunami loss assessment and investigate the effect of two coastal topography types namely, Sanriku-ria coast and Sendai plain. In this study, fatality ratio was defined as number of fatality divided by total number of people in a small scale of towns along the shoreline and tsunami arrival time was calculated from TUNAMI modelling with nesting-grids of 1350 m, 450 m, 150 m, and 50 m. Then, linear and nonlinear regression analysis were performed to develop a relationship model between fatality ratio and tsunami arrival time. Based on the results, a strong correlation that fatality ratio decreases with longer arrival time was found in both Sanriku-ria coast and Sendai plain. For different coastal types, different distributions of fatality ratio with tsunami arrival time are observed, in which fatality ratio of Sendai plain is higher than that of Sanriku ria-coast at the same arrival time generally.


Introduction
The 2011 Great East Japan tsunami generated by a magnitude 9.0 earthquake caused various kinds of damage, including a large number of fatalities, whereas fatalities caused by ground shaking were less severe [1]. The Tohoku coast comprises of two coastal topography types namely, Sanriku ria-coast and Sendai plain as shown in figure 1. Based on past studies about fatality ratio of people during the 2011 Great East Japan tsunami, influential factors on fatality ratio can be classified to tsunami characteristic of a focused area and personal characteristic of evacuation perspective [2]. Previous study found high correlation between coastal topography and inundation depth from the 2011 tsunami [3]. Difference in the coastal topography types lead to difference in tsunami characteristic and personal characteristic of evacuation. The tsunami characteristic is a factor of tsunami height, tsunami arrival time, coastal type, etc. and the personal characteristic is a factor of age, gender, preparedness, occupation, etc [2]. A relationship model between fatalities and influential factors should be proposed for contributing to local government and insurance society for future tsunami loss assessment, such as the Nankai tsunami. In general, tsunami height is a key factor which has a significant effect on a trend of fatality ratio. However, tsunami arrival time is also important for evacuation of people to safety areas.
Some past studies have discussed on fatality ratio in a large scale, which may include population outside of tsunami inundation zone and inconsistency due to such large scale. These assumptions may lead to error in developing the relationship between fatalities and influential factors. In addition, several past studies focused on tsunami height or inundation depth as the main factor affecting fatalities [2][4] [5]. This study proposed a relationship model between fatality ratio and tsunami arrival time based on the surveyed data in each coastal affected area and investigate effect of two coastal topography types, which are related not only the tsunami characteristic but also experience of historical events. The surveyed data of fatalities in each small scale of towns were collected by the reconstruction support survey archive [6]. In this study, a statistical analysis was performed, and several estimating equations for fatality ratio from different definitions of tsunami arrival time were developed. The results of this study are expected to help the estimation of fatalities with other considerations and also to predict fatality ratio in future tsunamis, which have shorter arrival time than the 2011 Great East Japan tsunami.

Numerical tsunami simulation
In order to obtain tsunami arrival time at each town along the shoreline, numerical tsunami simulation was performed by TUNAMI modelling with nesting grids of 1350 m, 450 m, 150 m, and 50 m as shown in figure 2. Tsunami arrival time was calculated from the finest grid at the shoreline of each town.

Comparison of fatality ratio
The comparison of fatality ratio aimed to check a trend of the insurance data relating to the surveyed data. The surveyed data in this study were collected by the reconstruction support survey archive [6] and prefectural office [11]. Since the insurance data cannot be disclosed, the surveyed data was used instead to develop the relationship between fatality ratio and tsunami arrival time. Figure   comparison of fatality ratio between the insurance data and the surveyed data. The result of correlation analysis is shown in figure 5.  In which FR = fatality ratio; T = tsunami arrival time; a = slope value of function, which should be a negative value; b = intercept value.
The satisfaction of proposed estimating equations can be checked by the value of R 2 , which is obtained from the results of linear regression analysis (R 2 = 1.00 satisfy perfectly). The independent variable (tsunami arrival time) and the intercept value are significant, if p-value is less than 0.05.

Nonlinear regression analysis.
Due to the scatterplot in the preliminary analysis (see Section 4), there is possibility to have the nonlinear relationship. In order to develop a nonlinear estimating equation assuming that fatality ratio increases dramatically with shorter period of tsunami arrival time, the method of nonlinear regression analysis is also performed by IBM SPSS Statistics (Version 23). Logarithmic regression equation: In which FR = fatality ratio; T = tsunami arrival time; a = constant value, which should be a negative value; b = constant value, which should be a positive value.
Inverse regression equation: In which a = constant value, which should be a positive value; b = constant value, which can be a positive or negative value.  Table 3 shows fatality data arranged by each town, which was separated to Sanriku ria-coast and Sendai plain. The fatality data of each town based on the criteria of small affected area near the coastline was classified to population in inundated area (N), number of fatality (D), and fatality ratio (FR). The fatality ratio (FR) was defined as D divided by N which is a dependent variable in linear regression analysis. For an independent variable, tsunami arrival time was classified to three cases of tsunami arrival time when tsunami inundation depth is equal or greater than 0.5 m (Initial), tsunami arrival time when the first peak of tsunami inundation depth occurs (First), and tsunami arrival time when maximum inundation depth occurs (Max). Table 3 shows the surveyed data of 63 towns in Iwate, Miyagi, Fukushima, and Ibaraki prefectures, separating to 46 towns in Sanriku ria-coast and 17 towns in Sendai plain.  figure 6. For Sanriku ria-coast, it was divided into two zones in order to develop a trend that fatality ratio decreases with longer arrival time as well as Sendai plain. As can be seen in figure 6, fatality ratio of Sendai plain is much higher than that of Zone 1 in Sanriku riacoast and little higher than that of Zone 2.  As it can be seen in figure 6, all available data from the 2011 Great East Japan tsunami contains few data for the first 20 minutes of the tsunami arrival time. In order to develop the relationship that covers not only long tsunami arrival time but also short arrival time, which in turn, giving more practical contribution for tsunami loss assessment; more data from previous event were included in the analysis. Table 4 shows the preliminary information of the additional data from the 1993 Okushiri tsunami, collected by Okushiri town office. Tsunami arrival time was calculated by TUNAMI modelling as the same approach as the 2011 tsunami.

Linear regression analysis
Linear regression analysis was performed to develop estimating equations of fatality ratio from tsunami arrival time. The results of linear regression analysis provide number of sample (N), the value of R 2 , and the values of p 1 and p 2 for independent variable and intercept value. The relationships between fatality ratio and tsunami arrival time are shown in figures 7-9 for the case of Initial, First, and Max, respectively. Figure 7 shows fatality ratio of 24 towns in Sanriku ria-coast (Zone 1) and 2 towns in Okushiri which decreases with longer arrival time.   The case of First provides the best fit among three estimating equations in which equals to 0.67. In addition, all of the predictors are significantly contributing to the equations based on the p-value. Figure 9 shows fatality ratio of 17 towns in Sendai plain and 2 towns in Okushiri which decreases with longer arrival time. The case of Max provides the best fit among three estimating equations in which equals to 0.69. In addition, all of the predictors are significantly contributing to the equations based on the p-value.
In comparison of estimating equations between Sanriku ria-coast (Zone 2) and Sendai plain, it was found that fatality ratio of Sendai plain is obviously higher than that of Zone 2 only for the case of Max. For both Sanriku ria-coast and Sendai plain, the different definitions of tsunami arrival time provide the significant difference in estimating fatality ratio. analysis provide number of sample (N), the value of R 2 , and the values of p 1 and p 2 for independent variable and intercept value as same as linear regression analysis. The relationships between fatality ratio and tsunami arrival time are shown in figures 10-12 for the case of Initial, First, and Max, respectively. Figure 10 shows fatality ratio of 24 towns in Sanriku ria-coast (Zone 1) and 2 towns in Okushiri which decreases with longer arrival time. The logarithmic regression and the inverse regression analyses were conducted to determine the best nonlinear relationship model for predicting FR. By comparing between the logarithm regression models and the inverse regression models, the values of R 2 of the inverse regression models are higher than the logarithmic regression models for all Initial, First, and Max cases of Zone 1. In addition, the inverse regression models are more practical because the solid lines can cover longer tsunami arrival time. Figure 11 shows fatality ratio of 21 towns in Sanriku ria-coast (Zone 2) and 2 towns in Okushiri which decreases with longer arrival time.  The same nonlinear regression analyses were conducted for Zone 2. The value of R 2 of the logarithmic regression models are higher than the inverse regression models' R 2 for all Initial, First, and Max cases. In addition, the inverse regression models are more practical because FR of the solid lines reaches to zero at the definite tsunami arrival time as well as figure 12. Figure 12 shows fatality ratio of 17 towns in Sendai plain and 2 towns in Okushiri which decreases with longer arrival time.

Conclusions
This study developed estimating equations of fatality ratio based on fatality data of the 2011 Great East Japan tsunami and the 1993 Okushiri tsunami. The different definitions of tsunami arrival time provide the significant difference in estimating fatality ratio from both linear and nonlinear regression analysis. Considering the effect of coastal topography types, a strong correlation between fatality ratio and tsunami arrival time in Sendai plain, whereas Sanriku ria-coast was divided into two zones in order to obtain the strong correlation due different experience and awareness of people.
Based on all results of the linear regression models, all relationship models are good fit. The tsunami arrival time, as a predictor, in all models is significantly contributing to the equations. Since the nature of the data is possible to be nonlinear relationship, the analysis using the nonlinear techniques were done after the linear regression analysis. The results found that it is possible to fit the logarithmic function and the inverse function. Similar to the results of the linear regression analysis, all relationship models are good fit, and the predictor is significantly contributing to the equations. Then, by comparing the linear regression models and the nonlinear models using mean of R 2 , there is significant difference between linear regression models' and logarithmic regression models' (t = -8.726, p < 0.001) and between linear regression models' and inverse regression models' (t = -4.872, p < 0.005). Thus, the logarithmic and inverse models seem to be more suitable to be used to predict the fatality ratio.
The difference in mean for the R 2 between the logarithmic regression model and the inverse regression model found no significance (t = -0.828, p > 0.1). Therefore, it is likely that both proposed models can be interchangeable.
However, is it important to consider the limitations of this study. First, the data used in the analyses model is multiple source data (i.e., 2011 Great East Japan tsunami and 1993 Okushiri tsunami) with the intention to provide the prediction in all tsunami arrival time length. While main amount of data came from the 2011 Great East Japan tsunami, only few data were added to the short arrival time length. Second, the statistical models were developed based on the event in specific area. In order to generalize the result, more data from different areas might be necessary to take into account.