Estimation of Maximum Daily Fresh Snow Accumulation Using an Artificial Neural Network Model

Graduate Research Assistant, Department of Civil Engineering, Hongik University, Seoul, Republic of Korea Associate Professor, Department of Civil Engineering, Hongik University, Seoul, Republic of Korea Professor, Department of Civil Engineering, Chonbuk National University, Jeonju-si, Jeollabuk-do, Republic of Korea Professor, Department of Civil Engineering, Hongik University, Seoul, Republic of Korea


Introduction
On February 17, 2014, the roof of the Mauna Ocean Resort Gym in Gyeongju, Korea (Figure 1), collapsed due to the heavy weight of accumulated snow, resulting in 113 casualties.Although the primary cause of the accident was defective construction [1], the rooftop snow accumulation (Figure 1(b)), which exceeded 50 cm after the accident at the site, played a significant role for the collapse.e operational snow warning system of Korea (http://www.kma.go.kr/weather/warning/status.jsp) was unable to monitor this heavy snow accumulation [2].Only 16 cm/day of snow accumulation was observed at the gauge located 14 km from the accident site (Figure 1). is postsurvey of the accident suggests that the snow accumulation cannot be accurately estimated through simple spatial interpolation, primarily because the factors influencing snow accumulation (such as winter precipitation and temperature) show significant spatiotemporal variability depending on meteorological [3] and geographical [4,5] conditions.
In practice, snow accumulation is often approximated using the ten-to-one rule, but the rule can be easily broken due to the high variability of snow density and the shape of ice crystals [6].
e actual ratio between the snow accumulation and the precipitation varies between 2 : 1 and 45 : 1 [7], depending on how the shapes of the snowflakes and ice crystals change as the temperature increases [8].In addition, the wind can move the accumulated snow from one location to another [9][10][11].e ideal approach to estimate the snow accumulation involves analyzing the physical mechanisms of snowmelt and interaction with the wind [8,12,13].However, this approach requires information on variables that are difficult to observe, such as temporal variation in temperature and the internal structure of the accumulated snow [4], hindering the operational use of this approach [14].Alternately, methods based on the statistical relationships between various factors affecting the snow accumulation have been studied [15][16][17].However, the relationships between the snow accumulation and factors such as precipitation, temperature, and vapor pressure cannot be fully explained with simple statistical models.Furthermore, these factors are observed at point locations on the ground and show significant spatiotemporal variability, which adds to the uncertainty of the snow accumulation estimations at ungauged locations [18,19].
Methods based on satellite imageries can reduce the uncertainties derived from spatial variability of the factors influencing the snow accumulation [20][21][22][23][24][25].Because satellite imagery does not directly measure the snow accumulation, these approaches extract the required information, such as brightness temperature, to estimate the snow accumulation from the satellite imageries and use this information as the input for the models estimating the snow accumulation.Recently, artificial intelligence techniques have been applied to establish the relationships between the snow accumulation and various factors.Artificial intelligence methods have been especially useful in this regard because they do not require a complete understanding of the complex physical processes of phenomena [26][27][28][29][30][31][32][33][34].Davis et al. [35] used an artificial neural network (ANN) algorithm to establish the relationships of five different satellite image brightness temperatures to average snowflake diameter, density, temperature, and depth of the accumulated snow.Gan et al. [36] used an ANN algorithm to estimate the snow water equivalent in the Red River basin of North Dakota and Minnesota in the United States, obtaining a correlation coefficient of 0.71.Gharaei-Manesh et al. [37] estimated the snow accumulation in semiarid regions in Iran using the M5 Decision Tree algorithm, a type of an artificial intelligence algorithm.Due to the difficulty in obtaining the observational data directly related to the snow accumulation, these authors used geographical characteristics, such as channel network and stream power, as the inputs for their ANN algorithms.Tedesco et al. [38] used an ANN algorithm to establish the relationship between special sensor microwave imager (SSM/I) data and the snow water equivalent.Sun et al. [39] used an ANN to establish the relationships between the parameters of the snow accumulation process and the various brightness temperature values derived from the SSM/I.Dobreva and Klein [40] obtained the spatial distribution of the snow accumulation of a complex mountainous terrain from the moderate resolution imaging spectrometer (MODIS) imagery by analyzing these data with an ANN algorithm.Liang et al. [41] determined the satellitebased microwave brightness temperature and visible/infrared reflectance values using a support vector machine (SVM) algorithm.Czyzowska-Wisniewski et al. [42] used an ANN algorithm to analyze the IKONOS satellite images to estimate the snow accumulation in the Alps region.Park et al. [43] developed an ANN algorithm to determine the occurrence of snow events based on the precipitation and the temperature in Korea.Kim et al. [44] developed an ANN algorithm to estimate the snow accumulation based on the precipitation and the temperature; this ANN algorithm was subsequently used to predict the snow accumulation for a future period under climate change.
is study aims to develop a novel method of estimating the depth of snow accumulation at ungauged locations to help prevent future disasters.While approaches based on the satellite-ANN combination have been used to estimate the snow accumulation, the revisit period of satellites is several days at the shortest.us, the application of satellite-based methods is limited to estimating the gradual variation in snow accumulation over a long period in cold regions. 2

Advances in Meteorology
In addition, no study has addressed the applicability of the real-time snow accumulation detection in ungauged locations based on the ANN in the Korean peninsula, where the factors influencing snow accumulation have great spatial variability.We developed an ANN-based model that estimates the snow accumulation based on the in situ temperature and precipitation.en, we used the developed model to estimate the snow accumulation at ungauged locations based on the spatially interpolated temperature and ground precipitation data.We compared our method to a method that spatially interpolates the in situ snow accumulation with the ordinary kriging (OK) technique.e results of the comparison indicate that our ANN-based approach outperforms the aforementioned approach based on the OK method.

Methodology
2.1.Study Area.South Korea, comprising southern half of the Korean Peninsula and Jeju Island, was chosen as the study area (Figure 2).Snowfall in the study area occurs mostly between late November and early March.During this period, the average precipitation is approximately 88.5 mm, about 6.77 percent of the annual precipitation (1307.7 mm).
e factors such as precipitation, temperature, and wind may influence the snow accumulation.ese factors show high spatial variability in the study area, primarily due to drastic changes in elevation (0 m-1950 m) over a small area (100,210 km 2 ).

Data Description.
e Korean Meteorological Administration (KMA) operates 94 in situ gauges in the study area as of 2017, while 123 in situ gauges have been historically operated since April 1904.e KMA records these meteorological variables following a strict standard specified by the government regulation [45].Precipitations are measured using tipping bucket gauges of which precision is 0.5 mm∼1 mm.Temperatures are measured using metaltype gauges of which precision is 0.1 °C.At each gauge, the maximum accumulated depth of fresh snow occurring between 0 o'clock and 24 o'clock has been measured on a daily basis, and this measured value is defined as the maximum daily fresh snow accumulation (MDFSA).e MDFSA values are read daily from rulers installed on the ground, and it is these MDFSAs that our study aims to estimate.At each gauge location, the maximum (T max ), average (T mean ), and minimum daily temperature (T min ) and daily precipitation (P) values were measured along with MDFSA.
is study used 19,923 datasets of MDFSA, P, T max , T mean , and T min values observed at 90 gauges between 1960 and 2016 to train and validate the ANN model.Figure 2(b) shows the number of the gauges of which data this study used.It showed significant annual variation because the snowfall did not occur at all gauge locations.While the precipitation and the temperature were recorded all possible gauges, we excluded the dataset that does not have any MDFSA recording.

Artificial Neural Network.
Figure 3 shows the structure of the artificial neural network used in this study.e network is composed of the input layer, hidden layer, and output layer.
is structure mimics the informationtransferring process in the human brain, where neurons accept stimulations from the dendrites and then transmit them through axons to other neurons.Just as neurons are interconnected by synapses, the nodes of the artificial neural networks are connected by weights.Each node in a layer receives values from multiple nodes in previous layers.en, the node calculates the weighted average of the received values.Lastly, the node transforms this value using a given activation function, where the transformed value is subsequently transferred to each of the nodes in the next layer.
Here, the activation function works as a threshold of the human brain neurons, based on which the transfer of the stimulation is determined.
e ANN can be also mathematically characterized as follows.Let X represent the matrix composed of n number of input variables of the ANN, or x 1, x 2, x 3, . . ., x n including the bias term, and let W k i represent the matrix composed of the strength of the connection (or simply the weight factor) between the jth input variables and the ith node value in the kth hidden layer, or w k ij .ey can be represented as follows: e net value assigned to the ith node in the first hidden layer or y 1  i is calculated as the linear combination of the input variables as follows: en, this net value is converted into the input value of the nodes in the next hidden layer based on a given activation function.is study adopted the sigmoid function.
is process can be mathematically described as follows: where z i represents the converted value at the ith node. is converted value is to be used as the input value of the next hidden layer; z i becomes the new x i , and the procedure described in equations ( 1) through ( 4) is repeated over the hidden layers until the final output variable is obtained.e process of determining the weight factors, or w k ′ i1 s, of the ANN is called the learning process.In this learning process, weight factors are determined such that the difference between the observed variable and the estimation from the ANN model is minimized.
e Levenberg-Marquardt backpropagation optimization algorithm [46] was employed to train the ANN of this study.e epoch represents the minimum number of repetitions of the training required for the ANN model to reach a given performance standard.Lower the epoch, more stable is the performance of the ANN model.Mitchell [47] provides more detail explanation on the ANN.

Regularization of the Artificial Neural Network.
e process of determining the overall structure of the ANN is called regularization.ese structure characteristics include the number and type of input variables, number of hidden layers, number of nodes in each layer, and type of activation function.A complex ANN structure forces the model behavior to be strongly dependent on the observed data [48].Conversely, a simple ANN structure forces the model to miss the overall relationships between the input variables and output.ere is no analytical or formal approach that can be generally applied to the regularization.Instead, the methods based on simple trial and error are used.

Input Data and Number of Hidden Layers of the ANN
Model.For operational use of the methodology, the input data should be easy to obtain and show high correlations with the MDFSA.After a brief correlation analysis and literature review [4,49], we chose the precipitation (P), T max , T min , and T mean as the candidate input variables of the ANN model.P has a very high correlation with MDFSA (R � 0.65), so it was considered as an input variable by default, leaving

O Input layer
Hidden layer Output layer Bias 1 x 2 x n T mean P T min . . . .Advances in Meteorology 8 (�2 3 ) additional choices of whether or not to consider the remaining 3 variables.We used the following approach to determine the optimal combination of input variables: ( Figure 4 shows the boxplot of the mean of the correlation coefficients obtained from repeating Steps 2 through 8 for 50 times with each of the 8 combinations of input variables.Note that the 50 times of repetition here was performed using the ANN model with the optimal structure that was decided from the processes described by Step 1 through Step 8. erefore, the 50 times of repetition is different from the 20 times of repetition mentioned in Step 7. e case in which P, T mean , and T min were used as the input variables showed the highest correlation coefficients (0.87-0.88).It is notable that as the mean correlation coefficient increased, its variability decreased.
is suggests that the choice of proper input variables increased not only the accuracy but also the precision of the ANN model.A high correlation coefficient (0.87-0.88) was obtained for the tested ANN model with one hidden layer.Furthermore, the test using the model structure with multiple numbers of hidden layers did not yield any improved performance.erefore, this study used one hidden layer for the final ANN model.

Optimal Number of Nodes in the Hidden Layer.
Blue and red lines in Figure 5 show variations in the correlation coefficients and the epochs in relation to the number of nodes in the hidden layer of the ANN model.ese values were extracted using the process described in the previous section 2.3.1.Higher the correlation coefficient, better was the performance of the ANN model.e correlation coefficient generally increased with an increase in the number of nodes, up to 10 nodes, and then decreased gradually.is showed that the ANN model started to experience overfitting as the number of nodes exceeded the threshold of 10.
e epoch monotonically decreased but showed little difference after 10 nodes.On the basis of these results, we used the ANN model with 10 nodes in the hidden layer to maintain the accuracy and the precision.

Number of Training Data Points.
e performance of the ANN generally improved as the size of the dataset used for training increased.However, the rate of performance improvement significantly decreased over a given threshold of training data size.is is because the ANN model became overfitted with an increase in the amount of training data [50].Figure 6 shows how the size of the training dataset was related to the performance of the ANN model.
e performance of the ANN model drastically increased when the size of the training dataset exceeded 20, and it stabilized when the training dataset exceeded 1,000.For this reason, we trained the ANN model using 1,000 datasets randomly chosen from the entire database composed of 19,923 in situ daily observations of P, T min , and T mean .However, note that there is no absolute guideline to determine the optimal size of the training dataset.It should consider not only the relative proportion of the training dataset but also the occurrences of the instability points (points displaying a sudden drop of correlation coefficient even with the increase of the training dataset, circled in Figure 6).Figure 6 suggests that the 1,000 datasets represent a good compromise between the number of training data and the occurrence of instable points.

Performance of the ANN Model
Alone.After the ANN model was regularized, we validated its performance using the leave-one-out cross validation approach.is approach involves training the ANN model on the dataset but excluding the data from just one gauge.en, the ANN estimates the MDFSA based on the input data observed at the excluded gauge.Finally, the estimated MDFSA for the excluded gauge is compared with the observed one.is approach allows measurement of the pure performance of the ANN without any disturbance due to spatial variability of the input variables.
Figure 7(a) compares the observed MDFSA (x) and the MDFSA estimated by the ANN (y). Figure 7(b) compares the observed MDFSA (x) and the MDFSA estimated by the OK method.e correlation coefficient was 0.90 for the ANN and 0.20 for the OK.us, if the precipitation along with minimum and average temperatures of the day is known, the proposed ANN model generally gives more accurate estimate of the MDFSA than that from the spatial interpolation.
e high spatial variability of the geographical and meteorological factors influencing the MDFSA is known to cause Advances in Meteorology high spatial variability of the MDFSA estimation [51], which is the primary reason why spatial interpolation using the OK method did not perform as well as the ANN model.
It is important for the ANN model to estimate the MDFSA in the range that can lead to a disaster.Figure 8(a) shows the performance of the ANN at different intervals of the MDFSA.e performance of the ANN was best between 0 cm and 10 cm, with a correlation coefficient of 0.73.e correlation coefficient varied between 0.26 and 0.44 at the remaining four MDFSA intervals.
is abrupt decrease in correlation coefficient was partly attributed to the number of data points available for training of the ANN model.While ∼93% of the dataset was concentrated at the first depth interval (0 cm-10 cm), the proportion of the dataset that was available for the training at the remaining ranges was 5.5, 1.2, 0.46, and 0.19 percent for the second (10 cm-20 cm) through the last (40 cm-50 cm) depth intervals, respectively.e correlation coefficient for the greatest depth interval (40 cm-50 cm) was 0.44 for the ANN and 0.17 for the OK (See Figure 8(b)).is indicates that the ANN method can provide a reasonably accurate estimate of disastrous snow accumulation, which the OK method would fail to predict in most cases.

Performance of the ANN Model in Ungauged Locations.
e ultimate goal of this study was to estimate the MDFSA at ungauged locations.To quantify the performance of the model for this purpose, we estimated the MDFSA using the ANN model based on spatially interpolated input variables.Here, all input variables were interpolated using the OK technique.Figures 9(a) through 9(c) show the observed (x) versus spatially interpolated estimates of the variables used as the input of the ANN model.Leave-one-out cross validation was performed to estimate the variables on the y-axis.
e plot corresponding to the precipitation (Figure 9(c)) was shown in the log-log axis because most precipitation is concentrated near the value of zero, but the precipitation that may cause disaster is far greater than zero.
e correlation coefficient was 0.69 for T min , 0.70 for T mean , and 0.59 for P. e spatial correlation of precipitation is low because most heavy snow events in the study area occurs in the form of orographic precipitation that is caused by orographic lift of the moist air that the cold Siberian High takes from the warm sea beneath it [52].e high spatial variability of elevation in Korea causes high spatial variability in orographic lift and winter precipitation, which the current in situ precipitation gauge network could not capture well.Comparison of the correlation coefficients for

Advances in Meteorology
OK method (R � 0.2, See Figure 7(b)), which implies that the ANN model can estimate MDFSA with greater accuracy than the OK method for ungauged locations.However, the correlation coefficient using the interpolated variables as the input dataset (R � 0.4) was lower than the case in which the input variables were observed values at the in situ gauges (Figure 7(a), R � 0.90). is implies that a significant amount of the uncertainty in estimating the MDFSA at ungauged locations with the ANN model came from the spatial variability of the input variables that could not be captured by the in situ gauge network.As shown in Figure 9(c), the spatial variability of precipitation was especially important.8 Advances in Meteorology coefficient significantly increased from 0.40 to 0.76, which implies that the uncertainty corresponding to the 47 percent (� 0.76 − 0.40/0.76)was added by the spatial variability of precipitation to the correlation coefficient.A similar analysis was performed for the remaining input variables (T mean and T min ), and the isolated adverse impact of each variable on the correlation coefficient was minimal (T mean, 0.01, 1.4 percent; T min , 0.04, 5.2 percent).

Summary and Conclusion
is study developed an ANN model that estimates the maximum daily fresh snow accumulation (MDFSA) based on daily precipitation, mean temperature, and minimum temperature.Regularization and training of the ANN model were performed through a trial-and-error method based on a set of 19,923 in situ data points observed at 90 gauges in the Korean Peninsula between 1960 and 2016.e final ANN model comprising one hidden layer with 10 nodes was proposed.
Definite relationships between the MDFSA and these three factors were established by the ANN model.e correlation coefficient between the observed and estimated MDFSA was 0.90.
e accuracy of the ANN model was greatest in the MDFSA interval between 0 cm and 10 cm, with a correlation coefficient of 0.70.For the remaining MDFSA intervals, the correlation coefficient varied between 0.20 and 0.40.e reduction in the correlation coefficient at higher MDFSA intervals was most likely explained by the difference in amount of data available at those intervals for training of the ANN model.e developed ANN model was used to estimate the MDFSA at ungauged locations.e correlation coefficient between the observed and estimated MDFSA was 0.40.e spatial variability of the precipitation that could not be captured by the in situ gauge network played a significant role in reducing the correlation coefficient.
e isolated adverse impact of each input variable on the correlation coefficient was 47 percent for precipitation, 1.4 percent for T mean , and 5.2 percent for T min .
A key finding of this study is that the MDFSA at ungauged locations can be estimated with high accuracy with the help of artificial intelligence techniques if precipitation and temperature can be estimated accurately.
e MDFSA estimated using the artificial intelligence methods will be more accurate than the one estimated through a more direct manner using spatial interpolation, such as the ordinary kriging method.
e accuracy of the model was especially sensitive to the accuracy of the input precipitation data.erefore, accurate estimation of the precipitation at ungauged locations is crucial to successful utilization of the proposed ANN model in practice.In this context, more in situ rain gauges should be installed in mountainous areas where snow accumulation frequently leads to disasters and precipitation is difficult to predict. is is especially because mountainous areas can have very distinct precipitation characteristics that cannot be easily inferred from the information acquired at nearby locations [53,54].

Figure 1 :
Figure 1: (a) Locations of in situ snow accumulation measurement stations and the snow collapse disaster location (Mauna Ocean Resort Gym in Gyeongju).(b) A picture of the disaster site (source: http://news.kbs.co.kr/news/view.do?ncd�2834382).

Figure 2 :
Figure 2: (a) Study area (South Korea) and the locations of the 94 ground gauges; (b) number of gauges used for the analysis varying with year.

Figure 3 :
Figure 3: Structure of the optimal artificial neural network model determined by this study.

1 )
Develop an ANN composed of one hidden layer with five nodes.(2) Train the ANN with a given dataset (out of 8 dataset combinations, which are [P], [P, T min ], [P, T mean ], [P, T max ], [P, T mean , T min ], [P, T max , T min ], [P, T max , T mean ], and [P, T max , T mean , and T min ]), excluding the data of one gauge location.(3) Estimate the MDFSA based on the ANN using the input variables observed at the excluded gauge location developed in Step 2. (4) Compare the estimated MDFSA in Step 3 with the observed MDFSA at the same gauge location in Step 2. (5) Repeat the process between Steps 2 and 4 for all gauge locations to acquire the relationship between the observed and the estimated MDFSA.Calculate the correlation coefficient between the two variables.(6) If the correlation coefficient calculated in Step 5 is lower than 0.9, add one more node in the hidden layer and repeat Steps 2 through 5. (7) If the correlation coefficient estimated in Step 5 does not exceed 0.9 after adding 15 additional nodes (total 20 nodes), reduce the original correlation coefficient threshold value by 0.01 and repeat Steps 2 through 6. (8) Repeat Steps 1 through 7 for 20 times and record the average correlation coefficient value for each of the 8 input variable combinations.

Figure 4 :
Figure 4: Boxplot showing the relationship between input data and correlation coefficient.

Figure 5 :Figure
Figure 5: Relationship between the number of nodes in the hidden layer and the threshold epoch over which the snow depth residual converged.

Figure 10 (
Figure 10(a) compares the observed MDFSA to that estimated by the ANN model.e correlation coefficient was significantly higher for the ANN model (R � 0.4) than the

Figure 7 :
Figure 7: (a) Observed MDFSA (x) versus the MDFSA estimated by the ANN model (y).Here, the input variables of the ANN are the observed ones, not spatially interpolated; (b) observed MDFSA (x) versus the MDFSA estimated by spatially interpolating the MDFSA values observed at the nearby gauges with the ordinary kriging method.

Figure 8 :
Figure 8: Cross validation of the ANN for different snow depth intervals.e results corresponding to the ANN and the ordinary kriging (OK) are shown.(a) ANN.(b) Ordinary kriging.

Figure 10 (
b) compares the observed (x) versus estimated MDFSA (y) using the in situ precipitation values as the input of the ANN model instead of the spatially interpolated values.e remaining input variables (T mean and T min ) were spatially interpolated, as in the case of Figure10(a).erefore, the difference between Figures10(a) and 10(b) show the isolated impact of spatial variability in precipitation on the performance of the ANN.e correlation

Figure 9 :Figure 10 :
Figure 9: Relationship between spatially interpolated input data of the ANN model and the observed data.(a) Minimum temperature.(b) Average temperature.(c) Precipitation.