Comparing Activation Functions in Modeling Shoreline Variation Using Multilayer Perceptron Neural Network

The study has modeled shoreline changes by using a multilayer perceptron (MLP) neural network with the data collected from five beaches in southern Taiwan. The data included aerial survey maps of the Forestry Bureau for years 1982, 2002, and 2006, which served as predictors, while the unmanned aerial vehicle (UAV) surveyed data of 2019 served as the respondent. The MLP was configured using five different activation functions with the aim of evaluating their significance. These functions were Identity, Tahn, Logistic, Exponential, and Sine Functions. The results have shown that the performance of an MLP model may be affected by the choice of an activation function. Logistic and the Tahn activation functions outperformed the other models, with Logistic performing best in three beaches and Tahn having the rest. These findings suggest that the application of machine learning to shoreline changes should be accompanied by an extensive evaluation of the different activation functions.


Introduction
Taiwan is often confronted by several typhoon events, particularly along the coasts causing serious coastal erosion, and subsequently significant damages to buildings, infrastructure, utilities, and ecosystems, and mitigating its effect has become increasingly important under climate change. The monitoring of shorelines has, therefore, become necessary. This activity usually entails ground surveys, topographic surveys, aerial photos, or remote sensing techniques to extract the shoreline. Moreover, it can be a daunting task to measure the shoreline using traditional techniques, which is why in recent years unmanned aerial systems (UAS) have been employed [1]. Despite their high resolution, weather-related challenges may limit their usage, which might call for predicting the shorelines. Several models have been applied, such as deterministic process-based models [2], which have been found to be computationally intensive. Besides they have been shown to have some inconsistencies between measured and modeled data [3]. On the contrary, artificial neural networks (ANNs) have been introduced in the field, and we have seen a dramatic increase in accuracy at much lesser costs [4][5][6][7]. Neural networks mimic how the brain works and are often dependent on activation/transfer functions and widely used in many fields, such as Yang et al. mentioned that many electric utilities use machine learning-based outage prediction models (OPM) to predict the impact of storms on their networks for sustainable management [8]. Cerrai et al. used machine learning models and distributed storm power outages across utility service territories in the Northeastern United Stages [9]. Bhuiyan et al. also studied wind speed forecasting, using a variety of tree-based non-parametric machine learning techniques to predict the maximum wind speed at 10 m for the selected convective weather variables. The wind speed generated by the model successfully encapsulates the reference wind speed, and significantly reduces the system error and random error [10]. Kumar et al. and Hazra et al. try to use the information of soil moisture and improve the estimation of rainfall through mechanical learning methods [11,12]. Bhuiyan et al. conducted rainfall assessments through warm and cold season weather patterns, and assessed the importance of rainfall improvement for hydrological simulation [13]. Zorzetto et al. proposed a method suitable for areas with sparse data, using the Quantitative Precipitation Estimates (QPE) probability density function to infer the sub-grid scale properties of rainfall [14]. Additionally, a comparison of different neural networks architecture has been reported for binary classification problems [15]; Tfwala and Wang used Multilayer Perceptron (MLP) in estimating sediment discharge at Shiwen in southern Taiwan [16]; Chen et al. used a feed-forward backpropagation model for estimating runoff by using rainfall data from a river basin is developed and a neural network technique is employed to recover missing data [17]. Wang et al. studied the potential of using the feed-forward backpropagation (BP) neural network algorithm for estimating evapotranspiration (ETo) from temperature data [18]. Awolusi et al. also used MLP and feed-forward networks to model the properties of steel fiber [19]. Wang et al. investigated the accuracy of a time-lagged recurrent network (TLRN) for forecasting suspended sediment load (SSL) occurring episodically during the storm events in the Kaoping River basin located in Southern Taiwan [20]. Sentas and Psilovikos evaluated Autoregressive Integrated Moving Average (ARIMA) and Transfer Function models in water temperature simulation in dam-lake Thesaurus, eastern Macedonia, Greece. From their results, Transfer function models performed better than the other [21]. Afzaal et al. used artificial neural networks and deep learning for groundwater estimation from major physical hydrology components. The components employed are stream level, streamflow, precipitation, relative humidity, mean temperature, evapotranspiration, heat degree days, and dew point temperature. The deep learning technique is found to be convenient and accurate [22]. The non-linear Auto-Regressive Network with exogenous inputs, a type of artificial neural network, was applied to investigate the role of the atmospheric variables in the sea level variations in the eastern central Red Sea by Zubier and Eyouni. From their work, it clearly demonstrated that the proposed approach is effective in investigating the individual and combined role of the atmospheric variables on residual sea-level variations [23]. Karamoutsou and Psilovikos studied the use of artificial neural networks in water quality prediction in Lake Kastoria, Greek, for understanding the future of the study area and to identify the problems that may arise. Based on the statistical measures, the dissolved oxygen model for the Giole station has produced satisfactory results [24]. While much work has been done on the application of neural networks, and on its application in different fields of study, less has been done on evaluating the different activation functions in shoreline predictions.
Therefore, the objectives of the study were to apply ANN to predict shorelines from past observations, and to explore the influence of activation functions in the predictions.

Study Area and Data Collection
The study applied shoreline data collected from 5 beaches; Baisha (BS), Nanwan (NW), Big bay (DW), Little bay (SW), and Chuanfanshi (TFS), all of which were located in southern Taiwan, as shown in Figure 1 following the flowchart in Figure 2 for this study. The beaches are characterized by both fine and coarse sand. The spatial information on each beach is developed by using aerial survey maps provided by the Aerial Office of the Forestry Bureau for years 1982, 2002, 2006 [25], while serving as predictors while the unmanned aerial vehicle (UAV) surveyed data of 2019 serving as the respondent. The UAV model used was the Phantom 4 RTK (developed by Da-Jiang Innovations, DJI in Shenzhen, China) and a detailed description of the drone and the camera equipped may be found on the DJI website (https://www.dji.com/tw).

Artificial Neural Networks (ANNs)
ANNs have gained popularity in the last decades with successful applications in many fields such as sediment transport [16], satellite image classification [26], evapotranspiration [27], coastal erosion [28] etc. They are a very powerful computational technique for complex non-linear relationships. Their structure includes at least 3 layers; input, hidden, and output layers. Several ANNs have been developed; for example, the coactive neuro-fuzzy inference system model, recurrent network models, radial basis, multilayer perceptron, forward backpropagation, etc. In this study, however, we applied the multiple layer perceptron (MLP) due to the learning rule it applies, the backpropagation, which is an effective and practical learning algorithm [5,29].

Multilayer Perceptron Neural Network (MLP)
Multilayer perceptron (MLP) is characterized by the presence of one or more hidden layers, with computation connections called hidden neurons, whose function is to intervene between the external inputs and the network output in a useful manner. To extract high order statistics, more hidden layers may be added. The network acquires a global perspective despite its local connectivity due to the extrasynaptic connections and the extra dimension of neural network interconnections [30]. MLP may constitute more than a single hidden layer; moreover, previous studies have demonstrated a single layer to be adequate for most applications [31]. It is for this reason that this study employed one hidden layer. The equation for each MLP layer is shown in Equation (1). The structure of the model was basically a 3-3-1 as shown in Figure 3. Inputs of the model were 100 shoreline data points for 1982, 2002, and 2006, while 2019 was the predicted shoreline. Cömert and Kocamaz mentioned that before network training, it is necessary to understand the amount of data because of the size of the data neurons in the neural network. During the network evaluation, 70% of the data set is used for training, and the weights and deviations can be updated according to the network and the target output value; 15% is used for verification, so that the network stops training before overfitting occurs; 15% is used as testing to predict the performance of the network [32]. Therefore, these data were partitioned into 70% for training, 15% for validation and testing respectively [33,34]. Although there are several training algorithms, we applied the second-order Broyden-Fletcher-Goldfarb-Shanno algorithm with a maximum of 200 training cycles because of its high performance [35]. Learning rates and momentum were both fixed at 0.1 to avoid instabilities [36] and the weight decay fixed at 0.001. The model and all computations were performed through the platform of STATISTICA ver. 12.5 (by TIBCO, Hillview Avenue Palo Alto in USA).
where Y j is the output of neuron j, W ij is the connection weight from neuron i to neuron j, X i is the signal generated for neuron i, θ j represents the bias associated with the neuron j and F(x) are the different activation functions which are discussed in the section below.

Artificial Neural Networks (ANNs)
ANNs have gained popularity in the last decades with successful applications in many fields such as sediment transport [16] , satellite image classification [26], evapotranspiration [27], coastal erosion [28] etc. They are a very powerful computational technique for complex non-linear relationships. Their structure includes at least 3 layers; input, hidden, and output layers. Several ANNs have been developed; for example, the coactive neuro-fuzzy inference system model, recurrent network models, radial basis, multilayer perceptron, forward backpropagation, etc. In this study, however, we applied the multiple layer perceptron (MLP) due to the learning rule it applies, the backpropagation, which is an effective and practical learning algorithm [5,29].

Multilayer Perceptron Neural Network (MLP)
Multilayer perceptron (MLP) is characterized by the presence of one or more hidden layers, with computation connections called hidden neurons, whose function is to intervene between the external inputs and the network output in a useful manner. To extract high order statistics, more hidden layers may be added. The network acquires a global perspective despite its local connectivity due to the extrasynaptic connections and the extra dimension of neural network interconnections [30]. MLP may constitute more than a single hidden layer; moreover, previous studies have demonstrated a single layer to be adequate for most applications [31]. It is for this reason that this study employed one hidden layer. The equation for each MLP layer is shown in Equation (1). The structure of the model was basically a 3-3-1 as shown in Figure 3. Inputs of the model were 100 shoreline data points for 1982, 2002, and 2006, while 2019 was the predicted shoreline. Cömert and Kocamaz mentioned that before network training, it is necessary to understand the amount of data because of the size of the data neurons in the neural network. During the network evaluation, 70% of the data set is used for training, and the weights and deviations can be updated according to the network and the target output value; 15% is used for verification, so that the network stops training before overfitting occurs; 15% is used as testing to predict the performance of the network [32]. Therefore, these data were partitioned into 70% for training, 15% for validation and testing respectively [33,34]. Although there are several training algorithms, we applied the second-order Broyden-Fletcher-Goldfarb-Shanno algorithm with a maximum of 200 training cycles because of its high performance [35]. Learning rates and momentum were both fixed at 0.1 to avoid instabilities [36] and the weight decay fixed at 0.001. The model and all computations were performed through the platform of STATISTICA ver. 12.5 (by TIBCO, Hillview Avenue Palo Alto in USA).

Activation Functions
Activation functions transform input signals from neurons of previous layers through mathematical functions, which may have a significant effect on the overall performance of a neural network model. Their overall function is to map any real input into a confined range, commonly from 0 to +1 or from −1 to +1. There are many types of activation functions, but the most commonly used include linear (sometimes referred to as identity) [37], logistic, hyperbolic tangent, Gaussian [38], threshold, sine functions, exponential linear unit, rectified linear units [39] etc. This study selected 5 activation functions (Identity, Tahn, Logistic, exponential and sine) to explore their impacts on a multilayer perceptron model. Their brief description is provided below.

Identity Function
This function returns a similar value used as its argument, simply obtained by the formula below: where α is observed coordinate Y-axis.

Hyperbolic Tan Function (Tanh)
Tahn is a symmetric s-shaped (sigmoid) function, whose output lies in the range (from −1 to +1) commonly used in MLP networks.

Logistic Function (Logistic)
This function differs from the Tahn function in that its output lies in the range (from 0 to +1). It is illustrated by the equation:

Exponential Function
The outputs of the function are from 0 to infinity. It is mostly applied when the target is positive.

Sine function
The sine function has a similar output range with the Tahn function, but is often used when the data being modeled is radially distributed.

Models Evaluation
In this study, the models were evaluated according to Liu et al. and Gupta et al. using two indices, the root mean square error (RMSE), and the Kling-Gupta efficiency (KGE), and the formulas are listed in Equations (7)-(9), respectively [7,40]. The RMSE is a measure of the residual variance while r is a measure of accuracy and is usually used to compare different models.
where x obs (t) represents the observed coordinate X axis, x est (t) is the alternative methods-estimated coordinate X axis value; x obs and x est are the mean values of the equivalent parameter; and n is the number of data under consideration. Additionally, a linear regression y = α 1 x + α 0 is applied for evaluating the models' performance statistically, where y is the dependent variable (alternative methods), x the independent variable (observed), α 1 the slope, and α 0 the intercept. ED is the Euclidian distance from the ideal point, α is the ratio between the standard deviation of simulated and the standard deviation of the observed coordinates, β is the ratio between the mean simulated and mean observed coordinates, and represents the bias, r can be interpreted as the potential value of KGE. The ideal value for KGE just like r is at unity. Table 1 shows the performance of the different activation functions in the five beaches. Logistic and Tahn activation functions are shown to have better overall performance, with higher r and KGE values and lower RMSE values. Logistic activation performed better at Baisha, Nanwan and Chuanfanshi, with r of respectively, 0.999, 0.983 and 0.999 during the testing phase. Similar patterns were observed with the KGE values, except for slight differences at Baisha and Nanwan. Due to the lower associated errors, reflected by the lower RMSE and bias value (β), and the very slight differences between the r and KGE values, the Logistic activation function was still found to be better. Tahn performed best in the remaining beaches; Big Bay (r = 0.997) and Little Bay (r = 0.994). Training cycles and the respective errors during training and testing phases are shown in Figure 4. In all simulated cases, the errors follow a specific pattern, showing a significant drop with 5 training cycles. Some instabilities are observed however at NW using the logistic activation function. Deviating from observations made by Parascandolo et al. [41] who suggested that the Tahn function may be replaced by the Sine function, our results suggest that different functions may be applicable to specific scenarios or fields.  A visual of the estimated shoreline and the accuracy of the neural network in 2019 is further demonstrated in Figure 5. Prediction of the shoreline at NW is shown to be poorer than the other beaches. This is reflected by the higher errors during the testing cycle as shown in Figure 4. and the predicted change under both the training and the testing phases as can be found in Figure 5. A visual of the estimated shoreline and the accuracy of the neural network in 2019 is further demonstrated in Figure 5. Prediction of the shoreline at NW is shown to be poorer than the other beaches. This is reflected by the higher errors during the testing cycle as shown in Figure 4. And the predicted change under both the training and the testing phases as can be found in Figure 5.

Conclusions
The knowledge of shoreline changes can play a crucial role in managing coastal areas, especially in storm-prone areas like Taiwan. The accurate prediction of such changes is also essential. The artificial neural network model applied (MLP) herein has demonstrated the application of artificial intelligence in this field. Additionally, the results have demonstrated the crucial role of activation functions in the application of such models. For modeling the shoreline change, different activation functions were considered. Logistic and Tahn functions were shown to perform better than Identity, Exponential and Sine Functions. The criteria used to select the best function were the highest R 2 and the lowest RMSE. The findings serve as a valuable reference to the prediction of shorelines.