A vector-based coastline shape classification approach using sequential deep learning model

Coastlines play a crucial role in coastal dynamics, and classifying their shape is an essential requirement for coastal analysis. With the development of Coastal Management Systems (CMS), structured and high-resolution vector-format coastlines have become increasingly available compared to remote sensing image coastlines. However, due to the challenges of accurate description and ambiguous classification rules, automatic classification of vector coastlines has been a difficult but urgent problem to solve. In this paper, we propose a data-driven approach for classifying the shape of vector coastlines, according to their morphological characteristics. The method utilizes a sequence-based deep learning algorithm to model and classify coastline segments. We construct a dataset including five representative types of vector coastlines, train and evaluate the model using this dataset. The evaluation results show that the proposed method outperforms all baselines, achieving a classification accuracy of 93.20%. This method can be integrated into existing Coastal Management Systems to enhance their morphological analysis functions, making a valuable contribution to the applications of Artificial Intelligence (AI) in coastal management.


Introduction
Coastal environments, characterized by their diverse array of shapes and forms, play a significant role in the Earth's ecological balance, serving as habitats for numerous species (Archambault andBourget, 1999, Bartley et al., 2001) and acting as natural barriers against marine processes (Gijsman et al., 2021).The intricate interplay of geological, oceanographic, climatic, and anthropogenic factors gives rise to the complex and dynamic nature of coastlines, making their accurate classification essential for a better understanding of coastal geomorphology, as well as for informed decision-making in coastal management (Frihy, 2008, Nunez et al., 2022), hydrodynamic analysis (Hummel and Stacey, 2021), and pollution treatment (Twilley et al., 2018).
The study of coastal shapes is an essential aspect of coastal geomorphology, as it provides valuable insights into the underlying processes and factors that shape coastal environments.By examining the geometry of coastlines, researchers can infer information about the factors that have contributed to the development and evolution of these diverse landscapes.This is an intelligent decision process supported by multi-factor reasoning.The relevance of the study of coastline shapes extends beyond mere academic interest, as it also has significant practical implications for various aspects of coastal management, conservation, and planning.Researchers and coastal managers have long recognized the importance of accurately classifying coastlines based on their geometric properties in order to identify patterns (Hummel and Stacey, 2021), reduce hazards (Yang et al., 2017), and better comprehend the factors driving coastal evolution (Porter-Smith andMcKinlay, 2012, da Silva et al., 2018).
Over the past few decades, researchers have developed various methods for analyzing and classifying coastlines, ranging from simple visual assessments to Remote Sensing (RS) (Aghdami-Nia et al., 2022) and Geographic Information Systems (GIS) based approaches (Hoonhout et al., 2015, Ghorai and Mahapatra, 2020, Zhang et al., 2022).Generally, these classification methods are based on rule analysis with domain knowledge from morphology and geo-science.However, existing rule-based methods often face limitations in terms of objectivity, automation, and scalability, particularly when dealing with highresolution spatial data.Traditional manual or semi-automated classification approaches are not only time-consuming and labor-intensive but also inherently subjective, which may hinder the reproducibility and comparability of the results.Furthermore, many existing algorithms are not specifically designed to handle the unique geometric properties and topological complexities of vector-based coastline representations.Alternately, we can utilize a data-driven approach, specifically by developing a machine learning model to learn the classification knowledge and rules from training data samples, given that large volumes of coastline data are being accumulated in coastal applications.
In this paper, we present a novel data-driven vector coastline shape classification model designed to address the limitations of the existing methods.Specifically, we propose a computational method to divide a complex coastline into a sequence of bends, and further propose a diverse set of features to describe the morphological characteristics of each bend.Furthermore, the morphological features of each bend and their sequences are integrated into a deep learning model Recurrent Neural Networks (RNNs, more specifically, Long Short Term Memory LSTM) to model the essential morphological characteristics of various coastline types and make the final coastline classification.Among multiple different deep learning models, we employ RNNs considering that the data structure is a one-dimension sequential structure.Via a comprehensive dataset of coastline samples, we train and evaluate the proposed coastline classification model.The evaluation results show that the proposed model outperforms all baselines, achieving a classification accuracy of 93.20 %.
By proposing an innovative vector coastline shape classification algorithm, we aim to contribute to the advancement of coastal geomorphology research and facilitate the development of more effective and sustainable coastal management strategies, informed by a deeper understanding of the complex morphological patterns observed in coastal environments worldwide.
The subsequent structure of this research paper unfolds in the following manner: Section 2 succinctly revisits previous literature on coastal typologies and the deployment of deep learning in the context of morphological classification.Section 3 introduces the conceptual framework and outlines the components of the proposed model.Section 4 undertakes a detailed evaluation and analysis of this model.Section 5 discusses the results, while Section 6 encapsulates the conclusions derived from the study.

Coastline typology
Coastline classification has always been a focal point of study (Wang and Aubrey, 1987).Different taxonomies emerge under various classification standards, primarily including categories based on genesis, morphology, and application-oriented classification.
Genesis-based methods focus on the formative and evolutionary dynamics and material conditions of coastlines, as illustrated by many early studies (Johnson, 1919, Shepard, 1937, Cotton, 1952, Putnam and Axelrod, 1960, Shepard, 1968, Inman and Nordstrom, 1971).While these traditional classification systems hold theoretical significance in comprehensive research in areas such as geological structures, marine dynamics, marine ecology, and others, they may involve many complex interrelated factors that require comprehensive data and multidisciplinary knowledge.They also require substantial experimental data and specialized coastal knowledge as prerequisites.
Morphology-based methods view the shape as a direct manifestation of the coastline's genesis, which also influences its evolution.Scholars like Inman and Nordstrom, (1971), who integrated some morphological features, and Bartley et al. (2001), who conducted pure morphological cluster analysis of the Mexican coastline based on its complexity, have dedicated efforts to classify coastlines using morphology data.Further, Dürr et al. (2011) combined morphology data with hydrology and lithology for classification, along with many other researchers (Del Río et al., 2013, Nyberg andHowell, 2016).The challenge with such classification methods lies in the substantial difficulty of quantitatively describing coastline morphology, leading to a degree of ambiguity of these methods.
Additionally, there are application-oriented classification systems combining morphology and genesis, acting as a compromise between the two classification approaches.These include Frihy (2008) on coastal management differences, and Mack, Theuerkauf, & Bunting (2020) for socio-economic studies.While these methods are useful for specific application studies, they generally lack universality and atomicity for extension into other research areas.
Traditionally, coastlines were represented in raster formats (e.g., via remote sensing images).Such data are often noisy, due to the influence of clouds and trees/vegetables around the coastlines.As a result, applications based on raster-based coastlines need to deal with such influence before applying image classification and other data analysis methods.Due to the rough representation of coastlines, morphological classification of raster-based coastlines often leads to unsatisfying outcome.Compared to remote sensing images of coastlines, vectorformat coastlines are gaining increasing interest owing to their superior accuracy, editability, rapid update capability, measurability, flexibility in simulation, and visualization.Vector-format coastlines can depict the morphology of coastlines with great precision while requiring minimal storage capacity.They support intricate GIS analysis models such as topological analysis, buffer analysis, and network analysis, and can be augmented with rich attribute information, demonstrating notable extensibility.Existing CMS systems are in immediate need of a fundamental morphological classification method tailored to vector coastlines, serving as a vital foundation for coastal analysis and management.
However, the classification of vector-format coastlines presents a multifaceted challenge.Firstly, computational methods to represent and calculate the morphological features of coastlines are largely missing, or need to be significantly extended.Findings on the modelling of geometrical features of shapes from computational geometry, cartography, and GIS are still not integrated for coastline analysis.Moreover, rule-based classification techniques tend to be developed with specific applications in mind, and often perform poorly on unseen or slightly different data.Meanwhile, creating a comprehensive rule set might be recourse-intensive and error-prone.Consequently, a data-driven approach, which implicitly learns classification rules from training dataset, emerges as a viable option for classification.

Recurrent neural networks for vector data
With the development of artificial intelligence, data-driven intelligent classification methods have provided new insights into solving the problem of coastline classification.Deep learning, as a technology based on artificial neural networks, relies on large amounts of data for learning and inference.Under the framework of connectionism, deep learning models can be categorized into different types and applications based on the characteristics and structure of the data.For example, convolutional neural network (CNN) is particularly suitable for handling gridstructured data like images, while graph convolutional network (GCN) has been developed to handle graph structures, making it suitable for graph-based tasks like network analysis.For tasks like coastline morphology analysis, its one-dimensional linear structure can be viewed as a spatial sequence, and thus, RNN and its related models become applicable choices within the realm of deep learning.
RNNs models, particularly their advanced variants such as Long Short Term Memory (LSTM) (Hochreiter and Schmidhuber, 1997) and Gated Recurrent Unit (GRU) (Chung et al., 2014), find applications not only in time series problems, such as climate change and weather forecasting (Ardabili et al., 2020, Hewage et al., 2021) and crowd prediction (Zhang et al., 2022), but also in semantic series problems, such as natural language processing (Sutskever et al., 2014), and in spatial series problems, for example, trajectory prediction (Alahi et al., 2016, Altché andde La Fortelle, 2017).Whether it's time series, signal data, or sequences of language and text, they can all be abstracted/conceptualized as a one-dimensional linear structure, which is fundamentally consistent with the curving distribution sequence of coastlines.Every point on the coastline has a spatial connection with its neighboring points, just as every point in a time series is connected to the points before and after it in time.Therefore, in this study, choosing a model based on RNNs becomes an appropriate solution, as they can capture the sequential dependencies in this one-dimensional linear structure.
Furthermore, in recent years, RNNs and other related deep learning models have been extensively applied in GIS, especially in dealing with issues pertaining to vector map data.These applications encompass understanding shape encoding of area objects (Xiongfeng et al., 2021), cartographic generalization of island boundary lines (Du et al., 2022), multiscale representation of residential areas (Feng et al., 2019), and the segmentation of linear geographical features (Yang et al., 2022).
In RNN and other deep learning methods, the encoding and decoding of data are key steps, especially in the fields of geographical information pattern recognition, map generalization, and related areas.For these types of problems, the encoding method must be able to accurately describe the complex structure and relationships of geographical and spatial information.Therefore, relevant research has adopted various encoding methods for vector geometric data (Yan et al., 2020, Mai et al., 2022).Deep learning models have also been extensively used in the classification of morphological patterns in vector map data: Yan et al. (2019) on the classification of building patterns (Yan et al., 2019), Wang et al. (2020) on road network patterns classification (WANG et al., 2020), Yu et al. (2022) on river network patterns (Yu et al., 2022), and Yang et al. (2022) on administrative boundary shape categories (Yang et al., 2022).
Deep learning networks can acquire decision-making knowledge directly from sample data (Shi et al., 2022, Xu et al., 2022), bypassing the need for complicated shape classification rule setting.The morphology of vector coastline data determines that sequence deep learning models are the most suitable model paradigms for its processing.Consequently, this study attempts to comprehensively describe the morphological characteristics of each coastline as a spatial series and construct an RNNs-based model to classify its morphological types.

Morphological typology of coastlines
Coastline classification has long been a fundamental technique in coastline research.As previously noted, there are numerous methods for classifying coastlines, but few have approached them from a morphological perspective, despite the proved significance of coastline morphology.Combining data-driven models with domain knowledge can amplify the usability, reliability, and interpretability of the classification models.Coastline morphological features reflect both marine science knowledge and coastal geomorphology knowledge.In this research, we seek to propose a coastline categorization specific to coastline morphology with domain knowledge.
It's extremely costly to describe the coastline shape with a precise numerical method.As mentioned in the related work section, many classical genesis typologies have been proposed by researchers.Considering the traditional genesis-based coastline classification, and the two morphological classification criteria, directionality and sinuosity, of linear objects in cartography, we try to propose a morphological classification of coastlines.Our classification is inspired by a typical evolution of shorelines, proposed by British scholar Johnson in his famous book "Shore processes and shoreline development" (Johnson, 1919, Shepard, 1937, Cotton, 1952, Putnam and Axelrod, 1960, Shepard, 1968, Inman and Nordstrom, 1971).In this pioneering work, he summarized the development process of coastlines of submergence into the following stages: (a) early stage, (b) young stage, (c) sub-maturity stage, and (d) maturity stage, as Fig. 1 shows.His classification has been employed by many existing studies in coastline research.These four stages correspond to four types of natural coastlines, which will be also employed in this work.When looking closely at modern coastline data, it is easy to find many artificial coastlines, which have orthogonal characteristics in morphology and are obviously different from natural coastlines.Therefore, we add artificial coastlines as the fifth type of coastlines.
To describe the morphological characteristics of these five types of coastlines more systematically, we focus on their directionality and sinuosity, the two main criteria for the morphological classification of linear objects in cartography.
Elongated (Type I): The first type corresponds to the "early stage" in Johnson's coastline evolution and is sinuous with variable directionality.This type of coastlines is characterized by a narrow and long middle, represented by fjords.Such coastlines were developed in the evolution stage when the sea level rose and submerged the valley.
Broad (Type II): The second type corresponds to the "young stage" in Johnson's coastline evolution, the beginning of sea erosion.Such coastlines are smooth with opposite directionality, and typically represented by lagoons.Compared with type I, the morphological feature of this type is that the coastline is relatively smoother, the narrow length in the middle has decreased, and the space has gradually increased.Lagoons have significant ecological relevant, and they were also listed as an important type in many previous classification studies.
Rugged (Type III): The third type corresponds to the "sub-maturity stage" in Johnson's coastline evolution and is the result of highly developed marine erosion.Such coastlines are sinuous with stable directionality.This type of coastlines is characterized by fixed direction but high roughness, and it is also widely distributed in some typical island reef coastlines.Roughness plays an important role in the collection of marine planktonic matters, so it is also one of the types that are commonly studied in coastal research.
Smooth (Type IV): The fourth type corresponds to the "maturity stage" in Johnson's coastline evolution and is smooth with stable directionality.This type of coastline is the simplest coastline, and its shape is relatively straight.Compared with Type III, it is more indicative of lithological hardness and fault activity and has the function of dividing geological structural units.
Orthogonal (Type V): The fifth type is orthogonal.With the construction of cities, the proportion of artificial coastlines is gradually increasing, which has also received extensive attention from coastal research.In terms of morphology, the artificial coastline has the characteristics of partial orthogonality.

Overview of classification model
This study aims to propose a data-driven deep learning model for automatic classification of vectorized digital coastline segments, using the morphological classification introduced above.The methodology for classifying coastline segments is composed of the following components (Fig.

Shape modeling of coastlines
The modelling and description of coastlines have always been challenging in the study of coastline morphology.When modelling coastlines, two general categories of methods can be imagined: global modeling and local/sequence modeling.
The global modeling method is characterized by a global investigation of the whole coastline.This method attempts to accurately express the shape of the whole coastline through a global analytical equation or reflect the shape of the whole coastline through global feature extraction.This kind of methods has been studied in other disciplines but not often employed in coastline studies, mainly because: 1) Since a coastline is often a very complex curve, the computational cost of the global equation or numerical modeling method is high, and the result is complicated to analyze.Even with the advances in computer technology and GIS technology, modeling of complex coastlines is still costly.2) Too little morphological information can be extracted from the global features, since global features are aggregation of detailed characteristics of a coastline and aggregation removes many essential details.Numerous researchers have attempted to correlate morphological features with coastal phenomena, but only fractal dimensions and curve complexity were commonly used.Analyses of global morphological features are still insufficient for many coastal applications.
It is well-known that lines are 1-dimensional, and polygons are 2-dimensional graphics.Harsdorff dimension (Mandelbrot, 1967) is one of the most used parameters in fractal analysis, which extends the measure of morphological dimension.Coastlines, an extremely complex curve, are found to be a fractal shape, whose Harsdorff dimension is between 1 and 2 (Mandelbrot, 1967).This indicates that coastlines are a special shape between lines and polygons.Therefore, it is difficult to fully describe its continuity and morphological differences by using the global modeling methods of lines or polygons alone.This paper proposes to divide a coastline into a sequence of bends, whose morphological characteristics are then analyzed individually (and thus "locally") (Fig. 3).
A bend can be defined as a part of a line that contains a number of subsequent vertices, with the inflection angles on all vertices included in the bend being either positive or negative and the inflection of the bend's two end vertices being in opposite signs (Wang and Müller, 1998).Plazanet defined a bend in a similar way: bend is a fraction of a curve between two consecutive inflection points (Plazanet et al., 1995).The segmentation of a line into a series of bends can be done via an inflection point segmentation algorithm.
Many bend segmentation methods have been proposed in the research domains of computational geometry, cartography, and GIS.Among them, the feature point method (Hehai, 2003), the monotonic chain method (Chen et al., 2011), and the Delaunay triangulation multiscale bend segmentation method (Ai et al., 2017), are commonly used.Each method has its own advantages and disadvantages and has its own applicable situation.Most of the bend segmentation methods are extensions of bend definition for specific purposes, and their core lies in the identification of bend turning points, i.e., inflection points.Note that these methods, proposed in other disciplines, have not been applied for coastline modelling.
In this paper, the inflection point based bend segmentation method (Wang and Müller, 1998) is applied and extended for the first time to segment a coastline into a sequence of bends, with the following three specific segmentation steps:  (1) Preprocessing: To start, a small threshold of σ is applied to the Douglas-Peucker (DP) algorithm (Douglas and Peucker, 1973) of the coastline to simplify its geometry.This is done to remove duplicate points and points that are very close together, allowing all the vector coastlines to be on the same scale.When applying the DP algorithm, we adjusted the tolerance parameter σ of the algorithm and monitored the area enclosed by the sample's endpoints.We ensured that the Intersection over Union (IoU) ratio before and after applying the DP algorithm remained above 95 % to maintain stability in shape labels during DP processing.
(2) Point classification and bend detection: The direction of each point, except the start and end points, is then determined based on whether it turns left (counter-clockwise) or right (clockwise).
Points with counterclockwise rotation are marked as positive angles (marked as red in Fig. 4a) and with clockwise rotation as negative angles (marked as green in Fig. 4a).All consecutive points whose angles have the same sign are considered to form a bend.(3) Bend correction: As shown in Fig. 4b, the initially detected bends require certain adjustments.A highly bending bend, which means the sum of inflections for all vertices inside a bend is too large, could make the neighbor bends intersect with themselves.The end points should be moved to reduce the accumulative angle of all vertices until the intersection between two neighbor bends disappears.For a gentle bending bend, which means the inflection that marks the end of a bend is small, people would not recognize this as the end point of a bend.The end points should be moved outward only when the inflection angle is small, and the new baseline is shorter than the old one.
Using the method above, a coastline can be segmented into a sequence of bends.Meanwhile, these bends will have the following obvious characteristics: 1) Positive bends and negative bends are always next to each other.2) Each bend is next to another and covers every vertex along the whole coastline.

Coastline morphological features based on bend sequences
In this section, we propose a set of features to represent the morphological characteristics of each bend, based on relevant literature in computational geometry, cartography, and GIS.The proposed morphological features of each bend can be divided into three groups: size-related, direction-related, and complexity-related features.

Size-related features
Size-related features describe the size and scope of the bend.Spe- cifically, as shown in Fig. 5a, four features are adopted in this research: • Bend area (S1-BdA): It is the area (i.e., Ar in Fig. 5a) between the bend and its baseline, while the baseline of a bend is simply the straight line connecting the first node and last node of the bend.• Bend length (S2-BdL): It is the length of the bend, i.e., Len in Fig. 5a.Fig. 3. Shape modelling of coastlines: global approaches (left), and the proposed sequence approach in this paper (right), which divides a coastline into a sequence of bends.The global approach treats a coastline as a whole, and extracts information like the total length, average global curvature, the area of global envelop, minimum bounding rectangle, and etc.Such information is computed for the while coastline, shown as X → in the figure.In other words, the global approach employs a feature measurement window that encompasses the entirety of the coastline.In contrast, sequence modelling approaches, as also employed in this work, first divide a coastline into several sub-parts (i.e., bends in this work).Each sub-part is then analyzed individually to study its morphological characteristics (x t → ).By doing this, the sequence modelling approaches allow to provide a detailed analysis of the coastline, and therefore are able to more accurately describe the morphological characteristics of coastlines.

Direction-related features
Direction-related features describes the angle and orientation of a bend, including: • Baseline deflection angle (D1-BdDA): It is the deflection angle between the current bend baseline and the previous bends baseline, shown as θ in Fig. 5a.The anticlockwise rotation angle is specified as positive.The first bend of each coastline sample has a baseline deflection angle of zero.• Rotation angle of the minimum bounding rectangle MBR (D2-MBRRa): It is the direction that the longer axis of the bend MBR points to, shown as α in Fig. 5b.Specifically, it is computed as the angle between the longer axis (e.g., l a ) of the bend MBR and the baseline of the whole coastline.Similarly, the baseline of the whole coastline a bend belongs to is simply the straight line connecting the first node and last node of the coastline.

Complexity-related features
Complexity-related features describe the difference between the bend shape of the shoreline and some standard shapes, and they include the following 7 kinds of features.
• Bend meandering (C1-M): Meandering refers to how a successive and habitual curve deviates from the straight direction.As shown in Fig. 6, it can be calculated as the ratio of the difference between the maximum arc length of the MBR and the curve length to the arc length (Ichoku and Chorowicz, 1994) where Len is the length of the bend in Fig. 5.
• Bend curvature (C2-BC): It is the rotation rate of the tangent direction angle to the arc length for a point on the curve, which reflects the complexity of the shape change.and the reciprocal of the curvature is the radius of curvature.In the case of a line (i.e., bend in this work), its average curvature is usually calculated using the angle difference between the start point and the end point.
where θ 1 and θ 2 are the baseline deflection angles at the start and end points of the bend (see Section 3.4.2),and l is the length of the bend.See Fig. 5 for more details.
• Bend Energy (C3-BE): It is an evaluation of the overall shape and complexity of the contours.It is defined as the accumulation of the curvature of the boundary of the polygon features.The greater the frequency of the direction change of the contour, the greater the bending energy and the higher the boundary complexity.In the case of a line, it can be computed using discrete formula as follow.A. Gao et al. be where N is the number of points of each bend, Δθ n and l n refer to the rotation angle and arc length change in every line unit, which is formed by two consecutive points.
• Eccentricity (C4-E): It is the ratio of the length of a shape's longest chord to its vertical longest chord.It can also be expressed as the ratio of the length (i.e., l a ) to the width (i.e., l b ) of the MBR of the shape, as shown in Fig. 5b.e = l a l b (4) • Rectangularity (C5-RE): It reflects how similar a bend is to a rectangle and is defined as the ratio of the area of the shape (i.e., bend area S bend ) to the area of its smallest circumscribed rectangle (see Fig. 7).Its value range is (0,1], with 1 being a rectangle shape, 0.5 being a triangle shape, and π/4 being a circle shape.The formula is as follows, where S mbr refers to the area of the MBR of the bend.

re = S bend S mbr
(5) • Compactness index (C6-CI): It is defined as the ratio of the area of the bend (i.e., bend area S bend ) over the circle whose circumference is the same as the circumference of the bend.As illustrated in Fig. 7, the circumference (Cir) of the bend is the sum of the bend length (BL) and its baseline length (Len).

LSTM model
Based on the methods proposed in Section 3.3 and Section 3.4, each individual coastline can be segmented into a sequence of bends, and each bend is then represented as a series of morphological features.In the following, we propose a deep learning model to classify the morphological type of a coastline (See Section 3.1), taking its bend sequence and bend features into account.

Rnn-based model
The basic deep learning model represented by the multi-layer perceptron (MLP) can perform excellent modeling on the multidimensional attributes of data but adapt to the structure of the data ineffectively.For example, it is impossible to get the meaning correctly, just by analyzing the words of a sentence and ignoring the word ordering.The inherent structure of sequence data is crucial to be considered.RNNs are deep learning model designed for sequence data and are widely used in deep learning tasks of series, that are interdependent with the orientation of sequence.Each neuron of RNN receives the input of the current information and the memory information generated before, so it can preserve the sequence dependence.Meanwhile, this process can be reversed and the RNN model can perform bidirectional propagation.

LSTM unit
The data-driven model based on the simple RNN structure has the problem of vanishing gradient during the training process, which makes the simple RNN model only suitable for processing short sequences.Therefore, many improved RNN models have been proposed to deal with sequence of different lengths.The most famous of these is the long shortterm memory networks (LSTM) model.In this paper, the LSTM model (Fig. 8) is employed to model the characteristics of the bend sequences.
For the current LSTM unit at time t, the input is the vector x t (i.e., the morphological features of a bend in this paper), the variable h t− 1 with hidden information from the previous t-1 unit, and c t− 1 with the information throughout the whole sequence.In the LSTM unit at time t (i.e., for the t th bend in the coastline), update h t− 1 and c t− 1 through the forget gate, update gate and output gate.The concrete formulas are as follows: Forget gate (determining what is relevant to be kept from the previous LSTM unit) Update gate (determining what information is relevant to update in the current LSTM unit) Equation of cell state at time t: Output gate (determining the present hidden state that will be passed to the next LSTM unit) A. Gao et al.
where the initial value c 0 = 0, h 0 = 0, and operator ⨀ denotes the Hadamard product (element-wise product).The subscript t indexes the time step (i.e., the bends).x t ∈ R d is the input vector to the LSTM unit.f t , i t , o t ∈ (0, 1) n are the activation vector of the forget gate, update gate and output gate, respectively.ct ∈ ( − 1, 1) n is the cell input activation vector and h t ∈ ( − 1, 1) n is the hidden state vector.c t ∈ R n is cell state vector.The W ∈ R n×d , U ∈ R n×d and b ∈ R n are weight matrices and bias vector parameters which need to be learned during training, where the superscripts d and n refer to the number of input features and number of hidden units, respectively.

The structure of the proposed classification model
Based on LSTM, we build a data-driven coastline classification model.As mentioned before, each individual coastline can be represented as a sequence of bends and their corresponding morphological features (Table 1).Given that there may exist a high number of bends in a coastline, we specifically employ LSTM instead of the traditional RNN, with the aims to avoid the gradient vanishing and explosion problems.Meanwhile, due to the sequential symmetry of the coastline shape, the bidirectional LSTM model, instead of the unidirectional one, is adopted, which can help to capture more morphological characteristics of a coastline.
We built a vector coastline morphology classification model Bend-SeqLSTM as shown in Fig. 9. First, employing the methods proposed in Sections 3.3 and 3.4, we split a coastline (whose type is to be classified) into a sequence of bends X = , and describe all the morphological characteristics of each bend as x → t , where t is the ordinal number of the bend in the coastline.Then the bend sequence X is input into the bidirectional LSTM layer, where its sequence structure information is aggregated into the hidden variable of the first layer H [1] .The number of LSTM units equals the number of bends in a coastline.Afterwards, H [1] is input into the two fully connected layer, in which the sequence structure information extracted by LSTM is enriched to obtain H [l] .Finally, a SoftMax layer is integrated to output the hidden variable H [l] as probabilities Ŷ for the five coastline morphological classes (proposed in Section 3.1).The class with the highest probability is then assigned as the morphological class of the coastline. where are the trainable parameters of (k + 1) th fully connected layer.The superscript l is the number of fully connected layers.

Loss function and optimization
Specifically, this model is composed of a layer of bidirectional LSTM network, followed by two fully connected layers, and a SoftMax layer.The length of the LSTM layer is set to the maximum sequence length among the samples.Samples shorter than this length are padded with a pre-defined mask value.A mask layer is applied for preprocessing before feeding the data into the LSTM layer.Each LSTM unit is configured with 4 cores.The fully connected layers consist of 128 units.
The coastline morphological classification is a discrete multiclassification problem.Following existing literature, the loss function used in this study is multi-class cross entropy: where M denotes the number of coastline samples, and N is the number of morphological classes (i.e., N = 5 in this work).y i,j denotes to the ground truth.If the i th sample belongs to the j th class, y i,j = 1.Otherwise, y i,j = 0. ŷi,j represents the probability that the model predicts that the i th sample belongs to the j th class, which is determined by the SoftMax layer in Equation ( 15).θ are all learnable parameters of the model.The goal of our model training is to find the parameter θ that minimizes the loss.

Table 1
Feature extraction in the bends sequence unit.Since the Adam optimizer (Kingma and Ba, 2014) has been widely used in machine learning and deep learning models, we train the model using this method (with an initial learning rate of 0.01) to learn the learnable parameters.Meanwhile, the proposed model is trained via back-propagation using a batch size of 64 and 300 iterations.

Evaluation settings 4.1.1. Dataset
A dataset must be constructed to train and evaluate the model.The development of the coastline classification dataset is comprised of two parts: segmentation and labeling (Fig. 10).Firstly, adhering to the A. Gao et al.Gestalt principle (Kohler, 1967), three coastal experts choose coastline segments that are distinctly characterized by the morphological categories from satellite images.Acknowledging the multi-scale effect of coastline morphology, this paper employs a circular sampling window with a radius from 2 cm to 3 cm to traverse at three fixed scales of 1:50,000, 1:500,000, and 1:1,000,000, thereby visually segmenting all homogenous coastline segments.Secondly, each expert individually labels the morphological type of each selected coastline segment.Finally, coastline segments that are unanimously labeled by all three experts are included into the database as samples.Other coastline segments are discarded.This method ensures both precision in selection and consensus in categorization, providing a solid foundation for further analysis and modeling.
In total, we collected a total of 2,500 coastline samples from satellite images, with 500 samples for each of the five morphological classes.Please refer to Fig. 11 for some examples of these five classes.Then the dataset was divided into a training set, validation set, and testing set at a ratio of 4:2:4.

Baselines
Unfortunately, there exists no relevant coastline classification models in the literature.Therefore, we compare our proposed Bend-SeqLSTM model with other state-of-the-art classification models, considering their popularity.Meanwhile, for selecting suitable baselines to be compared, we also consider the main components of the proposed model, including the coastline modelling (i.e., bend-sequence-based modelling instead of global modelling), the RNN unit type (i.e., LSTM instead of simple RNN), and bidirectional (instead of unidirectional).Finally, 11 baselines were selected for the comparison.A. Gao et al.
• Bi-GRU: Compared to BendSeqLSTM, but it employs GRU units instead of LSTM units.The GRU unit is also a typical strategy proposed to improve RNN models.They are used frequently because of their easier training and lower number of parameters compared to LSTM units.• Uni-GRU: Compared to Bi-GRU, it employs a unidirectional GRU.
• PointSeqLSTM:According to the method described in this paper (Yu and Chen 2022), coastlines are sampled into point sequences, with the x and y coordinates of the points used as input features for learning and classification through a bidirectional LSTM network.This model is designed to test the difference between the x/ycoordinate-sequence solution and the method proposed in this article.
• SVM: The support vector machine (SVM) is a classic and popular machine learning model for classification or regression tasks.SVM is selected mainly because it is found to be one of the most robust prediction methods with high performances and it is easy to train.For SVM we only consider the global description features of the coastlines (instead of bend sequence modelling).
• RF: The Random forests (RF) (Breiman, 2001) is an ensemble learning method for classification or regression tasks that operates by constructing a multitude of decision trees at training time.For RF, we only consider the global description features of the coastlines.• XGBoost: The eXtreme Gradient Boosting (XGBoost) algorithm (Chen and Guestrin, 2016) is a popular tree-based method for classification tasks.For XGBoost, we only consider the global description features of the coastlines.• ANN: Artificial neural network (ANN) is a basic deep learning network, consisting of multiple fully connected layers, and extensively used in various classification tasks.It has been used by García Balboa & Ariza López (2007) for morphological classification of roadways, which bears some similarity to the classification task in this paper.For ANN we only consider the global description features of the coastlines.• ResNet50:Residual Network 50 (ResNet50) is a pre-trained deep learning model from the Residual Networks series, notable for its 50layer depth structure.Excelling in computer vision tasks such as image classification, ResNet50 stands as one of the most representative models in image-related tasks, especially for classification (He et al., 2016).

Evaluation metrics and main objectives of the evaluation
Similar to many existing studies on classification tasks, we employ accuracy, precision, recall, and Cohen's kappa coefficient to evaluate the performance of the proposed BendSeqLSTM model and the 9 baselines.
• Accuracy: It is the percentage of correctly inferred coastlines out of the total number of coastlines, which is an intuitive performance measure.And this metric refers to all categories.• Precision: It is the percentage of correctly identified coastlines out of the total number of coastlines identified with that specific class.Each coastline class has a precision value.• Recall: It is the percentage of correctly inferred coastlines out of the number of coastlines of that specific class.Each coastline class has a recall value.• Cohen's kappa coefficient: It is an accuracy metric normalized by the imbalance of classes in the data, by considering the possibility of correct identification occurring by chance.It compares the actual accuracy with the expected accuracy, which is the accuracy that results from classification by random chance.It refers to all categories.
For the evaluation, we would like to answer the following questions: • How do the LSTM units, simple RNN units, and GRU units perform differently in the classification of coastlines?(Section 4.2) • Is there an improvement in coastline morphological classification performance with a bidirectional sequence model, compared to a unidirectional one?(Section 4.2) • Can bend-sequence-based classification models improve coastline morphological classification performance compared to global classification models and point-sequence-based model?(Section 4.2) • Does a vector-based classification approach offer advantages over deep learning based image classification approaches?(Section 4.2) • How does our proposed BendSeqLSTM model perform in each category of coastline classification?(Section 4.3) • How do the size-related, direction-related, and complexity-related features contribute to the classification results respectively?(Section 4.4)

Model comparison results
Table 2 compares the performance of all the models in the test dataset.The results are displayed according to the coastline modelling (i.e., bend-sequence-based modelling vs. global modelling), directionality (bidirectional vs. unidirectional), and the classification unit type (i.e., LSTM vs. simple RNN vs. GRU vs. others).
As can be seen from Table 2, compared to all the baselines, the proposed BendSeqLSTM performs the best, achieving the classification accuracy of 93.20 % and Cohen's kappa coefficient of 91.50 %.Bi-GRU also performs well, followed by Uni-GRU and Uni-LSTM.Uni-sim-pleRNN performs the worst.
In term of the coastline modelling approaches, those models employing the proposed bend-sequence-based modelling generally perform better than those with global modelling of coastlines (except Uni-simpleRNN) and Point-sequence-based.
For the point sequence method, we conducted an equidistant sampling of 2000 points for each coastline sample, which is for the balance between the sequence length and the accuracy of shape representation.Only the x and y coordinates were used as input features for training and testing the model, resulting in a classification accuracy of only 66.90 % and a kappa coefficient of 58.62 %.
For bend-sequence-based classification models, bidirectional models significantly outperform their unidirectional variants, with an accuracy improvement between 5 % and 10 %.With regards to the RNN units, as expected, simple RNN models perform significantly worse than LSTMbased and GRU-based models, with about 10 % accuracy drops.
We also tested the image-based ResNet50 classification model by scaling and rasterizing all vector samples into 224*224 sized images for classification, which is the standard image size used by ResNet50.As shown in Table 3, the proposed vector-based method BendSeqLSTM outperforms ResNet50, in terms of both the classification accuracy and Cohen's kappa coefficient.Besides, the total number of parameters of ResNet50 is much larger than that of BendSeqLSTM.

Misjudgment analysis
In order to further evaluate the performance of the proposed Bend-SeqLSTM model on each class of coastline samples, we analyze the confusion matrix of the model output on the testing dataset (Table 4), including the precision (PR) and recall (RC) metrics of each coastline class.
In general, the proposed model performed very well for Type I "elongated coastlines", Type II "broad coastlines", and Type V "artificial coastlines", and relatively poorer for Type III "rugged coastlines" and Type VI "Smooth coastlines".However, all the precisions and recalls are higher than 85 %, suggesting that the proposed model is able to differentiate among these five coastline types.Upon checking the confusion matrix, it can be noted that there are a small number of samples within Type III and Type VI that are prone to mutual confusion.Both Type III and Type VI exhibit stable directions, therefore direction-related features contribute less to the differentiation of these two classes.

Importance of the morphological features
Each bend is represented as a set of morphological features, being categories into three dimensions: size-related, direction-related, and complexity-related.To further investigate the feature importance of these three dimensions, we replicated the evaluation using the different feature combinations of these dimensions.Table 5 shows the performance of all possible feature combinations.
As can be seen from Table 5, among all possible combinations, the consideration of all three dimensions leads to the best classification accuracy.The results also show that direction-related features contribute the most to the classification model (achieving 85.80 % when using direction-related features alone), followed by complexity-related features (achieving 81.20 % when using complexity-related features alone).Size-related features contribute the least to the overall model accuracy, with a mere 1.10 % drop in accuracy upon ignoring them.However, they still contribute to the classification task.

Discussion
To classify coastlines based on their morphological characteristics, this paper proposes the BendSeqLSTM model, which first represents a coastline as a sequence of bends and their corresponding morphological features, and then employs a bidirectional LSTM recurrent neural network model to make classification.The evaluation results show that the proposed BendSeqLSTM model outperforms all the baselines, achieving a classification accuracy of 93.20 %.This highlights the superior performance of the proposed model, making it a promising method to classify coastlines for applications in coastal analysis and management.
The evaluation results also show that classification models employing the proposed bend-sequence-based modelling generally perform better than those with global modelling of coastlines.This suggests that to more accurately describe the morphological characteristics of a coastline, it is important to segment the coastline into a sequence of bends and describe the morphology of each bend individually.This approach retains the geographical semantic features of the original structure while providing a comprehensive description of the twodimensional sequence units.In contrast, global modelling approaches treat a coastline as a whole shape, and thus are not sufficient to describe fine-grained morphological details of a coastline.
Meanwhile, the evaluation results show that bidirectional sequence networks consistently outperform unidirectional networks utilizing the same deep learning units.This result is expected, as the morphology of the coastline is evidently irrelevant to the direction of the input.Utilizing bidirectional sequence networks eliminates the influence of the order of description when portraying coastline morphology in computational models.Furthermore, bidirectional sequence networks also help to capture more morphological characteristics of a coastline.
It was observed that the models using LSTM as sequential units demonstrated superior performance across all groups in terms of accuracy and kappa coefficient.The models employing GRU as sequential units displayed performance marginally inferior to the LSTM models, while the models with simple RNN units performed the poorest in all groups, significantly below both LSTM and GRU unit models.This suggests that the continuity between sequential units contributes significantly to the recognition of coastline morphology.Therefore, a comprehensive recognition and modeling of coastline forms is necessary.
In this work, we proposed to model the morphological characteristics of each bend using size-related, direction-related, and complexityrelated features.The evaluation results demonstrate that these three groups of features all positively contribute to the classification model, with the most significant contribution from direction-related features,   followed by complexity-related and size-related features.This is expected, as direction-related features retain the most shape information of a coastline, and help to capture the connectivity between bends for sequence models.Complexity-based features can describe the implicit information of some shapes in an aggregated manner, making a relatively important contribution to the classification task.
Compared to features constructed based on bends (as in the proposed method in this study), the approach based on point X and Y coordinates shows inferior performance.We attribute this to two main reasons: First, the longer length of point sequences compared to bend sequences increases the problem of gradient vanishing (Bengio et al., 1994).Additionally, compared to handcrafted features, automatically extracting higher-level features solely from X and Y coordinates is too inefficient, manifesting as difficulty in model convergence.In the experiments, even when we used better hardware (P100 GPU, 16G VRAM) and larger batch sizes to increase the speed of training convergence, PointSeqLSTM still required more than 2500 epochs to meet convergence criteria, while BendSeqLSTM needed just less than 300 epochs.So the end-to-end approach based on point X and Y coordinates has significant disadvantages in terms of training process and classification accuracy.
We also compared the proposed vector-based methods with an image-based classification approach, exemplified by the ResNet50 model, which is one of the state-of-the-art image classification methods.While ResNet50 yielded fairly good classification outcomes, its performance was still not as strong as that of our proposed BendSeqLSTM model.Indeed, end-to-end image classification models are welldeveloped, however, they require consistent formatting of input images.This leads to unavoidable distortions during the rasterization, as it's challenging to find a set of rasterization parameters suitable for all samples.Additionally, popular image-based methods all involve a large number of parameters and require significant computational resources during training.Furthermore, it is also unclear how the characteristics of coastlines are used by such deep learning models during the classification process, leading to the poor model explainability (aka the black box problem).Consequently, compared to adopting existing image-based approaches, vector-based methods with hand-crafted features, like the one proposed here, offer greater advantages in terms of data flexibility, ease of application, model explainability, cost efficiency in model training and transfer, as well as in overall performance.
Several limitations within this study warrant attention.Firstly, as a data-driven classification algorithm, the quality of the data determines the performance of the trained model.The dataset employed in this study is relatively small and might not cover all coastal forms.An ideal dataset would encompass a globally distributed, sufficiently large, and comprehensive collection of coastal samples, which is challenging to establish.Future work should focus on creating such a dataset, and further evaluating the proposed classification model using the new dataset.Secondly, while the study accomplishes automatic classification of coastline samples, the creation of the coastline sample dataset still employs manual extraction of coastlines samples from the entire coastline.As a future work, automatic segmentation techniques for entire coastlines should be developed.The proposed bend segmentation method and morphological features might provide some hints for this purpose.Thirdly, given that new deep learning methods that can process sequence data, such as transformer (Vaswani et al. 2017), have been proposed in recent years, it makes sense to extend the proposed Bend-SeqLSTM model with these new methods to see whether the classification accuracy can be still improved.

Conclusion
In this study, we proposed a data-driven automatic morphological classification model for coastlines.Our model first represents a coastline as a sequence of bends and proposes a series of features to comprehensively describe the morphological characteristics of each bend.Based on this bend sequence, a bidirectional LSTM recurrent neural network is then integrated to classify the type of coastline.The evaluation results showed that the proposed coastline classification model outperformed all baselines, achieving a classification accuracy of 93.20 %.Among all the bend features, direction-related features contribute most to the classification task, followed by complexity-related and sizerelated features.The superior performance of the proposed classification model makes it a promising method to classify coastlines for applications in coastal analysis and management.
Future work should aim to create a more comprehensive dataset to further evaluate the proposed model.Meanwhile, it might make sense to extend the proposed model to analyze other linear geographical features with complex curves, such as river and road systems.Such applications would benefit from the proposed model's ability to capture intricate spatial patterns and relationships.Further research should also focus on the automatic segmentation of entire coastlines, making the analysis of global coastlines fully automatic.Such advancements would contribute to the practical improvement of automation in coastline management within GIS systems.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
2): 1) Sample preparation: The categories for coastline morphology classification are defined, and the required dataset is constructed and labeled from coastline vector data.(Section 4.1.1)2) Sequence modeling: Each individual coastline segment is divided into a sequence of bends.This sequence structure can be processed easily by the RNN model.(Section 3.3) 3) Feature extraction: For each coastline bend, a series of features is then proposed and extracted to represent its morphological characteristics.With this, each individual coastline segment can be represented as a sequence of morphological feature vectors.(Section 3.4) 4) Classification model: A sequence classification model, based on RNN, is then proposed to classify a coastline segment, using its sequence of feature vectors.(Section 3.5)

Fig. 1 .
Fig. 1.Coastline morphological types used in this research.(The black parts are the basic and typical units for various coastlines).
A.Gao et al.

Fig. 2 .
Fig. 2. Overall framework for our proposed morphological classification of coastlines.

Fig. 4 .
Fig. 4. Construction of coastline bending sequence with an inflection point segmentation method (Wang and Müller, 1998).(a) The direction (i.e., clockwise or counter-clockwise) of each point in a coastline is identified and all consecutive points whose direction is the same are considered to form a bend.(b) Further adjustment is done to c.

Table 2
Comparison results of relative models.

Table 3
Comparison of image-based and vector-based approaches.

Table 4
Confusion matrix of the classification results.

Table 5
Evaluation of features contribution.