Exploring time series models for landslide prediction: a literature review

Introduction Landslides pose significant geological hazards, necessitating advanced prediction techniques to protect vulnerable populations. Research Gap Reviewing landslide time series analysis predictions is found to be missing despite the availability of numerous reviews. Methodology Therefore, this paper systematically reviews time series analysis in landslide prediction, focusing on physically based causative models, highlighting data preparation, model selection, optimizations, and evaluations. Key Findings The review shows that deep learning, particularly the long-short-term memory (LSTM) model, out-performs traditional methods. However, the effectiveness of these models hinges on meticulous data preparation and model optimization. Significance While the existing literature offers valuable insights, we identify key areas for future research, including the impact of data frequency and the integration of subsurface characteristics in prediction models.


Introduction
Landslides represent prevalent geological hazards observed frequently across the globe (Froude and Petley 2018;Yang et al. 2022a), resulting in fatalities, infrastructure damage, and economic losses.Globally, nearly 60000 citizens have been killed in 12 years as a result of 4862 non-seismic landslides (Froude and Petley 2018).Landslides are triggered by several triggerings such as earthquakes, volcanoes, floods, and rainfall (Ebrahim et al. 2024a).Among these triggers, due to the severe climate change conditions, rainstorms can induce catastrophic landslides (Wu et al. 2020).This means that despite the extensive efforts by the governmental authorities to address the risk of landslides, it still needs to be eliminated (Song et al. 2017).Consequently, studying such environmental risks (i.e., rainfall-induced landslides) is paramount for the safigffety of the existing structures and the city's future development.
Several actions can be taken to prevent such risks, saving the civilians and the infrastructure from any possible damage.The simplest approach is to reinforce all slopes against the worst-case scenario of external triggering by employing stabilizing piles, soil nailing, drainage channels, etc. (Huang and He 2023).However, this methodology wastes resources and time, necessitating prioritization and planning of such resources.Predictions and Early warning systems can forecast future scenarios, assisting authorities in taking timely action to protect vulnerable areas and populations from the devastating impact of landslides, avoiding over 90% of these losses, and prioritizing stabilization planning (Guerrero-Rodriguez et al. 2024;Baum and Godt 2010;Guzzetti et al. 2020;Intrieri et al. 2012) Early warning and prediction techniques exhibit considerable challenges as geohazard mitigation presents considerable complexity.To illustrate, the landslide triggering features, the corresponding landslide response, and the geological, geotechnical, and hydrological features that control landslide behavior all have complex, nonlinear, uncertain, and dynamic relationships (Dai et al. 2021;He et al. 2021;Tien Bui et al. 2019;Wu et al. 2022;Kang et al. 2017).
To address the aforementioned limitations, artificial intelligence (AI) has emerged as a powerful solution for dealing with geohazard complexities, effectively capturing their nonlinear attributes by mapping input features to output results (Ma et al. 2020;Jiang et al. 2022;Liu et al. 2024).Furthermore, deep learning, a subset of machine learning approaches (LeCun et al. 2015;Phoon and Zhang 2023), uses deep network topologies and complex nonlinear processes to extract different characteristics from data, resulting in exact representations of training datasets.For example, these models have demonstrated their ability to integrate complex nonlinear patterns seen in historical landslide displacement monitoring data (Meng et al. 2024).To clarify, various deep learning models, including convolutional neural networks (CNNs) (Pei et al. 2021), recurrent neural networks (RNNs) (Wang et al. 2021), long short-term memory (LSTM) (Yang et al. 2019), and gated recurrent unit (GRU) (Zhang et al. 2021d), have proven successful in landslide displacement prediction, for example.LSTM and GRU models outperform traditional methods like support vector machine (SVM) (Cai et al. 2016).Deep learning's advantages lie in its versatility across domains, ease of feature extraction, parameter optimization, and scalability (Zhu et al. 2020).The methods mentioned above are commonly used with physicallybased causative thresholds (Ebrahim et al. 2024b).
The powerful capabilities of deep learning techniques are dependent on a thorough understanding of the underlying physical responses of landslides, the quality of monitoring data, and the empirical hypertuning of the model parameters (Yuan et al. 2019;Ebrahim et al. 2024aEbrahim et al. , 2024c)).Regarding the physical response, one of the key physical responses is the time delay between triggerings and landslide response (i.e., infiltration and wetting front process dynamics) that deep learning should understand (Bednarczyk 2018;Chen et al. 2018;Sasahara 2017;Zhang et al. 2016;Li et al. 2021;Zhang et al. 2011).As for the data quality, missing data is a common occurrence because of harsh environmental circumstances (Ebrahim et al. 2024c).Concerning the model parameters, deep learning is a random process, hence hypertunning is required to properly choose the model parameters.These challenges can be addressed by time series analysis for accurate predictions, better utilization of deep learning capabilities, and data-related issues management.The time series relationship between triggerings and landslide response can be identified in one of the following scenarios: univariate, multivariate, trends, seasonality, randomness, or autocorrelation.
To demonstrate, a time series denotes a sequential organization of data, typically evenly spread across time intervals.It can be classified into two main types: univariate, which involves a single value at each time, and multivariate, where multiple values are recorded per time step.Time series analysis has a wide range of applications, including landslides.Its applications include prediction, handling missing data, and anomaly detection (Chatfield 2013;Shumway et al. 2000) (Fig. 1).
Time series analysis is classified into four categories: trend, seasonality, randomness, and autocorrelation.Trends (Fig. 2a) illustrate directional movements, either upward or downward, and seasonality (Fig. 2b) represents patterns that repeat at regular intervals.Randomness (Fig. 2c) emerges as white noise with no recognizable patterns.Autocorrelation (Fig. 2d) represents correlations with delayed duplicates of themselves, marked by unexpected spikes known as "innovations."A time series may exhibit all these trends simultaneously yielding a complex time series.Time series can be stationary and have constant statistical features, but non-stationary series undergo structural changes in response to major events, modifying their behavior (Fig. 2e).The interaction of these variables changes the dynamics of time series data, altering prediction accuracy and analytical methodologies (Chatfield 2013;Shumway et al. 2000).
After reviewing the literature on landslides and their mitigation strategies, it becomes evident that various systematic and methodological approaches have been employed to address the multifaceted challenges posed by these geological hazards.A comprehensive review of existing studies, as presented in Table 1, showcases a diverse range of topics covered, including the These reviews offer valuable insights into the current state of research and provide essential foundations for further exploration into landslide mitigation strategies.
Despite the breadth of topics covered in these reviews (Table 1), it is notable that time series analysis, a crucial aspect in landslide prediction and early warning systems, is largely absent.This omission is significant for several reasons.Firstly, time series analysis allows for the examination of temporal correlations between various factors influencing landslides, which is essential for accurate prediction and proactive risk management.Secondly, selecting appropriate hyperparameters in statistical models used for landslide prediction often relies on empirical methods, highlighting the need for a systematic exploration of time series techniques to enhance predictive accuracy.
To this end, this paper aims to bridge this gap by presenting a comprehensive review of time series applications for physically based causative threshold techniques (i.e., based on machine learning and geotechnically and environmentally monitored data), structured to cover various key aspects outlined in Fig. 3. Beginning with data preparation, it progresses through model selection, optimizations, model evaluations, and, ultimately, prediction generation.The paper then delves into identifying gaps and offering future recommendations, culminating in a conclusive summary.This study seeks to provide valuable insights for researchers and practitioners seeking to enhance landslide mitigation efforts through advanced predictive techniques.

Research methodology
In this study, a qualitative systematic review methodology is applied.The systematic review process, outlined in Fig. 4, involves three main stages: a) defining research questions, clarifying goals, conducting preliminary investigations, and validating concepts, establishing inclusion and exclusion criteria, and devising a research plan along with selecting appropriate research databases; b) assessing and evaluating the screened and retrieved studies; and c) determining eligibility, extracting, and refining data.

Identification process
In the initial phase of identification, the research methodology commences by sourcing significant studies related to time series applications in landslide analysis.This section employs keywords, search databases, and predefined inclusion and exclusion criteria to filter the acquired papers.It is recommended to utilize multiple databases in a systematic review to ensure a comprehensive retrieval and assessment of relevant literature.While Scopus, Web of Science, and Google Scholar are commonly utilized databases in engineering research, this study primarily relies on Scopus for the preliminary search, supplemented by the snowballing approach involving Google Scholar and Web of Science.Following the selection of the search database, relevant keywords such as "landslide or landslides" and "machine learning or artificial intelligence or deep learning" are chosen to encompass all available datasets concerning time series applications of landslides.
In any systematic review, the criteria for inclusion and exclusion are pivotal in refining search results and focusing on the most pertinent studies.This research adhered to specific inclusion criteria: 1) studies focusing on time series applications of landslides utilizing machine learning techniques; 2) articles published in peer-reviewed journals; 3) studies published as articles or review submissions; and 4) papers that underwent final publication.Exclusion criteria encompassed: 1) papers published in languages other than English; 2) studies lacking accessible full texts; and 3) publications not originating from journal sources.The Scopus search with inclusion and exclusion were: "(TITLE-ABS-KEY ( "landslide" ) OR TITLE-ABS-KEY ( landslides ) AND TITLE-ABS-KEY ( machine AND learning OR artificial AND intelligence OR deep AND learning ) ) AND ( LIMIT-TO ( SUBJAREA , "eart" ) OR LIMIT-TO ( SUBJAREA , "engi" ) ) AND ( LIMIT-TO ( DOCTYPE , "ar" ) OR LIMIT-TO ( DOCTYPE , "re" ) ) AND ( LIMIT-TO ( EXACTKEYWORD , "machine learning" ) OR LIMIT-TO ( EXACTKEYWORD , "landslide" ) OR LIMIT-TO ( EXACTKEYWORD , "landslides" ) OR LIMIT-TO ( EXACTKEYWORD , "deep learning" ) ) AND ( LIMIT-TO ( LANGUAGE , "english" ) ) AND ( LIMIT-TO ( SRCTYPE , "j" )".

Screening and evaluation of collected articles
By July 2024, a search of the Scopus database yielded a total of 273 articles.These publications underwent evaluation and assessment using the systematic reviews and meta-analyses (PRISMA) process, as outlined by Moher et al. (2009) (see Fig. 5).Following this approach, 232 papers were excluded due to duplication, irrelevance, or unavailability of complete texts.To demonstrate, landslide prediction systems are classified into four types: a) empirical thresholds, b) physically based causative thresholds, c) deterministic models, and d) susceptibility maps (Ebrahim et al. 2024a(Ebrahim et al. , 2024b).This study exclusively includes time series models with physically based causative thresholds, ignoring other studies with different prediction methods or different triggerings.Upon thorough examination of the full texts of each included article, 41 papers met the established inclusion criteria.To broaden the scope of the search, the backward and forward snowballing method (Wohlin 2014) was employed, resulting in the discovery of additional relevant articles beyond those identified through the Scopus search.To demonstrate the snowballing approach, for each article that matched the inclusion criteria, we searched for related studies in the reference lists as well as the article's citation; this procedure is known as backward and forward snowballing.This procedure helps to consider papers that were not included in the search dataset.In combination with manual searches, a total of 159 articles were deemed suitable for inclusion in the study.The manual search is done to consider research related to the methodological and relevant topics.

Time step interval
The time series data can be collected in several frequencies, such as minutes, hours, days, and even months.It is a function of the data size, power consumption optimization of the monitoring system, and the required accuracy.Fig. 6a reviews the bibliometric data according to the retrieved studies where two frequencies are utilized: Monthly time steps (Dai et al. 2022;Huang et al. 2022aHuang et al. , 2023a;;Li et al. 2020a;Lian et al. 2013;Liu et al. 2020;Wang et al. 2022Wang et al. , 2023a, b;, b;Xing et al. 2019); and daily time steps (Dassanayake et al. 2023;Filipović et al. 2022;Granata et al. 2022;Han et al. 2021;Nava et al. 2023;Togneri et al. 2022;Xi et al. 2023;Xu et al. 2023;Zhang et al. 2021b).This chart demonstrates that around 59% of the literature that is now accessible uses monthly time steps, whereas 41% integrates daily time steps.In these studies, the frequency is chosen based only on the availability of monitoring data, ignoring the physical and computational backgrounds of the process.
Physically, Ebrahim et al. (2024a), Bontemps et al. (2020), Ng et al. (2001), andRahimi et al. (2011) concluded that the temporal prediction of the landslides mainly relies on rainfall pattern (i.e., data frequency as illustrated in Fig. 6b).Computationally, Fig. 6b shows that the time series pattern completely differs between monthly and daily time steps.The monthly time steps (line red chart) are a relatively smoothed series in which model performance was adequate even for basic models.In comparison, the daily time steps (blue column chart) are more biased and random necessitating advanced modeling.Till now, investigating how data frequency affects the model performance is lacking, necessitating more focus in future research.In other words, comparative studies should consider this factor to balance data collection and sensor power consumption and prediction accuracies.

Splitting ratio
Employing physically based threshold models is a random procedure that needs to be well-trained.The training process is then evaluated using additional untrained data, known as validation or testing sets.This method is also random and requires consideration of two factors: a) the temporal sequence of splitting the dataset, and b) the ratio of the training to validation sets.Time-independent applications ignore the temporal ordering while analyzing the data and the data is split using randomly folding techniques.In contrast, time series applications necessitate keeping the temporal ordering while analyzing the data in which the standard holdout strategy should be adopted in which the validation and test sets are at the end of the series (Fig. 7a) (Roberts et al. 2017;Togneri et al. 2022).The training-to-testing ratio should be carefully chosen since a higher ratio results in better model training.However, larger training-to-testing ratios may affect the evaluation process as the testing set will be small.On the other hand, a smaller ratio will not be sufficient for model training.Du et al. (2013) concluded that a 50% to 90% ratio can achieve reasonable results.Ebrahim et al. (2024a) reviewed related landslide applications and concluded that a ratio of 70% is widely considered in the literature.According to Bergmeir and Benítez (2012), the last 10 to 15% of the time series is typically used as a validation and testing set for better generalization and prediction accuracy.
Figure 7b shows the bibliometric data for the reviewed literature.This figure represents the number of manuscripts versus the training set ratio adopted.It is seen that the ratio of 80% is utilized, followed by the ratios of 90%, 85%, and 70%, respectively.The ratios of 65% and 75% were rarely used.It was found that the dataset itself (i.e., the nature of the collected data) is why a training ratio greater than 80% was employed.To illustrate, generally, it was seen that the time series monitored was in monthly steps and ranged between 48 to 357 steps (Han et al. 2021;Xing et al. 2019).This temporal length is relatively small to train the model (Wang et al. 2023b).As a result, the testing set was selected to be a minimum of 12 steps to be able to evaluate at least the last year.(Huang et al. 2022a;Liu et al. 2020;Wang et al. 2023b;Xing et al. 2019).In other words, the ratio of the validation set should be selected to represent the temporal response of the training set.

Decomposition
The retrieved studies were mainly about landslide surface displacement predictions.As stated earlier, the response to landslides is quite complex.As a result, various attempts have been made in literature to simplify such a procedure.For studies that utilize monthly time series, Fig. 8 depicts how the complicated response of landslide displacement may be decomposed into residual, trend, and seasonal or periodic components, which could help facilitate the analytical process (Han et al. 2021;Huang et al. 2022aHuang et al. , 2023a;;Li et al. 2020a;Lian et al. 2013;Liu et al. 2020;Meng et al. 2024;Nava et al. 2023;Wang et al. 2022;Xing et al. 2019;Yang et al. 2019;Zhang et al. 2021b).The analysis and predictions at this time are employed for each pattern individually, then the final prediction is the summation of the trend and periodic terms.
Additionally, the prediction of such an application can be performed for the original data set without any decomposition (Dai et al. 2022;Wang et al. 2023a;Wei et al. 2019;Xi et al. 2023;Xu et al. 2023).Figure 9a) depicts the bibliometric data of the literature, taking into account the number of studies that use decomposition and non-decomposition.Decomposition was found to be commonly employed, with a percentage of 63% compared to 37% for non-decompositions.However, there is still a gap in the literature comparing and highlighting how the decomposition or non-decomposition affects the modeling process and the prediction accuracy.
These methods assume that the trend term depends on the creep behavior and is not affected by external triggering.In contrast, the seasonal term is the only term triggered by seasonal triggering.This assumption has not yet been proved; further proof and research are required.However, differencing methods can overcome this assumption, where the current value is calculated as the difference between two successive steps.Pouzols and Lendasse (2010) revealed that differencing effectively removes the trend, minimizes uncertainties, and offers high prediction accuracy.

Lagged sequence
Rainfall-induced landslides are complex mechanisms where the slope response lagged with a temporal period with the triggering (Chang et al. 2023).Physically, this process is illustrated by the infiltration and surface-runoff mechanisms (Zhang et al. 2011).As for the physically based models such as time series applications, it is paramount to consider such mechanisms by empirically tuning the model and selecting the optimum hyperparameters.This can be accomplished by considering an antecedent period.
To illustrate, Uwihirwe et al. (2020) and Zhao et al. (2019) proved that considering the antecedent period of rainfall improves the model performance in terms of higher accuracies and lower false positive rates while studying empirical models.Similarly, the same concept could be integrated with time series analysis by considering a lagged sequence as an input instead of one single input time step.Figure 10 illustrates the rainfall variation over time, where such rainfall can be divided into effective and noneffective rainfall events.Effective rainfall is the amount of rainfall that still affects the hydrological response of the slope.In contrast, non-effective rainfall is the amount of water that the slope has drained off already and does not affect the current response of the slope.Thus, the antecedent effect can be simplified to represent the effective rainfall period.The lagged sequence is affected by the recovery time which is a function of the triggering and mechanical and hydrological characteristics of the slope.It is paramount to consider this effect as Han et al. (2021) concluded that neglecting the lagged period has a more significant deviation from the predicted results than those that consider the lagged period.
The lagged period (i.e., antecedent period) varies in the literature as this period is a function of the slope hydraulic conductivity and other mechanical and hydrological characteristics.Dai et al. (2022), Huang et al. (2023a), Liu et al. (2020), Nava et al. (2023), andXu et al. (2023) utilized a lagged period of 12-time steps.Zhang et al. (2021b) integrated set pair analysis (SPA) to optimize the antecedent period, and it was found that 18 days provides reasonable accuracy.Li et al. (2020a) utilized different lagges for each input feature, such as two months for rainfall, ten months for reservoir levels, and one month for displacement.According to a related study of irrigation application, Filipović et al. (2022) sensitively investigated several lagged intervals and found that a lagged period of 60 days offers the best prediction accuracy.In this regard, Granata et al. (2022) applied a grid search to select the optimum lagged interval, which was found to be 7 days.However, the research has not given enough consideration to the window size (sequence length) and how the antecedent value influences the prediction model's performance.The reason is that widely the available literature considers smoothed monthly time steps in predicting surface displacement, neglecting to study such factors with spatially varied subsurface dynamic responses.

Feature selection
Intelligence models are governed by their controlling features, such as the high dependency between the model and the feature-controlling factors.Outdated or non-related features can negatively affect the model, necessitating (Cao et al. 2016).Numerous factors affect landslides, including creep and triggering features.Creep features can be represented by geology, geomorphology, soil, hydraulic, and land use features, while triggering features are rainfall, earthquakes, human activities, blasting, reservoir fluctuation, and others, as reviewed by (Ebrahim et al. 2024a) (Fig. 11).
According to the reviewed studies, which were generally about displacement predictions due to the availability of the surface displacement monitoring, generally rainfall and reservoir level fluctuation are considered to be the external triggering features (Han et al. 2021;Huang et al. 2022aHuang et al. , 2023a;;Li et al. 2020a;Liu et al. 2020;Meng et al. 2024;Miao et al. 2018;Wang et al. 2023b;Xing et al. 2019;Xu et al. 2023).Additionally, Selby (1988) concluded that including the state evolution of the landslides improves the prediction process.Similarly, displacement features such as historical values, displacement velocity, displacement increment, displacement change, and displacement evolution state is included as input features for improving the model performance (Huang et al. 2022a(Huang et al. , 2023a;;Li et al. 2020a;Miao et al. 2018;Nava et al. 2023;Wang et al. 2023b;Xing et al. 2019;Zhang et al. 2021b).The displacement time series was solely considered in the study of (Xi et al. 2023).The study of Xu et al. (2023) employed the groundwater level, surface displacement, and deep displacement features besides the rainfall and reservoir water level variation.Several statistical methods can be used for feature selection such as gray relation analysis (GRA) (Jiang et al. 2022;Meng et al. 2024;Zhang et al. 2021b), partial autocorrelation function (PACF) algorithms (Meng et al. 2024), the maximal information coefficient (MIC) Huang et al. 2022a), kernel sHAP (Ge et al. 2023), Pearson correlation (Jiang et al. 2022;Wei et al. 2019), R2-adj (Togneri et al. 2022), akaike information criterion (AIC) (Togneri et al. 2022), and the least absolute shrinkage and selection operator (LASSO) (Granata et al. 2022).However, such models face a significant challenge because they neglect temporal dependencies in landslide responses, resulting in the selection of unrelated features.As a result, it is recommended to incorporate knowledge-based methods and sensitivity analysis to consider several temporal dependencies and select the best-related features (Ebrahim et al. 2024a) (refer to Section "Statistical Correlations" for more details).

Statistical correlations
Section "Feature selection" mentions that statistical correlation can be employed to examine the linear and non-linear relation between the input features and the target output (Li et al. 2020b;Reshef et al. 2011).Figure 12 shows the most commonly used models in the literature.Among these models, Pearson's correlation coefficient is commonly used in feature selection algorithms due to its simple yet practical nature.Table 2 shows the Pearson coefficient values, which range from 0.0 to 0.2 (extremely weak correlation) to 0.8-10 (robust correlation).According to Li et al. (2020b), the maximal information coefficient (MIC) outperforms Pearson's correlation coefficient because it extracts both linear and non-linear correlations, as well as complex correlations, whereas Pearson cannot capture such non-linear behavior.
Another technique called the Shapley additive explanations (SHAP) algorithm is developed based on game theory to illustrate the output of any machine learning model (Lundberg and Lee, 2017).SHAP evaluates the contribution of each feature and indicates how positive or negative its role is in the prediction process.Many approximation methods have been developed to overcome the challenges in calculating SHAP, such as Tree SHAP, Deep SHAP, and Kernel SHAP (Baptista et al. 2022).Gray relational analysis theory, a key component of gray system theory, is a statistical approach that examines the relationships between multiple factors by assessing the degree of correlation, known as Gray correlation, using sample data from each factor.This correlation assesses how closely the geometric shapes of data curves align, with closer shapes indicating stronger correlations (Liu et al. 2022).In the study by Zhang et al. (2021b), a gray correlation was employed to identify primary influencing factors, with a correlation coefficient exceeding 0.6, indicating a significant association with periodic displacement (Wang et al. 2004).
The stacked prediction models were developed using the elastic net (EN) algorithm as the meta-classifier, as outlined by (Granata et al. 2022).The EN algorithm, introduced by Zou and Hastie (2005), combines two widely used regularized variants of linear regression: the least absolute shrinkage and selection operator (LASSO) method and the ridge method.The LASSO method identifies the most influential variables by introducing an absolute penalty in ordinary least squares (OLS) regression.Meanwhile, ridge regularization applies a penalty in the OLS formulation by penalizing the square weights rather than the absolute weights.Consequently, this approach penalizes large weights significantly while distributing many small weights across the feature spectrum.
Various techniques can also be incorporated, such as the information gain ratio (Tien Bui et al. 2016), the least support vector machine (Pham et al. 2018), and the Gini information gain (Quinlan 1993).Liu et al. (2020) utilized the Gini information gain method, employing the random forest (RF) approach proposed by Zhang et al. (2020) to evaluate the relative significance of each key factor.Information gain serves the purpose of identifying which feature provides the most helpful information for predicting outcomes.The akaike information criterion (AIC) serves as a measure for evaluating the effectiveness of a statistical model, taking into account both its accuracy in predicting data and its simplicity by penalizing complex models.It offers a means to compare and choose among different models based on their performance (McElreath 2018).
As stated in Section "Feature selection", combining knowledge-based models with sensitivity analysis can enhance performance when selecting controlling features.To illustrate, even though rainfall is one of the main triggers, Hemalatha et al. (2019) observed that rainfall has a low correlation when considered alone.The dynamic effect of rainfall is well explained by the infiltration process, which involves rainwater seeping through the surface and into the landslide body.This process is influenced by a variety of factors, including surface loss, scour, evapotranspiration, plant transpiration, air temperature, net solar radiation, soil temperature, humidity, and wind speed (Suk et al. 2022;Ahmed et al. 2021;Granata et al. 2022).To have a strong correlation between rainfall and the other responses, the elements listed above must be considered.
Another factor that should be sensitively considered is the effective antecedent period which can be achieved using statistical correlations.In the study of Zhang et al. (2021b), Zhang employed the set pair analysis (SPA) method to determine the optimal lag time, which they identified as 18 days.Set pair analysis, initially introduced by Zhao (1989), is a statistical approach that deals   Han et al. 2021;Li et al. 2018;Xu et al. 2023).However, such models can be further improved as it was found that deep learning models offer higher performance than static and shallow models.To illustrate, artificial intelligence can be divided into two terms: a) ANI (artificial narrow intelligence) and b) AGI artificial general intelligence (Goertzel 2014).AGI is still under development, while ANI is widely adopted for numerous applications.The learning algorithms of ANI have been revolutionized as a result of the advancement of computational devices (Semmler and Rose 2017).The advanced learning algorithms include neural networks, deep learning, and decision trees.The architecture of the neural network is presented in Fig. 13.The architecture of the layer consists of input features, hidden layers, activation functions, and output layers.The hidden layer consists of several neurons, and each neuron applies a function f(x) in which the input is (x) and the output is the probability of y=1.Each hidden layer aims to convert the input features to new features that fit well with the labeled output y (Wilamowski 2009).Each problem necessitates a different design of the network in terms of the number of hidden layers, neurons in each layer, activation function utilized, and the number of output neurons (single or multiple).
Sequences and time series applications typically involve inputs and outputs that vary over time.Unlike basic neural networks, recurrent neural networks (RNNs) are preferred for such tasks.This preference stems from the fact that basic models do not share features learned at different positions within the sequence.Figure 14 illustrates the architecture of the recurrent neural network.RNN is affected by vanishing gradients and exploding gradients.To overcome this issue, a gradient clipping is assigned with a maximum cap or threshold for exploding gradients.However, vanishing gradients are challenging to overcome.To illustrate vanishing gradients, for deep RNN, the first layer's effect vanishes with time steps.For this reason, gated recurrent unit (GRU) and long-shortterm memory (LSTM) models were developed to overcome such issues (Wang et al. 2020).
As illustrated earlier, landslide response lagged with the rainfall triggering, necessitating selecting the appropriate features while using static models to account for such temporal relation (Meng et al. 2024).Static models cannot build a temporal connection between the input features and the external triggering (Zhang et al. 2022a).On the other hand, dynamic and deep learning models can extract the non-linear correlation between the triggering and the landslide response (Zhang et al. 2022b).For more illustration, under the same triggering conditions, the slope may present different responses based on the antecedent characteristics of the slope.Under an extensive rainfall event, the slope may still be stable, but it may fail if small rainfall events trigger it after that due to its antecedent status (Crozier and Glade 2005;Li et al. 2018;Yang et al. 2019;Zhang et al. 2021a).Consequently, the temporal effect and the time series are also paramount factors to consider necessitating advanced deep models.

Optimizations Hypertuning
Prediction using machine learning models is an empirical process that necessitates selecting the appropriate parameters.This can be accomplished using optimization techniques.Two optimization processes are required: the first is to optimize the selected model by Hypertuning its parameters to achieve the best accuracy, and the second is to optimize the training process to achieve fast convergence.Model structure Hypertuning can be achieved through several optimization techniques as outlined in Fig. 15.Optimization techniques (Fig. 15) include grid search, random search, and other optimization techniques such as Bayesian optimization, particle swarm optimization (PSO), genetic algorithms (GA), successive halving (SH), sparrow search algorithm (SSA), etc (Bergstra and Bengio 2012; Jiang et al. 2022;Ma et al. 2023;Snoek et al. 2012;Xu et al. 2023).
Grid search is a heuristic search method within a predetermined subset of a learning algorithm's hyperparameter space, as outlined by (Granata et al. 2022).This algorithm uses a specific performance metric, such as R2, MAE, or RMSE, to guide its exploration and evaluation of potential hyperparameter combinations.Grid search, while comprehensive, is often time-consuming due to its requirement to evaluate and try all possible combinations Fig. 14 Recurrent neural network (RNN) network (modified from (Yang et al. 2021)).U, V, and W represent the weights from the input layer to the hidden layer, from the hidden layer to the output layer, and for self-recursion, respectively.Tx and Ty represent the input (x) and the output (y) sequence length of hyperparameters (Liu et al. 2020).On the other hand, random search employs a randomized sampling approach, which, although faster, may not always yield the optimal hyperparameter combination.According to Xu et al. (2023), Bayesian optimization demonstrates similar performance to random search, particularly in high-dimensional searching space.
PSO has emerged in recent years (Ni et al. 2013;Parsopoulos and Vrahatis 2002;Poli et al. 2007).It operates by iteratively refining solutions, starting from random initial solutions and evaluating their quality based on fitness.PSO stands out for its simplicity, high precision, and rapid convergence compared to alternative algorithms.The genetic algorithm (GA), belonging to the family of evolutionary algorithms, stands as a meta-heuristic search technique.Renowned for its robust global search capability, the GA effectively navigates solution spaces even without gradient information from error functions, making it a potent tool across optimization, search, and machine learning domains.Wei et al. (2019) employed the GA to optimize the connection weights of neural networks, addressing the common challenge of local minimum entrapment.Genetic algorithms have effectively enhanced learning efficiency and computational accuracy when tuning model hyperparameters (Ma et al. 2023).
The SH model, as utilized by Xu et al. (2023), operates on the principle of dynamic resource allocation.It aims to optimize hyperparameters by efficiently allocating computational resources (Jamieson and Talwalkar, 2015).If specific hyperparameter configurations prove less effective, their evaluation is halted, and resources are redirected to more promising configurations.Inspired by the non-stochastic 'Best-Arm Problem, ' SH prioritizes allocating resources to the most promising methods.By halving the available resources successively over multiple rounds, SH selects the optimal configuration from a set of hyperparameter configurations.The sparrow search algorithm (SSA), as mentioned by Jiang et al. (2022) and Xue and Shen (2020), is an innovative method inspired by the foraging and anti-predatory behaviors observed in sparrows.This approach offers several advantages, including its robustness, fast convergence rate, and effectiveness in seeking optimal solutions.Given that machine learning models are entirely empirical processes, hyperparameters play a critical role in shaping a model's structure by selecting the appropriate model dimensions.They must be established before the learning process starts.The primary goal of the aforementioned techniques is to a) accurately capture the best dimensions that help the model achieve better prediction performance, and b) to converge faster with fewer computational demands.The main concept for the abovementioned models can be simplified as shown in Fig. 16, where the model searches from coarse to fine scales to accurately capture the model's best structure while also converging faster.

Training optimizer
As for optimizing the training process, choosing the appropriate number of iterations, learning rate, and monitoring metrics is challenging.To clarify, a small number of iterations causes high bias issues while, on the contrary, long iterations may cause high variance issues, as illustrated in Fig. 17a).Figure 17a depicts the typical relationship between iteration of the training and validation sets and their corresponding loss, highlighting two issues: a) underfitting (high bias), and b) overfitting (high variance).Furthermore, selecting the optimum learning rate is challenging to avoid the local minima issues, as illustrated in Fig. 17b.To illustrate, the learning rate should be adjusted to reach the optimum values of the model weights (w) and the Baias (b) within a reasonable computational time.In other words, a too-small value of learning rate (α) requires huge computational time, and large learning (α) rate value will be misleading (refer to Fig. 17b).Thus, choosing an appropriate learning rate is essential as a larger value may not converge.Ebrahim et al. (2024d) investigated several optimization techniques, and it was found that Adam optimizers offer the best prediction accuracy.The Adam algorithm, widely employed for loss function optimization, outperforms traditional gradient descent methods (Togneri et al. 2022) by ensuring swift convergence and mitigating the risk of getting trapped in local minima.Its effectiveness lies in its ability to dynamically adapt learning rates for each parameter, leading to improved convergence speed and more efficient exploration of the solution space (Chiang et al. 2022;Huang et al. 2023a;Wang et al. 2023a;Xing et al. 2019).

Loss functions
Quantifying the loss during the training and validation of the training process is a vital step in monitoring the accuracy of the model performance.Section "Model evaluations" provides equations for several evaluation metrics that can be used to monitor model performance during the training phase.For instance, the Huber loss function (Equation 1 training at the time when the loss is minimal for both training and validation losses (refer to Fig. 17a).
where a = y i − y i is the difference between the true value y i and the predicted value y i and δ is a threshold param- eter (James et al. 2023).

Normalization
Since the triggering and landslide responses vary quantitatively, training process convergence may be difficult, as illustrated in Fig. 18.Fig. 18 depicts two scenarios: a) two features with different scales, and b) two features that have comparable scales.The first case is computationally challenging because the gradient descent during training proceeds in small steps, whereas the second case converges more quickly.Consequently, normalizing and scaling the data is paramount (Varangaonkar and Rode 2023).Several techniques can be used, such as minimum and maximum normalization (Equation 2) (Granata et al. 2022), mean normalization (Equation 3), Z score normalization (Equation 4) (Togneri et al. 2022), etc. (1) where X j (i) is the input feature or variable, X j-max (i) is the maximum value of X j (i) , X j-min (i) is the minimum value of X j (i) , μ refers to the mean, and σ indicates the standard deviation. (2) Un-scaled Besides the aforementioned techniques, some additional layers can be added to the model to overcome this issue, such as L2 regularization (Xing et al. 2019) or dropout layers (Chiang et al. 2022;Huang et al. 2023a).These layers are designed to reduce the size of the training weights (w) while maintaining the same model output.The generalization term can be assigned a value R that is greater than zero.Figure 19 depicts three cases with various R values: a) A high R-value that reduces the value of w, resulting in underfitting; b) An appropriate R-value that improves performance while overcoming overfitting and underfitting issues; and c) A low R-value that eliminates the generalization term.Thus, selecting and hypertunning the appropriate value of R is critical to overcoming the problem of overfitting.

Activation functions
The activation function is a higher-level feature that aims to convert the input features to new features that fit well with the labeled output Y.It should be noted that in simple models such as linear regression, choosing the appropriate feature is performed manually.In contrast, in neural networks, this process is performed automatically using the activation functions.As a result, activation functions (refer to Fig. 20) play a pivotal role in shaping the behavior and performance of neural networks.Several activations can be used, such as linear, ReLU, Leaky ReLU, tanh, sigmoid, etc.Each activation function presents distinct characteristics, requiring careful consideration based on the nature of the data and the objectives of the neural network model (Dubey et al. 2022).Table 3 summarizes the activation function characteristics.Among all these functions ReLU and Leaky ReLU remain more popular for hidden layer activations due to more straightforward gradient calculation (Wang et al. 2023a;Xi et al. 2023;Togneri et al. 2022).

Model evaluations
The time series analysis in landslide applications is limited, and generally, it is about landslide displacement prediction.This section reviews the widely adopted metrics among these studies, which may help future studies select the appropriate metrics (Scikit-Learn 2024a).Such metrics evaluate the model performance during the training, validation, and testing sets.Model evaluation can be performed using two techniques: a) unweighted method and b) weighted method.Most current research uses the first (unweighted) method, in which all the datasets are assigned the same error weight.On the other hand, the weighted method is rarely used.To clarify, the weighted method assigns different error weights for the creep and the mutual points in which the critical points receive a high error weight (Togneri et al. 2022).Figure 21 shows the number of manuscripts versus the metric employed among the selected literature retrieved for this aim.Among all metrics, the RMSE (Scikit-Learn 2024b) is the most employed among all metrics, followed by the MAE (Scikit-Learn 2024c) and MAPE (Scikit-Learn 2024d), recording a ratio of 86.5%, 37.8%, and 35.1%, respectively.Some studies utilized R2 (Scikit-Learn 2024e) and absolute error as metrics, while these studies are around 32.4% and 13.5%, respectively.The remaining metrics such as MSE (Liu et al. 2016), R2 adj (Togneri et al. 2022), R (Huang et al. 2022a), EF (Huang et al. 2023a), PICP (Ge et al. 2023), RI (Wei et al. 2019), MASE, and SMAPE (Filipović et al. 2022) were utilized rarely in literature according to the objective of the study.Equations 7-11 illustrate the widely used metrics: RMSE, MAE, MAPE, R2, and Absolute error.For the RMSE, MAE, MAPE, and absolute error, the greater the value of these parameters, the worse the prediction performance of the model whereas when these values are closer to 0 indicates higher accuracy.For the R2, the greater the factor, the better the model behaves.
(7  where Y i is the specific value of the i th real data; Y is the average value of the real data; X i is the specific value of the i th predicted data.

Predicted target
This study reviews the physically based causative thresholds time series applications of landslides.Among the reviewed studies, the surface displacement predictions are the most predicted targets (Huang et al. 2022a(Huang et al. , 2023a;;Lian et al. 2013;Liu et al. 2020;Nava et al. 2023;Wang et al. 2022Wang et al. , 2023a, b;, b;Xi et al. 2023).The reason for such several studies is the availability of the GPS or GNSS-monitored surface displacement through long intervals of up to 12 years in some studies (Wang et al. 2023b).Such studies may not represent the physical response of the slope as they only rely on the surface response.Recently, some studies considered the deep displacement predictions (Han et al. 2021;Xu et al. 2023;Zhang et al. 2021b) in which information about the sliding surface initiation and also the surface commutative values can be predicted (Fig. 22a).However, predictions of the hydrological response, such as the volumetric water content predictions, matric suctions, and the groundwater level variation of the landslides, are rarely considered.
The above two paragraphs suggest that more research should be employed to account for the inner response of several types of landslides.Some key differences may arise in three aspects: a) data preparation; b) feature selection; and c) physical response.As for the data preparation, current available studies were mainly about surface displacement that showed trends and periodic terms.In contrast, the inner response may have no trend and cannot be decomposed into a trend or periodic terms, necessitating investigating its prediction.Regarding feature selection, the features that control reservoir landslides are not necessarily to be the same that control other types of landslides, highlighting the need for further investigation.Concerning the physical response, the spatio-temporal responses are completely different for the surface and inner responses, highlighting the need for further investigations.

Single and interval predictions
Predictions are associated with unavoidable uncertainties that arise from the model assumptions.Therefore, it is vital to quantify such uncertainties that may help in understanding the predicted values well.According to the reviewed studies, widely single predictions were utilized where a single time step ahead is predicted neglecting to quantify the corresponding uncertainties (Dai et al. 2022;Dassanayake et al. 2023;Filipović et al. 2022;Granata et al. 2022;Han et al. 2021;Huang et al. 2022aHuang et al. , 2023a;;Li et al. 2020a;Lian et al. 2013;Liu et al. 2020;Nava et al. 2023;Togneri et al. 2022;Wang et al. 2023a, b;Wang et al. 2022;Xi et al. 2023;Xu et al. 2023;Zhang et al. 2021b).
Figure 23 a shows the bibliometric data of the retrieved studies where the studies that account for the uncertainties are around 9% only.These minor studies quantified the uncertainties through interval predictions (Ge et al. 2023;Xing et al. 2019).To illustrate the interval predictions, Figure 23b shows a schematic view of the single and interval predictions where the X represents the single predictions, and the dashed line shows the upper and lower boundaries of the interval predictions.
Xing et al. ( 2019) provided a detailed derivation of calculating the interval predictions: Based on the assumption that the random variable ζ has a zero mean Gauss distribution and is independent of the input variable x and for a given confidence level of po, the output interval of the model is presented in Equation 12.
where y is the output variable, σ is a parameter of the density function of the Laplace distribution and σ is the estimated value of σ. (12)

Discussions
Time series analysis of physically-based causative thresholds is a key aspect of landslide prediction and early warning systems.These models are simpler than deterministic models since they do not require extensive geotechnical datasets.Furthermore, these models surpass empirical thresholds because they account for the physical subsurface response.
Table 1 lists the available reviews in the literature which cover the following topics: Management of landslide risks, Utilization of advanced monitoring technologies, Assessment of susceptibility factors, and Development of predictive models.However, to the best of the author's knowledge, there is no literature addressing the time series application for rainfall-induced landslides in the accessible literature, highlighting the novelty of this study.
The main modeling processes were discussed: a) data preparation; b) model selection; c) optimizations; d) model evaluation; and e) predictions.This process is completely empirical where the AI model is affected by numerous factors.As a result, for each modeling step, several modeling concepts were discussed to highlight the physical meaning, illuminating how to select the appropriate parameter dimensions.For instance, the data preparation process is controlled by selecting the appropriate data frequency, splitting ratio, decomposition technique, window size, and the best-related features.Similarly, Table 4 highlights the main modeling process and the findings summary of the retrieved studies.
As discussed in the preceding sections, model performance is determined by a variety of modeling factors, including data preparation, model selection, optimizations, evaluations, and predictions which are also affected by the inventory data accuracy.This process is completely empirical and necessitates the use of an appropriate optimization technique.Moreover, this process also necessitates integrating knowledge-based techniques for selecting the best-related features.To illustrate, the theoretical relationship between input and output parameters is already known.Thus, theoretically, if the model's input parameters represent physical features, the output will be identical to the physical model (i.e., analytical models), emphasizing the importance of knowledge-based techniques.Table 5 compares some of the related literature considering surface displacement as a case study for the comparison.This table highlights some of the aforementioned factors, as well as a comparison of all models based on the final accuracy and performance of the provided models.
The arrangement of the discussion is intended to provide knowledge of controlling features and initial conditions.However, the theoretical background of these AI models is discussed by Merghadi et al. (2020), who thoroughly examined the theoretical background for machine learning algorithms.
In general, there is no superior model, but there is a superior prediction based on considering the aforementioned modeling process.Regarding the data preparation and model selection process, it is seen that investigating the affecting features while taking the actual initial condition into account, greatly improves the model's accuracy (Han et al. 2021).As a result, outdated or unrelated features can have a negative impact on the model, so it is best to remove them (Cao et al. 2016).For example, BPNN (Liu et al. 2016) and MLR (Krkač et al. 2020) offer reasonable accuracy, while they are limited in their ability to account for dynamic and non-linear relationships.This is because the well-established dataset accurately depicts the physical mechanism.Marrapu et al. (2021) concluded that ANNs with a large dataset are more accurate than ANNs with a small dataset.However, if the dataset lacks useful information, the model must be improved to better fit these complex relationships (Zhang et al. 2021c).The improvement of the modeling can be achieved by    4 and 5.These models are built on the assumptions listed in the following lines: a) The relationship between the landslide and the feature factor that controls this phenomenon will not change significantly in the future; b) These models cannot predict sudden failure because such events are rarely present in the training set.Furthermore, these models have a few limitations: a) Regression models, unlike landslide susceptibility maps, are only appropriate for small areas due to their reliance on field monitoring data; b) causative thresholds take into account only one limited feature, such as displacement or groundwater level, ignoring all other characteristics, such as spatial variation in land cover, soil type, topography, and geotechnical and hydrological parameters.These models have the following advantages: a) artificial intelligence models can predict any causative features using available monitoring data; thus, these models can provide an accurate warning than empirical-statistical thresholds; and b) these models are a cost-effective method for landslide prediction because they do not necessitate extensive geotechnical investigation.

Gaps and future directions
The literature presents valuable insights into landslide prediction by utilizing deep learning models, which have demonstrated notable accuracy.However, there are notable gaps, particularly in the realm of time series physically based causative thresholds models, which integrate mechanical and hydrological characteristics of landslides.These gaps primarily pertain to the accessibility of monitoring data and the prediction methodology.The current literature is constrained by the availability of monitoring data, emphasizing the necessity for expanding subsurface monitoring systems.Additionally, some literature lacks a comprehensive understanding of landslide mechanisms, thus highlighting the need for a more knowledge-based approach.Refer to Table 6 for a summary of these gaps and recommendations for future directions as follows: 1. Assessment of Subsurface Characteristics: Prediction models predominantly rely on surface measurements, neglecting subsurface mechanical and hydrological characteristics.Prioritizing the assessment of these characteristics offers a more accurate depiction of landslide mechanisms, directly considering the underlying factors driving landslides.2. Spatiotemporal dynamics in prediction methods: Single-point prediction methods often overlook spatiotemporal dynamics, limiting their effectiveness.
To address this, integrating information from diverse monitored data sources is recommended, emphasizing the need for comprehensive monitoring to enhance predictive accuracy.Prediction models predominantly rely on surface measurements (Barra et al. 2017;Tofani et al. 2013;Zhu et al. 2011) Prioritizing the assessment of subsurface mechanical and hydrological characteristics offers a more accurate depiction, as it directly considers the underlying mechanisms driving landslides 2 Single-point prediction methods are frequently criticized for failing to account for spatiotemporal dynamics, posing several limitations Improving predictive accuracy hinges on integrating shared information from diverse monitored data sources, highlighting the necessity of comprehensive monitoring for enhancing model performance 3 The literature often overlooks the impact of data frequency, thereby neglecting its influence on prediction accuracy.Larger intervals tend to result in smoothed data, while smaller intervals can introduce high bias.This oversight underscores the importance of considering data frequency in predictive modeling to achieve more accurate results Integrating field-monitored data into prediction techniques is essential for exploring the impact of data frequency on both accuracy and the operational costs of the monitored system.Higher frequencies demand increased power and maintenance, thus affecting the system's overall cost-effectiveness.Therefore, a comprehensive analysis of data frequency is crucial not only for improving prediction accuracy but also for optimizing the operational efficiency of the moni-

Conclusions
The systematic review of time series models for landslide prediction yields valuable insights into the current research landscape in this domain.Analysis of diverse studies reveals several key findings, implications, and avenues for future exploration as follows: Firstly, the review underscores the significance of data frequency in landslide prediction models.It highlights substantial pattern disparities between monthly and daily time steps, underscoring the need to further explore how data frequency influences model efficacy.Temporal ordering considerations in splitting training, validation, and testing sets are emphasized.Notably, an 80% training ratio is widely adopted.Furthermore, challenges associated with time series data decomposition are discussed, particularly in discerning trend and seasonal components.Monthly time series, for instance, often exhibit seasonal, trend, and residual terms, necessitating meticulous decomposition to trend and periodic components.Data frequency variations, such as the absence of seasonality in daily time steps, require tailored methodologies.The review also addresses the variability in lagged periods across literature, influenced by factors like slope hydraulic conductivity.Feature selection methods, such as statistical models, cannot extract temporal correlations, necessitating integrating knowledge-based considerations.
Regarding modeling approaches, dynamic and deep learning models like LSTM are found to outperform static models such as artificial neural networks (ANN), support vector machine (SVM), random forest (RF), and statistical models such as autoregressive integrated moving average (ARIMA).Deep learning models extract the temporal and non-linear correlation between the triggering and the landslide response.However, meticulous data preparation, including data frequency, sampling ratio, decomposition, temporal correlations, and feature selection, is emphasized to ensure model effectiveness.
In machine learning, hyperparameter optimization strategies are crucial for model performance.Notably, using the random search method and the preference for the Adam optimizer and loss functions like MSE and Huber provide better performance and help save time and converge faster.Overfitting mitigation strategies such as getting more data, not using more unrelated features, monitoring the training process, reducing the training weights' size, and carefully selecting activation functions are underscored.Additionally, the review notes the prevalent use of unweighted RMSE, MAE, and MAPE metrics for evaluation.According to the reviewed studies, widely single predictions for the surface displacement of reservoir landslides were utilized where a single time step ahead is predicted, urging exploration of interval predictions and a broader scope encompassing diverse landslide typologies.
Addressing gaps and future recommendations, the review advocates for integrating diverse data sources, exploring the impact of diverse monitored data for spatiotemporal predictions, data frequency on prediction accuracy for balancing between monitoring systems cost and prediction accuracy, adopting weighted evaluation methodologies to account for the critical points, especially for such catastrophic events (i.e., landslides), investigating time series decomposition effects, integrating knowledge-based models with statistical models to account for the temporal correlations, and developing further research related to the subsurface response of landslides instead of relying on the shallow responses.
In conclusion, the review is a comprehensive guide for scholars and practitioners in advancing landslide prediction techniques.By addressing identified gaps and implementing recommended future directions, researchers can enhance the accuracy and efficacy of time series models, contributing to improved disaster mitigation efforts.

Fig. 1
Fig. 1 Time series analysis visualization

Fig. 4
Fig. 3 Time series modeling process

Fig. 6 a
Fig. 6 a data frequency employed in literature as a percentage of retrieved studies; b illustrative example of different rainfall patterns (i.e., frequencies) as an example of time series

Fig. 7 aFig. 8
Fig. 7 a Illustrative view of the testing and training sets; b Training ratio utilized in the literature

Fig. 9 a
Fig. 9 a Bibliometric data for decomposition and non-decomposition studies; b decomposition techniques

Fig. 10
Fig. 10 Effective and non-effective rainfall patterns

Fig. 12
Fig. 12 Statistical models employed for feature selection Varangaonkar and Rode (2023) proved that the long short-term memory (LSTM) model outperforms support vector machine (SVM) and artificial neural network (ANN) models.Ge et al. (2023) showed that gated recurrent unit (GRU) surpasses particle swarm optimization -support vector regression (PSO-SVR) and bidirectional recurrent neural network (BRNN).Wang et al. (2023b) indicated that the LSTM model can surpass the primary recurrent neural network (RNN) model.Dai et al. (2022) revealed that the LSTM has higher accuracy compared to the classical back propagation neural network (BPNN).Liu et al. (2020) showed that LSTM is more accurate than GRU and random forest (RF) models.Filipović et al. (2022) figured that LSTM offered better prediction results than RF and the autoregressive integrated moving average (ARIMA).Conventional neural networks (CNN)(Wang et al. 2023a), (CNN-BiGRU-Attention)(Meng et al. 2024) can offer higher prediction results.Huang et al. (2022a) showed that the LSTM and the salp-swarm-algorithm-optimized temporal convolutional network (SSA-TCN) have almost similar performance.Based on the reviewed literature, the LSTM model generally has better prediction results than other models(Huang et al. 2022a;Xi et al. 2023;Xing et al. 2019).

Fig. 16
Fig. 15 Utilized optimization techniques in the retrieved studies

Fig. 17 a
Fig. 17 a Loss versus iterations for cross-validation (CV) and training sets; b Loss versus iterations for different learning rates (modified from (Khang Pham, 2023))

Fig. 23 a
Fig. 23 a prediction status versus the number of studies; b Single and interval prediction schematic view two frequencies: monthly time steps • Frequency selection is based on monitoring data availability, disregarding physical and computational backgrounds Data splitting • Two factors should be considered: the temporal sequence of splitting the dataset and the ratio of the training to validation sets: • The temporal ordering should be maintained through a standard holdout strategy • The testing set should represent at least the minimum expected temporal response Decompositions • Landslide response to landslides is complex, requiring various attempts to simplify the process • Decomposition can simplify the complex response into residual, trend, and seasonal behaviors • Differentiating methods outperform other decompositions minimizing uncertainties and offering high prediction accuracy Window size • Considering an antecedent period (i.e., lagged sequence) improves model performance • The lagged period varies based on the slope's mechanical and hydrological characteristics • Research has not given enough consideration to the window size (sequence length) and how the antecedent value influences the prediction model's performance Feature selection • Outdated or non-related features can negatively affect the model • Factors affecting landslides include creep and triggering features • Statistical methods for feature selection include gray relation analysis (GRA), partial autocorrelation function (PACF) algorithms, maximal information coefficient (MIC), kernel sHAP, Pearson correlation, R2-adj, Akaike information criterion (AIC), and least absolute shrinkage and selection operator (LASSO) • Models often neglect temporal dependencies in landslide responses, leading to unrelated features • Knowledge-based methods and sensitivity analysis are recommended to consider temporal dependencies and select best-related features Statistical correlations • Pearson's correlation coefficient is a common model used in feature selection algorithms due to its practical nature • The maximal information coefficient (MIC) outperforms Pearson's correlation coefficient as it extracts both linear and non-linear correlations, as well as complex correlations • The effective antecedent period can be sensitively investigated using statistical correlations Model Selection Model selection • Static models struggle to account for the temporal correlation between input features and external triggering • Dynamic and deep learning models can extract non-linear correlations between triggering and landslide response where LSTM offers outstanding performance

Table 3
Activation functions characteristics Function CharacteristicsLinear• The linear activation function is straightforward and efficient• Cannot capture complex nonlinear relationships• Is preferred where the input and the output are negative or positive numbers

Table 4
Summary of the review's main findings

Table 4
(continued)The sparrow search algorithm (SSA) is an innovative method inspired by the foraging and antipredatory behaviors observed in sparrows • The main concept can be simplified as the model searches from coarse to fine scales to accurately capture the best model structure while also converging faster Training optimization • Selecting the right number of iterations, learning rate, and monitoring metrics is crucial • Small iterations can cause bias issues, while long iterations can cause variance issues • A small learning rate requires large computational time, while a large learning rate may be misleading • Adam optimizers offer the best prediction accuracy, outperforming traditional gradient descent methods • Adam algorithm dynamically adapts learning rates for each parameter, improving convergence speed and efficient exploration of the solution space

Table 5
Comparison among several models considering surface displacement as a case study

Table 5
Liu et al. (2016)RM stands for recurrent cell, RF for the random forest, GM for the grey model, MLR for multiple linear regression, WMKGM for the weighted multi-kernel grey model, MKGM for multi-kernel grey model, KGM for kernel grey model, ENN for ensemble neural networks, RS for random search, WCA for water cycle algorithm, BA for bat algorithm, GWO for grey wolf algorithm, DA for dragonfly algorithm, WOA for whale algorism, GOA for grasshopper algorithm, and BLSTM and BGRU stand for bidirectional LSTM and GRU models * Best accuracy **Liu et al. (2016)investigated feature extraction, a method that accounts for evaporation, infiltration, and runoff, resulting in better prediction accuracy

Table 6
The current gaps and future recommendations The impact of time series decomposition on prediction accuracy remains unexplored in existing literature A comparative analysis between decomposed and non-decomposed methodologies is warranted.Such a study would offer deeper insights into the intricate mechanisms of such mechanisms 5 Feature selection methods commonly employed rely on statistical techniques, which often overlook temporal correlations present in the data Leveraging deep learning and knowledge-based approaches proves advantageous.These methods can effectively capture and incorporate temporal dependencies, thus enhancing feature selection robustness and accuracy 6 Weighted evaluation is not commonly utilized in many studies, with most assigning equal weight to all datasets.However, this approach may be inadequate, particularly for lengthy datasets where non-critical ones outnumber critical points.In such cases, treating all points equally can lead to misleading conclusions Accurately capturing critical points in landslide applications is paramount for ensuring economic safety and preserving human lives.Therefore, it is imperative to consider weighted evaluation methodologies.By assigning appropriate weights to critical points, we can prioritize their accurate detection and mitigate potential risks, thus safeguarding both economic interests and human well-being and non-decomposed methodologies to gain deeper insights into their mechanisms.5. Temporal Correlations in Feature Selection: Statistical techniques commonly employed for feature selection often overlook temporal correlations in the data.Leveraging deep learning and knowledge-based approaches is recommended to capture and incorporate temporal dependencies, thus enhancing feature selection robustness and accuracy.6. Weighted Evaluation Methodologies: Many studies assign equal weight to all datasets, potentially resulting in misleading conclusions, particularly for lengthy datasets with numerous non-critical points.Weighted evaluation methodologies are proposed to accurately capture critical points in landslide applications, prioritizing their detection to mitigate risks effectively.