Technology investigation on time series classification and prediction

Time series appear in many scientific fields and are an important type of data. The use of time series analysis techniques is an essential means of discovering the knowledge hidden in this type of data. In recent years, many scholars have achieved fruitful results in the study of time series. A statistical analysis of 120,000 literatures published between 2017 and 2021 reveals that the topical research about time series is mostly focused on their classification and prediction. Therefore, in this study, we focus on analyzing the technical development routes of time series classification and prediction algorithms. 87 literatures with high relevance and high citation are selected for analysis, aiming to provide a more comprehensive reference base for interested researchers. For time series classification, it is divided into supervised methods, semi-supervised methods, and early classification of time series, which are key extensions of time series classification tasks. For time series prediction, from classical statistical methods, to neural network methods, and then to fuzzy modeling and transfer learning methods, the performance and applications of these different methods are discussed. We hope this article can help aid the understanding of the current development status and discover possible future research directions, such as exploring interpretability of time series analysis and online learning modeling.


INTRODUCTION
Time series are a set of observations made and recorded at different points in time (Misra & Siddharth, 2017). It is ubiquitous in real life. Whether measured during natural processes (weather, sound waves) or artificially generated processes (stock, robots), most real-world data contain time elements (Langkvist, Karlsson & Loutfi, 2014). Moreover, time series data are being produced in different fields at an unprecedented scale and speed. Therefore, knowledge discovery from time series has considerable potential. Because of its unique sequence characteristic, time series analysis is considered one of the ten most-challenging problems in the field of data mining (Yang & Wu, 2006), becoming a prevalent research topic that has attracted the attention of many researchers over the years (Schreiber, 1999;Osmanoglu et al., 2016). In time series analysis, common data sets are often used, such as UCR time series classification archive (https://www.cs.ucr.edu/~eamonn/time_series_data_2018/), Awesome Public Dataset (https://github.com/awesomedata/awesome-public-datasets) and CEIC (https://www.ceicdata.com/zh-hans).
To gain a comprehensive understanding of the current status of time series application, we use time series as a keyword to search the Web of Science Core Collection and collect 120,000 references published between 2017 and 2021. Then, we use VOSViewer (Leiden University, The Netherlands) to visualize anaysis result: the subject category co-occurrence map of first-level disciplines as shown in Fig. 1.
To gain a clearer understanding of the application fields of time series, we remove the two subjects with the highest number of matches, i.e., Engineering and Computer Science, both of which have a high total link strength; this can be attributed to the fact that these two subjects are often used in research as analysis tools for other domains. The 120,000 publications contain 161 unique level-1 subjects in total. From Fig. 1, we can see that time series has an extensive range of applications.
Time series applications are present in every aspect of our lives, computational statistics and data analysis will give us a new perspective and help us gain a deeper understanding of the world.

Motivation
Time series is an important data object, used in an extensive range of research, including classification, prediction, clustering, similarity retrieval, anomaly detection, and noise elimination (Kalpakis, Gada & Puttagunta, 2001). The analysis and investigation of its current research applications can provide a comprehensive research review to aid future researchers in understanding the current development state of time series-related algorithms.
To identify the trending topics in current time series research, we further analyze the chosen studies. After removing generic terms like time series, time, analysis of time series, etc., we obtain a co-occurrence map by using literature keyword, shown in Fig. 2.
The font size in the figure is related to the frequency of occurrence of keywords. The larger the font, the higher the frequency of occurrence. There are approximately seven clusters in the figure, representing algorithms and different application domains. Two main research topics are identified, namely, classification and prediction. Because this article focuses on the analysis of time series algorithms, we will present our analysis and conclusions based on the technical development route of classification and prediction algorithms and discuss relevant areas for subsequent research.

Main contribution
The main contributions of this article can be summarized as follows: • a comprehensive analysis of prevalent topics in the field of time series; • an investigation into the progress of time series classification and prediction problems in recent years, highlighting several technical development routes that are widely studied in the field, and discussing the improvement and optimization of these algorithms; • a comparison of the performance of the different algorithms on multiple datasets, concluding with their advantages and disadvantages; • and finally, an analysis of the challenges and future development tendencies of time series classification and prediction problems.

Methods and materials
The following is the process of our study. First, a literature analysis tool is used to identify current popular research topics. We use VOSViewer to analyze the time series literature through keywords to examine the areas of greatest interest. These topics are classified into 478 categories, and the two research directions with the highest frequency are ''classification'' and ''prediction''. Then, the relevant scientific literatures for the identified categories are located. We review related papers on time series classification and prediction and select 87 literatures with high relevance and high citation for analysis. The scientific databases used in the search include Web of Science Core Collection, IEEE Xplore, ACM Digital Library, Springer Link, and ScienceDirect. Finally, according to the literatures, important technical development routes are extracted, and detailed analysis and summary are carried out.

Structure of this survey
The remainder of this article is organized as follows. 'Related Work' provides an introduction of related work of time series investigation and the comparison of our survey with other traditional surveys and review articles. 'Preliminaries' describes the fundamentals of time series classification and prediction tasks. 'Time Series Classification' elaborates on the development route of time series classification and prediction algorithms, by comparing their performances, analyzing the challenges being faced, and discussing future development trends. Finally, the 'Conclusion' concludes the article.

RELATED WORK
Knowledge discovery in time series is an important direction for dynamic data analysis and processing. The urgent need to predict future data trends based on historical information has attracted widespread attention in many research fields. In the past few decades, many studies have summarized time series research methods from different perspectives. Table 1 summarizes the existing time series surveys and their contributions. In contrast to the above works, we focus on the development direction of time series technical routes, try to track the most primitive methods of each technical route, study the improvement ideas and improvement strategies of subsequent methods, and compare the advantages and disadvantages of various technical routes and methods. Finally, we provide new ideas for future work.

Categories of time series
Using data characteristics, time series can be classified into five categories:   (Sang, 2013) hydrological time series analysis; wavelet transform Summarizes and reviews the research and application of wavelet transform method in hydrological time series from six aspects. (Nordman & Lahiri, 2014) experience likelihood Summarize the progress of the experience likelihood of time series data. (Tang et al., 2015) complexity test Discuss the complexity testing technology of time series data. (Scotto, Wei & Gouveia, 2015) autocorrelation function ( Gaussian and non-Gaussian time series. 5. Chaos: The generation of a chaotic time series is related to its initial conditions, where a change in the initial state of the system may lead to a critical state or inflection point of the interconnected system, significantly impacting on the performance of interconnected system. For example, the action of opening a window or door will affect the power consumption of an air conditioning system (Kim, 2017).

Related definitions
To explain time series and its methods in clearer manner, some definitions involving time series are introduced below. Definition 1. Univariate time series: A univariate time series, s = t 1 ,t 2 ,...,t L , is an ordered set of length L.
Definition 2. Multivariate time series: A multivariate time series, X = (x 1 ,x 2 ,...,x T ), is a sequence vector, where each element x i is a univariate time series, with differing lengths, X has T variables, with the ith variable being x i . Definition 3. Subsequence: Given a time sequence s with length L, s sub = s[m,m + n − 1] is a subsequence with a length n < L. The starting point of the subsequence is the position m in s, and the position m + n − 1 is the end point, represented as s sub = t m ,...,t m+n−1 , where, 1 ≤ m ≤ L − n + 1.
Definition 4. Similarity degree: For two time series, b and s, (assuming |b| ≤ |s|), the similarity degree for them can be computed by Sim(b,s) = min{dist (b,s i )}, where s i is an arbitrary subsequence of s that satisfies the condition |b| = |s i |.
Definition 5. Shapelet : A shapelet is a subsequence of time series, s, with the strongest discriminative ability. Specifically, the shapelet can be represented by p = (b,δ,c), where b,δ,c are the subsequence, threshold, and class label, respectively. If an unknown time series satisfies the condition Sim(p,s) ≤ δ, then it can be categorized into class c.
Definition 6. Euclidean distance: Euclidean distance is a frequently used distance measurement to determine the degree of similarity of two different time series. For sequences b and c, both with length L, the Euclidean distance can be calculated as Definition 7. Dynamic time warping (DTW): DTW is another widely used distance measurement method. Compared with Euclidean distance, it can compute the minimum distance between two sequences with different lengths. For its wide application, the principle will not be explained here, but the calculation is given as dist DTW = DTW (s,b).

Basic algorithms
In time series classification and prediction tasks, the most basic and widely used algorithms are 1NN-DTW (1 nearest neighbor dynamic time warping) and autoregressive (AR) and moving average (MA) models.

1NN-DTW
The 1NN-DTW model uses DTW as distance measurement, and the simple but effective algorithm 1NN to find the nearest training sample of the current instance and assigns the same class label to the instance as the nearest training sample. This model does not require training of parameters and has high accuracy. The following pseudocode describes the procedure of 1NN-DTW.

AR and MA
• AR model The model is represented as X t = p j=1 a j X t −j + ε t and is called the p-order AR model, denoted as AR(p), where a time series value can be expressed as a linear function of its previous value, X t , and an impact value, ε t . This model is a dynamic model that is different from the static multiple regression model.
The model is represented as X t = ε t + q j=1 b j ε t −j and is called the q-order MA model, denoted as MA(q). The time series value, X t , is the linear combination of the present and past error or shock value, ε t .

TIME SERIES CLASSIFICATION
Unlike traditional classification tasks, the order of the time series variable is related to the input object, which makes time series classification a more challenging problem. Based on data label availability, current time series classification research mainly focuses on supervised and semi-supervised learning. Usually, supervised learning methods with labeled information show better performances. However, in real life there is a tremendous amount of unlabeled data. Therefore, some semi-supervised methods have been proposed to address this situation by constructing models using limited labeled data and a large amount of unlabeled data. In addition, some specific application scenarios have new requirements for time series classification tasks, for example, the early diagnosis of a disease, which results in a better prognosis. Early classification is used in these situations, and its goal is to classify data as soon as possible with a certain accuracy rate. This is an important extension of time series classification. This section introduces the development route of time series classification technology, analyzes the current difficulties and challenges, and mentions some expected future trends.

Technology developments
Based on the literature reviewed, we discover three development routes: supervised time series classification, semi-supervised time series classification, and early classification, which is a critical extension of the time series classification task. Fig. 3 lists the algorithms of different technology development routes.

Supervised learning
In early time series classification methods, the work mainly focus on the distance-based algorithm (Ding et al., 2008). The most prominent one being 1NN-DTW, which has demonstrated excellent performance in multiple tasks and datasets (Ding et al., 2008), and was once considered as an insurmountable method in time series classification (Xi et al., 2006;Ye & Keogh, 2009;Rakthanmanon & Keogh, 2013). With the deepening of related research, some algorithms with better performance, such as Rocket (Dempster, Petitjean & Webb, 2019), have achieved better results than 1NNDTW on multiple data sets. Even so, 1NNDTW is worthy of analysis and academic attention. The 1NN-DTW uses 1NN as a classifier, DTW as distance measurement criteria, and assigns the nearest training  instance class label to a testing instance. This algorithm is simple, and has a high accuracy.
In practice, training on the optimal hyperparameter settings, such as warping windows, is required to obtain better performance (Dau et al., 2018). However, during the classification stage, the class label of every testing instance needs to be computed by working through the entire training dataset, affording it high time complexity. The optimization of 1NN-DTW mainly concentrated on reducing classification time, by using one of three methods.

• Speed up
The idea of this type of algorithm is that the effectiveness can be improved by reducing the dataset size and accelerating the computation of DTW. Through numerosity reduction and dynamic adjustment of the DTW warping window size (Xi et al., 2006), 1NN-DTW can be sped up while guaranteeing accuracy.
• Shapelets Geurts (2001) propose that a time series can be represented by its local pattern. Based on this idea, Ye & Keogh (2009) formally propose the concept of shapelets. The most important idea of shapelets is to extract the most discriminative subsequence from the whole sequence, and then making a classification by constructing a decision tree.
The advantages of the shapelet-based method are that it has strong interpretability, robustness, and low classification time complexity. Although it can be accelerated through early abandon and entropy pruning methods, the search space and time complexity of shapelets are still not negligible. Therefore, some acceleration strategies such as precomputing of reusable distance and allowable pruning (Mueen, Keogh & Young, 2011), discrete representation of subsequence (Rakthanmanon & Keogh, 2013), early abandoning Z-normalization, reordering early abandoning, reversing the query/data role, and cascading lower bounds (Rakthanmanon et al., 2012), are applied in the search of shapelets. In addition, some studies use shapelet transform to construct a new dataset from the original dataset, expecting reduced training time while retaining model interpretability and further improving accuracy (Lines et al., 2012;Hills et al., 2014). Shapelet transform separates the search procedure of shapelets and classifier construction (using the distance between shapelets and the original sequence as a new dataset), and this makes the selection of the classifier flexible.
Since the advent of shapelet transform, subsequent research has shifted focus to identify more effective ways of finding shapelets (Wistuba, Grabocka & Schmidt-Thieme, 2015;Baldán & Bentez, 2018). In contrast to constantly searching for shapelets in existing sequences, some algorithms (Grabocka et al., 2014;Bagnall et al., 2015;Hou, Kwok & Zurada, 2016;Zhao, Pan & Tao, 2020) believe that shapelets can be learned, and this changes the shapelet searching process into a mathematical optimization task, which can improve the performance of the model. However, some methods consider the performance of acceleration technology to be close to the upper bound, so other solutions must be considered, such as using multiple GPUs and FPGAs to accelerate the DTW subsequence search process (Sart et al., 2010).

• Constuct of a neural network
This type of algorithm is a feature-based method, and its main idea is to train the classifier in advance. Iwana, Frinken & Uchida (2020) embed DTW into a neural network as a kernel function. In this way, the neural network can solve the problem of time series sequence recognition, such as time distortion and variable pattern length, in feedforward architecture. There have been many studies devoted to applying deep learning models to time series classification (Zheng et al., 2014), andFawaz et al. (2019) provide a detailed introduction and summary.
Using the results from previous studies, we compare the accuracy of various methods (as shown in Table 2) with multiple public datasets which are widely used in this field (Ding et al., 2008;Rakthanmanon & Keogh, 2013;Lines et al., 2012). The performance of the shapelets learning method (LTS, FLAG, RSLA) is superior. According to the different principles used in the methods, we divide the algorithms into five categories: 1NN-DTW, shapelets, shapelets transform, shapelets learning, and neural network. In addition, the advantages and disadvantages of 1NN-DTW, shapelets, shapelets transform, and shapelets learning are compared in Table 3. 1NN-DTW is the simplest, high performing method that needs no training and can correctly classify samples. However, its biggest problem is long classification times, especially for large training datasets, which makes it unsuitable for certain applications. The shapelets-based method reduces the sequence length, and thus, has a faster classification time, and achieves high interpretability and robustness. However, shapelets are discriminative features that require significant effort to find, and for large sequence lengths, the search space increases drastically. The shapelet transform method makes the choice of classifier more flexible, but it still retains the long search time problem. The shapelet learning method learns the shapelets instead of searching for them through

Notes.
A dash (-) indicates that there is no data available. The bold values represent the highest accuracy for each category.
training data, so the learned shapelets have higher robustness compared to the searched one. The disadvantage of this type of method is the long training time required.

Semi-supervised learning
Semi-supervised learning methods construct classifiers using a small amount of labeled data and a large amount of unlabeled data. One of the most frequently used methods is self-learning: it utilizes a small amount of labeled data to assign class labels to a large unlabeled dataset.  propose extending training data by 1NN, if the distance between the labeled data and unlabeled data is close enough, then add the unlabeled data into the training set. This is a simple and basic semi-supervised learning approach for time series classification. Based on this, the subsequent advancements can be divided into three categories.   use Euclidean distance as a similarity measurement; because DTW is a more effective distance in time series classification, it can be used to improve model performance (Chen et al., 2013). However, the ratio of DTW and Euclidean distance is proposed to be the proper distance measurement, making the algorithm more suitable for smaller data sizes and diverse negative samples. This is based on two assumptions: first, negative samples are diverse, and the negative samples may have a closer distance with positive samples; second, compared with Euclidean distance, DTW makes the inter-distance of positive samples closer.

• Label approach
Other than optimizing the distance function, changing the method of adding testing data into the training dataset can also improve classification results. One possible way is to cluster negative samples. Because a robust classifier needs to be constructed using limited, labeled, positive data, partitioning the unlabeled dataset into smaller local clusters, and identifying the local clusters' common principal features for classification can make the algorithm more reliable and productive (Nguyen, Li & Ng, 2011). Hierarchical clustering is also an effective cluster method (Marussy & Buza, 2013); first, it clusters all sequences into smaller clusters, and then uses seeds to assign labels to unlabeled data.

• Stopping criterion
If a stopping criterion is too conservative (or too liberal), it is doomed to produce many false negatives (or false positives) (Begum et al., 2013). Therefore, it is important to propose a proper stopping criterion to avoid adding negative samples into the positive sample set. Begum et al. (2013) propose a parameter-free algorithm for finding a stopping criterion using the minimum description length (MDL) technique. The algorithm is stopped when the MDL becomes large, improving the classification results by optimizing the stopping criterion (Rodriguez, Alonso & Bostrom, 2001). The accuracy of different semi-supervised methods is compared in Table 4, as collected from various studies. The overall performance of the SSSL method is the best, which shows that the method of learning shapelets through optimization algorithms is still effective in semi-supervised learning. While shapelets improve accuracy, they also improve the interpretability of the algorithm, again highlighting the importance and usefulness of shapelets.

Early classification
The main goal of early classification is to assign class labels as early as possible while guaranteeing a certain percentage of accuracy. It has great importance in time sensitive applications, such as the diagnosis of heart disease, as early diagnosis improves prognosis. In practical applications, due to an unclear description of the issues to be solved, the early classification of time series may cause false positives in practical applications, and the cost of false positives is very high. To solve this problem, Wu, Der & Keogh (2021) propose that the definition of early classification of time series should be clearly defined first, and it is also very important to obtain real-world publicly available datasets. According to the data type, there are two technology development routes.
• Univariable Rodriguez, Alonso & Bostrom (2001) segment a time series into intervals and then describe these intervals using relative predicates and region-based predicates. It is the first literature to mention the term early classification of time series. Although it achieves early classification by using sub-information, it does not consider ways to choose the shortest prefix to provide reliable classification results. ECTS (Xing, Pei & Yu, 2009) obtains the shortest prediction length through training, and it uses the sequence prefix to classify data under the condition of guaranteed accuracy. ECTS achieves a shorter prefix, higher accuracy, and higher effectiveness by using an accelerating algorithm. Further, Mori et al.  change this task into a mathematical optimization problem, using the accuracy and earliness as mutual optimization goals. The above methods lack interpretability, which is useful in determining the factor affecting an object. EDSC (Xing et al., 2011) introduces shapelets and proposes localshapelets, using kernel density estimation or Chebyshev inequality to find the threshold value of each shapelet, and then selecting the best shapelet for classification.
• Multivariable MSD (Ghalwash & Obradovic, 2012) extends the EDSC algorithm to suit a multivariable situation. It uses information gain to evaluate the goodness of shapelets, adds shapelet pruning, and abandons shapelets that has no ability to correctly classify data. This method has three disadvantages: first, it handles multivariable data in a fixed window, even though, different variables could have different shapelet positions; second, it cannot process variables with different lengths; and third, there is no connection between multiple variables.
To solve these problems, He et al. (2015) propose learning a shapelet for each variable independently, and constructing a classifier that can use multiple shapelets to classify data. Moreover, it substitutes information gain with a new measurement (F-measure). This method can solve the inter-class imbalance problem (a class containing multiple small classes, or consisting of multiple concepts) to a certain degree through inter-class clustering. Lin et al. (2015) further extend the input variables of the algorithm from continuous numerical sequences to characterized discrete sequences. He et al. (2019) use downsampling technology to solve the intra-class imbalance problem, and a clustering method to deal with the inter-class imbalance problem, which further expands the applicability of the algorithm.
In contrast, He, Zhao & Xia (2020) mainly focus on the identification of multivariable class labels as early as possible and ensures the classification accuracy higher than the probability of true label. Tables 5 and 6 compare the accuracy of some univariate early classification algorithms and multivariate early classification algorithms, respectively.
While most of the univariable classification algorithms achieve good results (above 85%), the accuracy of multivariable algorithms do not reach that high (except EPIMTS). This can be attributed to the fact that it is difficult to consider multiple variables simultaneously and extracting the interconnection between them correctly. EPIMTS uses an ensemble method to combine these two important factors into the algorithm, allowing it to achieve the best performance.

Challenges and future trends
This section discusses the different technology development routes in time series classification. Mainly, the research covers both traditional supervised learning methods and semi-supervised learning methods. In particular, an important extension-early classification-is proposed for specific application situations. Although the existing work has achieved good results in time classification tasks, there are still some problems. In real life, the amount of unlabeled data exceeds that of labeled data, and its sources are more abundant. Although supervised learning yields better classification results, labeling data is expensive and time consuming. In some fields such as medical and satellite data, experts are required to label the data, making the acquisition of labeled data even more difficult. Therefore, research on semi-supervised or unsupervised methods has great value. However, according to the research reviewed for this article, very few recent studies focus on semi-supervised learning methods and unsupervised learning methods for time series classification Chen et al., 2013;Nguyen, Li & Ng, 2011). Managing large amounts of unlabeled data for classification tasks is a tremendous challenge we face.

TIME SERIES PREDICTION
Although time series prediction methods have experienced a long period of development, the rapid increase in data scale has brought severe challenges to traditional time series prediction methods, and has also seriously affected the efficiency of prediction methods. Time series prediction methods have gradually developed from simple linear regression models and nonlinear regression models based on traditional statistics to machine learning methods represented by neural networks and support vector machines. At the same time, researchers have also proposed other prediction methods for time series with different characteristics based on different theoretical foundations. Fuzzy cognitive map can deal with data uncertainty and maintain a high level of interpretability. To solve the problem of insufficient labeled data for some practical applications, transfer learning methods can be used. Two future research avenues are clear; first, dealing with rapid increase in the scale of time series data; second, choosing the most suitable model for a specific problem.

Technology developments
According to the reviewed literature, we have defined four technical development routes, namely, the classic algorithm, neural network, fuzzy cognitive map, and transfer learning. Figure 4 lists the development directions of the different technical routes and their resulting algorithms.

Classical methods
The traditional time series prediction methods are mainly used to solve the model parameters on the basis of determining the time series parameter model and using the solved model to complete the prediction work, mainly from the perspective of a stationary series, non-stationary series, or multivariate time series.
• Stationary series Russian astronomer Slutzky create and propose the moving average (MA) model (Slutzky, 1937), and British statistician G.U. Yule propose the autoregressive(AR) model (Yule, 1927) when studying sunspots. The AR model is a representation of a random process, and its output variable depends linearly on its previous value and random conditions. The purpose of the AR model is to minimize the square error between the predicted results and the actual results. Box and Jenkins propose a short memory model called autoregressive moving average (ARMA) model (Box & Jenkins, 1970). The ARMA model provide a general framework for predicting stationary observation time series data. However, it is not suitable for non-stationary time series data, and only one time series can be modeled at a time.
• Non-stationary series Non-stationary time series comprise four trends: long-term trend, cyclic trend, seasonal trend, and irregular trend. Box and Jenkins propose the autoregressive integrated moving average (ARIMA) model for non-stationary short memory data with obvious trends (Box & Jenkins, 1970). The ARIMA model has become one of the most widely used linear models in time series prediction. This model uses historical data of univariate time series to analyze its own trends and predict future cycles, but the ARIMA model cannot easily capture non-linear patterns. One or more time differentiation steps in ARIMA keep the time series data unchanged. Differentiation operations usually amplify high-frequency noise in time series data, thereby affecting the accuracy of prediction. When modeling time series with long memory dependence, a common alternative is autoregressive partial integration moving average (ARFIMA). The model is based on ARIMA and allows the difference parameters to be set to non-integer values. On the basis of the ARIMA model, the autoregressive integrated moving average (ARIMAX) model is obtained by adding exogenous input (Wangdi et al., 2010).
The exponential smoothing (ES) (Gardner, 1985) model is a time series data smoothing technique that uses past data points in a time window to smooth current data points. In contrast to the traditional MA model, the ES model uses an exponential function to assign more weight to the nearest data point, which is beneficial for processing non-stationary time series data, and is aimed at series without trend and seasonality. The Holt smoothing method (Holt, 2004;Winters, 1960), also called double exponential smoothing, is an extension of ES designed for time series with a trend but no seasonality. Chatfield (1978) propose the Holt-Winters model, which uses three smoothing steps to predict time series data. The three smoothing steps are used for level, trend, and seasonality, and are also called three exponential smoothing. The Holt-Winters model can be used for univariate time series prediction of seasonal data.

• Multivariate time series
The vector autoregressive (VAR) (Mizon, 1991) model is a natural extension of the univariate AR model over dynamic multivariate time series, providing predictions superior to univariate time series models and theory-based fine simultaneous equation models. The vector autoregressive moving average (VARMA) (Athanasopoulos & Vahid, 2008) model allows several related time series to be modeled together, considering the cross-correlation and internal correlation of the series. The VARMA model fully considers the influence of each sequence on another sequence, thereby improving the prediction accuracy. This makes the predictions generated by the VARMA model more reliable for decision-making.
Traditional research methods mostly use statistical models to study the evolution of time data. For decades, linear statistical methods have dominated the prediction. Although linear models have many advantages in implementation and interpretation, they have serious limitations in capturing the nonlinear relationship in the data, which is common in many complex real-world problems.

Neural Network
An artificial neural network (ANN) is a flexible computing framework and general approximator that can be applied to various time series prediction problems with high accuracy. The main advantage of a neural network is its flexible nonlinear modeling ability, without the need to specify a specific model form. The popularity of ANN stems from being a generalized nonlinear prediction model. Since the advent of the simplest ANN, the ideas of recursion, nonlinear regression, and convolution continues to develop. According to the characteristics of real data, the linear and nonlinear models can be combined to construct a hybrid model to achieve better performance.

• Recursion
Connor & Atlas (1991) apply a recurrent neural network (RNN) to time series prediction, using the historical information of time series to predict future results. Hochreiter & Schmidhuber (1996) proposes an improved RNN called long short-term memory (LSTM), which solves the problem of the vanishing gradient by introducing additional units that can store data indefinitely, and has shown success in single-step time series analysis. LSTM is able to address sequences of varying length and capture long-term dependencies without the same problems as traditional RNN architectures (Wilson et al., 2018). LSTM has gradually become a popular solution for learning the long-term time-dependent characteristics of original time series data, and can use a fixed-size time window to solve many time series tasks that feedforward networks cannot solve.

• Convolution
Convolutional neural network (CNN) is different from RNN, which strictly uses sequential learning processes. The latter processes one data point each time to generate data representations, while the former use nonlinear filters based on multiple dataset learning representation. In each step, a filter is used to extract features from a subset of local data, so that the representation is a set of extracted features. Liu et al. (2015) use a CNN combined with time-domain embedding to predict periodic time series values; a novel model called a time-embedding enhanced convolutional neural network (TeNet), to learn the repeated occurrences in periodic time series structural elements (called abstract fragments) that have not been hidden to predict future changes. Mittelman (2015) propose a non-decimated full convolutional neural network (UFCNN) to deal with time series problems. UFCNN has no gradient disappearing and gradient explosion problems, so it is easier to train. It can be implemented more efficiently because it only involves convolution operations instead of the recursion used by RNN and LSTM.
• Hybrid model Modeling real-world time series is a particularly difficult task because they usually consist of a combination of both linear and nonlinear patterns. In view of the limitations of linear and nonlinear models, hybrid models have been proposed in some studies to improve the quality of prediction. The ARIMA model, ANN model (Peter & Zhang, 2003;Khashei & Bijari, 2010;Babu & Reddy, 2014), and multi-layer perceptron(MLP) (de O. Santos Jnior, de Oliveira & de Mattos Neto, 2019) are combined to construct a hybrid model, which has been proven by experiments to achieve better performance than a single model.

Notes.
The data are obtained from reference (Shen et al., 2020).  (Peter & Zhang, 2003;Babu & Reddy, 2014), ARIMA-SVM (Pai & Lin, 2005;Oliveira & Ludermir, 2014), ARIMA-NN (Khashei & Bijari, 2010), ARIMA-MLP-SVR (de O. Santos Jnior, de Oliveira & de Mattos Neto, 2019), SeriesNet (Shen et al., 2020) Better performance High complexity Shen et al. (2020) propose SeriesNet, using LSTM and extended random convolution to extract features with different time intervals from the time series, and combining them. This can make full use of the characteristics of the time series and help improve prediction accuracy. Compared with other models, the SeriesNet model has the best prediction accuracy in nonlinear and non-stationary datasets. In the non-stationary datasets, the error of SeriesNet decreases slowly as the size of the sliding window increases. Table 7 compares the root-mean-square error (RMSE), the mean absolute error (MAE) and the coefficient of determination (R 2 ) of multiple methods. We summarize the advantages and disadvantages of different methods, and the results are presented in Table 8. The hybrid model has a stronger advantage when dealing with nonlinear and non-stationary data.

Fuzzy cognitive map
Fuzzy cognitive map (FCM) is a dynamic system quantitative modeling and simulation method proposed by Kosko (1986). It is a simple and powerful tool that is very useful in dynamic system simulation and analysis. FCM can be useful in time series prediction tasks that do not need to deal with exact numbers but only need approximate results (Felix et al., 2019). This method combines the characteristics of fuzzy logic and neural networks, which can effectively model the states of the system. It can simultaneously deal with the uncertainty of data and maintain a high level of interpretability. It has been demonstrated that FCM can be applied to predict time series with univariate (Lu, Yang & Liu, 2014) and multivariate (Froelich et al., 2012;Papageorgiou & Froelich, 2012a;Papageorgiou & Froelich, 2012b;Stach et al., 2005) variables.
The existing algorithms applied to train FCM belong to two main groups, populationbased and Hebbian-based methods. Population-based algorithms include particle swarm optimization (PSO) (Homenda, Jastrzebska & Pedrycz, 2015;Salmeron et al., 2017), genetic algorithm(GA) (Yesil et al., 2013), memetic algorithms (Salmeron, Ruiz-Celma & Mena, 2016), artificial bee colony(ABC) (Yesil et al., 2013), and modified asexual reproduction optimization (Salmeron et al., 2019). Hebbian-based learning algorithms are seldom used for time series prediction because of their poor generalization ability. FCM in the time series prediction domain is mostly composed of two parts, establishing the structure and learning the weight matrix. To facilitate an efficient extraction of concepts, FCM framework is constructed by using fuzzy c-means algorithm (Lu et al., 2014). When applying standard FCM to time series prediction, most of the literature (Lu, Yang & Liu, 2014;Poczeta & Yastrebov, 2014;Papageorgiou, Poczeta & Laspidou, 2015;Poczeta, Yastrebov & Papageorgiou, 2015) assumes that the weights of FCM are adjusted during the training phase and do not change with time when used for prediction. To improve the accuracy of prediction and reduce training time, some studies proposed pseudo-inverse learning and wavelet transform.
• Pseudo-inverse learning Vanhoenshoven et al. (2020) propose a new FCM learning algorithm based on the Moore Penrose inverse (FCM-MP). The unique feature of this learning method is that for the pseudo-inverse learning of the FCM weight matrix, each iteration step calculates a different set of weights. In this way, different time-varying data segments will affect the weight, and the weight will change from one iteration to the next. This algorithm improves the accuracy of prediction, does not require laborious parameter adjustments, and reduces the processing time required for training the FCM.

• Wavelet transform
Although fuzzy cluster analysis has strong time series modeling capabilities, prediction methods based on fuzzy cluster analysis cannot handle non-stationary time series, and evolutionary learning methods are not suitable for large-scale time series. To overcome these two limitations, Yang & Liu (2018) propose wavelet high-order fuzzy cognitive map (WHFCM), which uses wavelet transform instead of fuzzy time series, and uses ridge regression to train. Further, empirical wavelet transform (EWT) is superior to discrete wavelet transform in time series prediction, because empirical wavelet transform is a data-driven signal decomposition algorithm. Gao, Du & Yuen (2020) propose the mixed time series prediction model based on EWT and FCM. EWT is applied to decompose the original time series into different levels to capture information of different frequencies, and to train high-order fuzzy cognitive maps to model the relationship between all generated subsequences and the original time series.
FCM has been successfully used to model and predict stationary time series. However, it is still challenging to deal with large-scale non-stationary time series with time trends and rapid changes over time. The main advantage of the FCM-based model is the humancentered knowledge representation interface. Therefore, in terms of accuracy, fuzzy admissible mapping time series modeling may not exceed the classical methods that have been studied, but FCM provides superior practical characteristics.

Transfer learning
Time series data usually change over time. Hence, samples collected over a long period of time are usually significantly different from each other. As such, it is generally not recommended to directly apply old data to the prediction process. For time series prediction problems, we hope to train an effective model with only a small number of fresh samples and relatively rich, old data. Therefore, to solve the problem of insufficient labeled data available in some practical applications, transfer learning methods can be used. Transfer learning is the reusing and transferring of knowledge in one field to other different but related fields. Its basic idea is to utilize the data or information of related source tasks to assist in modeling for the target task. Traditional machine learning techniques try to learn each task from scratch, while transfer learning techniques try to transfer the knowledge from some previous tasks to a target task when the latter has less high-quality training data (Pan & Yang, 2010). Xiao, He & Wang (2012) propose a transfer learning-based analog complexing model (TLAC). First, it transfers related time series from the source domain to assist in modeling the target time series using the transfer learning technique. Ye & Dai (2018) propose a hybrid algorithm based on transfer learning, combining online sequential extreme learning machine with kernel (OS-ELMK) and integrated learning (TrEnOS-ELMK). With TrEnOS-ELMK, a single-source transfer learning algorithm is implemented. Using transfer learning, the knowledge learned from old data can be effectively used to solve the current prediction task, bridging the severe challenge brought about by long-term knowledge transfer. The distribution of time series data usually changes gradually and significantly over time; therefore, single-source transfer learning algorithm may also be confronted with the challenge of negative transfer. To solve this problem, Gu & Dai (2021) propose a new multi-source transfer learning algorithm, referred to as MultiSrcTL algorithm, and a new active multi-source transfer learning algorithm, referred to as AcMultiSrcTL algorithm.
Ye & Dai (2021) propose a deep transfer learning method (DTr-CNN) based on the CNN architecture, which inherites the advantages of CNN and tries to alleviate the problem of insufficient labeled data. This algorithm considers the similarity between the potential source dataset and the target dataset, and provides guidance for selecting the appropriate source domain. Gupta et al. (2018) propose an approach to leverage deep RNNs for small, labeled datasets via transfer learning.
At present, there are relatively few studies on the application of transfer learning to time series prediction. Existing research mainly focuses on the research of pattern classification. In many practical situations, the lack of labeled data may become an obstacle to time series prediction. Unlike traditional machine learning algorithms, transfer learning breaks the assumption that training data and test data must follow the same distribution. For relevant datasets with sufficiently labeled samples, the use of transfer learning framework has become a new trend, and the use of knowledge from relevant source datasets on target dataset effectively solves the problem of insufficient labeled data.

Challenges and future trends
This section discusses the method of time series prediction. Time series data essentially reflects the changing trend of some random variables over time. The core of the time series prediction problem is to identify trends from the data, and use it to estimate the future data and predict occurrences in the next period of time. There is not one best model for all actual data, only the most suitable model from a reasonable range of models can be chosen to provide better prediction. The establishment of new time series models is still a problem that scholars will continue to study in the future, giving direction for further research in the field of time series prediction.

CONCLUSION
Time series is an important data type and is generated in almost every application domain at an unprecedented speed and scale. The analysis of time series can help us understand the essence of various phenomena. We investigate current research regarding time series and find that there are few reviews for time series algorithms. In this article, we analyze the prevalent topics of time series and divide them into two categories: classification and prediction. Further, we extract the important technology development routes for time series related algorithms, and introduce every original method and its subsequent improvements. In addition, we compare the performance of different algorithms, analyze and conclude their advantages and disadvantages, as well as the problems and challenges they face.
Through our investigation, we find that the technological development has three areas: the traditional method, machine learning method, and deep learning method. In time series classification, the mainstream methods change from distance-based methods (1NN-DTW) into feature-based methods (shapelets), and finally they evolve into a mathematical optimization problem that not only improve the accuracy but also reduce the time complexity. In time series prediction, owing to the limitations of AR, MA, ARIMA, and other traditional methods that cannot cope with nonlinear problems well, neural network methods have become a popular topic, and it is expected to enhance the learning ability of models through fuzzy cognitive map and transfer learning. Despite the fact that the current research has obtained some achievements, we find some important problems during our investigation: • For time series classification, the research on semi-supervised and unsupervised learning algorithms is insufficient. While unlabeled data is ubiquitous and available in large amounts in real life, labeling it is labor intensive and sometimes requires expert knowledge.
• For time series prediction, constructing targeted time series models to solve real-world problems is still an ongoing problem for future researchers.
In view of the current development status of time series research, we believe that there are still many possible development directions for time series analysis. For example, neural network is a very popular method for time series analysis. In most cases, its solution process is a black box, which lacks interpretability, so that the results cannot be intuitively understood, and clear and targeted optimization scheme cannot be obtained. Exploring the symbolic expression of time series with stronger interpretability is the possible development direction of time series in the future. At present, most of the time series analysis is to collect data offline for offline analysis. When the model built in the offline phase is used in the online phase, new samples are continuously obtained as the working time increases. However, most methods do not consider the use of newly obtained data, and the model cannot be updated in time. Therefore, how to update the model for real-time data is the future task of time series modeling research.
Time series has attracted much attention because of its important applications in many fields, such as disease diagnosis and traffic flow prediction. We believe that the study of time series in this article will provide a valuable reference for related research and inspire interested researchers and practitioners to invest more in this promising field.