A Dual-Attention-Based Stock Price Trend Prediction Model With Dual Features

Modeling and predicting stock prices is an important and challenging task in the field of financial market. Due to the high volatility of stock prices, traditional data mining methods cannot identify the most relevant and critical market data for predicting stock price trend. This paper proposes a stock price trend predictive model (TPM) based on an encoder-decoder framework that predicting the stock price movement and its duration adaptively. This model consists of two phases, first, a dual feature extraction method based on different time spans is proposed to get more information from the market data. While traditional methods only extract features from information at some specific time points, this proposed model applies the PLR method and CNN to extract the long-term temporal features and the short-term spatial features from market data. Then, in the second phase of the proposed TPM, a dual attention mechanism based encoder-decoder framework is used to select and merge relevant dual features and predict the stock price trend. To evaluate our proposed TPM, we collected high-frequency market data for stock indexes CSI300, SSE 50 and CSI 500, and conducted experiments based on these three data sets. The experimental results show that the proposed TPM outperforms the existing state-of-art methods, including SVR, LSTM, CNN, LSTM_CNN and TPM_NC, in terms of prediction accuracy.


I. INTRODUCTION
The stock price is a highly volatile time series in the financial field.The prices of stocks are affected by many factors, such as interest rates, exchange rates, inflation, monetary policy, investor sentiment, etc. Modeling the relationship between the stock price and these factors and predicting the stock price trend is a challenging task for researchers and investors.
There are many studies on the prediction and analysis of financial time series.In 1970, the Effective Market Hypothesis [1] indicated that the stock price is an immediate reflection of stock market information.Therefore, researchers use traditional statistical methods such as regression methods, The associate editor coordinating the review of this manuscript and approving it for publication was Yongping Pan .exponential average and ARIMA [2]- [4] to predict the stock price based on historical prices.However, since the underlying market information mined from stock price is too little, these statistical methods cannot accurately predict stock price trend.Statistical methods often assume that the time series is generated from a linear process and therefore perform poorly in non-linear stock price prediction.Machine learning and deep learning methods have been relatively successful in financial time series modeling.Compared with statistical methods, machine learning and deep learning methods have better nonlinear mapping ability.A considerable amount of research has been conducted to extract features on specific time points and then use features to model and predict the result.However, they ignore the interaction of data and shortterm continuity of data fluctuations.To breach this gap, we propose a dual data feature extraction method based on one single time point and multiple time points, combine shortterm market features with long-term temporal features to improve the accuracy of prediction.Moreover, the proposed model is based on the encoder-decoder framework [5]- [6], and the attention mechanism [7] is introduced in encoder and decoder stages respectively to solve the problem that the most relevant features could not be concerned in a long time series.
Motivated by above-mentioned problems, this paper proposes a new stock price trend prediction model (TPM) based on dual features and dual attention mechanism.The aim of the TPM is to predict the direction and duration of stock price changes.The main contributions of this paper include: 1) A new dual-feature extraction method based on different time spans is proposed, which can effectively mine the underlying market information and optimize model prediction results.In this paper, the piecewise linear regression method and convolutional neural network are used to extract long-term temporal features and short-term market features of the financial time series in different time spans.Describing the stock market information with dual features can improve the model prediction performance.2) Using an encoder-decoder framework, a stock price trend prediction model (TPM) based on the dual attention mechanism is proposed.Introducing the attention mechanism in both encoder and decoder stage respectively, the TPM model can adaptively select the most relevant spatial short-term market features and combine them with long-term temporal features for prediction.3) A performance study on our proposed TPM shows that this proposed method outperforms the state-of-art methods, including SVR, LSTM, CNN, LSTM_CNN and TPM_NC, under various training and testing parameters with different stock index data sets.The performance results show our proposed TPM demonstrates better generalization ability and market forecasting ability.The rest of this paper is organized as follows.Section 2 introduces the related work on mining the financial time series.The proposed preprocessing method and TPM are discussed in details in Section 3. Section 4 presents experiments which demonstrate the superiority of our TPM.We conclude our work in Section 5.

II. RELATED WORK
In the field of mining the financial time series, most research methods can be divided into two phases: data preprocessing and time series modeling.In data preprocessing phase, some preprocessing procedure such as dimension reduction, feature selection, and feature extraction can be used to transform raw input data into representative features.Then in time series modeling stage, a prediction model is built to learn the temporal dependencies of feature and predict the result.
Data preprocessing is a process that maps the original high-dimensional data into low-dimensional features, filter out the irrelevant data, and obtain the least redundant and most representative features.The data preprocessing results often influence the effect of the prediction.In the field of mining financial time series, feature selection and extraction, time series segmentation are often used for data preprocessing.Feature selection and extraction can be classified as filters and wrappers, depending on different feature evaluation methods [8].Filter methods choose static statistical characteristics of the data as evaluation criteria, while wrapper methods are refining the results dynamically.Generally, wrapper methods perform better than filter methods but need more expensive computing resources.Therefore, some researchers have suggested a combined filter and wrapper method.Moradi and Gholampour [9] proposed a feature search method which combined feature correlation of local search and particle swarm optimization of global search.Dong et al. [10] blended binary genetic algorithm, neighborhood rough set algorithm and ROGA algorithm for feature extraction.Time series segmentation, decomposing historical data into important points or segments, have great help in filtering data noise, reducing dimension and saving computing resources.For example, Chang et al. [11] combined genetic algorithm and segmentation methods for identifying trend points and Zhao and Wang [12] used the outliers of stock volume for stock market prediction.
Modeling and learning the dependencies of financial time series is a challenging issue.In the last decades, machine learning and deep learning methods have become very popular in financial time series modeling.Machine learning methods, such as random forest [13], artificial neural network [14], and support vector machine [15], have good nonlinear mapping ability and easy interpretation.Chen and Hao [16] predicted the stock index by applying feature-weighted SVM and feature-weighted KNN.Thakur and Kumar [17] integrated the random forest and weighted SVM to generate trading signals of decision support systems.Chandar [18] used a discrete wavelet transform to decompose financial time series data and build a fuzzy set based ANN model for predicting the closing price of the stock.Deep learning methods, discover the multi-level abstract data representations of data set by its deep neural network structure and back-propagation algorithms, have achieved great success in image processing, speech recognition and text mining [19], such as Seq2Seq [5], GoogLeNet [20], ResNet [21], BERT [22] and have also been tried in the financial time series field [23]- [25].For example, Zhang et al. [26] proposed a novel event representation model involving RBM and sentence2vec, which extracts and trains the stock price data and news text information for prediction.Minh et al. [27] proposed a new two-stream gated recurrent unit network (TGRU) to improve the performance of the prediction model.Pang et al. [28] proposed two LSTM models, one based on the embedded layer and the other based on the automatic encoder.The LSTM with embedded layer shows better performance in stock market prediction.As we discussed earlier, most of them applied machine learning method for feature extraction and used a single neural network for modeling.However, they ignored that using multiple different intricate structures of deep neural networks, such as CNN, RNN, LSTM, can consider the characteristic of data more comprehensively.

III. TREND PREDICTION MODEL (TPM)
The aforementioned deficiencies are summarized as follows: first, the univariate financial time series contains insufficient information.Second, the traditional feature extraction method is limited in studying market behavior.Third, the information learning from data with a single neural network is incomprehensive.
To address these issues, this paper proposes a new stock trend prediction model (TPM) based on dual features and dual attention mechanism.This model consists of two phases.First, we use the piecewise linear regression method to segment the financial time series and extract the historical long-term temporal features based on the sub-sequences with different time spans.The short-term spatial market features based on each time point are generated through a convolutional neural network.Then, in the second phase of TPM, with the dual features extracted previously, a dual-attentionbased trend prediction model is proposed.It is based on the encoder-decoder framework.The encoder stage is in the form of LSTM and the attention mechanism in encoder is applied to extract the most relevant short-term market features adaptively, then encode into a feature vector.The decoder stage, formed by attention-based LSTM, selects and decodes the most relevant fusion features to predict the stock price trend.The detailed process of the TPM is shown in Fig. 1.
Given a financial time series X = X 1 , X 2 , . . ., X n , where X n ∈ R d represents the input data at time n.We decompose the predict time series into sub-sequence sets, denoted by L = (L 1 , L 2 , . . ., L m ).Each sub-sequence is fitted to a segment and denoted by L m = (s m , d m ), s m is the slope of the segment, and d m is the duration of the sub-sequence, which is the time length of the segment.The long-term temporal features can be extract through the sliding window ω l .At time t, the long-term feature can be denoted by The short-term feature can be extracted from the data of the sliding window ω s and the short-term feature at time t is The set of the short-term features is S = (S 1 , S 2 , . . ., S k ) and the length of each element S k is ω s .As we discussed in Section 1, according to the long-term features and the short-term features, our goal is to predict the stock price trend Y T = (s T , d T ).More specifically, we aim to learn a nonlinear mapping F( ) that A. PHASE I: DATA PREPROCESSING 1) FEATURE GENERATION Since the market information provided by univariate financial time series is insufficient, it is hard to model and predict the stock price trend from univariate data.We choose the basic market data such as opening price, closing price, highest price, lowest price, volume and transform them into technical indicators.Technical indicators are the meaningful rules and patterns of the market proposed in [29]- [30].For example, Moving Average (MA), The Relative Strength Index (RSI) and Moving Average Convergence/Divergence (MACD).In addition, their lagged time series also contained as our features.The specific description of the generated features is shown in Table 1.
Collecting the five-minute interval market data of CSI 300 on March 21, 2017, the closing price and several features mentioned above are plotted in Fig. 2. We can find that in Fig. 2 (a), the closing price change of two sub-periods is very similar, but Fig. 2 (b)-(f), other features of these two sub-periods are completely different, especially the Volume in Fig. 2 (b) and the WMSR%12 in Fig. 2 (e).Therefore, it is unreliable to predict the stock price trend only by the change of the closing price, and we should combine multi-feature to depict the stock market information and improve the accuracy of prediction.

2) DUAL FEATURE EXTRACTION a: PLR EXTRACTS THE LONG-TERM TEMPORAL FEATURES
Considering the continuity of data changes, we extract the long-term temporal features with multiple time points by piecewise linear regression method (PLR).The PLR method  can smooth the short-term fluctuation noise, reduce the data dimension and improve the computational performance.
There are three traditional PLR methods, bottom-up, topdown and sliding window.The sliding window method is that dividing financial time series into sub-sequences with fixed length.If the window size is not suitable, the sub-sequence will be incorrectly divided, which will influence the effect of the prediction.To avoid this, we choose the bottom-up PLR method which segments the time series more appropriately and has a relatively low fitting error compared with other methods.The detailed algorithm is shown in Algorithm 1.
In our task, we assume that the number of data points is n and there are m segments with length d.Each segment with length i needs θ (i) times to generate so that reach d length segment from 2 length segments needs θ (d 2 ) time.We need to check n/d segments with d length at most, thus, the time complexity is O(n * d).
Obviously, the segment result of the time series depends on the maximum error threshold δ.Taking CSI 300 as an example, we use the bottom-up PLR method to segment its historical closing price.In Fig. 3 (a), when the threshold value δ is 2.0, the time series can be divided into

b: CNN EXTRACTS THE SHORT-TERM SPATIAL MARKET FEATURES
Considering the interaction of different data at the same time point, the short-term spatial market feature of each time point is extracted by a convolution neural network.Given financial time series X = X 1 , X 2 , . . ., X n , we construct a Market Matrix to describe the historical stock market which is denoted by S T −1 = (X 1 , X 2 , . . ., X T −1 ).In the Market Matrix, each row represents one dimension of S T −1 and the number of rows is n, while each column represents one time points and the number of columns is T-1.Because the CNN preserve the neighborhood relations and spatial locality of the input data [31], CNN can capture the non-linear relationship between the Market Matrix S T −1 and stock trend (s T , d T ), and output the spatial features of the short-term historical time series.S T −1 = (X 1 , X 2 , . . ., X T −1 ).
The detailed short-term feature extraction structure of CNN is shown in Fig. 4. In our CNN architecture, different size of convolution kernel is chosen such as 1 × 3, 1 × 5 to extract abstract multi-level spatial market features.The convolution neuron for extracting features from input the Market Matrix is given by where X t denotes the input Market Matrix, * is the convolution operation, W c and b c are the weights and biases of convolution neurons to be trained, ∅( ) is a non-linear activation function which is chosen to be the ReLU function [32].
The max pooling layer will be performed after convolution layers and it can reduce the size of feature maps and avoid overfitting.We choose the same size with the convolution kernel 1 × m and the max pooling operation can be described by After several layers of convolution and max-pooling, we feed the outputs to a projected layer by W t = W p * H p t +b p , where W p and b p are parameters.Finally, the interaction of data can be depicted by the short-term spatial market vector , where each W t ∈ R m denotes the spatial market feature at time t.

B. PHASE II: TIME SERIES MODELING BY ENCODER-DECODER FRAMEWORK
The encoder-decoder framework is first proposed in text processing, which is usually in the form of RNN or CNN.In an encoder-decoder framework, the encoder compresses the input information into a fixed-size vector and the decoder processes these vectors into the final result.However, when there is too much input information, the encoder cannot efficiently identify all relative information.As a result, the performance of the encoder-decoder framework will deteriorate.The attention mechanism can optimize the problem by decoding the hidden state of relevant neurons.The encoder-decoder framework is simulating the human information processing process, solving the limitation of the same length of the encoding-decoding time series, and refining and compressing the input data to generate better prediction results.
Clearly, there is a problem that the attention-based decoder cannot select relevant input data explicitly, so we introduce the attention in both encoder and decoder stage respectively.The second phase of our proposed TPM is based on the dual attention mechanism which is shown in Fig. 5.The encoder-decoder framework can be divided into two stages.In the first stage we input the short-term spatial market features extracted by CNN into the attention-based LSTM encoder, the relevant short-term features at each time point are selected adaptively and encoded into vectors.In the second stage, the encoded vectors and the long-term temporal features extracted by PLR are input, the LSTM decoder decodes the relevant vectors and features based on the attention mechanism to predict the stock price trend.Through the dual attention mechanism, we can adaptively select the most relevant spatial market features and temporal features to model and predict the trend.

1) ATTENTION-BASED SHORT-TERM FEATURE ENCODER
Given the short-term spatial market features W Market = (W 1 , W 2 , . . ., W T −1 ) extracted by the CNN.At each time point t, the encoder learns the mapping relationship between the input feature W t and the hidden state H t : where H t ∈ R k is the hidden state of the encoder at time t, k is the size of the hidden state, f en ( ) is a nonlinear function, and θ en denotes the parameters of the encoder.We use LSTM [33] as a nonlinear function f en to capture the temporal dependencies and form a short-term feature encoder.A LSTM neuron controls the update and output of the state by a forget gate σ 1 , an input gate σ 2 and an output gate σ 3 .Their operations are as follows: where σ 1 , σ 2 , σ 3 are three sigmoid functions, * is the element-wise operator, H t−1 is the hidden state of the previous time point t-1, W t is the input at time point t, LSTM is capable of modeling the dynamic temporal behavior of time series effectively and avoiding gradient vanishing or exploding issues in RNN [34].
As shown in Fig. 6, we introduce the attention mechanism [7] in encoder stage and divide the input feature W Market into W 1 , W 2 , . . ., W m according to the feature dimension m, where W p = (W p,1 , W p,2 , . . ., W p,T −1 ) represents the p-th dimension feature at each time point.Given the hidden state H t−1 and the cell state C t−1 calculated at time t-1, the relevant dimensions of the input features are identified and used to update the input features of the next time t.
where v a ∈ R T −1 , W a ∈ R (T −1)×2k and U a ∈ R (T −1)×(T −1) are parameters, the softmax function is chosen for calculating the importance α m,t of each dimension feature, update all dimensions of W t to F t and input them to the encoder, then the hidden state of the time point t is: Through the above steps, at each time point t, we can select the relevant dimensions of spatial market features, update the input feature and the hidden state of the encoder successively, and generate the most relevant short-term feature encode vector.

2) ATTENTION-BASED LONG-TERM FEATURE DECODER
The decoder is in the form of the LSTM neurons to predict the stock price trend.Given the long-term feature  (17) where H t ∈ R g is the hidden state of the decoder at time t, g is the size of the hidden state, f de ( ) is a nonlinear function, and θ de denotes the parameters of the decoder.Similarly, we use LSTM as a nonlinear function f de to capture the temporal dependencies and form a long-term feature decoder.
The calculate procedure is similar to the encoder stage.We also introduce the attention mechanism in decoder stage to get the related encoder hidden states of all time points.Given the hidden state H t−1 ∈ R g and the cell state C t−1 ∈ R g of the decoder, the hidden state H i of the encoder, the importance of the hidden state γ i,t in the i-th encoder at time t can be obtained by where 1 R k×k are parameters.Then, the context vector that we feed to the decoder is given through all hidden states of the encoder (H 1 , H 2 , . . ., H k ) by After obtaining the context vector C t , we can combine C t with the long-term temporal feature L t to generate the mixed feature y t , that is where w c ∈ R k+2 and b c ∈ R 2 are parameters to be learned.Finally, instead the feature L t with the mixed feature y t , we can get the hidden state of the decoder H t by H t = f de y t , H t−1 ; θ de (22) Through the aforementioned formula, at each time point t, the most relevant encoder hidden state of all time points and VOLUME 7, 2019 the long-term temporal features will be chosen to generate the mixed feature vectors.
Finally, we learn the nonlinear mapping function F( ) between the stock price trend and the dual features.The prediction of stock price trend at time point t Y T = (s T , d T ) is given by where We used a stochastic gradient descent method and a momentum optimizer to train the proposed model with the batch size of 64 and the learning rate of 0.001.The squared error function with regular terms is our object function, and the parameters of the model will be learned through the back propagation.The loss function is given by where W and b represent the weights and biases to be learned, N is the number of training samples, λ is the hyper-parameter of L2 regularization, and Y i T denotes the slope and duration of time T .By feeding on the extracted dual features, the slope and duration of the stock price trend can be obtained.

IV. EXPERIMENTS
In this section, three different stock indexes from China's A-share market are chosen for experiments, including CSI 300, SSE 50 and CSI 500.We collected the stock market data including opening price, closing price, highest price, lowest price, trading volume and turnover with a 5-minute interval.The data covers the period from August 31, 2005, to August 31, 2018.In our experiments, we used CSI 300 data of 1,132 days including 150,336 data points, SSE 50 data of 3,133 days including 150,384 data points, and CSI 500 data of 3,042 days including 1,460,016 data points.We split these three data sets into training, validation, and testing set with ratio 8:1:1.With these stock index data sets, we conducted extensive experiments to evaluate the predictive performance of our proposed TPM and other models.

A. COMPARISON MODELS AND EVALUATION METRIC
The Support Vector Regression, LSTM, CNN and LSTM_CNN models are implemented for comparison.These models are described briefly as follows: 1) Support Vector Regression (SVR): the concatenating of the short-term features is feed to the SVR, and the parameters of radial basis functions (RBF) are set to c = 1, γ = 0.1 and d = 3.The prediction of the stock price trend will be generated with the RBF-based SVR. 2) LSTM: we implemented a recurrent neural network based on LSTM neurons and the hidden size is set to 64.We feed the long-term temporal features into LSTM to model and predict the stock price trend.3) CNN: The short-term time series are used as input data, and the stock price trends are trained and predicted by a two-layer convolutional neural network with 3 × 3 convolution kernels.4) LSTM_CNN [35]: it is a hybrid structure of CNN and LSTM.We feed the financial time series to this network and the CNN and LSTM extract and hybrid the features to learn and generate the prediction.5) TPM_NC: We implemented the TPM_NC model by removing the CNN neurons from our TPM model.The encoder and decoder of TPM_NC model consist of LSTM neurons.The TPM_NC model encodes the short-term time series directly and decodes them with the long-term temporal features to predict the stock price trend.We set the CNN of the TPM to consist of a 1×3 convolution kernel, and the number of LSTM neurons in the encoder and decoder is 64.The maximum training epochs of these models are set to 100.Meanwhile, we adopt the early stop method, the dropout layer with 0.5 dropout ratio, and the L2 regularization with λ = 0.0001 to prevent over-fitting.
In order to evaluate the performance of our TPM and other models in the trend prediction of financial time series, the root mean square error (RMSE) is used as evaluation metric.Specifically, assuming N is the number of samples, Ŷt and Y t denote the predicted value and the true value respectively at time t, we can calculate the RMSE by The lower the RMSE, the closer the predicted value is to the true value and the better performance of the model.Based on these parameters settings, the market data of the three China Stock Indexes: CSI 300, SSE 50 and CSI 500 are trained respectively and their stock price trends are predicted separately.

1) PREDICTION RESULTS
After analyzing the historical stock data of China's stock index CSI 300, SSE 50 and CSI 500, we set the maximum error thresholds of PLR method in long-term temporal features extraction as δ CSI 300 = 2.5, δ SSE 50 = 1.6, δ CSI 500 = 0.025, and their time step lengths T − 1 all are 96.We conducted the experiments with the settings in Section A. and the experimental results are shown in Table 2.
It can be observed that in all these three data sets, our proposed TPM out-performs other models in predicting the    slope and duration of the trend.SVR shows the worst performance, while LSTM performs better, indicating that the model capturing the non-linear temporal relationship well will improve the performance.Analogously, CNN also performs better than SVR because of the spatial modeling The LSTM_CNN achieves better performance than CNN and LSTM, validating the effectiveness of temporal and spatial modeling ability.Furthermore, the TPM outperforms the TPM_NC by a considerable margin since the CNN encoder enhances the predictive performance.The TPM combines the dual features with different time spans and optimizes feature selection and fusion operations through the dual-attentionbased encoder-decoder network.In terms of the of CSI 300, our TPM shows 9.51% and 7.98% improvements beyond the best baseline model on slope and duration prediction.It demonstrates that the dual features and dual attention mechanism can be successfully applied to the trend prediction problem and improve the prediction accuracy.

2) PREDICTION RESULTS WITH DIFFERENT ERROR THRESHOLDS
In the extraction process of long-term temporal features, the extracted features change with different maximum error thresholds δ.For the same data set, with the increase of the threshold value, the more data fluctuations are ignored and the fewer long-term features are extracted.Therefore, we set different thresholds δ according to the characteristics of the data set, and observe the influence of the threshold on the prediction accuracy.We set the time point lengths in three data all are 96 and the maximum error threshold varies by δ CSI 300 ∈ [0.It can be observed that as the threshold increases, the prediction performance of all models decreases.However, compared with other models, the proposed TPM has relatively low prediction error and is robust to the data variation.

3) PREDICTION RESULTS WITH DIFFERENT TIME STEP LENGTHS
The time step length of the input features is adjustable and the obtained prediction results are different with different time step lengths.As the time step length increases, more and more data can be fed to the model.Therefore, we conduct experiments with different time step lengths.The length of time points T − 1 in three data set is chosen from the range {48, 96, 144, 192, 240} which is {1, 2, 3, 4, 5} days.Three maximum error thresholds each are set to δ CSI 300 = 2.5, δ SSE 50 = 1.6, δ CSI 500 = 0.025.Table 3-5 shows the effect of the length on the model performance in the three stock indexes, respectively.With the prediction results of CSI 300 in Table 3, we find that when the length becomes longer, the RMSE of TPM decreases slightly and the improvement is little.Because the TPM focuses on the most relevant data which contains extensive market information.These data have a strong influence on the prediction results and increasing the amount of data will improve prediction performance marginally.Compared to other models, the TPM model has a lower RMSE and has similar performance in predicting the SSE 50 and CSI 500 indexes, as shown in Table 4 and Table 5.It demonstrates that in stock price trend prediction, the TPM is successful with its high accuracy and robustness.

V. CONCLUSION
Traditional methods cannot extract relevant features for mining the financial time series.To address this issue, we propose a dual phase trend prediction model (TPM) based on dual features and dual attention mechanism for financial stock markets.First, in the data preprocessing phase, we use the PLR method and CNN to extract the dual features which represent the long-term trend of historical data and the short-term underlying market information.Second, in the time series modeling phase, we propose a new framework with shortterm feature encoder and long-term feature decoder.We introduced the attention mechanism both in encoder and decoder so that the most relevant dimensions of features of all time points will be selected and merged adaptively.Finally, the TPM can accurately predict the slope and duration of the trend.Our experimental results show that our proposed TPM reduces the RMSE by 13.74% and 17.63% on average in comparison to other models, including SVR, LSTM, CNN, CNN_LSTM, and TPM_NC.In addition, experiments conducted with different thresholds and time step lengths demonstrate that the proposed TPM is not only better in prediction performance but also robust to time and data variation.

FIGURE 1 .
FIGURE 1. Detailed processes of the TPM.
16 sub-sequences, while in Fig.3 (b), when the threshold value δ is 4.0, there are only four sub-sequences can be obtained.With the increase of threshold value, the more data fluctuations are ignored and the fewer sub-sequences are formed.The value of the threshold influence the validity of historical time series features.Each sub-sequence represents the fluctuation of data over a time period.The slope s m and the duration d m of each sub-sequence are generated as the long-term temporal features in TPM to predict the stock price trend.

FIGURE 4 .
FIGURE 4. The short-term feature extraction structure of CNN.

FIGURE 6 .
FIGURE 6.The calculation procedure of the attention mechanism.

T − 1
and C T −1 represent the hidden state and content vector of the decoder at time point T-1, W d ∈ R g×(g+k) and b d ∈ R g are parameters.And at last, we use a linear function to get the stock price trend prediction at time point T , where v T d ∈ R g and b d ∈ R are weights and bias of the last linear function.

FIGURE 7 .
FIGURE 7. The slope prediction results with different thresholds of CSI 300 (left) and the duration prediction results with different thresholds of CSI 300 (right).

FIGURE 8 .
FIGURE 8.The slope prediction results with different thresholds of SSE 50 (left) and the duration prediction results with different thresholds of SSE 50 (right).

TABLE 1 .
Description of generated features.
extracted by PLR method, where T-1 is the sequence length, and L t = (s t , d t ) denotes the longterm temporal features at time point t.At each time point t, the decoder LSTM learns the mapping relationship between the encode vector W t , the long-term feature L t and the hidden state H t .H t = f de L t , W t , H t−1 ; θ de

TABLE 2 .
The experimental results of different methods in three stock indexes.

TABLE 3 .
The slope RMSE and duration RMSE of different methods with the time step length varies in CSI 300.The slope prediction results with different thresholds of CSI 500 (left) and the duration prediction results with different thresholds of CSI 500 (right).

TABLE 4 .
The slope RMSE and duration RMSE of different methods with the time step length varies in SSE 50.

TABLE 5 .
The slope RMSE and duration RMSE of different methods with the time step length varies in CSI 500.