A Novel Time-Sensitive Composite Similarity Model for Multivariate Time-Series Correlation Analysis

Finding the correlation between stocks is an effective method for screening and adjusting investment portfolios for investors. One single temporal feature or static nontemporal features are generally used in most studies to measure the similarity between stocks. However, these features are not sufficient to explore phenomena such as price fluctuations similar in shape but unequal in length which may be caused by multiple temporal features. To research stock price volatilities entirely, mining the correlation between stocks should be considered from the point view of multiple features described as time series, including closing price, etc. In this paper, a time-sensitive composite similarity model designed for multivariate time-series correlation analysis based on dynamic time warping is proposed. First, a stock is chosen as the benchmark, and the multivariate time series are segmented by the peaks and troughs time-series segmentation (PTS) algorithm. Second, similar stocks are screened out by similarity. Finally, the rate of rising or falling together between stock pairs is used to verify the proposed model’s effectiveness. Compared with other models, the composite similarity model brings in multiple temporal features and is generalizable for numerical multivariate time series in different fields. The results show that the proposed model is very promising.


Introduction
With the development of computer technology, artificial intelligence, big data and cloud computing, an increasing number of people are relying on computer algorithms to address problems in all fields. People in different fields attempt to use artificial intelligent technology to make their work simpler, faster and more accurate, especially in finance [1]. Due to the digitization of financial transactions, large amounts of financial data with considerable implicit information are generated and stored. How to use these data to help people invest has become an issue of common concern for people majoring in both computer science and finance. There are quite a few people demanding accurate predictions of financial indicators so that they can adjust their investment portfolio in time to gain more returns or reduce deficits. In fact, stock fluctuation is complicated and difficult to predict accurately, and most of the existing approaches can provide investors with only some efficient advice and cannot always provide returns. However, identifying the correlation between different stocks still has meaning for investors in helping them make investment decisions.
Generally, there are correlations between different stocks that could be reflected in price curves, such as stocks rising or falling together, stocks with one rising and another falling and stocks that are similar in shape but unequal in length fluctuations appearing at different times, which could be abstracted from the situation shown in Figure 1. Identifying the relationship between stocks can provide investors with a very efficient investment reference. For example, if a stock kept going up or had a very positive trend on recent days, tifying the relationship between stocks can provide investors with a very efficient investment reference. For example, if a stock kept going up or had a very positive trend on recent days, its similar stocks may have high probability to go up as well, such as the rising-or-falling-together relationship shown in Figure 1. In contrast, if a stock starts to fall, its similar stocks may also follow suit. This means that if an investment portfolio has some similar declining stocks at the same time, then it may lead to enormous losses. According to the cognition of the similarity of stocks, investors could adjust their investment portfolio in time to obtain more returns or avoid enormous losses. To explore the relationship between stocks, people begin to focus on the fluctuation of stock prices. Because stock price data are stored as time series, the relationship between different stocks is essentially a composite situation of time-series similarity. Considering that different stocks may have different volatility, as some rise or fall quickly and others slowly, they may take different amounts of time to exhibit similar fluctuations. This leads to the situation in which different stock price sequences are nonaligned on a timeline. Dynamic time warping (DTW) is usually used to compute the distance between similar time series of unequal length. DTW can warp the sequences to align the most similar points and obtain the best matches between points of different sequences; therefore, it is usually used for similar price pattern extraction [2][3][4][5]. In most DTW-based approaches, only one temporal attribute, for example, the price, is usually considered. However, considering the actual situation of the stock market, only taking price sequences into account cannot reflect or represent the entire situation of the stock market, let alone accurately predict trends. Stock features such as turnover rate and earnings ratio, which are sequential, also have their impact and representativeness for a stock. These features could help reflect the global similarity between different stocks. In the same way, one attribute's similarity could not represent the similarity between stock couples. Therefore, we need a new measurement that considers more temporal features to measure the similarity between different stocks.
In this paper, we propose a DTW-based time-sensitive composite similarity model to estimate the similarity between different stocks to detect the correlation between them and help investors adjust their investment portfolio. The contributions of our research are described as follows.
 A time-series segmentation approach that is designed for DTW is proposed. Generally, when using DTW to compute two time-series distances, similar important points such as poles need to be aligned, so we design a segmentation that cuts time series by the number of eligible peaks and troughs and stipulate counting rules to ignore tiny fluctuations.  A time-sensitive composite similarity model is proposed that can consider more sequential stock parameters and combine the traditional 'rise or fall together' similarity measures to measure the similarity of different stocks. To explore the relationship between stocks, people begin to focus on the fluctuation of stock prices. Because stock price data are stored as time series, the relationship between different stocks is essentially a composite situation of time-series similarity. Considering that different stocks may have different volatility, as some rise or fall quickly and others slowly, they may take different amounts of time to exhibit similar fluctuations. This leads to the situation in which different stock price sequences are nonaligned on a timeline. Dynamic time warping (DTW) is usually used to compute the distance between similar time series of unequal length. DTW can warp the sequences to align the most similar points and obtain the best matches between points of different sequences; therefore, it is usually used for similar price pattern extraction [2][3][4][5]. In most DTW-based approaches, only one temporal attribute, for example, the price, is usually considered. However, considering the actual situation of the stock market, only taking price sequences into account cannot reflect or represent the entire situation of the stock market, let alone accurately predict trends. Stock features such as turnover rate and earnings ratio, which are sequential, also have their impact and representativeness for a stock. These features could help reflect the global similarity between different stocks. In the same way, one attribute's similarity could not represent the similarity between stock couples. Therefore, we need a new measurement that considers more temporal features to measure the similarity between different stocks.
In this paper, we propose a DTW-based time-sensitive composite similarity model to estimate the similarity between different stocks to detect the correlation between them and help investors adjust their investment portfolio. The contributions of our research are described as follows.

•
A time-series segmentation approach that is designed for DTW is proposed. Generally, when using DTW to compute two time-series distances, similar important points such as poles need to be aligned, so we design a segmentation that cuts time series by the number of eligible peaks and troughs and stipulate counting rules to ignore tiny fluctuations. • A time-sensitive composite similarity model is proposed that can consider more sequential stock parameters and combine the traditional 'rise or fall together' similarity measures to measure the similarity of different stocks.

•
Comparisons of prediction accuracy between traditional similarity measures are devised to validate the effectiveness of the proposed similarity model, and the time sensitivity of the composite model is also embodied in the experimental results. The remainder of this paper is organized as follows. In Section 2, related work about stock relationships and related applications of DTW are presented. Section 3 describes the time-series segmentation approach and the composite stock similarity model we proposed. Section 4 presents the experimental setup and performance of the approach we proposed, as well as a comparison of the experiments and performances. A discussion about the differences in experimental results is also given in Section 4. Finally, the conclusions and directions for future research are given in Section 5.

Correlation
Research on the correlation [6] between different entities could help improve the efficiency of research targets. For example, we could obtain the composition of distributed resource services by researching the correlation between resource services from different organizations to improve resource utilization [7]. Researchers filtered the feature data of turbine groups at certain distances based on the correlation to optimize the forecasting effect on wind power further by clustering [8]. Majnu et al. show the limitations of two current dynamic correlation estimation approaches and present an alternate approach for dynamic correlation estimation based on a weighted graph [9]. Analyzing the correlation between stock prices and different financial indexes could provide a valuable reference to help investors make long-term investment decisions [10]. Random matrix theory is used to analyze the crosscorrelations of price changes of different cryptocurrencies [11]. For more accurate predictions on the stock market, researchers propose researching the correlation between corporations and incorporating the information on the related corporations of a target company [12]. To underscore the potential for using multilayer network tools to study the time-varying correlations of financial assets, the authors of [13] apply recent innovations in network science to analyze how correlations of stock returns evolve over time. A complex network could be used to research stock correlation, so Yan et al. proposed to use part mutual information for developing the stock network [14]. To obtain better portfolio allocation and risk management, researchers began to research the correlation between different stocks [15].

Stock Time-Series Correlation
Today, more researchers and investors have begun to focus on the technical analysis of the stock market [16,17]. When talking about the correlation between stocks, some research changes stock similarity to the graphic similarity of patterns [18], calculating the distance of the price vector and classifying patterns to identify predictive stock patterns. Wang [19] constructed a Pearson-correlation-based network and a partial-correlation-based network to analyze the correlation structure and evolution of world stock markets. Guan [20] proposed a forecasting model based on neutrosophic logical relationships and employed a Jaccard similarity measure to find the most proper logical rule for forecasting. Xi [21] created a stock-associated network model based on financial indicators and explored the structural similarity of financial indicators of stocks. Zhang [22] defined an intracoupled attribute value similarity and an intercoupled attribute value similarity to construct a stock correlation matrix to assist in tensor decomposition. Most of these stock similarity studies are based on the static features of stocks, but many stocks' features are temporal and dynamic, so we decided to define a dynamic similarity of stocks using dynamic features. Then, our research has turned to DTW, which is good at calculating the distance between time series of different lengths.

Dynamic Time Warping
DTW was first proposed and applied to spoken word recognition in 1978 [2] and has been used in pattern recognition [23], time-series data processing [24][25][26][27], signature verification [28,29], speech segment clustering [30], exceptional motion capture [31], etc. It was first used to obtain the optimal alignment between points in both template sequences and test sequences, calculating the distance to obtain two sequences aligned and judge whether two sequences are similar. Currently, DTW is widely used and modified as a similarity calculation method [32,33]. Because it is good at aligning the most similar points and obtaining the distance between two similar time series, many researchers have applied it to the recognition of similar stock patterns [3,4]. Tsinaslanidis [5] proposed an algorithmic approach using mainly the DTW algorithm and two of its modifications, subsequence DTW and derivative DTW, to capture common characteristics for helping stocks' bullish and bearish class predictions.
The process of the DTW algorithm is shown below. Definition 1. Dynamic time warping. Given two sequences, X = (x 1 , x 2 , . . . x m ) and Y = (y 1 , y 2 . . . , y n ), the distance function of any point-to-point in two sequences is d(i, j) = f(x i , y j ) ≥ 0. Due to m = n, an m × n matrix is constructed to obtain two aligned sequences.
To obtain the aligned matrix, a sequence distance matrix D is obtained, whose rows correspond to sequence X; columns correspond to sequence Y, and the element of matrix D(i, j) represents the distance from x i to y j , which is d(x i , y j ). Generally, Euclidean distance is used as the distance function. Then, the loss matrix Dc is obtained via the following steps: Step 1: Set D c (1, 1) = D(1, 1); Step 2: Follow the two steps, and repeat Step 2 until the element in the last row and last column is obtained, which is also the DTW distance between two sequences.
To illustrate the process of obtaining the DTW distance better, we use an easy example to show how DTW works. There is an original time series S 0 = {3,6,8,5,7,2}, another time series S 1 = {2,6,7,5,6,7,2,1}; the curves of the two time series are shown in Figure 2. To obtain the DTW distance between two time series, we choose distance function d(i, j) = f (x i , y j ) = |x i − y j |. Then, the distance matrix is obtained which is shown in (a) of Figure 3. Follow the steps obtaining the loss matrix described in Definition 1, we obtain the loss matrix which is shown in (b) of Figure 3.

Dynamic Time Warping
DTW was first proposed and applied to spoken word recognition in 1978 [2] and has been used in pattern recognition [23], time-series data processing [24][25][26][27], signature verification [28,29], speech segment clustering [30], exceptional motion capture [31], etc. It was first used to obtain the optimal alignment between points in both template sequences and test sequences, calculating the distance to obtain two sequences aligned and judge whether two sequences are similar. Currently, DTW is widely used and modified as a similarity calculation method [32,33]. Because it is good at aligning the most similar points and obtaining the distance between two similar time series, many researchers have applied it to the recognition of similar stock patterns [3,4]. Tsinaslanidis [5] proposed an algorithmic approach using mainly the DTW algorithm and two of its modifications, subsequence DTW and derivative DTW, to capture common characteristics for helping stocks' bullish and bearish class predictions.
The process of the DTW algorithm is shown below. Definition 1. Dynamic time warping. Given two sequences, X = (x1, x2, … xm) and Y = (y1, y2..., yn), the distance function of any point-to-point in two sequences is d(i, j) = f(xi, yj) ≥ 0. Due to m ≠ n, an m×n matrix is constructed to obtain two aligned sequences. To obtain the aligned matrix, a sequence distance matrix D is obtained, whose rows correspond to sequence X; columns correspond to sequence Y, and the element of matrix D(i, j) represents the distance from xi to yj, which is d(xi, yj). Generally, Euclidean distance is used as the distance function. Then, the loss matrix Dc is obtained via the following steps: Step 1: Set Dc(1, 1) = D(1, 1); Step 2: Follow the two steps, and repeat Step 2 until the element in the last row and last column is obtained, which is also the DTW distance between two sequences.
To illustrate the process of obtaining the DTW distance better, we use an easy example to show how DTW works. There is an original time series S0 = {3,6,8,5,7,2}, another time series S1 = {2,6,7,5,6,7,2,1}; the curves of the two time series are shown in Figure 2. To obtain the DTW distance between two time series, we choose distance function d(i, j) = f(xi, yj) = |xi − yj|. Then, the distance matrix is obtained which is shown in (a) of Figure 3. Follow the steps obtaining the loss matrix described in Definition 1, we obtain the loss matrix which is shown in (b) of Figure 3.  We could obtain a warping path through the loss matrix if needed. The least-cost We could obtain a warping path through the loss matrix if needed. The least-cost path from the first element of matrix, which is the first row and first column located, to the last element, which is also the last row and last column located, is the warping path between two time series. The warping path obtained from loss matrix is shown in (a) of Figure 4. The value of the last element is also the DTW distance which we use in this paper. According to the warping path, we obtained pairs of points which could be aligned to each other in (b) of Figure 4, and two points connected by the red dotted line could be aligned. We could obtain a warping path through the loss matrix if needed. The least-cost path from the first element of matrix, which is the first row and first column located, to the last element, which is also the last row and last column located, is the warping path between two time series. The warping path obtained from loss matrix is shown in (a) of Figure 4. The value of the last element is also the DTW distance which we use in this paper. According to the warping path, we obtained pairs of points which could be aligned to each other in (b) of Figure 4, and two points connected by the red dotted line could be aligned. When using DTW to compute the distance between two time series, there are some constraints here:  Monotonicity. All the points in the time series should be aligned by the time order.
For example, in (a) of Figure 5, all the black dotted lines connect all the pairs of points aligned with each other, but the pair of points connected by the red dotted line is not allowed to be aligned.  Continuity. To ensure that all the points in the two sequences are matched in the calculation process, the calculation of the two points' distances cannot be skipped, and it should be continuously calculated. It is easy to find that Dc(i, j) is dependent on Dc(i − 1, j − 1), Dc(i − 1, j) and Dc(i, j − 1) in step 2 of the loss-matrix-obtaining process. For example, if we skip the calculation of Dc(i − 1, j), that means we could not obtain the value of Dc(i − 1, j); then, the value of MIN(Dc(i − 1, j − 1), Dc(i − 1, j), Dc(i, j − 1)) could not be obtained, and that means that the value of Dc(i, j) could not be obtained. When using DTW to compute the distance between two time series, there are some constraints here:

•
Monotonicity. All the points in the time series should be aligned by the time order. For example, in (a) of Figure 5, all the black dotted lines connect all the pairs of points aligned with each other, but the pair of points connected by the red dotted line is not allowed to be aligned. • Continuity. To ensure that all the points in the two sequences are matched in the calculation process, the calculation of the two points' distances cannot be skipped, and it should be continuously calculated. It is easy to find that D c (i, j) is dependent on D c (i − 1, j − 1), D c (i − 1, j) and D c (i, j − 1) in step 2 of the loss-matrix-obtaining process. For example, if we skip the calculation of D c (i − 1, j), that means we could not obtain the value of D c (i − 1, j); then, the value of ) could not be obtained, and that means that the value of D c (i, j) could not be obtained. In addition, there are some constraints that could also be added in practical application: • Slope constraints. To avoid the same points in one time series being aligned too many times in another time series, just as in (c) in Figure 5, the slope could be constrained.

•
Warping windows. Generally, the best-matching paths tend to be near the diagonal, just as in the condition in Figure 6, so sometimes only a suitable path in a window near the diagonal needs to be considered.  Boundary conditions. The start point and end point of one time series should be aligned with the start point and end point of another time series. When matching one time series to another, the matching direction should be consistent, both from the start point to the end point. For example, in (b) of the Figure 5, if we want to obtain the DTW distance between two time series, the pairs of points connected by the red dotted lines must be aligned with each other. In addition, there are some constraints that could also be added in practical application:  Slope constraints. To avoid the same points in one time series being aligned too many times in another time series, just as in (c) in Figure 5, the slope could be constrained.  Warping windows. Generally, the best-matching paths tend to be near the diagonal, just as in the condition in Figure 6, so sometimes only a suitable path in a window near the diagonal needs to be considered.

DTW-Based Temporal Composite Similarity Model
To find the correlation between entities of financial time series, a time-sensitive composite similarity model designed for multivariate time-series correlation analysis based on dynamic time warping is proposed. Related definitions and algorithms are described in this section.

Peaks and Troughs Time-Series Segmentation (PTS)
DTW was originally developed for similar but unequal length speech recognition. Similar but unequal length time series may be the same word's speech. Therefore, DTW is good at recognizing the similarity between time series that are similar but unaligned in the timeline. However, DTW will cause alignment mistakes due to local noise in the time series. To overcome the impact of local noise on DTW applications while following strict boundary conditions, we propose the PTS approach to cut the time series of different  In addition, there are some constraints that could also be added in practical applic tion:  Slope constraints. To avoid the same points in one time series being aligned too man times in another time series, just as in (c) in Figure 5, the slope could be constrained  Warping windows. Generally, the best-matching paths tend to be near the diagona just as in the condition in Figure 6, so sometimes only a suitable path in a windo near the diagonal needs to be considered.

DTW-Based Temporal Composite Similarity Model
To find the correlation between entities of financial time series, a time-sensitive com posite similarity model designed for multivariate time-series correlation analysis base on dynamic time warping is proposed. Related definitions and algorithms are describe in this section.

Peaks and Troughs Time-Series Segmentation (PTS)
DTW was originally developed for similar but unequal length speech recognitio Similar but unequal length time series may be the same word's speech. Therefore, DTW is good at recognizing the similarity between time series that are similar but unaligned i the timeline. However, DTW will cause alignment mistakes due to local noise in the tim series. To overcome the impact of local noise on DTW applications while following stri boundary conditions, we propose the PTS approach to cut the time series of differen

DTW-Based Temporal Composite Similarity Model
To find the correlation between entities of financial time series, a time-sensitive composite similarity model designed for multivariate time-series correlation analysis based on dynamic time warping is proposed. Related definitions and algorithms are described in this section.

Peaks and Troughs Time-Series Segmentation (PTS)
DTW was originally developed for similar but unequal length speech recognition. Similar but unequal length time series may be the same word's speech. Therefore, DTW is good at recognizing the similarity between time series that are similar but unaligned in the timeline. However, DTW will cause alignment mistakes due to local noise in the time series. To overcome the impact of local noise on DTW applications while following strict boundary conditions, we propose the PTS approach to cut the time series of different stocks' temporal features to ensure that all the time-series samples will have the same number of fluctuations.  Figure 7). stocks' temporal features to ensure that all the time-series samples will have the same number of fluctuations.  Figure 7). Figure 7). Different constants δ will lead to completely different divisions of fluctuations in the same time series. As shown in Figure 8, the instance is divided into two eligible fluctuations (F1,5 and F5,30) when δ = 0.5, but if δ = 1.0, the same instance will be divided into two eligible fluctuations (F1,20 and F20,30).  The peak and trough points were extracted to divide the fluctuation of the time series. A definition of an eligible fluctuation is as follows:

Definition 4. PTS (Peaks and Troughs Time-series Segmentation). Given the input sequence X = [x1 … xm] with length m, for convenience, X [i] is used to represent the i th element in sequence X, X [i] = xi, 0 ≤ i ≤ m. A peak and trough deviation threshold value δ is used to judge
, v k ∈ R, its turning point collectionP = P p , P t is a set of all peak points and trough points. The difference between a peak point P a and a trough point If D a,b is greater than or equal to the given constant δ, the subsequence between P a and P b is considered an eligible fluctuation F a,b .
Different constants δ will lead to completely different divisions of fluctuations in the same time series. As shown in Figure 8, the instance is divided into two eligible fluctuations (F 1,5 and F 5,30 ) when δ = 0.5, but if δ = 1.0, the same instance will be divided into two eligible fluctuations (F 1,20 and F 20,30 Figure 7). Figure 7). Different constants δ will lead to completely different divisions of fluctuations in the same time series. As shown in Figure 8, the instance is divided into two eligible fluctuations (F1,5 and F5,30) when δ = 0.5, but if δ = 1.0, the same instance will be divided into two eligible fluctuations (F1,20 and F20,30).   . . x m ] with length m, for convenience, X [i] is used to represent the i th element in sequence X, X [i] = x i , 0 ≤ i ≤ m. A peak and trough deviation threshold value δ is used to judge whether the subsequence is an eligible fluctuation, and the number of eligible fluctuations n is used to find the split point and controls the length of segmentations. We can split the sequence X as follows:

•
Step is larger than the threshold value δ, similar to the peak and trough deviation marked by the red line in Figure 9, then we take this peak and trough as eligible fluctuations. If the deviation of the value of X [i] and the value of X [j] is not larger than the threshold value δ, similar to the peak and trough deviation marked by the blue line in Figure 9, then we go on to backtrack and find the next trough that could meet the condition.

•
Step 3: Go on the backtrack sequence to find a new peak next to the trough, which is obtained in step 2, and repeat step 2 until the number of eligible fluctuations reaches n ef (the eligible fluctuation number set according to segmenting demand). We suppose the last peak we find is is larger than the threshold value δ, similar to the peak and trough deviation marked by the red line in Figure 9, then we take this peak and trough as eligible fluctuations. If the deviation of the value of X [i] and the value of X [j] is not larger than the threshold value δ, similar to the peak and trough deviation marked by the blue line in Figure 9, then we go on to backtrack and find the next trough that could meet the condition.  Step 3: Go on the backtrack sequence to find a new peak next to the trough, which is obtained in step 2, and repeat step 2 until the number of eligible fluctuations reaches nef (the eligible fluctuation number set according to segmenting demand). We suppose the last peak we find is X [end], where X [end…start] is the sequence used as input to the DTW approach. The entire PTS process can also be described by Algorithm 1. PTS could ignore the tiny fluctuation when cutting the sequences by tuning the threshold value δ so that only obvious fluctuations could be the basis of the cutting approach, which could obtain more accurate similar sequences for DTW. Figure 10 shows the comparison between the original time series and the target time series cut by peak and trough segmentation with different δ and nef.
The PTS has two parameters; δ could be decided by the user's psychological anticipation of minor fluctuations that the user wants to ignore. For example, in the financial market, it could be the psychological endurance range. At the experimental level, δ also controls the granularity and avoids time series with a large granularity gap matching with each other. The parameter nef is used to control the number of the eligible fluctuations. Both parameters control the length of history data which is used to analyze the correlation. Because the correlation is changeable in different periods, the length of history data should be in the proper range. Generally, the values of two parameters are adjusted through the experiment results; the process of adjusting parameters is illustrated in the Section 4.2.   The entire PTS process can also be described by Algorithm 1. PTS could ignore the tiny fluctuation when cutting the sequences by tuning the threshold value δ so that only obvious fluctuations could be the basis of the cutting approach, which could obtain more accurate similar sequences for DTW. Figure 10 shows the comparison between the original time series and the target time series cut by peak and trough segmentation with different δ and n ef .

Time-Sensitive Composite Similarity Model
When we refer to the similarity of two stocks, the most intuitive expression is tha they 'rise or fall together (roft)' frequently, then they are more likely to be similar. The fore, we take the number of days rising or falling together in the same period of time The PTS has two parameters; δ could be decided by the user's psychological anticipation of minor fluctuations that the user wants to ignore. For example, in the financial market, it could be the psychological endurance range. At the experimental level, δ also controls the granularity and avoids time series with a large granularity gap matching with each other. The parameter n ef is used to control the number of the eligible fluctuations. Both parameters control the length of history data which is used to analyze the correlation. Because the correlation is changeable in different periods, the length of history data should be in the proper range. Generally, the values of two parameters are adjusted through the experiment results; the process of adjusting parameters is illustrated in the Section 4.2.

Input: Sequence,
The original time series δ, Peak and trough deviation threshold value n ef The number of eligible fluctuations Output: newSequence

Time-Sensitive Composite Similarity Model
When we refer to the similarity of two stocks, the most intuitive expression is that if they 'rise or fall together (roft)' frequently, then they are more likely to be similar. Therefore, we take the number of days rising or falling together in the same period of time as one attribute of similarity, which is one of the traditional measures of stock correlation. Obviously, the similarity of two stocks and the number of days rising or falling together are proportional. The number of days of two stocks rising or falling together can be calculated by the sequential data of stock change, and the process is described by Algorithm 2.
However, the number of days rising or falling together is not enough to represent all similar situations. For example, if one stock had risen 3 days and then fell, but another similar stock began to rise 2 days later than the first one and rose 3 or more days and then fell, then the number of days rising or falling together may be only 1, but the whole trend curve is not only similar at one day, similar to the situation shown on the left side of Figure 11. The two lines have very similar trend curves, but they are not aligned on the timeline. We need to use DTW to align the most similar point and compute the distance between the two similar lines, which is shown on the right side of Figure 11.  Generally, we use the closing price to analyze the price trend or predict the futur stock price, but the stock market is very complex, and considering only the closing pric cannot reflect the entire situation of a stock. It is very difficult to find the real relationshi using only one attribute; there are also many other sequential features that influence stoc price trends, and we can see that curves of different features of the same stock can be ver different, which is shown in Figure 12. Examples of daily raw stock data are shown Table 1.  Generally, we use the closing price to analyze the price trend or predict the future stock price, but the stock market is very complex, and considering only the closing price cannot reflect the entire situation of a stock. It is very difficult to find the real relationship using only one attribute; there are also many other sequential features that influence stock price trends, and we can see that curves of different features of the same stock can be very different, which is shown in Figure 12. Examples of daily raw stock data are shown in Table 1.
volume_ratio volume ratio 2.200 pe price-to-earnings ratio 9.922 pe_ttm price-to-earnings trailing twelve months ratio 13.49 pb price-to-book ratio (total market value/net assets) 0.860 ps price-to-sales ratio 0.220 ps_ttm price-to-sales trailing twelve months ratio 0.210 Only one feature's DTW distance could not represent the similarity of two stocks, so we decided to combine more features' DTW distances and their rise-or-fall-together times to obtain a composite similarity to compute the similarity between two stocks. Because the similarity is proportional to the number of rises or falls together and inversely proportional to the DTW distance, we define the similarity as follows (1): In Formula (1), roft is the number of days rising or falling together in the same period of time; … are the weights of different sequential features in the similarity; λ1 + …+λn = 1, DTW(feature1) … DTW(featuren) are the DTW distances between the target stock and benchmark stock with the sequences of different temporal features (feature1 … featuren). DTW distances in the composite similarity model are only used to describe the degree of similarity of different temporal features; the matching path between sequences is not considered in this model. In this similarity formula, we could choose different sequential features of stocks to combine and use λ to tune the weight of each feature in the composite similarity to make the similarity closer to reality.
The whole process of obtaining the composite similarity is shown in Figure 13. The similarity obtained from the composite similarity model is a relative value that is used to compare the similarity between stocks similar to the benchmark stock. Only one single stock's similarity value is meaningless, and it is meaningful when compared with other similar stocks' similarity values. If one stock's similarity is larger than that of another stock, then this stock is more similar to the benchmark stock than to another stock. We will evaluate the similarity model and compare it with other similarity measures in the next section.  Only one feature's DTW distance could not represent the similarity of two stocks, so we decided to combine more features' DTW distances and their rise-or-fall-together times to obtain a composite similarity to compute the similarity between two stocks. Because the similarity is proportional to the number of rises or falls together and inversely proportional to the DTW distance, we define the similarity as follows (1): In Formula (1), roft is the number of days rising or falling together in the same period of time; λ 1 . . . λ n are the weights of different sequential features in the similarity; λ 1 + . . . +λ n = 1, DTW(feature 1 ) . . . DTW(feature n ) are the DTW distances between the target stock and benchmark stock with the sequences of different temporal features (feature 1 . . . feature n ). DTW distances in the composite similarity model are only used to describe the degree of similarity of different temporal features; the matching path between sequences is not considered in this model. In this similarity formula, we could choose different sequential features of stocks to combine and use λ to tune the weight of each feature in the composite similarity to make the similarity closer to reality.
The whole process of obtaining the composite similarity is shown in Figure 13. The similarity obtained from the composite similarity model is a relative value that is used to compare the similarity between stocks similar to the benchmark stock. Only one single stock's similarity value is meaningless, and it is meaningful when compared with other similar stocks' similarity values. If one stock's similarity is larger than that of another stock, then this stock is more similar to the benchmark stock than to another stock. We will evaluate the similarity model and compare it with other similarity measures in the next section.

Performance Evaluation
In this section, the DTW-based composite similarity model proposed in the previous section is applied to a stock database containing the basic daily information of 300 CSI stocks collected from Tushare Pro (https://tushare.pro/, accessed on 1 February 2021). We first introduce our dataset and experimental settings. Then, we analyze the outputs of similarity computing. Finally, we compare the result of this model with the similarity calculated only by DTW of the closing price and similarity calculated by the rising-or-fallingtogether number.

Experimental Setup
For evaluation, the stock data we use are the stocks in the CSI 300 Index, whose samples are selected from the Shanghai and Shenzhen stock markets, cover most of the market capitalization and can reflect the income of mainstream investment in the market. We use all stocks in CSI 300 and collect their basic daily information, including the closing price, turnover rate, volume ratio, price-to-earnings (PE) ratio, price-to-earnings trailing twelve months (PETTM) ratio, price-to-book (PB) ratio, price-to-sales (PS) ratio and price-to-sales trailing twelve months (PSTTM) ratio, as the features to compute the composite similarity. Part of the stock list is shown in Table 2. As there are too many columns in a grid of stock quotation data, some of them are shown in Table 3

Performance Evaluation
In this section, the DTW-based composite similarity model proposed in the previous section is applied to a stock database containing the basic daily information of 300 CSI stocks collected from Tushare Pro (https://tushare.pro/, accessed on 1 February 2021). We first introduce our dataset and experimental settings. Then, we analyze the outputs of similarity computing. Finally, we compare the result of this model with the similarity calculated only by DTW of the closing price and similarity calculated by the rising-orfalling-together number.

Experimental Setup
For evaluation, the stock data we use are the stocks in the CSI 300 Index, whose samples are selected from the Shanghai and Shenzhen stock markets, cover most of the market capitalization and can reflect the income of mainstream investment in the market. We use all stocks in CSI 300 and collect their basic daily information, including the closing price, turnover rate, volume ratio, price-to-earnings (PE) ratio, price-to-earnings trailing twelve months (PETTM) ratio, price-to-book (PB) ratio, price-to-sales (PS) ratio and priceto-sales trailing twelve months (PSTTM) ratio, as the features to compute the composite similarity. Part of the stock list is shown in Table 2. As there are too many columns in a grid of stock quotation data, some of them are shown in Table 3. Stock data from the date 1 January 2018 to the date 31 December 2018 set as Group 1 and data from the date 1 January 2019 to the date 31 December 2019 set as Group 2 are used to compute the similarity. The data from 2 January 2019 and 2 January 2020 are used to verify whether the similar stocks obtained from the composite similarity model rise or fall together with the stock we choose as the benchmark.  The stock of Sinopec, whose stock code is '600028. SH', is chosen as the benchmark; it is just a case to show how the model works, and certainly any other stock can be chosen as the benchmark according to investment preference. The composite similarities of the other 299 stocks in CSI 300 are computed by the composite similarity model we used.
For comparison, we also use the number of days rising or falling together in 2019 and the DTW of the stock closing price as the similarity of stocks. The similarities of stocks are obtained from the composite similarity model and these two methods, and we compare the rise-or-fall-together rate after computing the similarity to see whether the model is efficient.
The rise-or-fall-together rate is obtained from Formula (2): In Formula (2), roftrate is the rise-or-fall-together rate; roft t1 . . . roft tn is the set of stocks whose prices that rise or fall together with the benchmark stock on day t 1 . . . t n , num(), which is the method that obtains the number of stocks in a stock set; n s is the number of samples we choose to compute the rise-or-fall-together rate. When n s is 299, we obtain the average rise-or-fall-together rate of the whole sample.

Results and Discussion
In Group 1, we set δ = 0.3 in the PTS to ignore minor fluctuations and n ef = 10 to ensure that all the sequences have a similar number of eligible peaks and troughs and that the lengths of sequences are the right size. We set the weight of eight features, closing price, turnover rate, volume ratio, PE ratio, PETTM, PB, PS and PSTTM, as (0.1,0.1,0.3,0.1,0.1,0.1,0.1,0.1), with which we experimented many times to obtain preferable results. After cutting the sequences and computing the composite similarity, 299 stocks' similar degrees with the stock '600028.SH' are obtained and sorted in descending order. The top ten similarity results are shown in Table 4, and the bottom ten similarity results are shown in Table 5. Similarity calculated only by the DTW distance of the closing price and similarity calculated by the number of days rising or falling together are used to compare the experimental results. On the transverse side, we compare the rising-or-falling-together rate in the top 100 and top 150 similar stocks. The results are shown in Table 6 and Figure 14. Longitudinally, we compare the top 100 rising or falling together rates at 1 day, 2 days, and 3 days after computing the similarity, which is shown in Table 7 and Figure 15.  The stocks are sorted by similarity in descending order, and stocks on the top are the most similar stocks to the benchmark stock. If the benchmark stock's trend is used to predict the trend of similar stocks, then the prediction accuracy, which is also the rise-or-falltogether rate, could reach 83% in the top 100 similar stocks and 75.3% in the top 150 similar stocks, whose similarity is obtained by the composite similarity model. This result is better than 76% in the top 100 and 74.6% in the top 150, whose similarity is obtained by the number of days rising or falling together. Both of these results are higher than the average rise-or-fall-together rate of the whole sample. The composite similarity was 7.1% better than the average rate of the whole sample. However, if only the DTW distance of the closing price is chosen as the similarity measure, then the rise-or-fall-together rate is lower than the average rate of the whole sample. In the financial market, there is a time difference between 'buy' and 'sell' investment behavior, for example, the 'T+1' trading rule in the Chinese stock market, which means that the stock you buy on day T could only be sold out on day T+1. This reality makes a situation in which the prediction for a continuous period of time is also worthwhile. Therefore, whether the similarity obtained by different models will last for a few days is also taken into account. Table 7 shows that if the stocks are sorted by the similarity calculated by the composite model in descending order, then the rate of the same trend could reach 83% on the first day in the top 100 stocks, 50% on the second day and 49% on the third day after calculating the similarity. If the number of days rising or falling together is taken as the similarity, then the rate of the same trend in the top 100 similar stocks could reach only 76% on the first day, and all the rates are lower than those of the composite model and higher than the average rate of the whole sample. As days passed by, all three measures' rates of the same trend decreased but remained above the average rate. Only DTW distances of close price time series are lower than the average rate, which may be  because the fluctuations of stocks in this time period are complex and affected by multiple features. Only one feature's DTW distance could not cluster similar stocks well, so the top 100 and top 150 similar stocks could not gain a better rise-or-fall-together rate than the average rise-or-fall-together rate. However, the composite similarity measure is always above the other two measures and the average rate of the whole sample over time. The longitudinal comparison results show that on the first predicting day, both the composite similarity and the similarity measure of the number of days rising or falling together could obtain higher accuracy than the overall average same trend rate, and the composite similarity could obtain more accurate predictions than the other two traditional similarity measures.
The horizontal comparison results show that the composite similarity could obtain a more accurate prediction than the other similarity measures not only on the first prediction day but also on the second and third prediction days. This means that the composite similarity is more time sensitive. In fact, the relationship between different stocks is continuous, which supports researchers using the stock correlation to predict stock trends.
In Group 1, although the values of δ and nef in the PTS are set as the same for all the features roughly, we still take a few experiments to obtain the appropriate values which could obtain good experiment results. We found that if nef is too large, PTS would lose effect, and the time series would not be segmented. We tried to retain as many eligible fluctuations as possible in the time period. After observing the range of data, we finally took nef = 10. The value of δ is adjusted through the experiment results; that is, the top 100 rising-or-falling-together rates at 1 day, 2 days and 3 days. The experiment results vary with the value of δ as shown in Table 8, and the weights of eight features are The stocks are sorted by similarity in descending order, and stocks on the top are the most similar stocks to the benchmark stock. If the benchmark stock's trend is used to predict the trend of similar stocks, then the prediction accuracy, which is also the rise-orfall-together rate, could reach 83% in the top 100 similar stocks and 75.3% in the top 150 similar stocks, whose similarity is obtained by the composite similarity model. This result is better than 76% in the top 100 and 74.6% in the top 150, whose similarity is obtained by the number of days rising or falling together. Both of these results are higher than the average rise-or-fall-together rate of the whole sample. The composite similarity was 7.1% better than the average rate of the whole sample. However, if only the DTW distance of the closing price is chosen as the similarity measure, then the rise-or-fall-together rate is lower than the average rate of the whole sample.
In the financial market, there is a time difference between 'buy' and 'sell' investment behavior, for example, the 'T+1' trading rule in the Chinese stock market, which means that the stock you buy on day T could only be sold out on day T+1. This reality makes a situation in which the prediction for a continuous period of time is also worthwhile. Therefore, whether the similarity obtained by different models will last for a few days is also taken into account. Table 7 shows that if the stocks are sorted by the similarity calculated by the composite model in descending order, then the rate of the same trend could reach 83% on the first day in the top 100 stocks, 50% on the second day and 49% on the third day after calculating the similarity. If the number of days rising or falling together is taken as the similarity, then the rate of the same trend in the top 100 similar stocks could reach only 76% on the first day, and all the rates are lower than those of the composite model and higher than the average rate of the whole sample. As days passed by, all three measures' rates of the same trend decreased but remained above the average rate. Only DTW distances of close price time series are lower than the average rate, which may be because the fluctuations of stocks in this time period are complex and affected by multiple features. Only one feature's DTW distance could not cluster similar stocks well, so the top 100 and top 150 similar stocks could not gain a better rise-or-fall-together rate than the average rise-or-fall-together rate. However, the composite similarity measure is always above the other two measures and the average rate of the whole sample over time.
The longitudinal comparison results show that on the first predicting day, both the composite similarity and the similarity measure of the number of days rising or falling together could obtain higher accuracy than the overall average same trend rate, and the composite similarity could obtain more accurate predictions than the other two traditional similarity measures.
The horizontal comparison results show that the composite similarity could obtain a more accurate prediction than the other similarity measures not only on the first prediction day but also on the second and third prediction days. This means that the composite similarity is more time sensitive. In fact, the relationship between different stocks is continuous, which supports researchers using the stock correlation to predict stock trends.
In Group 1, although the values of δ and n ef in the PTS are set as the same for all the features roughly, we still take a few experiments to obtain the appropriate values which could obtain good experiment results. We found that if n ef is too large, PTS would lose effect, and the time series would not be segmented. We tried to retain as many eligible fluctuations as possible in the time period. After observing the range of data, we finally took n ef = 10. The value of δ is adjusted through the experiment results; that is, the top 100 rising-orfalling-together rates at 1 day, 2 days and 3 days. The experiment results vary with the value of δ as shown in Table 8, and the weights of eight features are (0.1,0.1,0.3,0.1,0.1,0.1,0.1,0.1). The variation in experimental results with the value of δ when n ef = 10 is shown in the Figure 16. It is easy to see that as the value of δ goes up, the experimental results of 1 day goes up with mild concussions, peaks at δ = 0.30 approximately, then falls. So, we set n ef = 10 and δ = 0.30 in the experiment. The variation in experimental results with the value of δ when nef = 10 is shown in the Figure 16. It is easy to see that as the value of δ goes up, the experimental results of 1 day goes up with mild concussions, peaks at δ = 0.30 approximately, then falls. So, we set nef = 10 and δ = 0.30 in the experiment. The weights of eight features in Group 1 were not chosen only based on subjective considerations; they were adjusted by the experimental results. Firstly, we took each of the features' weight as 1/8, and the results were not satisfactory. Then, we thought that maybe different features master the time-series fluctuation in different periods, and we also wanted to find the most effective feature in a period, so we tried up-weighting one feature and checked if it would improve accuracy. Different weights of features and the experimental results of the top 100 rising-or-falling-together rates at 1 day, 2 days and 3 days are shown in Table 9, and in these experiments, nef = 10, and δ = 0.30. The weights of eight features which could obtain the best accuracy were chosen. The reason why we choose eight features in this composite model is that the fluctuations of multivariate time series and the correlation between multivariate time series are affected by multiple features; only one feature's similarity could not reflect the similarity between the entities of multivariate time series. The number of features is not fixed, but we decided upon the number of temporal features that may be related to the target relationship that we want to analyze. Eight is not a threshold value; theoretically, any multivariate time The weights of eight features in Group 1 were not chosen only based on subjective considerations; they were adjusted by the experimental results. Firstly, we took each of the features' weight as 1/8, and the results were not satisfactory. Then, we thought that maybe different features master the time-series fluctuation in different periods, and we also wanted to find the most effective feature in a period, so we tried up-weighting one feature and checked if it would improve accuracy. Different weights of features and the experimental results of the top 100 rising-or-falling-together rates at 1 day, 2 days and 3 days are shown in Table 9, and in these experiments, n ef = 10, and δ = 0.30. The weights of eight features which could obtain the best accuracy were chosen. The reason why we choose eight features in this composite model is that the fluctuations of multivariate time series and the correlation between multivariate time series are affected by multiple features; only one feature's similarity could not reflect the similarity between the entities of multivariate time series. The number of features is not fixed, but we decided upon the number of temporal features that may be related to the target relationship that we want to analyze. Eight is not a threshold value; theoretically, any multivariate time series which have more than one temporal feature could use our composite model to find similar entities. To verify the generality of our model, experiments will be carried out on the other dataset in Section 4.3.
Because different features are in different ranges, we decided to segment different features separately. In Group 2, we set δ and n ef separately for different features to reach a more accurate rise-or-fall-together rate, which is shown in Table 10. According to these variables' values, the situation of the benchmark stock's time series before and after cutting is shown in Figure 17. The weight of eight features, closing price, turnover rate, volume ratio, PE ratio, PETTM, PB, PS and PSTTM, are set as (0.1,0.1,0.1,0.1,0.3,0.1,0.1,0.1). This time, we obtained better experimental results on 2020.01.02, which are shown in Table 11.    In Group 2, the composite similarity still achieved the best rise-or-fall-together rate and achieved 9.4% better than the average rate of the whole sample on the rise-or-falltogether rate; it is also better than 7.1% in Group 1 which means that through tuning variables' values in PTS proposed in this paper could help to obtain a more accurate rise-or-fall-together rate.
All experiments in two groups show that the composite similarity model can effectively cluster similar stocks together through a time-series correlation analysis to help investors adjust their portfolios.

Verification of Generality
To verify the generality, we tested our model on real weather data of 168 Chinese cities collected by AkShare (https://akshare.xyz/, accessed on 24 April 2021). The structure of daily weather data is shown in Table 12. The temporal features chosen for the experiment and their meanings are shown in Table 13. In the financial market, we care about the price of stocks, so we use the closing price to compute the roft in Formula (1) and roftrate in Formula (2); in these weather data, we chose the feature temp. The city named Beijing was chosen as the benchmark. Daily weather data from 2020.01.01 to 2020.12.31 were used to compute the composite similarity; the rising-orfalling-together rate of temperatures in the top 50 similar cities on 2021.01.01 was used to verify whether similar cities' temperature will change consistently with the benchmark city of Beijing. In this experiment, we set n ef = 30 and δ = 1. The weights of features, PM 2.5, PM 10, NO 2 , CO, O 3 , SO 2 temperature, humidity, were set as (0.1,0.1,0.1,0.3,0.1,0.1,0.1,0.1). The traditional similarity measures, the pure DTW distance of temperatures and the number of temperatures rising or falling together were also chosen as the comparison methods.
The top ten cities similar with Beijing are shown in Table 14. The rising-or-fallingtogether rate on 2021.01.01 in the top 50 similar cities is shown in Table 15.  It is obvious that the composite similarity could achieve better results than other similar methods in weather data. The experiment on weather data verified that the proposed composite similarity model is efficient not only with financial multivariate time-series data but also with other multivariate time-series data.

Conclusions
In this paper, we studied the correlation between stocks to provide helpful references for investors adjusting investment portfolios and proposed a composite similarity model that composited many different sequential features of stocks. Then, the composite model was compared with other similarity-computing methods to verify its effectiveness and practicability. The results show that the composite model could obtain more accurate clusters than many traditional similarity measures. When adjusting investment portfolios, investors could take an uptrending stock as a benchmark to buy in similar stocks, and when a stock's price is going down, investors could sell similar stocks in their portfolios. The composite similarity model could help investors find similar stocks according to historical data and adjust portfolios quickly. The model could also be used to find the most effective feature which masters the fluctuation in a period. Experiments on other datasets also proved that the composite similarity model could be used to research multivariate time series in different fields. However, only eight usual temporal features were used in the composite model. Finding more useful stock features, tuning the weights of different features to reach more accurate results and finding other functional forms to describe the relationship between time series' similarity and temporal features could be directions for future research.