A Two-Factor Fuzzy-Fluctuation Time Series Forecasting Model for Stock Markets Based on a Probabilistic Linguistic Preference Relationship and Similarity Measure

An increasing number of scholars have tried to incorporate external factors affecting the disturbance of a time series into their forecasting models. However, these studies only verify the linkage relationship of two or more time series by empirical tests without providing any theoretical explanation. This makes it difficult to choose a linkage time series without using many tests. In this paper, a novel two-factor fuzzy-fluctuation time series (FFTS) forecasting model is proposed based on the probabilistic linguistic preference relationship (PLPR) and similarity measure. It not only proposes the idea of combining external factors with internal potential trends but also explains the linkage mechanism of time series fluctuations from the perspective of behavioral preference. Specifically, the probabilistic linguistic preference logical relationship (PLPLR) is employed to express the fluctuation behavior rule and preference attribute from the history testing dataset. The Euclidean distance or Hamming distance between the “current state” and the left side of training PLPLRs is introduced as a similarity comparison method for the identification of appropriate rules. The proposed model is tested using a traditional time series (e.g., the enrollment of the University of Alabama) to compare its performance with existing models. The model is also employed to forecast realistic stock markets, such as the Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX) and Hang Seng Index (HSI). The performance comparison illustrates the effectiveness and universality of the model.


I. INTRODUCTION
Forecasting fluctuations in a time series has always been a hot topic in industry and academia. With continuous changes in the environment and the gradual complexity of the problems, forecasting methods based on traditional mathematical models [1], [2] require a large volume of historical data, which makes the forecasting results unsatisfactory. Song and Chissom [3]- [5] proposed the concept of fuzzy time series (FTS) based on fuzzy set [6] theory to address the complexity forecasting problem and employed the enrollment rate data from the University of Alabama to conduct experiments. Subsequently, Chen [7] presented an arithmetic The associate editor coordinating the review of this manuscript and approving it for publication was Chun-Hao Chen . operation method that simplifies the complicated max-min composition operator model based on fuzzy time series theory. Then, Chen [8] developed the model into a high-order fuzzy time series category to express more detailed historical fluctuation information. Based on Chen's model, many high-order fuzzy time series forecasting models have been proposed and successfully applied in various fields, such as enrollment rate forecasting [8]- [11], artificial intelligence [12]- [14], temperature forecasting [15]- [19] and stock price forecasting [16], [17], [19]- [34].
In recent years, multifactor fuzzy time series models have drawn remarkable attention from researchers who have employed foreign market factors to construct multifactor fuzzy time series to innovate the original model [19], [24], [29]- [32], [34]- [36]. Many scholars have also presented multifactor fuzzy time series forecasting models to more accurately express internal fluctuation information. For example, Wang and Chen [16] and Lee et al. [17] presented a two-factor fuzzy time series forecasting method using daily average temperature and daily cloud density, and they used the presented method to predict the Taiwan Futures Exchange Index (TAIFEX). Singh and Borah [18] presented a twofactor high-order fuzzy time series method to predict daily average temperature. Kumar [32] presented an improved weighted forecasting method based on fuzzy logic relations, and the model used the opening and high prices of the Bombay Stock Exchange Index (BSE) as a two-factor fuzzy time series. Zhao et al. [24] presented a two-factor time series forecasting method based on neutrosophic set correlation theory using stock market transaction price and transaction volume. Guan et al. [34] presented a multi-attribute time series forecasting method based on stock market trading volume and trading price. However, these models do not clarify the influence mechanism of multiple factors on time series; rather, they verify the linkage relationship between factors from an empirical perspective.
The complexity and inconsistency of forecasting require continuous exploration of the internal rules of time series to improve forecasting accuracy. The method of multiattribute decision-making provides a new idea for fuzzy time series forecasting. For example, inspired by the neutrosophic set (NS) theory [37] proposed by Smarandache, Zhao et al. [24] introduced NS and information entropy (IS) into the forecasting field to more accurately express the internal rules of the stock market and avoid the lack of prediction rules. However, Zhao's model can only represent the three dimensions of up, inconsistency and down, and it cannot express more historical details, such as the fluctuation range. Then, Zhao et al. [33] introduced the five-dimensional probabilistic linguistic term set (PLTS) [38] to reflect the fluctuation characteristics of fuzzy time series and achieved a good prediction effect. However, Zhao's model can only express the fluctuation of the time series itself and does not consider the interference of other influencing factors on the time series, and the forecasting accuracy is not sufficient. The probabilistic linguistic preference relationship (PLPR) is widely used in the field of multi-attribute decision-making [39]- [41] but rarely used in the field of prediction.
In this paper, the fluctuations of a time series are considered complex behaviors, whose fluctuation rules are influenced by a kind of preference. In addition to the time series itself, another time series that can reflect the preference of the time series is also included in the forecasting model. To represent the fluctuation trend and the preference information, we introduce the PLPR into the forecasting model. The PLPR was first proposed by Zhang et al. [41] in 2017; it provides more than one linguistic term about linguistic variables and reflects different importance degrees of the possible preference values. Distance measurement methods are also employed to locate the proper PLPR for future forecasting. The contributions of the model are mainly as follows: (1) The PLPR is employed to express both the historical fluctuation rules and preference properties, which makes the model more interpretable.
(2) From the innovative perspective of behavior and preference theory, this paper reveals the internal law of time series fluctuation, provides a theoretical basis for exploring the linkage mechanism of multiple datasets, and provides theoretical support for the selection of relevant linkage datasets.
(3) The introduction of similarity comparison methods provides a quantitative comparison basis of complex rules. It provides a useful reference for deepening the expression and comparison methods for the conclusion of complex rules.
To verify the validity of our model, we apply the method to forecast enrollment at the University of Alabama, Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX) and Heng Seng Index (HSI). Experiments show that the proposed method outperforms the existing methods.
The rest of this paper is organized as follows. The second part introduces the basic concepts of fuzzy fluctuation time series (FFTS) and probabilistic linguistic-related theories. The third part introduces the forecasting method based on probabilistic linguistic preference logical rules. The fourth part is the empirical test of the proposed model. The fifth part summarizes the conclusions.

B. PROBABILISTIC LINGUISTIC
Zhang et al. [41] proposed a PLPR based on a PLTS. In this part, we introduced and expanded the relevant theories about probabilistic linguistics.
Definition 5 (Probabilistic Linguistic Term Set and Its Conversion): Let S = (s 0 , s 1 , . . . , s τ ) be a linguistic term set (LTS), and PLTS be defined as: where s α (p α ) is called the probabilistic linguistic variable (PLV), which is the linguistic term s α associated with the probability p α . Suppose (S (t − 1) , S (t − 2)) is the ''current state'' of a one-factor second-order FFLR; it can be converted to a probabilistic linguistic term set (PLTS) as follows: where w i = 1 if the subscript of S (t − i) is equal to α and 0 otherwise. Example 1:Let τ = 4 as defined in Definition 1. The ''current state'' of a one-factor second-order FFLR (s 2 , s 0 ) → s 3 can be converted to a PLTS (s 0 (0.5) , s 1 (0) , s 2 (0.5) , s 3 (0) , s 4 (0)). For convenience, the ''current state'' of the FFLR can be represented by the probability of each linguistic variable of an LTS in order. For example, the above PLTS can be simplified to the representation (0.5, 0, 0.5, 0, 0).

Definition 6 (Probabilistic Linguistic Preference Relationship):
Let E = {e 1 , e 2 } be a set of parameters related to the fluctuation behavior and the corresponding preference of a time series. A pair B (t) = (R 1 ,R 2 ) is called a PLPR, which represents the ''current state'' of the two parameters.
Example 2: Let E = {e 1 , e 2 } = (price, volume) be the parameter set reflecting the fluctuation behavior and preference of a stock market. Then, a PLPR B (t) can be represented by: For convenience, the PLPR can be represented by the probability of each linguistic variable of an LTS in order. Definition 7 (Probabilistic Linguistic Preference Logical Relationship): Let S (t) (t = n + 1, n + 2, . . . , T , n ≥ 1) be an FFTS and B (t) be the ''current state'' of a PLPLR. The ''next state'' of the PLPLR can be generated by converting FFLRG, which is all the ''next state'' of corresponding FFLRs with the same B (t), to a PLTS C (t). In this way, the FFLR can be represented by a PLPLR as B (t) → C (t).

Definition 8 (Hamming-Hausdorff Distance and Euclidean-
. . , τ be any two PLPRs and the similarity measurement between B (t) and B (t) can be defined by: If γ = 1, then Equation (6) is reduced to the Hamming-Hausdorff distance, and it is Euclidean-Hausdorff distance when γ = 2.       The Euclidean-Hausdorff distance can be calculated as follows: Definition 9 (Score Function): Let R (t) = s j p j j = 0, 1, . . . , h} be a PLTS. The score function S (R (t)) of R (t) can be defined as follows.

III. A NOVEL FORECASTING MODEL BASED ON PLPLR
The stock market is a complex nonlinear system, and coupled with the limited and irrational nature of human thinking, it is difficult to predict its fluctuation trend accurately. Stock prices are often an effective reflection of investors' investment behavior, and investors' behavior is influenced by related preference information. Therefore, studying the behavioral preferences of investors is of great significance to predicting stock prices. The preference behavior of investors is interfered through multiple internal and external factors, such as the economic operating environment, the price trend VOLUME 9, 2021 of the international market, the opening price, and the trading volume at the beginning of the week. The prediction model established in this paper based on the PLPR can effectively reflect the preference interference of such internal and external factors and effectively express its historical fluctuation trend.
Trading volume usually affects a trader's decision, and this conclusion is shared for both buying and selling. The example part of this article establishes a two-factor model by using trading volume as the second factor, aiming to explore preference in the stock market. Existing studies have shown a positive correlation and a dynamic causal relationship between the trend of stock prices and trading volume [43], [44]. Since the transaction volume contains many factors that are not included in the transaction price, the transaction volume can provide useful information for predicting future stock price fluctuations [45], [46]. At the same time, trading volume has a strong explanatory effect on stock prices [47]. Therefore, using trading volume as the second factor can effectively explain our proposed model and verify the preference problem in the stock market.
In this paper, we proposed a novel forecasting model based on PLPLRs and similarity measures. As shown in Figure 1, our model consists of three parts: fuzzification, modeling, defuzzification and forecasting. The historical data of TAIEX 2004 are divided into two parts. We take TAIEX from January to October 2004 as the training data and November and December as the testing data. The transaction price and volume in the dataset are taken as the main factor and secondary factor, respectively. The detailed methods are shown in the following steps.
Step 3: The ''current state'' of each two-factor secondorder FFLR can be represented by a PLPR B (t).
Step 4: Convert the FFLR to PLPLR. Each ''current state'' of FFLRs can be represented by a PLPR B (t). Then, we can generate C (t) for different B (t) according to Definition 7. Thus, the FFLRs for the historical training data are converted into PLPLRs.

C. PHASE C: DEFUZZIFICATION AND FORECASTING
Step 5: For the observed point B (i) in the testing data, we use a PLPR B (t) to represent the historical fuzzy fluctuation trend. The similarity measure is employed to find the best probabilistic linguistic preference logical relationship (PLPLR) B (t) → C (t). According to the ''next state'' of the corresponding PLPLR, obtain the probability of down, slightly down, equal, slightly up and up from C (t). The pseudocode of the rule generation algorithm is shown in Figure 2.
Step 6: Forecast the fluctuation value of future: Step 7: Forecast the future value:

IV. EMPIRICAL ANALYSIS A. FORECASTING OF THE TAIWAN STOCK EXCHANGE CAPITALIZATION WEIGHTED STOCK INDEX (TAIEX)
In this part, we will take TAIEX as an example to elaborate on the methods and steps of the model. The dataset is divided into a training set and a testing set. TAIEX 2004 from January to October is the training set, and from November to December is the test set. In addition, trading price and trading volume are the main factor and secondary factor, respectively.
Step 1: Generate fuzzy-fluctuation time series. Consider the historical training data from January 2, 2004, to October 29, 2004, of the main factor and secondary factor TAIEX shown in Appendix Table 11. First, we can obtain the In the same way, we can obtain the other fuzzified historical data of the main factor and secondary factor of TAIEX. As shown in Appendix Table 10, all historical fluctuation trends in the training data can be fuzzified into FFTS. For convenience, the element s i is simplified to number i in the expression of FFLRs.
Step 2: Establish two-factor second-order FFLRs. As shown in Appendix Table 11, the two-factor second-order FFLR for training data can be established for historical fluctuation trends according to Definition 4. For convenience, the element s i is simplified to number i in the expression of FFLRs. VOLUME 9, 2021 Step 3: Convert the two-factor second-order FFLR to PLLR. The ''current state'' of a two-factor second-order FFLR can be represented by a PLPR B (t). For example, the ''current state'' of two-factor second-order FFLR ((s 2 , s 2 ) , (s 2 , s 0 )) can be represented by a PLPR B (t) = ((0, 0, 1, 0, 0) , (0.5, 0, 0.5, 0, 0)), according to Definition 6.
In this way, all FFLRs for different PLPR can be converted to PLPLRs as shown in Table 1.
Step 6: The fuzzified fluctuation trend of the stock market of November 1 can be obtained by the score of the ''next state'' of the PLPLR.
It can be defuzzified by the following: Step 7: The forecasted value can be calculated through the current value and the fluctuation value.
The other forecasting results of TAIEX 2004 are shown in Table 2 and Figure 4.
To analyze the performance of the forecasting results and the actual results, we used the mean squared error (MSE), the root of the mean squared error (RMSE), the mean absolute error (MAE), the mean percentage error (MPE) and symmetric mean absolute percentage error (SMAPE) index to verify the accuracy of our results. They can be defined as follows: where t = 1, 2, . . . , n denotes the position of the series to be compared, n denotes the number of series, and forecasts (t) and actual (t) denote the forecasted value and actual value at position t, respectively. The forecasting errors for different forecasting evaluation indices of MSE, RMSE, MAE, MPE and SMAPE are shown in Table 3. The forecasting errors of RMSE for different nth-order models are shown in Table 4. Figure 5 describe the forecasting results and RMSEs for TAIEX from 1997-2005.
To verify the universality and accuracy of our model, Table 5 shows the root mean square error (RMSE) from 1999 to 2005 using the Euclidean distance measurement method and the Hamming distance measurement method. Table 6 shows a comparison of the RMSEs of different methods for forecasting TAIEX 1999TAIEX -2005. It is revealed that the proposed model has the best stability from the perspective of average forecasting performance.

B. FORECASTING OF THE ENROLLMENT OF UNIVERSITY OF ALABAMA
To further test the performance of our model and compare it with more authoritative models, the model employs the enrollment data of the University of Alabama and the average unemployment rate of the United States from 1971 to 1992 as the main-factor and secondary factor tests, respectively. The specific data of the two datasets are shown in Table 7.
To validate our findings, we compared the results to those of eminent researchers working through different methods, as shown in Table 8.
As seen from our training data and from November to December as our testing data. To verify the universality and generality of our model, we compare the root mean square error of the Hong Kong Hang Seng Index from 2004 to 2009 with the data of other authoritative methods, as shown in Table 9. The results show that our experimental results are superior to other models and have broad application prospects.
As shown in Table 9, the average prediction error of the proposed method is 265.42; compared with most other models, it has a relatively small prediction error.

V. CONCLUSION
In this paper, a two-factor fuzzy-fluctuation time series forecasting model is proposed based on the PLPR and similarity measure. Considering that trading volume can reflect the investment preference of investors, volume is selected as the secondary factor and the main factor of a stock market time series. Specifically, a PLPLR is generated from the training dataset. These PLPLRs represent the behavior rule hidden in the training dataset in terms of historical internal fluctuation trends and corresponding preference information. The Euclidean-Hausdorff distance and Euclidean-Hausdorff distance are used to locate the best PLPLR whose left side is more similar to the state of the current point. Accordingly, the forecasting result can be generated by the defuzzification of the corresponding right side. The greatest advantage of the proposed model comes from its interpretability and programmability. Moreover, the similarity comparison method makes it easy to deal with uncertain situations, such as a lack of forecasting rules due to the complexity of the future. This model first introduces the theory of behavior and preference habit into the study of the mechanism of time series fluctuation, which opens up a new perspective for revealing the complex internal law of time series fluctuation.
The proposed model is successfully employed to forecast traditional and realistic time-series datasets, including the enrollments of the University of Alabama, the Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX) and the HSI. By comparing its MSE, MAE, RMSE, SMAPE and MPE with those of other existing models, the proposed model is shown to have outstanding performance in stability and accuracy. The successful application in difference datasets also verifies its practicability and universality. The most important contribution of the proposed model comes from its theoretical meaning, which makes it possible to introduce research results related to behavior preference into the time series forecasting area. In future research, we will try to enrich the theoretical framework of such forecasting models and introduce related behavior theory into the stock market prediction model, such as irrationality, bounded rationality, the herding effect, and risk preference. Systems and many of them have been indexed by SCI, EI, CSSCI, and ISTP. His research interests include complex system theory and methods, e-commerce and business intelligence, safety engineering, and risk prevention and control. In recent years, he has taken in charge of two projects of the National Social Science Foundation and ten research projects at the provincial and ministerial levels, such as the Humanities and Social Sciences of the Ministry of Education. The research results have won the Third Prize of the Shandong Science and Technology Progress Award (First Prize). He was selected for the First Batch of Outstanding Young Talents Support Program of the Shandong University of Finance and Economics. VOLUME 9, 2021