A B LENDED S OFT - COMPUTING M ODEL FOR S TOCK - VALUE P REDICTION

Stock investments play a crucial role in deciding the global economic growth of the country. Investors can optimize profit and avoid risk through accurate stock-value prediction models, which motivates researchers to work on various aspects of correlated features and predictive models for stock-value prediction. The existing stock-value prediction models used data like Twitter, microblogs, price history and Google trends. On the other hand, domain-specific dictionary-based deep learning evolved as a competitive model for alternative models in stock value prediction. But, the accuracy of these models depends on the quality of the input, the correlation among the features and the correctness of the sentiment scores generated for the dictionary terms. Financial-news sentiment analysis for stock-value prediction with dictionary-based learning needs attention in improving the quality of the input and dictionary terms’ sentiment score generation. The present research aims to develop a blended soft-computing model for stock-value prediction (BSCM) with cooperative fusion and dictionary-based deep learning. In the current work, six Indian stocks that cover uptrend, sideways and downtrend characteristics are considered with stock-price histories and news headlines from 8 th August 2016 to 31 st March 2023, i.e., 2427 days. The number of records in price-history dataset is 14,562 and in the news headlines dataset is 46,213. The performance of the stock-value prediction can be improved by taking advantage of multi-source information and context-aware learning. The present research aims to achieve three objectives: 1. Applying cooperative fusion to combine the news headlines and price history of stocks collected from multiple sources to improve the quality of the input with correlated features. 2. Building a dictionary, FNSentiment, with a novel strategy. 3. Predicting stock values using FNSentiment and News Sentiment Prediction Model (NSPM) integration. In the experimentation, the proposed model outperformed the state-of-the-art models with an accuracy of 91.11%, RMSE of 10.35, MAPE of 0.02 and MAE of 2.74.


INTRODUCTION
The stock-value prediction models gained attention in recent times. The stock value varies with the variety of features. The major categories of these features are internal and external features. Internal features include close price, the price-to-earnings ratio (P/E), the price-to-book ratio (P/B) and the like [1]. The news, Twitter, Google trends and other social media reveal external features: currency value, political decisions, organization profit, loss, the relation between employees and chief executive officer and the like, as described by [2]. The selection of the features dramatically impacts the stock-value prediction accuracy. The popular stock-value prediction models considered price history, news, Twitter [3] and microblogs as sources of stock information. But, social groups or microblogs cover a limited number of people's opinions that cannot be trusted compared to the news media reports.
In recent research, news financial sentiment analysis models evolved to predict the stock price [4]. An interesting correlation was found between the news features and the stock price. In the proposed model, to join these correlated features, cooperative fusion [5] is used. Cooperative fusion is a methodology to combine correlated features from various sources to improve the quality of the input. Through this fusion methodology, the performance of the predictive model can be improved.
Financial news sentiment analysis involves generating sentiment scores for the words in the input news. The existing English-term dictionaries were insufficient to capture the meaning of the business context statements. The Natural Language Processing (NLP) problem, polysemy [6] means that multiple interpretations are possible for the same word depending on the context. Hence, domain-specific dictionaries are required to improve the accuracy of sentiment prediction.
Domain-specific dictionaries for stock value prediction exist for various languages. But, these dictionaries need improvement in considering correlated features of terms to generate accurate sentiment scores in dictionary learning. In the proposed system, the novel dictionary FNSentiment is developed with fused information by combining correlated features, news and close price.
Ahmad et al. [7] expressed that the accuracy of the sentiments depends on the learning model's performance. Deep-learning models exhibit great computation power in fields like image processing and text analytics. In specific, Deep Neural Networks (DNNs) [8], like Convolutional Neural networks (CNNs) [9], Long Short Term Memory (LSTM) [10] and Gated Recurrent units (GRUs) [11] are dynamic classification and predictive models. The proposed model uses the NSPM model, which combines CNN and LSTM to compute the stock value. The objectives of the proposed work are as follows.
1) Developing a cooperative fusion methodology to combine the news headlines and price history to improve the quality of the input data to the stock-value prediction model. 2) Generating domain-based dictionary, FNSentiment to with dictionary-based deep learning using the fused information. 3) Developing a stock-value prediction model combining FNSentiment and NSPM.
The remaining sections of the paper are outlined as follows: Section 2 presents the related work. Section 3 explains the theory and implementations of the proposed model. Section 4, interprets the results and discussion. Section 5 presents the conclusion and some suggestions for future work.

RELATED WORK
This section presents the various methods to predict stock prices and the importance of dictionary-based deep learning for stock-value prediction. Further, the section discusses the advantages and limitations and steps to overcome the boundaries of the existing models and give a clear vision of the proposed model.

Stock-value Prediction
Stock trading is a business investment technique that results in drastic variations in profit or loss in a short span with a quick change in stock value. In recent years, the following methods have evolved to address the stock-market analysis and prediction; Auto Regression (AR), Auto Regressive Integrated Moving Average (ARIMA), Auto Regressive Moving Average (ARMA) models, linear regression, Support Vector Regression (SVR), Bayes neural network, Hybrid Network Adaptive Time-series recommendation framework (HNATS) [12], Long Short Term Memory Cellular Automata (LSTMCA) [13], Fuzzy time series analysis [14], GRU [15]. Stock-value prediction is a time-series problem involving statistical or textual data analysis.
The researchers used various statistical models to address stock-value prediction based on internal features. Kumar et al. [16] have proposed an SVR with the fuzzy model and a genetic algorithm with SVR [17] to perform time-series analysis on statistical data stock's close price. Tunisian stock data analysis using a hierarchical deep neural network [18] showed a considerable performance. Even though the time-series prediction analyzes the stock's close price and other statistical factors like the P/E ratio, this analysis ignored external features that dynamically decide stock variations.
The time-series text analysis involves investigating the stock-market data collected from several media. Long et.al [19] have used the news media analysis for stock-value prediction using SVM with S&S kernel and obtained good performance. In this study, the authors expressed the importance of considering the news data for stock-value prediction. Many researchers have experimented with mass media, news, Twitter, microblogs [20], online financial comments [21], behavioural finance [22] and the like and found that the media creates hype and touches user emotions in stock trading [23]. However, in social-media analysis for stock-value prediction, all the traders must be members of a specific social network with active communication to capture data, which is impossible. In addition, the users might be irrational; hence, the correctness of the social-media data is questionable.
To overcome the existing models' limitations, we considered news data and price history for stock-value prediction in the present study. News is quickly captured and reachable to traders via newspapers and electronic media. Moreover, the news data is trustworthy when compared with social-media data. We have developed a cooperative fusion method to use the price history and news headlines to obtain input datasets for stock-value prediction.

Dictionary-based Sentiment Analysis
In recent research, many dictionaries have evolved to analyze text belonging to various domains. These dictionaries have been developed to be object-specific, category-specific, language-specific and the like. SentiDomain [24] was introduced as a rule-based sentiment dictionary developed for particular domain objects. This work calculates the sentiments for each object cluster using cosine similarity. This method analyzed the user review to measure product rankings and user satisfaction. Loughran and Mc-Donald manually developed a financial news dictionary, LMFinance for Hong Kong news. This dictionary outperformed the existing dictionaries, SentiWordNet and Senticnet [25].
A Korean-language dictionary [26] was developed to extract nouns from news statements and sentiments of positive and negative words obtained by calculating the average frequency of all positive and negative words in the Korean language. Most of the dictionaries were built in Japanese and Chinese rather than in English. We studied various methods to create dictionaries for deriving a domain-specific dictionary. Then, we analyzed the advantages of manual and automated dictionaries. Consequently, we proposed a semi-automated FNSentiment dictionary to take the benefits of both manual and automated dictionaries.

Sentiment Analysis with Deep Learning
The challenges in the sentiment analysis made researchers tend towards using dynamic computing models. In recent studies, deep-learning models have shown exemplary performance in sentiment analysis. Abdi et.al. [27] have introduced RNSA, which uses RNN and LSTM combinations for sentiment analysis. This model finds the sentiments of the users' emotions in social-media data. The authors found that in word-level and sentence-level features with pre-trained embedding vectors, Word2vec showed promising results.
Chen et.al. [28] have used GRU to analyze a dictionary created from Chinese social networks. This model classifies the user's emotions as positive or negative. Then, these sentiments are used to predict the stock price. Later, the authors proposed another deep-learning model, RNNboost [29], for stock prediction. The authors believed deep learning efficiently finds user emotions and is reliable for stockvalue prediction. In addition, they showed that dictionary-based knowledge is suitable for analyzing domain-specific data. The hybrid CNN-LSTM dynamically classifies the statements positively and negatively [30]. In the present work, we have designed a deep-learning model, News sentimentprediction model (NSPM) that employs CNN and LSTM with Word2Vec embedding in the proposed system.

PROPOSED SYSTEM
This section presents the theory and implementations of blended soft-computing model for stock-value prediction (BSCM) for the stock-value prediction for news updates using a dictionary-based deep learning approach. Figure 1 shows the BSCM architecture. The design and development of BSCM are as follows: 1) The news headlines are collected from the National Stock Exchange (NSE) and Times of India (ToI). Price history is collected from the NSE. 2) The cooperative fusion method combines price history and news headlines information from multiple sources.
3) The FNSentiment dictionary was developed that consists of significant bigram terms with close price and sentiment scores. 4) NSPM is modeled using the deep-learning approach for stock-value prediction when integrated with FNSentiment.

Data Collection
The experimentation in this research considers six stocks WIPRO, TCS, BHARIARTL, SBIN, NXTDIG-ITAL and PNB to cover three possible trends in the stocks. Figure 4 illustrates the pricehistory input datasets with three dataset characteristics; upward, sideways and down trends. The stock datasets are of two types; one is price-history data and the other one is news headlines data from 8 th

Fusion Method
The input datasets to build the dictionary were obtained by cooperative fusion, as illustrated in Figure  2. After data collection, the pre-processing and feature-selection methods were used to find the essential elements of the price-history data. In the pre-processing step, the null value of data will be substituted by the previous day's instance value. The fusion method was applied to the datasets as follows: 1) Price-history pre-processing and feature selection.
a) Pre-processing: The stop words and the like were eliminated from the news sentences. b) Feature engineering: For each sentence in the news headlines: (i) A sequence of bigrams (two consecutive words) from the pre-processed news sentence is generated with a semicolon as a separator. In the next step, a polarity feature ranging from -1 to 1 will be added to each headline entry based on domain knowledge. Table 1 summarizes the domain knowledge about the news headlines.
(ii) Finally, the bigrams with the same date and stock entries are joined with semicolon as a separator and the new polarity value is considered as the sum of polarities in the respective entries.
2) In the rule-based merge step, the close-price data is mapped with news headlines data and vice versa based on date and stock features.
a) For the available entry in news headlines data, if the corresponding date entry is missing in the price history, a new entry is created with date and close price by considering the 1 to n steps available close price for the stock. b) If the corresponding date entry is missing in the news headlines data, then move back to the 1 to n steps to find the most recent news for the stock. c) For common date and stock, the stock's price-history is merged with the news headlines data. Figure 5. Performance of the news sentiment-prediction model with various standard dictionaries.

Financial News Term Sentiment Dictionary (FNSentiment)
The fabrication of FNSentiment consists of two steps 1. Computing significant bigrams, 2. Computing bigram polarities. In the first step, significant bigram generation produces the critical terms for news sentiment analysis. The significant bigrams were obtained by the two consecutive terms generated by combining a noun, adjective, adverb and verb by discarding the other terms in each news sentence. The polarity computation finds the polarity values for all sentences. The sentence with word length n+1 and score S generates n terms. The S gives the sentiment score for each significant term in the sentence. A sentiment score Si for i=1 to n was computed using (1), contributing to the sentiment score; hence, the output was a set of terms with sentiment for each sentence. Next, the sentence's sentiment denoted by STi was computed using (2). After this step, The FNSentiment was updated with the pairs of terms and their scores. Next, the Normalized Sentiment Score (NST) was computed using (3) to convert the sentiment scores on the scale (0, 1). Thus, NSTi generates polarity values ranging from (0, 1). The NSTi value zero represents the highly negative sentiment term. The NSTi value one defines the extreme positive sentiment term and 0.5 means the terms with a neutral sentiment. In (3), min and max represent ST's minimum and maximum values, respectively. Table 3 shows an instance of the FNSentiment dictionary.

Stock-value Prediction Using NSPM
The objective of the NSPM is to predict the news sentiments using deep learning, as shown in Figure 3. The model consists of three layers; embedding, CNN and LSTM. First, the embedding layer creates input vectors to train the model. In this step, the layer generates equivalent word vectors for the inputs.
In the next step, the embedded vectors were passed from the convolution layer to the max pooling layer to capture the most significant features from the information. Finally, the input was passed through the sequential layers of LSTM for further learning. The ci indicates an internal cell that collects input in LSTM. The hj represents the hidden state that produces output. The final layer softmax outputs a value from zero to one, indicating the sentence sentiment value. The hyper-parameters of NSPM were found through the Bayesian optimization tuner of the Keras tuner.  Figure 6. Stock-value prediction using the proposed model and existing models.

RESULTS AND DISCUSSION
The dataset is divided into 70% for training data from 8 th Aug 2016 to 3 rd March 2021 (1699 days) and 30% testing data from 4 th March 2021 to 31 st March 2023 (728 days). The results, performance analysis of the proposed model, evaluation and comparison with the models in recent literature are explained in the following sub-sections. The experimentation of the present research is described as follows. The collected experimental data, news headlines and close prices, was initially joined using cooperative fusion to generate quality input.
In the next step, the FNSentiment dictionary was built by computing significant bigrams and their polarities. As a final step, NSPM is used to predict the stock value with the FNSentiment using dictionary-based deep learning.

Metrics for Evaluation
The metrics used for model evaluation are Accuracy, RMSE, MAE and MAPE. The accuracy determines the percentage of the number of sentences recognized correctly among the total tested sentences. The RMSE gives the square root of averaged squared error. The error represents the difference between the actual and predicted values. The MAE gives the absolute difference between the predicted and actual values. MAPE is the mean absolute percentage error that determines the relative error. The proposed work aims to optimize these metrics.

Results of Cooperative Fusion
The news and price-history datasets were combined to create the fused information. In this process, feature selection is applied to price-history datasets to select the prominent features 'close price' and 'Date'. Feature engineering is applied to pre-processed news data sentences to obtain bigrams. Further, a sentiment score between -1 and 1 is appended to each news sentence based on the domain knowledge.
In the next step, the close price and bigrams of specific stock were mapped with Date. This step results in datasets with (Date, Stock, Close price, Bigrams, Sentiment score). Table 3 shows information fusion for the sample input dataset.

FNSentiment Dictionary
The formulation of the FNSentiment dictionary starts by collecting the fused information from the previous step. The next step generates significant bigrams by considering the noun, adjective, adverb and verb combinations of the bigrams obtained from the fused information. In the next step, formulae (1) and (2) are applied to compute the sentiment score for each bigram. Then, normalized sentiment score is calculated for the sentiment score field to convert the range of the values from 0 to 1. For the duplicate terms in the dictionary, sentiment scores were summed up and the close prices were averaged. Now, the FNSentiment contains triplets (significant bigrams, sentiment score, close price). An instance of these results is shown in Table 4.

Performance of News-sentiment Prediction Model (NSPM)
The NSPM and alternate deep-learning models are integrated with various dictionaries and compared for the analysis. The hyper-parameters of NSPM are done through Bayesian optimization. The hyperparameters are as follows: The learning rate for the generator and discriminator is 0.01, the suitable optimizer is Adam. We considered the number of epochs as 100 throughout the experimentation. The NSPM with FNSentiment was evaluated and compared with standard dictionaries, SenticNet, SentiWordNet and Vader. Figure 5 illustrates the comparison results. These results demonstrate that context-aware learning is possible through the integration of FNSentiment and NSPM with an accuracy of 91.11%. Table 5 shows the experiment summary on the six stocks. The NSPM with FNSentiment is a promising approach compared with NSPM alone, with improved accuracy by 3.33% from the experimental results. Figure 5 illustrates the accuracy of NSPM integrating with the FNSentiment and existing dictionaries. The FNSentiment with NSPM shows an accuracy of 91.11%; thus the results concluded that the NSPM with FNSentiment outperformed the recent literature models.

Performance Analysis of Existing and Proposed Models
BSCM is evaluated and compared with the baseline models with the metrics: Accuracy, RMSE, MAE and MAPE. The metrics were computed using formulae (5), (6) and (7). Tables 6, 7 and 8 illustrate the model evaluation results using the metrics. The summary of the results is shown in Table 9. The BSCM model outperformed all the baseline models with an accuracy of 91.11%. The evaluation of other metrics, RMSE of 10.35, MAPE of 0.002 and MAE of 2.74, showed that the BSCM is a reliable stock-value prediction system. Figure 6 shows the stock values predicted by the proposed model BSCM and the existing models for the six stocks: WIPRO, TCS, BHARIARTL, SBIN, NXTDIGITAL and PNB.

CONCLUSION AND FUTURE WORK
In the present work, we computed the futuristic stock values for six stock datasets with the proposed model and the models in the literature. The proposed model achieved an accuracy of 91.11%, RMSE of 10.35, MAPE of 0.002 and MAE of 2.74. BSCM improved stock-value prediction with a rise in accuracy by 4.54% and with a fall of the MAE by 1.65, MAPE by 0.01 and RMSE by 15.76 compared with the existing models. BSCM outperformed the models in the literature. The NSPM for news sentiment prediction improved accuracy by 3.72% after integrating with a novel dictionary FNSentiment. The results showed that the cooperative fusion method and dictionary-based deep learning models improved the stock-value prediction accuracy. In future studies, we want to incorporate context-based clustering to refine the significant bigrams in predicting the stock value. The BSCM can be enhanced by integrating with clustering to analyze the critical news features for various stocks like oil, bank, software stocks and the like to develop sector-wise dictionaries to optimize the runtime.