1 Introduction

Many factors affect the economic performance and financial market. Particularly, social media networks and electronic devices are rapidly increasing factors. According to the reports by Pew Research Center [1] and Statistica [2], the number of social media users has been doubled since 2010 and is expected to increase from 0.97 billion users in 2010 to 2.95 billion users in 2020. Furthermore, the amount of data produced is expected to reach 44 zettabytes in 2020 [3]. The analytics using big data, which is collected from online social media, provides useful insights into practical applications [4]. Particularly, the opinions of the public toward a particular issue using social networking sites have a certain effect on our society. The sentiment analysis enables one to capture these emotions by identifying subjective contents as positive, neutral and negative emotions to certain social issues. Hence, the algorithm is frequently used in numerous areas such as financial market and marketing for sales forecasting. Zhang et al. [5] demonstrate that the social media sentiment provides statistically significant information about the stock price.

In this study, we use sentiment analysis to investigate the relationship between economic performance and public opinions. For this goal, we train sentiment classifiers with big data sources collected from various social media sites including news articles, twitter and blogs. Then, we consider three different feature sets, which include feature vector and sequence vector with positive and negative word dictionaries, emoticons and lexical properties of the sequence of words. Finally, we evaluate the performance of six classifiers: MaxEnt-L1, C4.5 decision tree, SVM-kernel, Ada-boost, Naïve Bayes and maximum entropy. The results show that MaxEnt-L1 has sustainably better performance than other classifiers. The next step is to predict the sentiments of the collected datasets with the trained classifiers and compare the sentiment scores with an economic index. Finally, we use a VAR analysis and Granger causality theory to investigate the causal relationship between the sentiment scores and the economic performance. The contribution of this paper is not to propose a new method but to deeply analyze the correlation between the economic value and the time series emotion value of the social data collected using specific keywords. The remainder of the paper consists of related works, methodology, economic results and conclusion.

2 Related work

2.1 Sentiment analysis

The sentiment analysis finds how sentiments are expressed in certain texts and whether favorable or unfavorable sentiments or opinions can be assigned in the texts. In other words, the sentiment analysis includes sentiment expressions, polarity and strength of the expression and the relationship among the subjects [6]. As a topic of natural language processing (NLP) in the field of computer science, sentiment analysis has been studied in academic fields and industry. The primary purpose of applying sentiment analysis is to figure out how people feel about something. Furthermore, many companies, which have collected a lot of data on their customers and staffs, tend to use sentiment analyses to realize the reputation of their companies and make their business plan [7]. During the 2000s, sentiment analysis was developed in various areas. Pang and Lee [8] studied diverse theories and methodologies to approach sentiment analysis. Liu [9] mentioned that the major reason for the increase in sentiment analysis studies is the proliferation of social media. As noted in [8], advanced sentiment analysis methodology and applications are required to better understand customers.

Several works specifically proposed classifiers for sentiment analysis. According to [7], SA based on supervised learning is the most well-accepted approach for sentiment analysis. The common types of classifiers are Naïve Bayes classifier, decision tree, k-nearest neighbors, neural network, Support Vector Machine (SVM) and maximum entropy. Pang et al. [10] analyzed the performance of three classifiers (Naive Bayes, maximum entropy, and SVM) on movie reviews with rating indicators, i.e., a number of stars, which served as a baseline. They also used star ratings as polarity signals in their training datasets. Nasukawa and Yi [6] and Wilson et al. [11] classified the contextual polarity of sentiment expressions. Particularly, [11] classified expressions about specific items using manually devised patterns to categorize the polarity. O’Hare et al. [12] reported that Multinomial Naïve Bayes better performed than the Support Vector Machine (SVM) on finance-related blogs. In their study, the classification on sentiments is for both ternary (i.e., positive, negative and neutral) and binary (i.e., positive and negative). In classifying the sentiment of Twitter messages, [13] concluded that machine learning algorithms such as Naïve Bayes, Support Vector Machine (SVM), and maximum entropy achieved a high accuracy (more than 80%) using the trained Twitter message with emoticons. They applied feature extractors that consisted of unigrams, bigrams, unigrams and bigrams, and unigrams with part of speech tags. However, sentiment classification is often perceived as having the domain-dependent problem because there are different sentiment expressions in different domains, and the same word can mean different sentiments. Therefore, [14] suggested a collaborative multi-domain sentiment classification approach to simultaneously train sentiment classifiers for multiple domains. Specifically, they disassembled the sentiment classifiers as a global one and a domain-specific one. Fernández et al. [15] proposed the Distributional Correspondence Indexing (DCI) method for domain adaptation in sentiment classification. The experiment of [15] shows that the DCI performs well in comparison with the latest technologies for cross-language and cross-domain sentiment classifications. In addition, DCI substantially reduces the computational cost and requires less human intervention. Also, [16] creates a sentiment-related index (SRI) to evaluate the association between different lexical elements in a specific domain with the help of domain-independent features as a bridge in order to reconcile the gap between different domains. Then, they suggest a new SRI-based cross-domain sentiment classification algorithm called SentiRelated, to analyze the sentiment polarity of short texts. Furthermore, [17] explains innovative approach to predicting the sentiment of documents in multiple languages without translation through Latent Semantic Indexing (LSI) which is able to change over from multilingual corpus to a multilingual “concept space.” They invent and implement the experiments that examine the extent to which subjects and sentiment contribute individually to their classification accuracy. As a result, they try to straighten out the question of whether subjects and sentiment can be discerned sensibly.

2.2 Literature about using social media sentiment to business companies

Various businesses embrace text and sentiment analysis and combine it into their processes because of its efficiency and accuracy. Manek et al. [18] suggested a statistical method using weight by Gini Index method with Support Vector Machine (SVM) for feature selection in sentiment analysis by using large movie review datasets. Many companies recently used social media data such as Facebook and Twitter more frequently to interact with customers. Culnan et al. [19] introduced the Fortune 500’s use of four of the most popular social media platforms: Twitter, Facebook, blogs and client-hosted forums. They showed case studies of three Fortune 100 corporations to demonstrate how they administered their respective networks of social media. Generally, most activities using social media are sales, customer care, advertising, marketing, product development and innovation [20]. He et al. [21] conducted an in-depth case study, which used text mining, to analyze the instruction text content on Facebook and Twitter sites of the three largest pizza chains: Pizza Hut, Domino’s pizza and Papa John’s Pizza. Yu et al. [22] examined the effect of social media and traditional media, their relative importance and correlations to short-term firm stock market performance. They exercised advanced sentiment analysis techniques beyond the number of mentions to analyze the overall sentiment of each media resource on a daily basis toward a particular company.

3 Methodology

Figure 1 shows an overview of this study. As shown, the process can be divided into two parts. The first part trains the sentiment classifiers and calculates the performance of their sentiment classification for six types of classifiers. The second part finds which economic indicators precede or follow with the sentiment score from each source of contents. After checking the rejected hypothesis at an alpha value of some variables, we ascertain whether there is an antecedent or an aftertaste among the variables. Then, a vector autoregressive analysis is used to find the time difference that the two variables show before and after. Therefore, the Granger causality test and vector autoregressive analysis were simultaneously performed in this study.

Fig. 1
figure 1

Research flow

3.1 Collected data

To create a social media index that can be used to identify the public economy from social media data, we sought to index consumer responses to the welfare economy based on a simple frequency of economic keywords. We collected 28 words of Twitter, blogs and news for each medium. In detail, in this study, we considered 73,229 news articles, 860,445 NAVER blogs and 9,749,893 tweets from Twitter from January 1, 2014, to October 31, 2015. We consider the periods between 2014 and 2015 because the Sewol ferry disaster occurred in 2014, and the Middle East respiratory syndrome (MERS) virus was running rampant during 2015. When we collected data, the terms of economic situation and event-related words were collected as query terms as shown in Table 1.

Table 1 Economic terms

The data crawling process is shown in Fig. 2. When a specific query or search term is inputted, the search page results are collected. Using an HTML parser, the URL list is generated. With the Web client requests, web pages are gathered. Using the HTML parser, we can extract the data contents.

Fig. 2
figure 2

Data crawling process

Since the collected data are composed of a document unit, it must be cut into sentence units. We separated the sentence into tokens, which are semantic units, through the tokenization process, which removes whitespaces and measurement strings and divides the sentence into words. Lemmatization is a technique to group multiple forms of a single word into a single form. Stop word removal is the process of eliminating meaningless words such as articles, postpositions, prepositions and conjunctions. Morpheme analysis is the representation of the contents of words, phrases and paragraphs in the document as data that can be processed. It is possible to grasp the parts of the sentence morphemes and ultimately to understand the structure of sentences. This process is called part of speech (POS), which is a task of assigning parts of speech by processing words and assigning lexical categories to each word.

3.2 Selection of feature set

The following feature sets were fed into classifiers to predict the sentiments.

Positive and negative data were collected from various data sources in various manners, manually filtered and selected. We used positive and negative words that are circulating in the public. Word2vec was used to select candidate words as positive or negative and manually selected. The profanity data were added to the text by the Korea Creative Content Agency and divided into positive and negative.

  1. (1)

    Feature vector (including Korean positive word dictionary (11,461 words), Korean negative word dictionary (13,767 words), curse word dictionary (3863 words), positive emoticon dictionary (49), negative emoticon dictionary (52), Korean SentiWordNet (105,178 words)

  2. (2)

    Sequence vector (bag of words)–TokenSequence2FeatureSequence

  3. (3)

    Combine dictionary-based feature vector + bag of words

We compiled the training dataset for sentiment classifiers as follows. To make the classifier domain-neutral, first we collected 11,000 tweets using the query “Seoul Mayor Election.” Then, we collected 6000 news articles using the query “living cost and job.” Finally, we collected 2,450,000 movie reviews from NAVER. Because of the sheer volume of review data, we decided to use the movie ratings of customers. The scale of rate is 0–10; we considered ratings of 0–3 as negative, 4–7 as neutral and 8–10 as positive reviews. These datasets (except for movie review data) were independently reviewed by three evaluators. They labeled each text as negative, neutral or positive.

Among 17,000 data instances, the three judges agreed on 3230 data instances as positive, 5021 instances as neutral and 5410 instances as negative. The percentage of agreement is (3230 + 5021 + 5410)/17,000 = 80.3%. Then, we used 13,661 data instances and 2.45 million movie reviews as the training data to learn the classifiers.

3.3 Machine learning algorithms

In this paper, we concentrate on selecting a correct classifier based on various feature set generation methods. Therefore, we apply six types of machine learning-based classification algorithms for evaluation: MaxEnt-L1, decision tree, SVM-kernel, Ada-boost, Naïve Bayes, and MaxEnt. MaxEnt, which is Max Entropy, is a probabilistic classifier and a type of exponential model that finds the probability distribution of maximum entropy [23]. MaxEnt is based on the principle of maximum entropy and can be applied to language detection, topic classification and sentiment analysis. Because we contribute to the performance of MaxEnt, we use MaxEnt-L1. According to [24], the MaxEnt model is a one-to-one relationship between subsets of variables that emerge from the parameterized factors of the model and subsets of variables to use in constraints. MaxEnt-L1, which adapts generalized expectation criteria for semi-supervised learning, has the flexibility to break out the one-to-one relationship because the generalized expectation criteria are defined from the model that contains generalized expectation terms. In addition, generalized expectation criteria have many advantages such as the ease of use and simplicity [25]. The generalized expectation criteria do not need to have an additional process such as making an inverted index for pre-clustering unlabeled data. In this regard, we add MaxEnt-L1 to evaluate the measures. We also use the C4.5 decision tree classifier to approximate discrete valued functions using a decision tree; the C4.5 decision tree classifier is the most popular among inductive inference algorithms [26]. As another classifier, we use Ada-boost, which is fast and simple to program [27]. In addition, Ada-boost does not require prior knowledge about the base learner, so it can be combined with any other method to find the base classifiers. We also use Naïve Bayes, which is a probabilistic classifier based on Bayes theorem [28]. Using training data, Naïve Bayes predicts the category of documents using cue words that occur in the classified target document. Finally, we use the SVM [29], which can find a hyperplane divided by the maximal margin in the positive and negative subsets.

As evaluation measures of these classifier, there are four indicators: accuracy, recall, precision and F-measure. First, the accuracy represents the ratio of correct classification in the total classifications. Recall is the number of assigned proper classifications divided by the number of assigned total exact categories. Precision is the portion of correct categorizations in the total classification. The F-measure indicates the combination of precision and recall.

3.4 VAR analysis

In this section, we use a VAR analysis to identify the relationship between financial data such as KOSPI and the exchange rate among social media sentiments. Vector auto-regression (VAR) is a type of random process that enables one to detect the linear interdependencies among multiple time-series data. A VAR model describes how k variables evolve over time using their past values as follows.

A pth order VAR, which is denoted by VAR(p), is:

$$x_{t} = \alpha + A_{1} x_{t - 1} + A_{2} x_{t - 2} + \cdots + A_{p} x_{t - p} + u_{t}$$

where xtj is the pth lag of x, α is a vector of constants, and ut is an error term that satisfies \(E(u_{t} ) = 0,\;E(u_{t} ,u_{s} ) = \varOmega\) and \(E(u_{t} ,u'_{t - p} ) = 0\) where Ω is the covariance matrix of error terms.

The Korea Composite Stock Price index (KOSPI), which was first introduced in 1983 with the base value of 100, is computed from the prices of selected stocks using a weighted average. Levin and Zerovs [30] find that stock market predicts economic growth consistently. Hence, KOSPI can be used as an important indicator for economic activities.

3.5 Granger causality test

The fact that variable X is a Granger causality to variable Y implies that the fluctuation of the past X may affect the fluctuation of variable Y. Granger causality and the precedence between variable X and variable Y can be determined by performing Grandeur causality test with different time lags. Granger causality test can be selected by inputting only two time series. The time difference or delaying time is set to 1, 2, 3, 4, 5 days, etc. The p value, which determines the hypothesis test result according to the delay time, can be used to estimate the relative Granger causality between the two variables. In this study, the alpha value (α) was set to 0.1, 0.05 and 0.01. After finding the rejected hypothesis at an alpha value of some variables, first we confirm whether there is an antecedent or an aftertaste among the variables. Then, a vector autoregressive analysis is used to find the time difference that the two variables show before and after. Therefore, Granger causality test and the vector autoregressive analysis were simultaneously performed in this study.

4 Results

4.1 Performance results of the sentiment classification

The performance results of sentiment classification are suggested in Table 2. Three types of feature sets have the highest F − 1 in MaxEnt-L1: 0.7351, 0.7456, and 0.9296. When we use the vector feature set, the MaxEnt-L1 classifier indicates the highest accuracy (0.6787). In particular, when we combine the feature vector and bag of words, recall, precision, and F − 1 have the highest values in MaxEnt-L1. As a result, MaxEnt-L1 has better performance than five other classifiers.

Table 2 Sentiment classification performance results

4.2 VAR analysis

4.2.1 VAR analysis with KOSPI

The fact that variable X is a Granger causality to variable Y implies that the fluctuation of the KOSPI and economic-related keywords such as “boom,” “depression” and “unemployment” were selected to investigate the relationship between the financial market and the sentiment scores using a VAR analysis. The VAR model is known as a successful technique to predict interrelated time-dependent variables, structural inference and policy analysis. In this study, we consider four endogenous variables for the VAR analysis: KOSPI, “boom,” “depression” and “unemployment.” Furthermore, we use Granger causality test to identify the causal relationship between the KOSPI and four other keywords selected from social media.

Before Granger causality test is applied, it is necessary to determine the optimal lag length because Granger methodology is sensitive to the lag length. From the results of Akaike information criterion (AIC), the 5-lag length is selected as an appropriate lag structure for the variables. Granger causality test procedure involves estimating the following series of regressions. Each variable in this system depends on its own lags and the lags of other variables.

$$Z_{t} = c + \sum\limits_{i = 1}^{5} {A_{i} Z_{t - i} + \epsilon_{t} }$$
(1)

where Zt is an n × 1 vector variable. The vector of variables in the VAR is \(Z_{t} = \, [\begin{array}{*{20}c} {y_{t} } & {b_{t} } & {d_{t} } & {u_{t} } \\ \end{array} ]^{\text{T}}\), which includes KOSPI (denoted by y), extracted keywords “boom,” “depression” and “unemployment,” which are denoted by bt, dt and ut, respectively.

E(ϵt) = 0, E(ϵt, ϵs) = 0 for s ≠ t, and

$$E\left( {\epsilon_{t} ,\epsilon_{t}^{{\prime }} } \right) = \left[ {\begin{array}{*{20}l} {\sigma_{1}^{2} } \hfill & {\sigma_{1} \sigma_{2} } \hfill & {\sigma_{1} \sigma_{3} } \hfill & {\sigma_{1} \sigma_{4} } \hfill & {\sigma_{1} \sigma_{5} } \hfill \\ {\sigma_{1} \sigma_{2} } \hfill & {\sigma_{2}^{2} } \hfill & {\sigma_{2} \sigma_{3} } \hfill & {\sigma_{2} \sigma_{4} } \hfill & {\sigma_{2} \sigma_{5} } \hfill \\ {} \hfill & \vdots \hfill & \vdots \hfill & \vdots \hfill & {} \hfill \\ {\sigma_{1} \sigma_{5} } \hfill & {\sigma_{2} \sigma_{5} } \hfill & {\sigma_{3} \sigma_{5} } \hfill & {\sigma_{4} \sigma_{5} } \hfill & {\sigma_{5}^{2} } \hfill \\ \end{array} } \right].$$

The coefficients \(A_{i} = \, [\begin{array}{*{20}c} {\beta_{1i} } & {\beta_{2i} } & {\beta_{3i} } & {\beta_{4i} } \\ \end{array} ]\) are constants to be estimated. The test results can be obtained from Eq. (1).

  1. (i)
    $$H_{{{\text{o}}(1)}} :\beta_{21} = \beta_{22} = \cdots \beta_{25} = 0.$$
  2. (ii)
    $$H_{{{\text{o}}(2)}} :\beta_{31} = \beta_{32} = \cdots \beta_{35} = 0.$$
  3. (iii)
    $$H_{{{\text{o}}(3)}} :\beta_{41} = \beta_{42} = \cdots \beta_{45} = 0.$$

The above hypotheses can be interpreted as follows: The test analyzes the null hypothesis that: (1) The keyword “boom” does not cause KOSPI, (2) “depression” does not cause KOSPI, and (3) “unemployment” does not cause KOSPI. Hence, the test results in Table 3 show that “depression” and “unemployment” lead to KOSPI, whereas KOSPI causes “boom” and “unemployment.” Consequently, there is a bi-directional causality in the short-run dynamics between KOSPI and “unemployment.” The results reveal uni-directional relationships between “depression” and KOSPI and between “unemployment” and KOSPI. If we reject the null hypothesis of (i), then we conclude that there is a causality from “boom” to KOSPI.

Table 3 Pair-wise Granger causality tests

The outcome of Granger causality test to determine the interaction among KOSPI, “boom,” “depression” and “unemployment” for the specified period is shown in Table 3. The results show that both null hypotheses \(\beta_{31} = \beta_{32} = \cdots \beta_{35} = 0\) and \(\beta_{31} = \beta_{32} = \cdots \beta_{35} = 0\) are rejected. Consequently, “depression” and “unemployment” lead to KOSPI.

For each parameter estimate in Table 4, “boom” with lag 1 and lag 3 are statistically significant at the 10-percent level; “depression” with lag 2 and lag 4 are statistically significantly different from zero. Finally, “unemployment” at t − 1 and t − 2 have a statistically significant effect on the KOSPI. Hence, the selected keywords relating to economic terms such as “boom,” “depression” and “unemployment” with lags have a significant effect on the price of KOSPI. Furthermore, the coefficients of the KOSPI index with lag 3 are significantly different from zero.

Table 4 Vector auto-regression estimates

Table 5 shows the results of AIC and BIC values that were used as a criterion for model selection. Given the results, we prefer the model with the lowest AIC or BIC value. Hence, we prefer the fifth lag with the lowest AIC or BIC.

Table 5 AIC and BIC values

4.2.2 VAR analysis with exchange rates

In this study, we consider four endogenous variables: exchange rates, “price,” “year-end-tax” and “budget deficit.” Given the Akaike information criterion (AIC), we choose lag 2 for the optimal lag length.

The outcome of Granger causality test to determine the interaction among the exchange rate, “price,” “year-end-tax” and “budget deficit” for the specified period is indicated in Table 6. The results present that the extracted keywords from the sentiment analysis, such as “price,” “year-end-tax” and “budget deficit,” cause the exchange rates.

Table 6 Pair-wise Granger causality tests

As shown in Table 7, the estimated coefficients of “price” and “year-end-tax” with lag 2 are statistically significantly different from zero at least at the 10% level. The lagged value of exchange rates significantly affects the “price.” Therefore, Granger causality runs one-way from price, “year-end tax” and “budget deficit” to exchange rate (Table 8).

Table 7 Vector auto-regression model estimates
Table 8 AIC and BIC values

Regarding the VAR analysis of exchange rates, we prefer the second lag that minimizes both AIC and BIC values. Hence, we determine the second lag for the VAR analysis.

5 Conclusion

On the economic side, sentiment analysis is a notably interesting field of research. In this study, we conducted experiments using six classifiers to analyze the sentiment of the public in social media related to several economic words. We combined the machine learning method, statistical analysis and Korean economy. Then, we investigated the relation among the sentiments from three types of media (i.e., news, Twitter and blogs) and actual economic indicators such as KOSPI and exchange rates by applying Granger causality test and vector auto-regression model. We found whether the sentiment scores derived from large-scale datasets were correlated with the economic index over time. The results show that MaxEnt-L1 surpasses other classifiers that we expect. In addition, we used a VAR analysis to investigate the relationship between the sentiment of the public and the actual economic situation related to the economic theme. We confirm that the sentiment of the public shown in some economic words is actually related to the economic situation. In other words, analyzing the public sentiment can result in meaningful economic forecasts or useful information in the enterprise. In fact, a company that analyzes and uses the public sentiment through social media has a stronger effect on operations [12, 14]. Therefore, it is expected that companies will be able to see good effects if they recognize the importance of public sentiment analysis and apply it to their marketing, customer service and operation methods. In future research, we plan to show the public sensibility related to economic keywords and the effect on the actual economic situation by comparing the economic index with the more in-depth emotion of the public. In addition, the effect on the actual economic situation should be demonstrated instead of the public sensibility related to only few economic keywords by comparing the economic index with the more in-depth emotion of the public.