Evaluation of e-Word-of-Mouth through Business Intelligence processes in banking domain

Social networks and Internet discussions are valuable sources for a company’s marketing research and public relations management. The Internet is full of public communication in an unstructured form and reflects recent movements of contributors' perception of the company, brand, products, competitors or whole market. As one of the approaches to achieve a better view we propose to design metrics which should be followed in order to get valuable insight where the company stands in terms of its customers. This paper focuses on obtaining an e-Word-of-Mouth in the banking sector using publicly available data. The main goal is to design metrics and dashboards evaluating customers’ perception of a bank’s services based on the analysis of public Facebook sites and web discussions related to several banks in the Czech Republic. We studied several approaches to unstructured data analysis. Thus we present complementary findings in classification of the unstructured data analysis presentation as a set of summarised metadata, top peaks of primary qualitative data and results of automated semantic analysis of the unstructured data. Based on the result we discuss the possible value of an unstructured data analysis and related systems. We find out that the value could be in the identification of opportunities and threats in the market by unexpected movements in public opinion of the Internet crowd, which we suggest to explore in future research. The benefit of this report is to describe the processing of data that can be obtained with emphasis on their content, their further enrichment, and their users.


Introduction
The phenomenon that people talk and recommend their favorite products and services to their friends and followers plays an important role in shaping their behavior (Goyyette et al. 2010). Deeper understanding of these talks may be crucial in creating a successful marketing strategy in online communities. The most common method for monitoring these data is now online monitoring. But there are another data from different sources as internal and external databases, CRM, ERP, etc. in Data Warehouses of the companies which could be put into context with the data from online communities.
In marketing field these talks of customers are known as Customer Voice or Word-of-Mouth (WoM). The unstructured data gained from the Internet is also known as digital (Hu et al. 2006), electronic (eWoM) (Choi and Scott 2012) or online Word-of-Mouth (Wu and Zheng 2012). The most developed definition of the eWoM which captures present and future development of communication more complexly is stated by Bronner and de Hoog (2010) : "Any statementpositive, negative or neutralmade by potential, current or former stakeholders about a product, service, company or person, which is made available to a multitude of people, organisations or institutions, via a digitally networked platform." Potential of eWoM data can be used to obtain information to a broader audience as companies, professionals, and retrospectively users themselves. Any consumer in the world can connect to the Internet and read the opinions of others. The emergence of various social media like blogs, microblogs, social networks, forums, online reviews etc. is an important step for Customer Voice research. Users share there their personal experience with the companies, products and services and those are then followed by their transfer to other users. Simultaneously these sources create opportunity for companies to be visible and convince customers by communication on the quality of its services. Dellacoras et al. (2007) noted that the practice of reviewing products online significantly increases the potential for an empirical understanding of eWoM marketing. Breazeale (2009) states that digital platform is changing our understanding and the essence of the eWoM meaning. While the articulated evaluation disappears shortly after they were spoken, and it is very difficult to capture and analyze it, online statement lingers long after it was written and is not necessarily spontaneous. It is also immediate and accessible by others. Similar to classic WoM research shown that in Internet environment eWOM may have "higher credibility, empathy and relevance to customers than marketer-created sources of information on the Web" as stated Gruen et al. (2006, p. 449).
The importance of WoM in shaping consumers' attitudes and buying decisions led many researchers to examine its effectiveness in stimulating demand within various industries. There are researches including WoM influence on the buying decision and sales (e.g. Senecal and Nantel 2004;Tsang and Prendergast 2009;Chevalier and Mayzlin 2003), quality control (Ashton et al., 2014) or user and service experience (Hedegaard and Simonsen 2013;Pai et al. 2012). There are studies investigating eWoM in social networks (Wu and Zheng 2012) and estimating the social influence of individual nodes in social networks (Lü et al. 2011). Choi and Scott (2012) focus on the relationship between the use of social networking, user social capital, sharing knowledge and eWoM. The result shows that the intensity of use of the networks linked with confidence and identification, which has a positive impact on knowledge sharing.
Study conducted by Almossawi (2015) proved that "WoM has a positive influence on the youth's decision-making process when choosing where to open a bank account". This result can also lead to the importance of the customer segmentation according to their characteristics and preferences which they share on social networks and the data the banks has in their Data Warehouse. Banks can thus connect the characteristics from the social profile with the customer's behaviour. Banking services are one of the industry where analysis of WoM can be crucial to stay competitive in the financial market. The potential is in identification what clients attracts, how the trends in banking look like, what is necessary to improve in services and also in what is necessary to help the clients and how to communicate with them in the space of social networks. Lack of confidence in the banking services might also be the result of an increase in perceived risk, which can reduce customers' willingness to use banking services (Aurier and Siadou-Martin 2007). WoM can be a competitive advantage through banks can increase acquisition of prospects and retention of customers. The study of Shirsaver et al. (2012) found that the major determinant factors of positive WoM are corporate image, relationship marketing, perceived value, perceived risk, satisfaction, and loyalty. There are also studies which put in context WoM, service quality and customer satisfaction of the banking services (e.g. Yavas et al. 2004;Lymperopoulos and Chaniotakis 2008).
WoM affects individuals' decisions and influences organizations' operations. It has very important implications for a wide range of management activities, such as:  building brand and reputation,  increasing conversions, i.e. sales transactions,  acquiring and retaining customers,  product development,  Quality Assurance.
Also business managers start to pay attention to social networks communication and new type of Business Intelligence is emerging (Chen 2010). In Business Process Management bringing together the worlds of structured and unstructured data can add significant value to the enterprise. It can help to find the priority clients, problems relating to products and services, customer sentiment, find the next best step in business, identify activities of the competitors and customers, their reactions, etc.
Tremendous strides were made in recent years to automate the analysis of unstructured text data. The problem of semantic analyses is that their results should be quantifiable. Complexities in the analysis of unstructured textual data often results in only minimal use of the data (Ashton et al. 2014). So it is necessary to find a way how to generate outputs consumable to service providers. We are convinced that due to established culture, knowledge and technologies in companies the new methods has to adapt as much as possible to end users. According to Adamala and Cidrin (2011) the Business Intelligence solution must be built with end users in mind, as they need to use it.

Motivation of the research
Today Data Warehouses of banks contain mostly structured data as an asset they can easily measure. Business Intelligence (BI) is primarily directed to the presentation and analysis of numerical business data. Reporting systems, commonly based on dashboards, prepare quantitative data based on metrics in a report-oriented format that might include numbers, charts, or business graphics (Kemper et al. 2004). According to Kimball (2010) the metrics from the point of BI view are expressed on the basis of dimensional modelling as indicators and their characteristics, analytical dimension and their characteristics and the relationship between dimensions and indicators. COBIT 5 emphasizes the importance of business metrics. Metric is meant as degree, the extent to which company management is satisfied with the contribution of IT to meet business strategy.
Dashboards are applications that allow to organize pre-selected key performance indicators (metrics) in a clear and intuitive graphical form (Pour et al. 2012). At dashboard metrics can be viewed from many dimensions, for immediate use in decision-making processes in the organization. For business users dashboards bring the visibility and clarity of all monitored metrics and their instant overview of improving or deteriorating. Thus users can immediately assess the plan or reality and save their time.
Management of unstructured data determines how efficiently the company will deal with their customers in the future. The danger threatens from the ignorance of unstructured data can be sorted from dissatisfied customers, very loud customers, rapidly rising costs for customer service and their departure to breaching trust in the organization, the customer knows more than its employees. The new approach allows companies to consolidate unstructured data to central Data Warehouse is able to communicate consistently through all channels. The customer then feels that company knows him when he communicates with his counterpart, whether it's agent or vendor, or attends a customer portal. Also customer service operations at the same time can reduce costs while maintaining customer satisfaction.
Integration of unstructured and structured data were discussed on presentation level (e.g. Becker et al 2002) where structured data are accompanied with relevant texts. The structured data selected as a results of metrics viewed from different dimensions and relevant documents are presented side-by-side. Another integration exits on the level of extracting metadata from collections of unstructured data (e.g. Keith et al 2005;Sukumaran and Sureka 2006). Identifiers of the content items are treated as facts that are subject to analysis, whereas metadata fields (e.g., author, date of creation, length, and addressed product) are used for classification purposes and thereby act as analysis dimensions. This allows associate individual documents with numerical facts directly, based on shared dimensions and to investigate document frequencies, e.g., the number of documents that cover a certain topic and are connected to certain segment of customers.
An integrated framework of Business Intelligence with the inclusion of unstructured data was constructed by (Baars and Kamper 2008), but they focus more on classic enterprise data and data from CRM. They do not include eWoM as a possible source of data to BI process. We are convinced, that eWoM is specific source of data which has to be handled in specific way. Our intention is also unique with its focus to banking domain, which has specific requirements to business. This paper follows results and expands article of (Šperková 2014) and (Šperková and Škola 2015), where the first content analysis of banking data were conducted.
Our purpose is automation of the process of gaining the data and their pre-processing for further analysis. Automation can reduce cost and timeconsuming, manual and comprehensive analysis conducted by people like reading posts and search links in them. It is not able to capture the full transfer of expertise that customers write anywhere on the Internet. But at least in monitored publicly available sources can be analysed topics that interested users. Furthermore, these themes can automatically evaluate categories of sentiment and thereby obtain the distribution of subjects with positive or negative customer experience.
There are many studies conducted to mine the sentiment and opinion from the WoM and using the computer aided methods like Latent Semantic Analysis (Ashton 2014) or Machine Learning Classification (Pai et al. 2012). These methods are well known but are uneasily to implement in service practice. For this purpose, the powerful tool Elasticsearch seems to be adequate. There are only a few academic articles, which use Elasticsearch in their research. These articles are focused on library science (Johnson 2013) and full-text searching (Divya and Goyal 2013) or big log data (Bai 2013). Textual data analysis was the part of theses elaborated at the Department of Information Technologies at the University of Economics in Prague this year. These thesis uses unstructured data as the input and Elasticsearch as a tool for data analysis. Methodology used in those thesis are wellconceived and executed but lacks business context.
The nature of unstructured data in contrast with structured data usually presented in BI solutions is different and its meaningful presentation may differ from usual BI dashboards. We discuss the possibility of measurement and dashboard presentation relevant to the nature of the data and its business importance.

Objective and methodology
The main objective of this research is to create a periodic review of the data evaluating banks according to the context in which their users speak about them on the Internet. Our approach is built on the methods used in BI and knowledge from unstructured data processing in BI. The insight will be given based on metrics which have to be defined on the base of Facebook and Web comments. After processing of information from those comments, metrics are counted and visualized on dashboards. The results is an overview of the sentiment of the talks about the bank in specific period and its position in monitored metrics compared to other banks in the market. The research is conducted as a case study and proof of concept which will be followed by other studies and anchored in a methodology. Our approach is conducted according to established Business Intelligence process (Kimball 2010) and data mining, eventually text mining methodology, specifically according to CRISP-DM (Chapman 2000) as the main aim of this integration is effective customer retention management. The lifecycle of the CRISP-DM contains 6 stepsbusiness understanding, data understanding, data preparation, modelling, evaluation and deployment. Compliance with these procedures we outlined basic methodology of the research as follows: 1. Identification of the Web pages and social network sites where regular information from customers and users of banking services can be obtainedbusiness and data understanding 2. Creating a system that will ensure downloading of the necessary data from the Internet and storing them in repository data preparation 3. Processing and data analysisdata preparation and modelling 4. Design of metrics and characteristics, which evaluate the bank from the customers' point of viewmodelling and evaluation 5. Design of the dashboard for the visualising the metrics and more detailed information evaluation and deployment Result of this paper is a dashboard which serves to further actions which should lead to better decision making and increasing performance. Figure 2 shows a general model of the decision making process from the unstructured data used at this research. The findings will provide important insights into the business impact of social media and user-generated content -an emerging problem in Business Intelligence research. Further this model can be easily integrated to the traditional, on structured data based, BI process.

Data collection and processing
From the marketing research point of view, East (2007) claims that it is not difficult to find the data on the Internet, but the problem may occur, if the data are only from one source/server. The eWoM may be affected and rather be negative or positive.
For this reason, we apply more than one data source. For the purposes of analysis and design of the metrics we chose comments that relate to banking occurring on the Czech website or Facebook profiles of Czech banks. Five Czech banks with the largest balance sheet total in 2012 and with the Facebook profile are shown in Table 1.

Connectors
For downloading the data from the Internet forums we programmed a web crawlers for automatic browsing website content by using Java language and open source crawler4j library under the Apache Licence v2. In crawler4j we set up rules which domain to browse and optionally specified rules for browsing URLs that were interesting in their content. A list of text strings in the URL which should not be contained at pages was also defined for more efficient browsing. This crawler received information which parts of the site not to attend because they contain no user comments. Parts of the HTML code, containing identification of the contributor, text (comment), date of the comment and eventually the number of reviews of the comment by other users, were separated and prepared for further processing.
For acquisition of data from Facebook we used Java library RestFB which contains classes for working with Facebook objects. To login we used credentials (assigned App Id together with Access Token) for the application created below the private Facebook profile. The advantage of this log is access the data without the need to renew the validity of credentials. The objects of downloading from Facebook are posts on the wall of Czech banks and the data about the Facebook page which are downloaded from. Post can be represented by text, picture, link etc. Every post can contain comments from other users. The download these objects are accomplished by withdrawing feed objects first. For each object is determined whether contains a comment. If so, this comment is downloaded. Comments on Facebook are in two layers. For each comment, users can respond by sub-comments, these are downloaded as well. For each object type post is also necessary to determine the number of Likes -a positive evaluation of the object.

Repository
As a repository and analysis tool of gained data we used open source Elasticsearch software based on Apache Lucene library. Elasticsearch is a distributed scalable system for real-time search and analysis tool whose main function is the full-text search. It also supports structured search, geolocation and recording the relationships between data. In Elasticsearch, the data from all sources are collectively analysed. The data from both connectors are stored to Elasticsearch in JSON format. Every document contain unique identification under it is stored. This ID enables to start connectors over again each day because Elasticsearch saves one document under one ID. Downloaded data were enriched by other two Java programs, which connected sentiment analysis and evaluation of named entities contained in posts.
Elasticsearch provides built-in support for analysis in the Czech language. Outputs from Elasticsearch were then processed and visualized in Kibana application. Plugin Head for simplification of indexes (data file) and application carrot2 for clustering documents were also used.

Sentiment analysis were conducted by open source
OpenNLP library which is used for programming the various tasks of natural language processing like detection of sentences, tokenization, document categorization etc. Evaluation of sentiment contributions are made through OpenNLP Document Categorizer based on the principle of maximum entropy. For the training of the categorization model we used data from the University of West Bohemia as an output of sentiment analysis of data from the Czech Facebook sites and reviews from the Czechoslovak Film Database web using machine learning with a teacher.

WoM information extract
Before the design of metrics we explore which type of information could be extracted from the unstructured eWoM data. The successful BI initiatives, as shown in (Adamala and Cidrin 2011) share factors like orientation on choosing best opportunities ("low hanging fruit") or alignment to specific needs of business sponsor. In our case the generic best opportunity could be found in fast, easy and simple understanding of movements in public opinion related to the company and its competitors. The business vision or specification of unstructured data analysis is difficult due the fact that the content of the data is not known in advance. Thus, the dashboard can be designed mostly by:  summarised eWoM metadata,  top peaks of eWom primary data,  automated semantic analysis of eWoM data.
The metadata such as source, type or time of the contribution enables easy summarising and graphical representation. These data are easily integrable to current BI environments. The reason of these data in eWoM analysis is to understand time, typological and quantitative differences, and recent and past movements in eWoM data.
The metadata are source for identification of top peaks in primary data such as topics with the highest absolute or incremental rate of appearance or the persons with the high influence. These primary data should be shown to the dashboard user as a primary, non-summarised content, because it entails the semantics not easily evaluable by computer. For example, when rate of appearance of terms such as "availability", "outage" or "failure" grows in conjunction with a competitor, it could be valuable information about technical conditions of competitor's e-banking system. Also topics widely discussed about the company, e.g. social network campaigns started by unsatisfied customers, can be intercepted in its beginning.
The automated semantic analysis is represented mainly by sentiment analysis, ie. identification whether the contribution is neutral, positive or negative. The output to dashboard can be the quantity of the customer's statements of different sentiment to measure the mood of the Internet public opinion or direct indication of sentiment of the top peak contributions. The example of a reason for semantic analysis could be an early cognition of negative or positive mood movements in the crowd after controversial marketing campaigns, thus is possible to avoid or intensify them.

Metrics design
The main purpose of metrics is to highlight the important facts that corporate resources or people need to be focused. Metrics summarize various aspects of the data in aggregate form and are comparable among the surveyed companies. From downloaded and indexed data is necessary to draw metrics and other characteristics evaluating banks from the customers' perspective. If the characteristics are of the quantitative type they are defined in proposed metrics in Table 2. Nominal characteristics are understood as dimension according them the metrics as measurable indicators can be calculated and sliced. Metrics along with dimension form the value for gaining the eWoM from the data. The highest value have dimensions created from textual analysis. Some results of metrics can be further used as dimensions to slice other metrics. For example one metric can be the calculating sentiment of different comments. Further this result can be used to slice metric most active contributors and show only those with negative sentiment. To better understand the content of posts and comments, the list of keywords has to be designed for better search of contributions according to user requirements. This is a domain knowledge of every enterprise which wants to use our procedure. This list can be always updated. Keywords are attributes for different dimensions. Considered dimensions in our case are:  Time period (e.g. month, week, day, date)  Source of the data (Facebook, Web forum)  Type of the contributor (Facebook user, Bank, follower, user (cookie))  Type of the page (individual Facebook page, individual forum page)  Type of the contribution (comment, post)  Name of the bank (keywords)  Name of the product (keywords)  Specific  Generic  Sentiment (positive, negative, neutral)  Topic User/cookie

Dashboards design
Designed metrics need to be placed to the Dashboard. In our case the dashboard is realized in an application Kibana. Dashboard Overview shows defined metrics and contains a set of visualizations that correspond to the quantitative questions about the stored data. Topic analysis dashboard shows topics or words that frequently occurred, or may be potentially interesting. It is designed to gain insight on the topics discussed in the context of the stored data. Dashboards are used for analysis of indexed data and are preparing for the final visualization. Data can be viewed from different angles, search allows querying specific subsets of data. Data which contain the specific shapes of searched word or phrase are then displayed. All objects defined in Table 5 and placed to dashboards also serve as filters that allow view data according to user interest. For example, finding where there are many negative posts, which source caused a blip in the number of contributions etc. Another option is to enter a query into the search and thus, for example, determine whether the messages contained some of the key words or how often a name of the bank occurs.
Issues which were of interest of commenting can occur in several ways -with objects Frequent Terms and Top unusual terms and a frequency of posts in the course of time. Identification of themes related to the contribution. It shows the word in the form after stemming rules and frequency of occurrence.

Top Unusual Terms
Terms that are statistically unusual Terms which occur more frequently than they according to statistical model by other data should. It highlights the novelties in selected data

Further findings and implications of the study
The results of reporting design may serve as indicator of the marketing department for the evaluation of bank in relation to others in the market, as a feedback for new product introduction, overview of the competition or the discovery of the customer wishes. It indicates what bank is customer friendly and what bank and issues people talk about. Longerterm monitoring of metrics can therefore tell where to apply banking products. From a managerial perspective, our results suggest that firms should pay attention to textual content information when managing social media and, more importantly, focus on the right measures. Therefore we also suggest closer cooperation of the people taking care of the social sites like Facebook and BI analysts. This approach could lead to higher customer satisfaction and growth of agility, profitability and orientation to the customers. Though we consider the metrics and the dashboard design itself as a main result of our study, we are able to extract a typology of a possible information value and thus present a distinctive business value which could be requested from similar cases. We can also discuss the consideration of overall business value of unstructured data-based intelligence systems.
The following unstructured data value typology is made by observation of the data presented on the designed dashboards. Deployment of this typology to action in several situations is a base for future research. We suggest to perform a qualitative study by a sample of power users over the analysed data to authenticate the acceptance of the typology. We are able to find value even in the primary information itself, such as in the content of public contributions, or in the amount of the contributions or similar quantitative values. While the static data representing the absolute value are predictible, dynamic information representing changes -such as topics or words with the highest change rate of appearance -creates unexpected value. That leads to further exploration of reasons and origins of the information. Such origin sometimes lead to one source, sometimes to a/the competitor's campaign, sometimes to a single Internet personality with extended influence. Such an influencer could be a possible partner in public communication.
The overall business value of the unstructured data analysis is a sum of all of the expected business value described above. This makes similar systems very difficult to evaluate and to calculate a business case. A lot of value could be found in the area of unexpected, surprising information. It can create a big opportunity or prevent an extensive threat. Such value cannot be calculated in a simple business case, because it is impossible to set probability of a rise of such surprising information from the eWoM. Then the value and ROI of unstructured data intelligence systems could be considered similarly as in Business Continuity Management approach; as the avoiding the possible business impact of not having the information, eg. in our case, as possible business impact of ignorance of the Internet public opinion.

Conclusion
The purpose of the research was to design a comprehensive overview of customers' eWoM based on web forums and Facebook comments. After a study of the approaches to unstructured data analysis and WoM analysis, we discussed the nature of the unstructured data analysis and possibilities of its dashboard presentation. We defined quantitative metrics evaluating individual aspects of customers' perception of the bank, dimensions and the way they can be displayed on the consolidated dashboards. We chose the Czech banking industry and Facebook pages and relevant websites with extensive discussions. The results were designed with respect to a possible future integration of the eWoM to Business Intelligence process and data structures in banks. The advantage of our approach is its extensibility. Connectors can be added for new sources of data; new metrics can be defined and incorporated to the dashboard. This approach can be also used besides banking in other enterprises.
The main outcome is the design of the metrics and the dashboards over the analysed public banking market data. The main findings are the way of the unstructured data analysis presentation as a set of summarised metadata, top peaks of primary qualitative data and results of automated semantic analysis of the unstructured data -especially the sentiment analysis, designed in the specific banking data dashboard.
Furthermore, we discussed, generalised and classified the possible value of unstructured data analysis and related systems. We found out that the value could be in the identification of opportunities and threats within the market by unexpected movements in the public opinion in the Internet crowd, which we suggest to explore in future research.
In the case of positive results of the typology validation, the future research could contain automatic classification of the data to identify the type of business value of information presented on the dashboard and thus transfer more intelligence from humans to automated unstructured data processing.