When Bitcoin encounters information in an online forum: Using text mining to analyse user opinions and predict value fluctuation

Young Bin Kim; Jurim Lee; Nuri Park; Jaegul Choo; Jong-Hyun Kim; Chang Hun Kim

doi:10.1371/journal.pone.0177630

Abstract

Bitcoin is an online currency that is used worldwide to make online payments. It has consequently become an investment vehicle in itself and is traded in a way similar to other open currencies. The ability to predict the price fluctuation of Bitcoin would therefore facilitate future investment and payment decisions. In order to predict the price fluctuation of Bitcoin, we analyse the comments posted in the Bitcoin online forum. Unlike most research on Bitcoin-related online forums, which is limited to simple sentiment analysis and does not pay sufficient attention to note-worthy user comments, our approach involved extracting keywords from Bitcoin-related user comments posted on the online forum with the aim of analytically predicting the price and extent of transaction fluctuation of the currency. The effectiveness of the proposed method is validated based on Bitcoin online forum data ranging over a period of 2.8 years from December 2013 to September 2016.

Citation: Kim YB, Lee J, Park N, Choo J, Kim J-H, Kim CH (2017) When Bitcoin encounters information in an online forum: Using text mining to analyse user opinions and predict value fluctuation. PLoS ONE 12(5): e0177630. https://doi.org/10.1371/journal.pone.0177630

Editor: Kim-Kwang Raymond Choo, University of Texas at San Antonio, UNITED STATES

Received: January 16, 2017; Accepted: May 1, 2017; Published: May 12, 2017

Copyright: © 2017 Kim et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science, ICT and future Planning(NRF-2016M3C1B6929629, NRF-2016M3C1B6929579, NRF-2017R1A2B2005380) and Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(MSIP)(R7518-16-1028,High performance computing (HPC) based rendering solution development). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

The advancement of the ubiquitous Internet has resulted in the emergence of unprecedented types of currencies that are distinct from the established currency system. The rise of these so-called cryptocurrencies, of which the total supply is increased by using a unique method known as “mining”, has changed the way economic transactions are conducted among Internet users to a great extent. Following the introduction of Bitcoin in 2008[1], a range of cryptocurrencies comparable to Bitcoin have come into existence since 2010[2–4]. Currently, Bitcoin and other cryptocurrency variants are often used for online payments and transactions[4–6] with their circulation gradually increasing over time[3, 6].

In parallel with the increasing circulation of Bitcoin, a growing number of Bitcoin users take to social media or online Bitcoin forums to share information[6]. Yet, despite the plethora of information posted by Bitcoin users, the linkage between such postings and Bitcoin transactions has not been well-documented.

The present research builds on previous findings regarding Bitcoin-related online forums, and proposes a method to analytically predict the fluctuations in Bitcoin transaction counts and value using the data collected from user comments posted on the online forum. First, we extracted keywords of interest from user comments on the online forum. We analysed the relationship between the Bitcoin transaction count and price based on the extracted keywords and quantification. Then, we developed a model based on deep learning[7, 8] to predict the Bitcoin transaction count and price. The proposed method efficiently processed the readily accessible online data, and identified as well as utilized the elements that online forum users perceived as important.

Related work

Research on cryptocurrencies, particularly on Bitcoin, has been extensively conducted from diverse perspectives, e.g. the analysis of user sentiment as manifested by social media including Twitter[9, 10]. The aim is to determine the value of Bitcoin relative to social phenomena and incidents that have taken place since the introduction of the currency. These social phenomena and incidents include research on the extent to which Bitcoin price fluctuations are related to web search query volumes on Google Trend and Wikipedia, i.e. the extent to which these query volumes predict the Bitcoin price and trade volume[11–14].

Some recent research has focused on the characteristics of Bitcoin online forums. People who share common interests tend to post comments concerning certain topics on online forums[15–19]. Bitcoin is mostly traded on the web with many users making buying/selling decisions based on information acquired on the Internet[6, 20]. Therefore, it is possible to observe how users respond to daily Bitcoin price fluctuations, and to identify or predict future fluctuations in the Bitcoin price and trade volume [6, 20]. In addition, forum users are analysed and classified into Bitcoin user groups[6].

Some researchers simply analysed sentiments based on comments posted by forum users or focused on users per se without considering the information derived from cumulative user comment data gathered during a sample period[17, 21, 22], while others analysed online user comments.

In this regard, topic modelling has been actively explored as an effective technique for analysing user opinions from their online textual postings[23]. Topic modelling[24, 25] is a text-mining technique that extracts a set of prevailing topics and relevant keywords out of a large-scale document corpus. This topical information provides users with an instant overview of the corpus, thereby obviating the need to read through comments, which would otherwise be a tedious, time-consuming process.

Recently, collaborative filtering and topic modelling have been integrated for generating scientific article recommendation systems on an online community[26]. A Temporal Latent Dirichlet Allocation (TM-LDA) system was used to conduct an in-depth analysis of the online social community by employing an advanced Latent Dirichlet Allocation (LDA) topic modelling algorithm[27]. Likewise, application of the LDA approach to Chinese social reviews revealed the sentiments underlying some social events and services[28].

Methods

System overview

This section provides an overview of the proposed method. First, we gathered the data relevant to Bitcoin for the purpose of the experiment. More specifically, Bitcoin-related posts on the online forum, daily Bitcoin transaction counts, and its price were gathered. We also extracted and rated significant keywords from the data gathered on the online forum. Then, we selected the data of higher score ratings to generate the prediction model based on deep learning and used the model to predict the fluctuation in the Bitcoin price and transaction count (see Fig 1).

Download:

Fig 1. System overview.

https://doi.org/10.1371/journal.pone.0177630.g001

Data crawling

Data crawling was the first step in our analysis. The online environment for Bitcoin transactions is well defined and the rise/fall in its price depends on the supply and demand arising from users [2, 3, 5, 6]. We postulated that user comments on the targeted online Bitcoin forum would have an impact on the fluctuation of the Bitcoin price and transaction count. Thus, we crawled and analysed the relevant data.

The large online forum is home to a variety of Bitcoin-related topics, where users actively engage in conversations by posting comments and forming threads[6, 29]. The bulletin boards on the Bitcoin online forum are largely comprised of four different sections. Each section consists of three to five sub-sections. For example, the ‘Bitcoin’ section is sub-divided into ‘Development & Technical Discussion’, ‘Mining’, ‘Bitcoin Discussion’, ‘Project Development’, and ‘Technical Support’. We crawled the ‘Bitcoin Discussion’ subsection under the ‘Bitcoin’ section where comments are posted most actively.

The threads of comments and replies posted from 1 December 2013, when Bitcoin started to sweep the globe, until 21 September 2016 were crawled. Each thread, including the topics and all relevant replies, the time when such posts appeared on the forum, the number of replies posted, and view counts were crawled as well. Duplicate sentences were removed from the replies that quoted earlier posts or replies prior to crawling. We collected data in a legitimate manner, in compliance with the terms and conditions. Moreover, the collected data did not involve any personally identifiable information. The.json files of the Bitcoin forums crawled are presented in the Supporting Information.

Furthermore, we used Coindesk to crawl the daily Bitcoin price and the number of transactions for the abovementioned sample period (See Table 1).

Download:

Table 1. Summary of crawled data.

https://doi.org/10.1371/journal.pone.0177630.t001

In addition, we reinforced the learning model by crawling the widely used Google Trend data and Wikipedia usage data. Google Trend shows the search interest in a certain keyword on a scale of 1 to 100 based on its search volume on Google for a certain sample period. Google Trend data is widely used to analyse data and phenomena in multiple disciplines[30–34]. We gathered Google Trend data related to the keyword “Bitcoin”. The Wikipedia usage volume data is based on the page views of a certain keyword on a certain day, and broadly used in many analytical studies on data or Internet phenomena[34–36]. Again, we gathered data about the keyword “Bitcoin” on Wikipedia. Table 1 outlines the arrangement of opinion and market data crawled.

Analysis of user comment data

Our intention was to extract significant keywords used in Bitcoin transactions from the aforementioned crawled data. Therefore, we conducted topic modelling on every user comment to extract the keywords, which were in turn subjected to kernel density estimation for score rating.

Concept building.

Our main goal was to extract quantitative features related to diverse characteristics from documents (see Fig 2). We considered the feature value as the degree of relevance for a feature. In detail, the feature value represents the extent to which a document has a particular characteristic. For example, sentiment analysis concerns one such quantitative feature, or the extent to which a document is positive or negative. We generalised this idea to various other user-defined characteristics. Examples of such characteristics include the extent to which a document is related to finance, immigration, and family issues. In particular, we built a lexicon, i.e. a set of keywords, relevant to the characteristics and utilised it to assign a feature value to a document by computing the degree to which the document contains those characteristics defined in the lexicon and other potentially relevant keywords. In this study, we considered a characteristic to be a concept describing a particular phenomenon or object, and defined a concept by constructing a set of keywords, whose meanings were relevant.

Download:

Fig 2. Concept for building the workflow.

https://doi.org/10.1371/journal.pone.0177630.g002

Concepts can play an important role in document analysis in diverse fields. That is, one can build useful domain-specific concepts in economics, politics, and social sciences and define the characteristics of documents with respect to these concepts. In the case of a spam-filtering task on documents and comments, for example, we can actively employ a ‘spam’ concept consisting of suspicious terms that usually appear in spam mails to measure the likelihood of the comment being unsolicited mail.

Here, the concept building process was composed of two steps: (1) the initial construction of a relevant keyword set, followed by its (2) user-interactive expansion. In order to facilitate the first step, we provided a user with the initial sets of coherent keywords obtained with two different techniques. The first technique we used was topic modelling, which algorithmically computes those representative keywords emerging from a document corpus. The user can then select some of them as an initial word set for their own concepts. As the other method to provide initial keywords, we computed the representative keywords from the centroid vectors obtained by k-means clustering on word embedding vectors[37].

Once a user formed an initial, small-sized lexicon for a particular concept, the second step was to interactively expand it by using a recently proposed visual analytics system named ConceptVector. Based on the initial lexicon given as user inputs, ConceptVector recommended potentially relevant keywords to enable users to easily add a subset of them to the lexicon. As the lexicon expanded, ConceptVector adjusted the recommended keywords that match the semantic meaning of the concept.

The foregoing procedure is discussed further below.

Topic modelling for initial lexicon building.

The topic modelling approach we used to extract representative keywords emerging from a document corpus is non-negative matrix factorisation, where the non-negativity allows users to interpret the value from factor matrices as the relevance score of a word or a document to a particular topic as mentioned above.

In particular, we constructed a document-term matrix A from the 17,381 forum articles and 627,122 user comments collected from the Bitcoin forum (See Table 1). Each article contains five attributes, ‘content’, ‘topic’, ‘comments’, ‘date’, and ‘views’, whereas each comment contains ‘content’ and ‘date’ features. Using the `date’ field, we split the document-term matrix per day for our analysis. We then applied the topic modelling to each so as to extract the different topic sets and their representative keywords across different dates.

The mathematical details of this process are as follows. Given a document-term matrix where m is the number of articles and n is the dictionary size, Non-negative Matrix Factorization(NMF) approximately factorises it into two matrices and , where d represents the number of topics (50 in our study), e.g. (1)

The columns in the resulting matrix W correspond to different topics and the keywords corresponding to the dimensions of the k largest value in each column function as the representative keywords of the topic.

Expanding the lexicon via word recommendation.

We proposed two types of concepts in the system. A unipolar concept represents exactly one concept such as crude oil and immigration. A bipolar concept has two polarities that oppose each other, e.g. positive vs. negative, progressivism vs. conservatism. In the case of building a concept, the system has positive, negative, and irrelevant word sets. When a user provides a word as an input, the system provides 50 recommended words that are potentially relevant to the seed word. We then automatically sorted the recommended words into five clusters, using the k-means clustering, to gather closely related terms into one group.

Once the lexicon of a concept is created by user interactions, the document rating process utilises the concept built in the process above. Because of the lack of expression resulting from the limited number of words a person could manage, we applied the kernel density estimation (KDE) in the word rearranging phase.

Computation of document relevance to concept.

Prior to the KDE, the concept had a limited number of descriptive terms for a characteristic, which resulted in a lack of expression and description. Therefore, the KDE served for the probabilistic smoothing over every word. This smoothing process is the most important procedure for document analysis since the score rating process cannot consider synonyms or closely related words that also represent a specific concept. Based on the assumption that the input terms describe the concept sufficiently well, we constructed a kernel that exerts influence on the entire vocabulary. ConceptVector adopts a Gaussian kernel as described below.

For the class y ∈ {positive,negative,irrelevant}, the conditional probability for each class can be calculated by the distance function d that represents the distance between a word in the word set in each class and the kernel k that ensures a proper balance between the given word and the others. The conditional probability of a keyword z for a class c can be computed as below: (2) which can also be seen as the relevance score to each class.

Since our final goal was to obtain scores by taking all classes into consideration, we rated a concept in view of all classes. For instance, ‘happy’, in the case of a bipolar concept, was rated for the positive, negative, and irrelevant classes. We calculated the bipolar rating as below: (3) (4)

The range of the bipolar score is [-1, 1] because the max value of p(y = positive,z) and p(y = negative,z) are 1.

Prediction modelling

Granger causality test.

The Granger causality test is based on the supposition that if a variable X causes Y, then any change in X will methodically happen before any change in Y[17, 22, 38]. As shown in past research, slacked estimations of X display a measurably noteworthy connection with Y[17, 22, 38]. Nevertheless, connection does not imply causation. We test whether the time arrangement of a discussion of conclusions contains any prescient data with respect to vacillations in the Bitcoin transaction and price.

Our time arrangement at the Bitcoin transaction count and price, indicated by S_t, reflects day-to-day change in the Bitcoin transaction count and price. To test whether the idea of gathering feelings in the time arrangement could forecast the change in the vacillation in terms of the Bitcoin transaction and price, we considered the difference clarified by two linear models as in (5) and (6) below. The first model uses just n slacked estimations of S_t for the forecast. However, the second model uses the n slacked estimations of both S_t and the time series of a concept of forum opinions, meant by X_t−1,⋯,X_t−n. We completed the Granger causality test as indicated by the models in (5) and (6).

(5)

(6)

In view of the consequences of the Granger causality test, we can reject the null hypothesis, whereby the time series of a concept of forum opinions does not predict fluctuations in the Bitcoin transaction count and price with a high level of confidence. The Granger causality test was performed on the Bitcoin transaction count and price for a time lag of 1 to 12 days.

Deep learning model.

Using the gathered data and the analysed and rated comment data, we built a model for predicting the fluctuation in the Bitcoin price and transaction through deep learning. Deep learning is widely used for addressing diverse challenges[8, 39]. Despite the quantitative and qualitative increases in Bitcoin-related formal and informal data following the broadening applicability of Bitcoin, deep learning has rarely been used to explore Bitcoin price trends and to address other Bitcoin-related challenges. We created a setting to apply deep learning to the data spanning a period of 2.8 years.

As the first step, we standardised the data to improve its applicability to the learning model. An example of applicable input data is provided in Table 2.

Download:

Table 2. Example of deep learning data set.

The z-score (, where and represent the mean and standard deviation for every date, respectively) of data for the previous 12 days (t = 12) was used as the values.

https://doi.org/10.1371/journal.pone.0177630.t002

Subsequently, to use the input data for prediction, we set up a deep learning model. Multiple hidden layers were accumulated for learning to identify deep data structures. Specifically, 1, 2, 3, and 5 hidden layers were constructed to select the layer structure that returned the best possible prediction result. The number of neurons that were allocated to each hidden layer was 1,024.

As for the input layers, based on the input data provided in Table 2, 15 input data points were represented as serial vectors to allocate neurons based on the cumulative number of days spent on learning, i.e. 45, 75, 105, and 180 neurons were allocated to cumulative 3, 5, 7, and 12 days. As for the output layer, two neurons were allocated while the probability of rise/fall was represented with the softmax function. The prediction model was built using Google Tensorflow[7], and GPU operation (nVIDIA CUDA) was used to accelerate the deep learning process.

Results

Concept building results

Fig 3 shows the concept derived from the concept building phase and the words constituting the concept. We focused on a general phenomenal analysis of the meanings of the concept, rather than analysing all the words constituting the concept.

Download:

Fig 3. Ten topics generated by the Bitcoin forum documents.

https://doi.org/10.1371/journal.pone.0177630.g003

Because mining is a means of earning Bitcoin, many users share their opinions about its efficiency. In addition, the fundamental algorithm by which the Bitcoin is operated, namely ‘blockchain,’ is often discussed. Other than mining, Bitcoin can also be earned by transactions. Therefore, it is possible to conduct transactions with investment character, in which case related concepts include ‘transaction’ and ‘investment’. Moreover, the ‘wallet’, a kind of repository in which Bitcoin can be stored and used in subsequent transactions via mining, has given rise to many opinions. In addition, it would be possible to more accurately verify users’ considerations when they use Bitcoin through ‘security’ concepts relative to the problems that may occur as a result of mining and transactions.

The ‘silkroad’, a large marketplace that uses Bitcoin as a currency, has been exploited for illegal transactions and money laundering. Security therefore not only became a popular issue on the Bitcoin forum but also resulted in social problems, leading to the closure of the site. Although the situation was resolved when the site was closed towards the end of 2013, words regarding related exchange markets and companies attracted considerable attention from users. Therefore, many opinions on illegality related with the use of Bitcoin and consequent problems were verified through the concept of ‘illegality’.

Since the emergence of Bitcoin, many types of similar cryptocurrencies have been developed and are in use. Users’ discussion on the presence and availability of other cryptocurrencies can be found through the ‘altcoin’ concept. China dominates the pricing of Bitcoin with large funds, of which the trend manifest in the postings on the forum can be viewed via the concept ‘China’.

Results of Granger causality test and correlation test

In view of the after-effects of the Granger causality test, the null hypothesis was rejected. This suggests that the time series of the gathered data failed to forecast the fluctuation in Bitcoin transaction volume and price—i.e. β_{1,2,⋯,b} ≠ 0—with a high level of confidence. The Granger causality test was performed on the Bitcoin transaction count and price for a time lag of 1 to 12 days. Tables 3 and 4 list the test results.

Download:

Table 3. Statistical significance (p-values) of bivariate Granger causality correlation between Bitcoin price and concepts of forum opinions.

https://doi.org/10.1371/journal.pone.0177630.t003

Download:

Table 4. Statistical significance (p-values) of bivariate Granger causality correlation between Bitcoin transaction and concept of forum opinions.

https://doi.org/10.1371/journal.pone.0177630.t004

In addition, the Pearson Correlation Coefficient between the rating of each concept and Bitcoin price and transaction is shown in Table 5.

Download:

Table 5. Pearson Correlation Coefficient result.

https://doi.org/10.1371/journal.pone.0177630.t005

The foregoing results are partially indicative of the significance of the extracted keyword data. However, this process was only used for the purpose of verification. The entire data set was used to build the actual deep learning model for prediction.

Prediction results

We built and applied the deep learning model based on the gathered and KDE-based rating data to predict the Bitcoin transaction and price.

For the period from 1 December 2013 to 21 September 2016, 90% of the data were used for learning, with the remaining 10% used for validation. The accuracy rate, the Matthews correlation coefficient (MCC), and the F-measure were used to evaluate the performance of the proposed model.

Table 6 presents the prediction results. The most accurate prediction model for the Bitcoin price (accuracy rate = 80.39%) is based on the three-layer neural network and the previous twelve-day learning data. The most accurate prediction model for Bitcoin transaction (accuracy rate = 81.37%) is based on the two-layer neural network and the previous twelve-day learning data. Table 4 presents the results relative to the layer and learning data structures. Both three or more hidden layers and cumulative learning data for 12 days or longer resulted in negligible differences. Less than two hidden layers and cumulative learning data for less than 7 days proved to be insufficient for learning and compromised the prediction accuracy. Conversely, overfitting could possibly occur with the prediction accuracy failing to significantly improve, if more than five hidden layers and cumulative data for over 12 days were used.

Download:

Table 6. Experimental results of predicted Bitcoin fluctuation.

https://doi.org/10.1371/journal.pone.0177630.t006

Discussion

We analysed the user comments posted on a Bitcoin online forum to predict the fluctuation in the Bitcoin price and transaction count. Based on the easily accessible online data, the proposed method predicted the Bitcoin price fluctuation with an accuracy rate of over 80%. Moreover, online user postings influenced Bitcoin transactions. The proposed method shed light on some aspects of Bitcoin-related user comments affecting their decisions to buy/sell the cryptocurrency.

The causality test result indicated some topics associated with Bitcoin transactions. The Granger causality test result highlighted the concept ‘China’ as having a high causality toward the Bitcoin price with the p-value being 0.05 or less, which was significant. These findings suggest China exerts a strong influence on the Bitcoin price.

Furthermore, such concepts as Blockchain’, ‘Altcoin’, and ‘Transaction’ had a high causality toward Bitcoin transaction count with the p-value being 0.05 or less, which was significant. This finding suggests that topics related to the circulation and transaction of other types of cryptocurrencies have an impact on the Bitcoin transaction volume.

In addition, the correlation test found significant linear relations in most concepts, excluding ‘Silkroad’, which showed an insignificant linear relation. Hence, the experimental findings revealed some user comments that had the most significant relationship with and effects on the fluctuation in Bitcoin price and transactions.

That said, the proposed method has a limitation in terms of its broader applicability due to the fact that the concepts were constructed for a long period of time. For instance, the correlation coefficient of the concept ‘Silkroad’ was 0 or lower even though its construction was based on topics often mentioned by users in relation to some events taking place during a certain period, which hindered the extension of the analysis of the concept to the entire sample period. Thus, appropriate subdivision of the sample period would help to obtain a more accurate understanding of the users for topic modelling and to refine the analysis with additional approaches including sentiment analysis.

Moreover, the present findings warrant further studies on the analysis of user comments relative to the characteristics of Bitcoin forums.

To increase the accuracy of prediction, it is necessary to address a few challenges. The present work is focused on analysing online forum user comments and adds some formal or structured data to predict the fluctuation in the Bitcoin price and transactions. However, it may add to the reliability of the findings if the search results and relevant content on search engines were quantitatively analysed or if the social network data were analysed as they did in some comparable previous studies[21, 40]. Furthermore, it may be an efficient preliminary study to analyse and classify online forum users per se[41–45]. In addition, the postings may be worth filtering more meticulously [46–50] to more accurately corroborate the findings.

Information derived from online forum users seems to be well-suited for extensive research on cryptocurrencies as well as Bitcoin. In the same vein, keywords manifested in online forum user comments could be used for further in-depth analysis and understanding of cryptocurrency transactions. Online forum users’ propensities could also be a cue to identify the characteristics inherent in each cryptocurrency. Moreover, online forums are great sources of abundant informal and formal information, which serves to appreciate cryptocurrencies from diverse perspectives including money laundering, which is closely associated with cryptocurrencies [51–54].

Conclusion

With the increasing circulation of Bitcoin, its acceptability has drawn much attention in many ways [2, 3, 5, 14]. The present study is noteworthy in that it analysed the topics often mentioned by Bitcoin users and linked their meanings to Bitcoin transactions. The proposed method for predicting the fluctuation in the Bitcoin price and transactions based on user opinions on online forums is conducive to understanding a range of cryptocurrencies other than Bitcoin and increasing their usability, although it needs to be reinforced. In addition, the present approach to the salience of user comments on online forums is likely to yield more significant results in many other fields.

Supporting information

S1 File. Results of crawling Bitcoin forum.

https://doi.org/10.1371/journal.pone.0177630.s001

(ZIP)

S2 File. Python-based crawler source code for Bitcoin forum data collection.

https://doi.org/10.1371/journal.pone.0177630.s002

(ZIP)

Author Contributions

Conceptualization: YBK CHK.
Data curation: JL JC NP.
Formal analysis: YBK JL JC CHK.
Investigation: YBK JL JC.
Methodology: YBK JL JC CHK.
Project administration: YBK JC CHK.
Software: JL NP JC.
Supervision: JC CHK.
Validation: YBK NP.
Visualization: YBK JL JHK.
Writing – original draft: YBK JL NP JC CHK.
Writing – review & editing: YBK JL JC JHK.

References

1. Nakamoto S. Bitcoin: A peer-to-peer electronic cash system. 2008.
2. Böhme R, Christin N, Edelman B, Moore T. Bitcoin: Economics, technology, and governance. The Journal of Economic Perspectives. 2015;29(2):213–38.
- View Article
- Google Scholar
3. Grinberg R. Bitcoin: An innovative alternative digital currency. Hastings Sci & Tech LJ. 2012;4:159.
- View Article
- Google Scholar
4. Barber S, Boyen X, Shi E, Uzun E. Bitter to better—how to make bitcoin a better currency. International Conference on Financial Cryptography and Data Security. 2012:399–414.
- View Article
- Google Scholar
5. Reid F, Harrigan M. An analysis of anonymity in the bitcoin system. Security and privacy in social networks. 2013:197–223.
- View Article
- Google Scholar
6. Kim YB, Kim JG, Kim W, Im JH, Kim TH, Kang SJ, et al. Predicting Fluctuations in Cryptocurrency Transactions Based on User Comments and Replies. PLOS ONE. 2016;11(8):e0161197. pmid:27533113
- View Article
- PubMed/NCBI
- Google Scholar
7. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:160304467. 2016.
8. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–44. pmid:26017442
- View Article
- PubMed/NCBI
- Google Scholar
9. Matta M, Lunesu I, Marchesi M, editors. Bitcoin Spread Prediction Using Social and Web Search Media. UMAP Workshops; 2015.
10. Kaminski J. Nowcasting the Bitcoin Market with Twitter Signals. arXiv preprint arXiv:14067577. 2014.
11. Kristoufek L. BitCoin meets Google Trends and Wikipedia: Quantifying the relationship between phenomena of the Internet era. Scientific reports. 2013;3.
- View Article
- Google Scholar
12. Kristoufek L. What are the main drivers of the Bitcoin price? Evidence from wavelet coherence analysis. PloS one. 2015;10(4):e0123923. pmid:25874694
- View Article
- PubMed/NCBI
- Google Scholar
13. Yelowitz A, Wilson M. Characteristics of Bitcoin users: an analysis of Google search data. Applied Economics Letters. 2015;22(13):1030–6.
- View Article
- Google Scholar
14. Athey S, Parashkevov I, Sarukkai V, Xia J. Bitcoin pricing, adoption, and usage: Theory and evidence. 2016.
15. Bernstein MS, Monroy-Hernández A, Harry D, André P, Panovich K, Vargas GG. 4chan and/b: An Analysis of Anonymity and Ephemerality in a Large Online Community. ICWSM. 2011:50–7.
16. Hau YS, Kim Y-G. Why would online gamers share their innovation-conducive knowledge in the online game user community? Integrating individual motivations and social capital perspectives. Computers in Human Behavior. 2011;27(2):956–70.
- View Article
- Google Scholar
17. Kim YB, Lee SH, Kang SJ, Choi MJ, Lee J, Kim CH. Virtual world currency value fluctuation prediction system based on user sentiment analysis. PloS one. 2015;10(8):e0132944. pmid:26241496
- View Article
- PubMed/NCBI
- Google Scholar
18. Panzarasa P, Opsahl T, Carley KM. Patterns and dynamics of users' behavior and interaction: Network analysis of an online community. Journal of the American Society for Information Science and Technology. 2009;60(5):911–32.
- View Article
- Google Scholar
19. Sing CC, Khine MS. An analysis of interaction and participation patterns in online community. JOURNAL OF EDUCATIONAL TECHNOLOGYAND SOCIETY. 2006;9(1):250.
- View Article
- Google Scholar
20. Fleder M, Kester MS, Pillai S. Bitcoin transaction graph analysis. arXiv preprint arXiv:150201657. 2015.
21. Cohen-Charash Y, Scherbaum CA, Kammeyer-Mueller JD, Staw BM. Mood and the market: can press reports of investors' mood predict stock prices? PloS one. 2013;8(8):e72031. pmid:24015202
- View Article
- PubMed/NCBI
- Google Scholar
22. Bollen J, Mao H, Zeng X. Twitter mood predicts the stock market. Journal of Computational Science. 2011;2(1):1–8.
- View Article
- Google Scholar
23. Linton M, Teo EG, Bommes E, Chen CY-H, Härdle WK. Dynamic Topic Modelling for Cryptocurrency Community Forums. 2016.
24. Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. Journal of machine Learning research. 2003;3(Jan):993–1022.
- View Article
- Google Scholar
25. Lee DD, Seung HS. Learning the parts of objects by non-negative matrix factorization. Nature. 1999;401(6755):788–91. pmid:10548103
- View Article
- PubMed/NCBI
- Google Scholar
26. Wang C, Blei DM. Collaborative topic modeling for recommending scientific articles. Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. 2011:448–56.
27. Wang Y, Agichtein E, Benzi M. TM-LDA: efficient online modeling of latent topic transitions in social media. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. 2012:123–31.
28. Xianghua F, Guo L, Yanyan G, Zhiqiang W. Multi-aspect sentiment analysis for Chinese online social reviews based on topic modeling and HowNet lexicon. Knowledge-Based Systems. 2013;37:186–95.
- View Article
- Google Scholar
29. Bohr J, Bashir M. Who uses bitcoin? an exploration of the bitcoin community. Privacy, Security and Trust (PST), 2014 Twelfth Annual International Conference on. 2014:94–101.
30. Choi H, Varian H. Predicting the present with Google Trends. Economic Record. 2012;88(s1):2–9.
- View Article
- Google Scholar
31. Preis T, Moat HS, Stanley HE. Quantifying trading behavior in financial markets using Google Trends. Scientific reports. 2013;3.
- View Article
- Google Scholar
32. Kristoufek L. Can Google Trends search queries contribute to risk diversification? Scientific reports. 2013;3.
- View Article
- Google Scholar
33. Kang M, Zhong H, He J, Rutherford S, Yang F. Using google trends for influenza surveillance in South China. PloS one. 2013;8(1):e55205. pmid:23372837
- View Article
- PubMed/NCBI
- Google Scholar
34. Kim YB, Park N, Zhang Q, Kim JG, Kang SJ, Kim CH. Predicting Virtual World User Population Fluctuations with Deep Learning. PLOS ONE. 2016;11(12):e0167153. pmid:27936009
- View Article
- PubMed/NCBI
- Google Scholar
35. Moat HS, Curme C, Avakian A, Kenett DY, Stanley HE, Preis T. Quantifying Wikipedia usage patterns before stock market moves. Scientific reports. 2013;3.
- View Article
- Google Scholar
36. Mestyán M, Yasseri T, Kertész J. Early prediction of movie box office success based on Wikipedia activity big data. PloS one. 2013;8(8):e71226. pmid:23990938
- View Article
- PubMed/NCBI
- Google Scholar
37. Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781. 2013.
38. Granger CW, Huangb B-N, Yang C-W. A bivariate causality between stock prices and exchange rates: evidence from recent Asianflu☆. The Quarterly Review of Economics and Finance. 2000;40(3):337–54.
- View Article
- Google Scholar
39. Schmidhuber J. Deep learning in neural networks: An overview. Neural Networks. 2015;61:85–117. pmid:25462637
- View Article
- PubMed/NCBI
- Google Scholar
40. Matta M, Lunesu I, Marchesi M. Bitcoin spread prediction using social and web search media. Proceedings of DeCAT. 2015.
41. Juea W, Jian-pinga Z, Bao-huab Z, Cheng-ronga W. Online Forum Opinion Leaders Discovering Method Based on Clustering Analysis [J]. Computer Engineering. 2011;5:017.
- View Article
- Google Scholar
42. Li N, Wu DD. Using text mining and sentiment analysis for online forums hotspot detection and forecast. Decision support systems. 2010;48(2):354–68.
- View Article
- Google Scholar
43. Coles BA, West M. Trolling the trolls: Online forum users constructions of the nature and properties of trolling. Computers in Human Behavior. 2016;60:233–44.
- View Article
- Google Scholar
44. Benevenuto F, Rodrigues T, Cha M, Almeida V. Characterizing user behavior in online social networks. Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference. 2009:49–62.
45. Maia M, Almeida J, Almeida V. Identifying user behavior in online social networks. Proceedings of the 1st workshop on Social network systems. 2008:1–6.
46. Rout J, Dalmia A, Choo K-KR, Bakshi S, Jena S. Revisiting Semi-supervised Learning for Online Deceptive Review Detection. IEEE Access. 2017.
47. Feng S, Banerjee R, Choi Y, editors. Syntactic stylometry for deception detection. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2; 2012: Association for Computational Linguistics.
48. Shin Y, Gupta M, Myers S, editors. Prevalence and mitigation of forum spamming. INFOCOM, 2011 Proceedings IEEE; 2011: IEEE.
49. Shin Y, Gupta M, Myers SA, editors. The Nuts and Bolts of a Forum Spam Automator. LEET; 2011.
50. Hayati P, Potdar V, editors. Toward spam 2.0: an evaluation of web 2.0 anti-spam methods. Industrial Informatics, 2009 INDIN 2009 7th IEEE International Conference on; 2009: IEEE.
51. Choo K-KR. Designated non-financial businesses and professionals: A review and analysis of recent financial action task force on money laundering mutual evaluation reports. Security Journal. 2014;27(1):1–26.
- View Article
- Google Scholar
52. Choo K-KR. Cryptocurrency and Virtual Currency: Corruption and Money Laundering/Terrorism Financing Risks? Handbook of Digital Currency: Bitcoin, Innovation, Financial Instruments, and Big Data. 2015:283.
- View Article
- Google Scholar
53. Bryans D. Bitcoin and money laundering: mining for an effective solution. Ind LJ. 2014;89:441.
- View Article
- Google Scholar
54. Stokes R. Virtual money laundering: the case of Bitcoin and the Linden dollar. Information & Communications Technology Law. 2012;21(3):221–36.
- View Article
- Google Scholar

[ref1] 1. Nakamoto S. Bitcoin: A peer-to-peer electronic cash system. 2008.

[ref2] 2. Böhme R, Christin N, Edelman B, Moore T. Bitcoin: Economics, technology, and governance. The Journal of Economic Perspectives. 2015;29(2):213–38.
View Article
Google Scholar

[3] View Article

[4] Google Scholar

[ref3] 3. Grinberg R. Bitcoin: An innovative alternative digital currency. Hastings Sci & Tech LJ. 2012;4:159.
View Article
Google Scholar

[6] View Article

[7] Google Scholar

[ref4] 4. Barber S, Boyen X, Shi E, Uzun E. Bitter to better—how to make bitcoin a better currency. International Conference on Financial Cryptography and Data Security. 2012:399–414.
View Article
Google Scholar

[9] View Article

[10] Google Scholar

[ref5] 5. Reid F, Harrigan M. An analysis of anonymity in the bitcoin system. Security and privacy in social networks. 2013:197–223.
View Article
Google Scholar

[12] View Article

[13] Google Scholar

[ref6] 6. Kim YB, Kim JG, Kim W, Im JH, Kim TH, Kang SJ, et al. Predicting Fluctuations in Cryptocurrency Transactions Based on User Comments and Replies. PLOS ONE. 2016;11(8):e0161197. pmid:27533113
View Article
PubMed/NCBI
Google Scholar

[15] View Article

[16] PubMed/NCBI

[17] Google Scholar

[ref7] 7. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:160304467. 2016.

[ref8] 8. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–44. pmid:26017442
View Article
PubMed/NCBI
Google Scholar

[20] View Article

[21] PubMed/NCBI

[22] Google Scholar

[ref9] 9. Matta M, Lunesu I, Marchesi M, editors. Bitcoin Spread Prediction Using Social and Web Search Media. UMAP Workshops; 2015.

[ref10] 10. Kaminski J. Nowcasting the Bitcoin Market with Twitter Signals. arXiv preprint arXiv:14067577. 2014.

[ref11] 11. Kristoufek L. BitCoin meets Google Trends and Wikipedia: Quantifying the relationship between phenomena of the Internet era. Scientific reports. 2013;3.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref12] 12. Kristoufek L. What are the main drivers of the Bitcoin price? Evidence from wavelet coherence analysis. PloS one. 2015;10(4):e0123923. pmid:25874694
View Article
PubMed/NCBI
Google Scholar

[29] View Article

[30] PubMed/NCBI

[31] Google Scholar

[ref13] 13. Yelowitz A, Wilson M. Characteristics of Bitcoin users: an analysis of Google search data. Applied Economics Letters. 2015;22(13):1030–6.
View Article
Google Scholar

[33] View Article

[34] Google Scholar

[ref14] 14. Athey S, Parashkevov I, Sarukkai V, Xia J. Bitcoin pricing, adoption, and usage: Theory and evidence. 2016.

[ref15] 15. Bernstein MS, Monroy-Hernández A, Harry D, André P, Panovich K, Vargas GG. 4chan and/b: An Analysis of Anonymity and Ephemerality in a Large Online Community. ICWSM. 2011:50–7.

[ref16] 16. Hau YS, Kim Y-G. Why would online gamers share their innovation-conducive knowledge in the online game user community? Integrating individual motivations and social capital perspectives. Computers in Human Behavior. 2011;27(2):956–70.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref17] 17. Kim YB, Lee SH, Kang SJ, Choi MJ, Lee J, Kim CH. Virtual world currency value fluctuation prediction system based on user sentiment analysis. PloS one. 2015;10(8):e0132944. pmid:26241496
View Article
PubMed/NCBI
Google Scholar

[41] View Article

[42] PubMed/NCBI

[43] Google Scholar

[ref18] 18. Panzarasa P, Opsahl T, Carley KM. Patterns and dynamics of users' behavior and interaction: Network analysis of an online community. Journal of the American Society for Information Science and Technology. 2009;60(5):911–32.
View Article
Google Scholar

[45] View Article

[46] Google Scholar

[ref19] 19. Sing CC, Khine MS. An analysis of interaction and participation patterns in online community. JOURNAL OF EDUCATIONAL TECHNOLOGYAND SOCIETY. 2006;9(1):250.
View Article
Google Scholar

[48] View Article

[49] Google Scholar

[ref20] 20. Fleder M, Kester MS, Pillai S. Bitcoin transaction graph analysis. arXiv preprint arXiv:150201657. 2015.

[ref21] 21. Cohen-Charash Y, Scherbaum CA, Kammeyer-Mueller JD, Staw BM. Mood and the market: can press reports of investors' mood predict stock prices? PloS one. 2013;8(8):e72031. pmid:24015202
View Article
PubMed/NCBI
Google Scholar

[52] View Article

[53] PubMed/NCBI

[54] Google Scholar

[ref22] 22. Bollen J, Mao H, Zeng X. Twitter mood predicts the stock market. Journal of Computational Science. 2011;2(1):1–8.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref23] 23. Linton M, Teo EG, Bommes E, Chen CY-H, Härdle WK. Dynamic Topic Modelling for Cryptocurrency Community Forums. 2016.

[ref24] 24. Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. Journal of machine Learning research. 2003;3(Jan):993–1022.
View Article
Google Scholar

[60] View Article

[61] Google Scholar

[ref25] 25. Lee DD, Seung HS. Learning the parts of objects by non-negative matrix factorization. Nature. 1999;401(6755):788–91. pmid:10548103
View Article
PubMed/NCBI
Google Scholar

[63] View Article

[64] PubMed/NCBI

[65] Google Scholar

[ref26] 26. Wang C, Blei DM. Collaborative topic modeling for recommending scientific articles. Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. 2011:448–56.

[ref27] 27. Wang Y, Agichtein E, Benzi M. TM-LDA: efficient online modeling of latent topic transitions in social media. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. 2012:123–31.

[ref28] 28. Xianghua F, Guo L, Yanyan G, Zhiqiang W. Multi-aspect sentiment analysis for Chinese online social reviews based on topic modeling and HowNet lexicon. Knowledge-Based Systems. 2013;37:186–95.
View Article
Google Scholar

[69] View Article

[70] Google Scholar

[ref29] 29. Bohr J, Bashir M. Who uses bitcoin? an exploration of the bitcoin community. Privacy, Security and Trust (PST), 2014 Twelfth Annual International Conference on. 2014:94–101.

[ref30] 30. Choi H, Varian H. Predicting the present with Google Trends. Economic Record. 2012;88(s1):2–9.
View Article
Google Scholar

[73] View Article

[74] Google Scholar

[ref31] 31. Preis T, Moat HS, Stanley HE. Quantifying trading behavior in financial markets using Google Trends. Scientific reports. 2013;3.
View Article
Google Scholar

[76] View Article

[77] Google Scholar

[ref32] 32. Kristoufek L. Can Google Trends search queries contribute to risk diversification? Scientific reports. 2013;3.
View Article
Google Scholar

[79] View Article

[80] Google Scholar

[ref33] 33. Kang M, Zhong H, He J, Rutherford S, Yang F. Using google trends for influenza surveillance in South China. PloS one. 2013;8(1):e55205. pmid:23372837
View Article
PubMed/NCBI
Google Scholar

[82] View Article

[83] PubMed/NCBI

[84] Google Scholar

[ref34] 34. Kim YB, Park N, Zhang Q, Kim JG, Kang SJ, Kim CH. Predicting Virtual World User Population Fluctuations with Deep Learning. PLOS ONE. 2016;11(12):e0167153. pmid:27936009
View Article
PubMed/NCBI
Google Scholar

[86] View Article

[87] PubMed/NCBI

[88] Google Scholar

[ref35] 35. Moat HS, Curme C, Avakian A, Kenett DY, Stanley HE, Preis T. Quantifying Wikipedia usage patterns before stock market moves. Scientific reports. 2013;3.
View Article
Google Scholar

[90] View Article

[91] Google Scholar

[ref36] 36. Mestyán M, Yasseri T, Kertész J. Early prediction of movie box office success based on Wikipedia activity big data. PloS one. 2013;8(8):e71226. pmid:23990938
View Article
PubMed/NCBI
Google Scholar

[93] View Article

[94] PubMed/NCBI

[95] Google Scholar

[ref37] 37. Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781. 2013.

[ref38] 38. Granger CW, Huangb B-N, Yang C-W. A bivariate causality between stock prices and exchange rates: evidence from recent Asianflu☆. The Quarterly Review of Economics and Finance. 2000;40(3):337–54.
View Article
Google Scholar

[98] View Article

[99] Google Scholar

[ref39] 39. Schmidhuber J. Deep learning in neural networks: An overview. Neural Networks. 2015;61:85–117. pmid:25462637
View Article
PubMed/NCBI
Google Scholar

[101] View Article

[102] PubMed/NCBI

[103] Google Scholar

[ref40] 40. Matta M, Lunesu I, Marchesi M. Bitcoin spread prediction using social and web search media. Proceedings of DeCAT. 2015.

[ref41] 41. Juea W, Jian-pinga Z, Bao-huab Z, Cheng-ronga W. Online Forum Opinion Leaders Discovering Method Based on Clustering Analysis [J]. Computer Engineering. 2011;5:017.
View Article
Google Scholar

[106] View Article

[107] Google Scholar

[ref42] 42. Li N, Wu DD. Using text mining and sentiment analysis for online forums hotspot detection and forecast. Decision support systems. 2010;48(2):354–68.
View Article
Google Scholar

[109] View Article

[110] Google Scholar

[ref43] 43. Coles BA, West M. Trolling the trolls: Online forum users constructions of the nature and properties of trolling. Computers in Human Behavior. 2016;60:233–44.
View Article
Google Scholar

[112] View Article

[113] Google Scholar

[ref44] 44. Benevenuto F, Rodrigues T, Cha M, Almeida V. Characterizing user behavior in online social networks. Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference. 2009:49–62.

[ref45] 45. Maia M, Almeida J, Almeida V. Identifying user behavior in online social networks. Proceedings of the 1st workshop on Social network systems. 2008:1–6.

[ref46] 46. Rout J, Dalmia A, Choo K-KR, Bakshi S, Jena S. Revisiting Semi-supervised Learning for Online Deceptive Review Detection. IEEE Access. 2017.

[ref47] 47. Feng S, Banerjee R, Choi Y, editors. Syntactic stylometry for deception detection. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2; 2012: Association for Computational Linguistics.

[ref48] 48. Shin Y, Gupta M, Myers S, editors. Prevalence and mitigation of forum spamming. INFOCOM, 2011 Proceedings IEEE; 2011: IEEE.

[ref49] 49. Shin Y, Gupta M, Myers SA, editors. The Nuts and Bolts of a Forum Spam Automator. LEET; 2011.

[ref50] 50. Hayati P, Potdar V, editors. Toward spam 2.0: an evaluation of web 2.0 anti-spam methods. Industrial Informatics, 2009 INDIN 2009 7th IEEE International Conference on; 2009: IEEE.

[ref51] 51. Choo K-KR. Designated non-financial businesses and professionals: A review and analysis of recent financial action task force on money laundering mutual evaluation reports. Security Journal. 2014;27(1):1–26.
View Article
Google Scholar

[122] View Article

[123] Google Scholar

[ref52] 52. Choo K-KR. Cryptocurrency and Virtual Currency: Corruption and Money Laundering/Terrorism Financing Risks? Handbook of Digital Currency: Bitcoin, Innovation, Financial Instruments, and Big Data. 2015:283.
View Article
Google Scholar

[125] View Article

[126] Google Scholar

[ref53] 53. Bryans D. Bitcoin and money laundering: mining for an effective solution. Ind LJ. 2014;89:441.
View Article
Google Scholar

[128] View Article

[129] Google Scholar

[ref54] 54. Stokes R. Virtual money laundering: the case of Bitcoin and the Linden dollar. Information & Communications Technology Law. 2012;21(3):221–36.
View Article
Google Scholar

[131] View Article

[132] Google Scholar

Figures

Abstract

Introduction

Related work

Methods

System overview

Data crawling

Analysis of user comment data

Concept building.

Topic modelling for initial lexicon building.

Expanding the lexicon via word recommendation.

Computation of document relevance to concept.

Prediction modelling

Granger causality test.

Deep learning model.

Results

Concept building results

Results of Granger causality test and correlation test

Prediction results

Discussion

Conclusion

Supporting information

S1 File. Results of crawling Bitcoin forum.

S2 File. Python-based crawler source code for Bitcoin forum data collection.

Author Contributions

References