Expert Systems With Applications

called


t data is not
able to be arbitrarily manipulated.In addition, it is a highly reliable technology because network members continuously authenticate the data.Since 2008, when the first paper on Bitcoin was published ( Nakamoto, 2008 ), the increased attention and understanding of this powerful technology has generated great repercussions around the world.The development of blockchain technology has generated three major innovations commonly referred to as Blockchain 1.0, 2.0, and 3.0.Blockchain 1.0 refers to the evolution of currency and digital payment systems such cryptocurrencies like Bitcoin.Blockchain 2.0 is the application of blockchain technology to the financial sector more broadly.Blockchain 3.0 goes further still by applying the technology to sec-tors beyond currency or finance ( Swan, 2015 ).In the Blockchain 1.0 and 2.0, especially, cryptocurrency transaction and blockchain architecture are becoming major issues ( Peters, Panayi, & Chapelle, 2015;Zheng et al., 2017 ).However, the advent of Blockchain 3.0 has seen new value continually added to f elds of interest to Industry 4.0, such as the Internet of Things (IoT), smart contract, ecosystems, and storage systems, as well as to the fields of healthcare, finance, privacy and security ( Alharby & van Moorsel, 2017;Dagher, Mohler, Milojkovic, & Marella, 2018;Fan, Wang, Ren, Li, & Yang, 2018;Miraz & Ali, 2018 ).While previous iterations of blockchain technology related specifically to virtual currency or financial transactions, recent developments track the broader applications of blockchain technology.These trends indicate that the blockchain-related research will therefore be of interest to any sector in Industry 4.0, and thus the importance of predicting future applications of blockchain technology cannot be overemphasized.

Accordingly, current studies on blockchain trend analysis have been conducted, and our study shares this purpose.The most common approaches to analyze blockchain trends can be summarized as follows: (1) screening review and (2) bibliometrics analysis of the relevant papers.In the first approach, Lu (2019) and Zheng et al. (2017) show overall blockchain research lines.In specific fields, Alonso, Arambarri, Lópezhttps://doi.org/1 .1016/j.eswa.2020.1134010957-4174/© 2020 The Authors.Published by Elsevier Ltd.This is an open access article under the CC BY license.( http://creativecommons.org/licenses/by/4.0/ ) Coronado, and de la Torre Díez (2019) and McGhin, Choo, Liu, and He (2019) conducted a frequency analysis of publications related to blockchain technology, and a number of potential research opportunities are also discussed in eHealth and overall healthcare fields.Considering the social and economic aspects of blockchain technology and associated environmental issues, Giungato, Rana, Tarabella, and Tricase (2017) presents current trends concerned with the sustainability of Bitcoin.On the other hand, in the second approach, the bibliometrics method of the blockchain domain is a statistical analysis of trends using papers or book publications related to blockchain ( Dabbagh, Sookhak, & Safa, 2019;Miau & Yang, 2018 ), simply capturing bibliographic information or using a statistical frequency analysis.Yli-Huumo, Ko, Choi, Park, and Smolander (2016) proposed a systematic mapping study, which is able to find relevant papers through keywording based on the abstract.Identifying keywords and categories manually for the mapping of the papers, they summarize the challenges and positions and provide recommendations for future research direction.Zeng and Ni (2018) used term-frequency based textual analysis and social network to present blockchain research topics and the researcher-level co-authorship on the basis of Ei Compendex and China National Knowledge Infrastructure database between 2011 and 2017.However, these studies utilized short-term papers or limited databases and the concrete evaluation for their methods remains a challenging task.In brief, the previous studies are based on traditional and naive approaches which just review relevant literature or do simple frequency analysis without providing insights beyond revealed information about blockchain trends, and thus it is urgent to do comprehensive and in-depth trend analysis on blockchain technology.

Therefore, of particular interest to our study is a trend analysis through text mining approach focusing on topic modeling; we can identify the author's opinion or intention by extraction of potential topics from the text.In general, the initial trend analysis was conducted as a simple pattern analysis for 1-dimensional time series data ( Kivikunnas, 1998 ).However, recent developments in text analysis techniques have enabled trend analysis using text data, including user reviews, newspaper articles, papers, patents, keyword analysis that analyzes main words in specific documents, and social network analysis that can examine the association and impact among users ( Hung, 2012;Hung & Zhang, 2012;Kim, Jo, & Shin, 2015;Kim & Delen, 2018;Terachi, Saga, & Tsuji, 2006;Tseng, Lin, & Lin, 2007 ).In particular, topic modeling has gained a lot of attention recently by researchers in trend analysis since the main purpose of trend analysis based on text data is to detect the up and down trends about frequency of each topic in the target documents ( Kang, Kim, & Kang, 2019 ).

Specifically, topic modeling identifies and classifies latent topics of each document.This method has coevolved with advances in machine learning and text mining techniques.Probabilistic latent semantic analysis (PLSA), one of the most widely u ed techniques of topic modeling, is a probabilistic topic model also known as aspect modeling, which is a latent variable model based on the termdocument matrix of co-occurrence data ( Hofmann, 1999 ).The superiority of PLSA was demonstrated by comparison with k -means and Latent Semantic Analysis (LSA) ( Newman & Block, 2006 ).As a variant or extension of PLSA, Latent Dirichlet Allocation (LDA) uses the Bayesian approach for parameter estimation to complement the incompleteness of PLSA on topic probability distribution ( Blei, Ng, & Jordan, 2003 ).However, it is difficult to interpret LDA without prior knowledge of latent topics and hyperparameters.Alghamdi and Alfalqi (2015) presented a paper comparing the techniques of topic modeling such as LSA, PLSA, and LDA.

The aforementioned probability-based statistical topic modeling techniques fail to capture the entire context of the document because it usually uses a uni-gram representation that considers a word independently ( Lu & Zhai, 2008 ).Alternatively, it s possible to use a n-gram representation, which considers multiple words simultaneously, but the efficiency of the model decreases rapidly due to the curse of dimensionality ( Bengio, Ducharme, Vincent, & Jauvin, 2003 ).Accordingly, Word2vec quantifies the word into a vector considering the context to solve the limitation of this representation ( Mikolov, Chen, Corrado, & Dean, 2013a ).In other words, it creates representations of words so that similar words are located in a similar space.Although this new representation is widely used and its performance has been demonstrated in recent text analyses ( Asghari, Sierra-Sosa, & Elmaghraby, 2018;Van Hooland, Coeckelbergs, Hengchen, & Rizza, 2017;Zhang, Xu, Su, & Xu, 2015 ), very few attempts have been made to develop a new topic model based on Word2vec.

In short, two types of existing studies on blockchain trend analysis have their own limitation.Screening review-based ones require a great deal of time and effort for screening and summarizing all literature.Bibliometrics analysis-based ones are not suit ble to discover underlying patterns that lie in blockchain-related fields.Furthermore, topic models commonly-used for trend analysis in other fields are generally based on uni-gram based word vector representations, which are non-contextual and sparse.In this paper, to overcome these problems, we propose a new topic modeling approach called Word2vec based latent semantic analysis (W2V-LSA) which makes use of Word2vec, contextual word embedding algorithm along with spherical k -means clustering.This technique allows one to quantify a word's contextual meaning in a vector format and to group the words with cosine similarity.We use the proposed method to perform blockchain-specific trend analysis, which can play a role as an advanced and useful alternative to extract meaningful topics involved in the current trends of blockchain.

Our contributions.The major contributions of this study are summarized as follows:

• As blockchain technology becomes more popular and the number of related technologies and studies increases, the topics of blockchain research become more diverse and precise.
Our contribution lies in the fact that we can provide different aspects against the bibliometrics method for the blockchain trend, therefore capturing the topics from the available literature based on a new topic model.In specific, this study also shows characteristics about blockchain technology trends of several leading countries in the blockchain-related research.• We propose a novel approach to extract more related topics to blockchain research than existing studies, which combines contextual embedding and clustering in a harmonized way.Firstly, we adopt a neural network-based word embedding algorithm which can generate representations of words to capture the context of the documents.Next, we use the cosine similaritybased clustering method to construct topic clusters.Finally, we propose a new topic allocation method via document vector construction and similarity calculation procedure between the topic cluster and the document.

• We demonstrate the performance of the proposed method to show its usefulness on real text data related to blockchain technology.The results show that our method can produce the highly coherent topics and to what extent the topics contain the core meaning of the documents found by topic coherence measures and keyword matching score, respectively.We also present qualitative evaluation on the actual documents to confirm its accuracy of topic detection.determine future directions of their studies.In addition, our proposed method is an informative tool for anyone responsible for strategic decision-making in the blockchain-related industries.By capturing the trends of various technology fields using blockchain, our method can be utilized to identify the prospects and marketability of each field.

This paper is organized as follows.In Section 2 , we represent the material and data pre-processing works.A new topic-modeling method that complements the limitations of existing techniques is proposed, which will be discussed in detail in Section 3 .Section 4 presents the results of the trend analysis and compares results with the previous method.Sections 5 and 6 contain a discussion of the results and the conclusion of this study.


Material

Fig. 1 is the process of data collection and preprocessing used to conduct topic modeling about blockchain.For blockchain technology trend analysis, we collected abstracts of blockchain-related papers from six paper database such as Scopus, ScienceDirect, Web of Science, IEEE Xplore, Google Scholar, and Korean Citation Index.A total of 763 abstracts of papers were collected, whose keywords and abstracts contain the words such as'Blockchain,' 'Block ch

n,' and '
lock-chain' from 2014 to August 2018.In this case, conference papers were excluded.To ensure a minimum amount of information in the text for topic modeling, we selected the abstracts whose character count is greater than 180, and a total of 231 abstracts were utilized for experiments.For the collected data, we performed preprocessing by excluding numbers and basic verbs, stemming and lemmatization.Specific words such as 'Blockchain' and 'Technology,' which are not meaningful as a topic index in the blockchain trend analysis, were designated as stopwords and excluded from analysis.Based on the frequency of words in a corpus, we employed Zipf's Law, a method to remove either too common or too rare words.In each country, a final vocabulary set, to be used in analysis, was constructed by extracting about 60 to 100 unique words.Fig. 2 represents the number of blockchain-related papers published per year and by country.Since 2016, the number of published papers has risen sharply, and the growth rate of papers in 2017 is about 52%.The number of papers in the top three countries -Korea, the US and China-accounts for about 69% of the total number of papers.In particular, the growth rate of papers in the second quarter of 2018 is 34% in the US and 62% in China.


Methodology


Word2vec

Word2vec, a neural network-based model, represents words in corpus as a vector with contextual comprehension ( Mikolov, Chen, Corrado, & Dean, 2013a ).In vector space, the closer the distance between two vectors, the higher the similarity of the two words.The result of Word2vec depends on two user-defined parameters: the dimensionality (i.e., size) of the vector representation m , and the maximum distance (i.e., window) between a word and words arou

sentence
δ.Word2vec is configured in twoways: skip-gram and continuous bag of words (CBOW).The major difference is that skip-gram is intended to predict the surrounding words by inputting the reference word, whereas CBOW predicts the current word using the surrounding words.


Spherical k-means clustering

Because it quantifies the degree to which two vectors point in the same direction by measuring their cosines, cosine similarity has been widely used in text data analysis ( Dhillon & Modha, 2001 ).Each word vector x i ∈ R m , i = 1 , . . ., N derived from Word2vec and the inner product with two word vectors represents the semantic similarity with cosine.The centre cluster is calculated by allowing the cluster vector c(i ) ∈ 1 , . . ., C be assigne

to x i and the cosine distanc
between x i and p q , q = 1 , . . ., C. The objective is to find the best adjustable cluster to minimize the cosine distance between x i and x q .σ iq is a constraint that defines whether clusters q and x i are equal (i.e.σ iq = 1) ( Buchta, Kober, Feinerer, & Hornik, 2012 ).
min i,q σ iq (1 − cos ( x i , p q )) s.t. σ iq = 1 , if c(i ) = q .
0 , otherwise .

(1)


Proposed method

In this paper, we propose W2V-LSA, a new topic-modeling method combining Word2vec and spherical k -means clustering in a harmonized manner.It can significantly increase the quality of topic modeling by overcoming the drawbacks of existing representation-based probabilistic-statistical models, are ill-suited to satisfactorily consider the context of documents.Fig. 3 shows the over in d

ail.

Steps 1: E
ch word in a corpus is vectorized as an m -dimensional word vector x i ∈ R m by Word2vec.Steps 2: Word clustering is performed by applying a spherical kmeans clustering method to the extracted x i .Each x i is assigned the closest cluster number by comparing p q and cosine similarity of the cluster.In this case, the name of the cluster is defined by considering the characteristics of the words assigned to the cluster, and it is considered as a topic.Steps 3: Each document-spe ific vector l j , j = 1 , . . ., D, representing the characteristics of the document, is generated by using matrix multiplication between the x i and N × D term-document matrix.Fig. 4 is a graphical representation of how to generate l j .Steps 4: Cosine similarity between x i in each cluster and l j is calculated.The final similarity between the cluster and the document is determined by the average value of the cosine similarity with the top t words of each cluster.The topic of the cluster with the highest similarity is assigned to the topic of the document by comparing their final similarity.This process is illustrated in Fig. 5 .


Results

In this section, we compare W2V-L SA with PL SA, a representative probabilistic topic model.This section consists of three parts.First, we show blockchain trend analysis results from PL SA and W2V-L SA.We then evaluate the performance of W2V-LSA quantitatively and qualitatively in terms of the accuracy of topic allocation and the relevance of the words in each topic, respectively.


Blockchain trend analysis


PLSA

We implemented PLSA to each term-document matrix based on uni-gram repre

ntation
or a single country.Bayesian information criterion (BIC) is used to minimize overfitting problem caused by increasing the number of parameters while maximizing loglikelihood function ( Schwarz, 1978 ).The number of topics in each country was determined where the BIC value is the smallest; the group comprised of Korea, the US, China, and the etc. group have 7, 9, 6, and 9 topics respecti

rder
f probability are designated as the main topic of the document.The PLSA results are presented in Table 1 .

It is noteworthy that in Table 1 the ratios are uniformly distributed.There are some characteristic results in each group.In the case of Korea, we found 'Fintech' and 'Regulation,' which are absent from others.'Healthcare/Privacy' accounts for a large share in the US and China.Also, the etc. group has a variety of topics compared to others as well as one unique topic, 'Real Estate.'

Table 2 shows the results of using PLSA to identify topics that change over time in each country.In Korea, topics such as 'S curity/Network,' 'Finance/Fintech,' and 'Virtual Currency/Bitcoin' were prominent in 2016 and 2017, while topics related to application fields in Blockchain 3.0, including 'Service/Trade', 'IoT', and 'Energy/Transaction,' comprised a big part of topic ratio in 2018.In the US, only 'Distributed Ledger' and 'Bitcoin/Transaction' topics appear in 2015, but other topics such as 'Healthcare/Privacy' and 'IoT/Smart Contract' rose to dominance in 2016.Except for a few specific topics such as 'Energy/Cryptocurrency' and 'Cloud,' various topics are uniformly distributed in 2017 and 2018.In China, 'Storage/Cloud' takes up a big share of 2016, but they are replaced by 'Transaction/Bitcoin' in 2017.From 2016 to 2018, documents related to 'Healthcare/Privacy' have consistently been predominant.In the etc. group, beginning with the document on 'Bitcoin' in 2014, studies on various fields have been carried out by 2018.


W2V-LSA

To create unique word vectors for each country, we applied Word2vec to documents for each country.We used the Skip-gram method and set m and δ to 100 and 12 respectively.When implementing spherical k -means clustering on x i ∈ R 100 , k , the optimal number of clusters, was decided by a silhouette measure that can estimate k considering the distance and density of the clusters ( Rousseeuw, 1987 ); the values of k for Korea, the US, China, and the etc. group were determined to be 6, 6, 7, and 7 respectivel

In this
tudy, we defined t as 3 in Step 4 to calculate the final similarity between the clusters and the documents.Table 3 shows the results of topic modeling using W2V-LSA.The top-ranked topics in Table 3 are distributed differently by country, denoting their characteristics.In Korea, there is a preponderance of papers related to 'Virtual Currency,' 'Regulation,' 'Economy' and 'Fintech,' which represents Korea's interest in the financial sector.In other countries, there are various topics including 'Healthcare' and 'Cloud' not seen in Korea.Especially noteworthy is unique topics such as 'Real Estate' in the etc. group.

Table 4 shows the results of W2V-LSA.In Korea, from 2016 to 2018, 'IoT/Network/Smart Contract' proved to be of continual interest, as were topics regarding the background of blockchain and finance fields such as 'Industry 4.0/Economy,' 'Virtual Currency/Regulation' and 'Finance.'In the US, 'Bitcoin/Cryptocurrency/Transaction' was prevalent for much of 2015 but interest in the topic began to wane after 2016.'IoT/Economy/Privacy' was especially popular, accounting for about 43% of topics in 2016, while 'Energy/Healthcare' has consistently occupied a large portion of 2016.In China, unlike Korea, topics such as 'Smart Contract/Energy/Trade,' 'Healthcare,' and 'Security/Signature' began to trend after 2016.In the etc. group, topics


Quantitative evaluation 4.2.1. Topic coherence evaluation

Perplexity is referred to as a key evaluation measure in probabilistic topic modeling.However, perplexity is unable to explain the semantic coherence of words for each topic on non-probabilistic models ( Chang, Gerrish, Wang, Boyd-Graber, & Blei, 2009 ).Alternatively, topic coherence can measure the quality of a topic with reference to how many the words of a topic coincide within the same documents or the semantic similarity among the words in the topic ( Aletras & Stevenson, 2013;Li, Wang, Zhang, Sun, & Ma, 2016;Mimno, Wallach, Talley, Leenders, & McCallum, 2011 ).The higher topic coherence score, the more the words for each topic cohere.In order to evaluate our topic model, we calculate two coherence m

t al., 2011 ) and (2) norma
ized pointwise mutual information (NPMI) ( Lau, Newman, & Baldwin, 2014 ).We compute the coherence as we increase T , the number of words for each topic.Top-T words have a large weight, which means the highest probability of the words in PLSA and the highest cosine similarity of the words in W2V-LSA respectively.Fig. 6 shows UMass and NPMI based average topic coherences for each model, PLSA and W2V-LSA, and T was varied from 3 to 14.As T increases, coherence scores decrease in both W2V-LSA and PLSA.For all conditions based on T values, W2V-LSA model outperforms PLSA.To be specific, the NPMI score gap between W2V-LSA and PLSA is the largest at T = 3 and the smallest at T = 14 .


Keyword matching evaluation

For measuring the accuracy of allocated topics to the documents, existing studies have used the data for text classification, which was already categorized or assigned to the topic.Since there are no exact labels for our data, we propose a quantitative evaluation method: keyword matching score (KMS).Unlike ambiguous results of existing topic modeling, this approach has the advantage of numerically measuring the accuracy.For computing KMS, we gathered keywords from each document and we counted how many top-T words of the topic exactly match the keywords.

KMS is:
KMS = T t=1 u t (2)
where u t is the sum of the number of words that exactly match the keywords.(3)

where w 1 , w 2 , . . ., w T are the weights assigned to the top-T words for ea

topic in case of the top-T
eighted KMS.KMS was computed for each model, PLSA and W2V-LSA, for several T ( Fig. 7 ).The KMS of W2V-LSA before top-4 and after top-12 is larger in W2V-L SA than PL SA.In the case of the top-5 weighted KMS, the score in the W2V-LSA model is significantly larger than that of the PLSA in all top-T words if only weighted scores were given to the top-5 words.


Qualitative evaluation

Results show that W2V-LSA is able to extract more detailed topics than PLSA for documents.This also means that W2V-LSA has the advantage of assigning more suitable topics to eac documen mple, the paper in Fig. 8 is related to the blockchain in the healthcare indust to the topic of 'Service/Trade' and 'Healthcare' respectively.It is because words such as "healthcare" or "medical" barely appear in the entire corpus of Korea compared with the word "service", and PLSA as a word frequency-based topic-modeling technique suffers from capturing precise information.

Fig. 9 is one of the documents in the US and its main content is about the application of the decentralized network system based  on cryptocurrency for the security of banking

onetary technocracy.For
the same reason as in the previous example, PLSA pointed out 'Network' as a topic for the document shown in Fig. 9 , while W2V-LSA identified 'Bitcoin/Cryptocurrency'.


Discussion

There are several methodological implications of this study.First of all, the use of the context-embedding representation is the considerable advantage of W2V-LSA compared with other topic models based on uni-gram based representation.To best of our knowledge, most of the studies on natural language models have demonstrated that the contextual embedding method outperformed the classical n-gram based models via empirical experiments ( Mikolov, Yih, & Zweig, 2013b;Schnabel, Labutov, Mimno, & Joachims, 2015;Sharma, Anand, Goyal, & Misra, 2017;Mikolov, Chen, Corrado, & Dean, 2013a ).W2V-LSA resolves the issues of sparseness and high dimensionality of the n-gram representation, and unlike probability-based statistical topic models it does not require any distributional assumptions which often degrade algorit

performanc
.Secondly, we demonstrated the feasibility and usefulness of W2V-LSA by comparing with PLSA in quantitative and qualitative ways and confirmed that W2V-LSA has a relative advantage over PLSA in finding precise topics for documents.W2V-LSA can extract topics not found in PLSA and topics derived from W2V-LSA are usually more detailed and definite than ones of PLSA; The words assigned to each topic are more relevant and meaningful than those of PLSA.PLSA tends to place topics with highfrequency words on top of other topics, while W2V-LSA captures diverse and distinct topics appropriately and it also forces words to belong to one cluster exclusively.This may be because W2V-LSA can learn the word periphery using the cosine similarity, making it feasible to derive the main words in one document even though their frequency is low in terms of the whole corpus.Finally, W2V-LSA can be used universally for any other studies using topic modeling as well as technology trend analysis, which creates the added value to the field of semantic expert systems.

Furthermore, this study has several managerial implications.First, it provides a data-driven text mining approach called W2V-LSA that allows to effectively and efficiently discover trends in blockchain technology without anyone investigating full texts of every document.As a content-based analysis technique, W2V-LSA can extract new information not found in the existing blockchain trend analysis at both national and temporal levels.Second, we can provide valuable insights to the blockchain-related academia and industry, and present the future implementing fields of blockchain technology.In the early blockchain research, a virtual currency like Bitcoin attracted much attention as a promising field of future research.This is particularly so in Korea, where virtual currency and its regulations have been discussed nationwide in 017.

Recently, global research on blockchain technology has been oriented toward other applications beyond virtual currencies, such as healthcare, smart contract, energy, cloud, and IoT.These emerging fields of study promise to have a significant impact in the near future.Besides, security still remains an important research topic because of attack attempts for blockchain technology itself.Therefore, it is also valued that the research on security of a decentralized network, which is one of the advantages of blockchain.Lastly, this study provides direction to enable industrial sectors such as new technology-based firms (NTBFs, i.e., technology-based startups), willing to leverage blockchain, to preoccupy a potential application domain based on blockchain.From the perspective of investment, it can bring real business value to NTBFs an promote crowdfunding and investment of venture capital firms ( Fiedler & Sandner, 2017 ).


Conclusions

This paper proposed a novel technique for topic modeling called W2V-LSA based on Word2vec and Spherical k -means clustering.We collected blockchain-related 231 documents and applied our method to analyze blockchain trends by country and time.We then presented current trends in blockchain technology and demonstrated the usefulness of the new method by comparing it with PLSA from quantitative and qualitative perspectives.The significance of this study lies in developing a new topic-modeling method as well as providing an indicator to present the future direction of blockchain study.

Although this study has a lot of contributions to technology trend analysis, but at the same time there are several limitations as well.We conducted the experiments in a limited scale to serve as a proof of concept; we compared our proposed m

hod only wit
PLSA under a small number of documents.We plan to expand the scope of our analysis to a larger-scale analysis of other advanced technologies along with a comparison to several other comparative methods.In addition, it should be noted that the optimal values of the user-defined parameters are data-dependent, which makes it hard to select those a priori.There is definitely a need to study this problem using a principled approach.

Possibility for several future research directions is worth investigating.This study can be applied to the trend analysis for any other domains not only for lockchain.In addition, it is expected to be widely used in several topics of research in which text data from different sources are collected such as patent analysis ( Lee, Yoon, & Park, 2009;Noh, Jo, & Lee, 2015;Xie & Miyazaki, 2013 ), customer online review analysis ( Jung & Suh, 2019;Korfiatis, Stamolampros, Kourouthanassis, & Sagiadinos, 2019 ), and text based-recommendation system ( dos Santos et al., 2018 ), which are recently attracting attention.Further, it would be interesting to investigate the performance of our proposed method when Word2vec in W2V-LSA is replaced by state-of-the-art word embedding methods ( Devlin, Chang, Lee, & Toutanova, 2018;Pennington, Socher, & Manning 2014 ).


Declaration of Competing Interest
None
Fig. 1 .
1
Fig. 1.Process of data collection and preprocessing.


Fig. 2 .Fig. 3 .
23
Fig. 2. Growth of the number of blockchain-related papers; Q2 means the second quarter of a calendar year.The detailed information of the etc. group is represented in Appendix.


Fig. 4 .
4
Fig. 4. Example of document vector construction.


Fig. 5 .
5
Fig. 5. Topic Modeling of W2V-LSA.


Fig. 6 .
6
Fig. 6.UMass and NPMI scores for each model.


Fig. 7 .
7
Fig. 7. KMS and top-5 weighted KMS for each model.


Fig. 8 .
8
Fig. 8. Example for comparison of PLSA (marked in light gray) and W2V-LSA (marked in dark gray).


Fig. 9 .
9
Fig. 9. Example for comparison of PLSA (marked in light gray) and W2V-LSA (marked in dark gray).


Fig. B. 1 .
1
Fig. B.1.Example for comparison of PLSA (marked in light gray) and W2V-LSA (marked in