A Keyword Based Scientometric Analysis of Semantic Web Research in Tourism

The economy of many nations sustain on tourism which is an information intensive industry and requires semantic handling of data. Semantic web research is an area that deals with technologies for semantic handling of data. In this study, the results of an investigation on the extent of semantic web research in the field of tourism have been presented. For this purpose, data was retrieved from the Scopus database. Scientometric methods that help in keyword-based analysis were applied for analyzing the data using the R software environment.


INTRODUCTION
According to the United Nations World Tourism Organization (UNWTO), [1] "Tourism is a social, cultural and economic phenomenon which entails the movement of people to countries or places outside their usual environment for personal or business/professional purposes. These people are called visitors (which may be either tourists or excursionists; residents or non-residents) and tourism has to do with their activities, some of which involve tourism expenditure." Quite naturally, tourism is a data intensive industry that requires semantic handling of data. Now, Semantic Web technologies help in creating Web data stores, building vocabularies and writing rules for data handling. [2] Thus, applications of Semantic web technologies have made inroads into the tourism domain. To understand the state of semantic web research in the tourism domain, quantitative analysis of scholarly publication must be done. This is because quantitative analysis helps in predicting research topics at a macro level. [3] So, to get an understanding of the semantic web research scenario in the tourism domain, keyword based scientometric analysis of research publications on the domain was done. A scientometric study involves quantitative methods for studying the development of science. [4][5] The results of the analysis have been presented in this study while illustrating about important implications about the domain under consideration. In the following parts, the section titled Related work discusses about the different purposes for applying bibliometric/scientometric methods and associated relevant results from some studies that are mostly related to 'Tourism' or 'Semantic Web'; the section titled Data and Methodology discusses the processes by which data was collected and analyzed; the section titled Analysis discusses the observations made from the analysis of the data and the Conclusion section concludes the discussion while discussing about the implications of the presented study.

LITERATURE REVIEW
Garfield [6] has traced the evolution of Scientometrics while discussing about the contributions of Nalimov and Price. It is one of the earliest studies that had demonstrated the process of visualizing a domain using software designed for bibliometric analysis and has influenced several similar studies. Shtovba and Petrychko [7] have demonstrated the use of Jaccard index for assessing similarity of research areas and the process of identification of influential pairs of research areas. Eigenfactor, [8] the Audience factor, [9] h-index, [10] local citation score [11] are some of the indicators that are used in a scientometric analysis for getting overview of a domain under consideration. There are many studies like Vazquez et al. [12] that have presented scientometric analysis of literature on application of domains like artificial intelligence on other domains. Liu et al. [13] have done scientometric analysis to predict topics and trends related to tourism forecasting and have found an upward trend of research output in the domain with tourism demand related models being the hot topics. Sharafuddin and Madhavan [14] have used a combination of scientometrics, citation-based and theme-based systematic within different disciplines while also carrying out citation analysis.

DATA AND METHODOLOGY
The Semantic Web has evolved from the field of Artificial Intelligence, and semantics and ontology related research constitutes the core part of the Semantic Web research field. [22] Though ontological research had started from the 1980s, research on Semantic Web gained pace from late 1990s due to the support of the European Commission and United States. [28] In 2001, Tim Berners-Lee had indicated that the Semantic Web [29] is the harbinger of the next wave of transformation through enabling content-based access interoperability. So, for the current study documents published after 2001 till 2020 were considered. Though some of the terms for emerging concepts of that time related to semantic web, for example SPARQL, RDF, OWL, OIL, DAML+OIL etc. Were not included in the search query for Ding's study in 2010, some of them were included in the search query for the demonstrated study since they are quite popular now. But also some were avoided because it has been also noted by Ding that noisy data will be retrieved by the usage of the terms like OWL, OIL and DAML in the search query. The search was restricted to areas related to computer science and engineering. Conference reviews, editorial materials were excluded. The results were downloaded from Scopus on May 21, 2021. The results were reviewed manually and some articles were filtered based on a strategy similar to the strategy of Yu et al. [18] for filtering items from the corpus of study. Items were filtered when the words used in the search query appeared only in the keywords provided by Scopus or if the abstract did not have any relationship with semantic web. The analysis was done using the Bibliometrix package [30] for R. [31] Information about the downloaded data has been presented in Table 1. The annual scientific production has been illustrated in Figure 1. It can be seen that scientific production in the domain has increased from 2004 onwards.
literature review for assessing the research patterns and thematic evolution of research on blue tourism. The study notes a huge research growth in blue tourism and also sheds light on the conceptual relationship in the domain. Zhu et al. [15] have used co-word analysis procedures including keyword clustering and strategic analysis for mapping the research progress on ontology research and have also proposed the disciplinary incidence index. The study shows that 'applications in semantic web research' is among the frequently used keywords in the domain. Fang et al. [16] have presented the results of a scientometric analysis of publications climate change and tourism that the domain has become multi-disciplinary with a huge growth in publications. By doing a scientometric study of publications about cruise tourism, Vega-Muñoz et al. [17] have identified important research areas related to cruise tourism. A knowledge map for pro-poor tourism, built using a scientometric approach has been presented by Yu et al. [18] Qian et al. [19] have studied co-occurrence and collaboration networks and presented a scientometric review of research on travel websites thereby identifying the research hotspots of the domain and shedding light on the importance of the websites. Zhang et al. [20] have used semantic network analysis of subjects and social network analysis of networks together for mapping the knowledge domain of China's smart tourism research and revealing the research trends. Influential institutes and scholars related to the smart tourism research have been identified. Johnson and Samakovlis [21] have examined the development of concepts belonging to smart tourism knowledge through collaborative networks.
Ding's [22] study presents a bibliometric analysis of publications on Semantic Web retrieved from Scopus and Web of Science. Niknia and Mirtaheri [23] have employed cluster analysis, network analysis, co-word analysis to study publications on Linked Data retrieved from Scopus. The study has revealed the new concepts and also the research trend in the domain after the contributions from librarians and information scientists. Mika, et al. [24] have showed in their study on social network analysis of the sciences that the use of semantic web technologies can be beneficial for dealing with heterogeneity of data. Gandon [25] has identified the main research trends by conducting a survey of the research field of Semantic Web, Linked Data and Web of Data. Li et al. [3] have presented a bibliometric analysis of geo-ontology research with interesting visual analysis of research performance revealing the patterns of collaboration. An analysis of keywords, term words has been also presented. Bhattacharya [26] has used co-word analysis and links between keywords in a bibliometric analysis of the machine learning domain. St-Germain and Mongeon [27] have assessed the contribution of information science discipline in the field of Semantic Web research by studying the evolution of publications and discussion of topics

RESULTS AND ANALYSIS
The authorship related information about the documents was analyzed to get an overview about collaboration in the domain. 92.6% documents in the corpus are multi-authored. So, it can be said that research in the field under consideration is mostly collaborative. Further, using the information from Table 2, the value of Collaboration Index [30][31][32] is found to be 2.71, that is determined by dividing the total number of authors of multi-authored documents by the total number of multi-authored documents. Thus, in the field under consideration collaborative works mostly have a team of 2 to 3 authors per multi-authored document. Figure 2 shows the network [30,32] formed by collaborations [33] between countries. According to the degree of the vertex in the network, Italy, Spain, China, France, Germany are among the top countries with collaborations. Also, Italy and Spain are the top two countries in terms of total number of citations received.
Tracking the changes in research can help researchers to gain insights about the evolution of a research domain. [16] Research status and development trends can be studied using keywords. [3] The rest of the analysis is based on keywords associated with the articles. Both the Authors' keywords and Scopus assigned keywords were used for the study. Table 3 presents the top ten Authors' keywords and Scopus assigned keywords associated with the documents in the corpus. Since, keywords indicate a concise label to a document, keyword co-occurrence can effectively point out emerging topics of research. [16] So, the network formed by co-occurrence of Scopus assigned keywords has been presented in Figure 3. The size of the nodes in the network correlate with the top Scopus assigned keywords list. Figure 4 displays the keyword growth graph constructed from the Scopus assigned keywords to the documents. The graph clearly shows the high usage of the keywords 'ontology' and 'semantic web' among the other keywords by the documents in the corpus. To gain a better understanding about the usage of the keywords, the formulas for calculating Popularity Index [34] and the Promising Index [34] of entities proposed by Li et al. [34] has been applied on selected keywords. Li et al. [34] Popularity index helps in determining the percentage of publications discussing an entity among all publications within a specific period and the Promising index helps in determining the change in popularity of an entity in the research domain between two continuous periods. According    to Li et al. [34] approach the appearance of the entity is to be checked in title, abstract and keywords of the publications. It is envisaged that the formula for Popularity Index and Prestige Index will also be able to help in quantifying the importance of the keywords because of simplicity of the indicators.  Figure 5 displays the results of a co-word analysis that was done to map the conceptual structure based on the word co-occurrences in the corpus of documents under consideration. The data has been presented in a twodimensional graphical form. In the left of the graph articles related to ontology creation have been plotted. At the top articles related to tourism application can be found. In the right topics related to tourism services can be found. At the bottom topics related to Linked Data, mapping have been plotted. The clusters have formed based on the closeness of the keywords due to getting mentioned in articles together. [36] Three big clusters can be noticed. Among them, the biggest cluster is formed by co-occurrences of concepts like software agents, intelligent systems, automation, context, locationbased services etc. The next big cluster has formed due to the co-occurrences of concepts like point of interest, data    integration, linked open data, ontology mapping etc. And the third big cluster is made up of co-occurrence of concepts like taxonomies, named entities, machine learning techniques, question answering systems etc. By their size, these clusters symbolize about the areas where research in the domain under consideration is focused. Articles that mention words like context, intelligent systems, information sources, knowledge graph (belonging to different cluster than the previous words) are close to the middle of the graph and represent the centre of the research field.
Next the thematic evolution [37] of the field was studied. The year breaks in the time span 2002-2020 for analyzing the thematic evolution of the graphs were at 2005, 2010 and 2015.
The maps based on thematic evolution using the author-based keywords have been presented in Figure 6a, 6b, 6c, 6d. During 2002-2005 (see Figure 6a), semantic web was a generic theme in the research field. Silva and Rocha's [38] and Chiu and Leung's [39] works are among the top cited articles during this period. In the period 2006-2010 (Figure 6b), research related

CONCLUSION
The study includes a review of selected scientometric studies on tourism and semantic web research. Building upon the reviewed methodologies the study presents some important findings by analyzing data about studies on the applications of    to recommendation systems, twitter-based studies targeted at opinion mining were very well developed. Cambria et al. [40] work is one of the top cited works of this period. Semantic web continued as a generic theme. Some other generic themes were ontology construction, context modeling. During 2011-2015 (see Figure 6c) cultural heritage research was among the developed themes. Goy et al. [41] O'Keefe and Benyon's. [42] Fermoso et al. [43] works are the notable works of this period. During 2016-2020 (see Figure 6d) e-tourism, recommender systems, big data etc. were among the generic themes. Chu et al. [44] and Shi et al. [45] works are some of the works related to the generic themes of this period.
semantic web technologies in the tourism domain. The study highlights countries based on collaboration networks and also points out important areas of research based on keywordbased analysis and thus, can clearly provide a directional motivation to researchers intending to apply semantic web technologies in the tourism domain. The study clearly shows that semantic web started as a generic theme in the domain but specific semantic web technologies gained prominence in the later years. Research in the domain was found mostly to be collaborative. Intelligent systems, context, location-based services, point of interest, data integration, linked open data, ontology mapping, knowledge graph, taxonomies, machine learning techniques, question answering systems, etc. are some of the concepts around which research clusters can be found in the studied domain. It was found that major sources of the documents included in the corpus of the present study are Lecture Notes in Computer Science, CEUR Workshop Proceedings, ACM International Conference Proceeding Series, Communications in Computer and Information Science. This means that maximum documents are conference publications. A limitation of this study is that it presents the results from an analysis done over data collected from only one database. If data collected from other citation and indexing databases are also studied, then a better understanding of the research scenario of the domain under consideration may be obtained.