Introduction

The scientific production of tourism papers has significantly expanded over recent years. In 2005, the number of papers within this field on the Web of Science Core Collection was around 5000 articles. As of January 7, 2021, the search for the term “tourism” in this database retrieved more than 91,866 documents, of which 63,316 were articles published in over 8800 academic journals. Most of the articles were published in journals within the “Hospitality, Leisure, Sport & Tourism” category (43.22%). The others were published in journals of other categories: “Environmental Studies” (10.1%), “Management” (9.4%), “Environmental Sciences” (8.6%), “Geography” (6.3%), “Economics” (5.5%), “Sociology” (5.0%), “Business” (4.2%), “Green Sustainable Science Technology” (4.8%), and “Ecology” (2.4%). These results reflect the interdisciplinarity of tourism research (Oviedo-García, 2016).

Given the high number and the variability of academic journals in the field of tourism research, the proper selection of a journal is particularly important to researchers when submitting a paper for two main reasons: (1) the submission and the review process of a manuscript is time consuming; and, (2) the submission of a manuscript is only to one journal at any one time (Knight & Steinbach, 2008). In this way, dissemination of the results of an investigation may be delayed by an inappropriate choice of journal.

Two important factors stand out when selecting a journal for the submission of a manuscript: (1) the journal impact factor; and, (2) the fit between the journal and the topic of the manuscript (van Teijlingen & Hundley, 2002; Knight & Steinbach, 2008; Walters & Markgren, 2019). Currently, the impact factor is often considered as a benchmark for the prestige or reputation of a journal and it is used to rank and to compare journal quality, based on the premise that a higher citation rate will indicate higher quality (Klinger et al., 2005; Kurmis, 2003). Publishing in high-impact journals has a double benefit. At an individual level, researchers who wish to pursue an academic career are expected to publish in these journals and, at an institutional level, the University or the research center is likely to attract greater financial resources and to occupy a prominent position in international rankings.

The concept of the journal impact factor was introduced in 1955 by Dr. Eugene Garfield and its calculation is based on a ratio: the numerator represents the number of citations from the current year that refers to items published over the previous two years; the denominator represents the number of citable materials (source items) published over the same two years (Garfield, 2006). There can be several variants of Garfield’s index, depending on the formulation of the ratio and its numerator and denominator. The best-known indices within the international scientific community are the Journal Citation Reports (JCR), provided by the Web of Science (WOS) database of Clarivate Analytics, and the Scimago Journal Rank (SJR), provided by the Scopus database of Elsevier. The differences between these indices are mainly due to the different scientific databases from which the citations are sourced and the methodologies that are employed for their estimation (Falagas et al., 2008).

According to Knight and Steinbach (2008), the fit between the journal and the manuscript is probably the most important point in selecting a journal. In fact, if a paper is not in line with the journal content, some editors will reject it outright without a peer-review process (desk-rejection). A manuscript presents a good fit if the research topic reflects the research area of the journal and complies with the style and other formal requirements (e.g., article length, coherency of literature review, statistical analyses and practical value for practitioners) (Knight & Steinbach, 2008). However, the question of how to assess whether a research topic fits the requirements of an academic journal is not as straightforward as might appear. In a first approximation, authors should consult the scope and the range of topics noted on the journal website. If the journal defines its lines of research in a generic way, authors may when adopting an interdisciplinary approach have doubts over whether the topic of the manuscript fits the editorial line. Besides, other journals specify the range of topics with more precision, but authors may be unaware of the weight attributed to one topic among the others that the journal publishes.

The objective of this investigation is twofold. A tool will be proposed to quantify the fit of a research topic for a series of Tourism and Hospitality journals according to the frequency of occurrence of that topic. It will also set out a guide to the most appropriate journals based on the JCR impact factor and the topic-journal fit. To do so, text-mining techniques will be used to perform a bibliometric analysis of the abstracts of all the articles that have been published in the Tourism and Hospitality journals with the highest impact factor (Q1) indexed in the Web of Science within the “Hospitality, Leisure, Sport & Tourism” category. The literature review will be presented in Sect. 2; in Sect. 3, the research methodology will be described; the results will then be discussed in Sect. 4; and the conclusions will finally be presented in Sect. 5.

Literature review

The term bibliometrics was first introduced by Pritchard (1969) as “the application of mathematical and statistical methods to books and other means of communication” (p. 349). Different tools are used to perform a bibliometric analysis—some using very simple mathematical operations and others of greater complexity, which require greater knowledge of statistics and computing. Koseoglu et al. (2016) identified three different traditional bibliometric methods: (1) review techniques, (2) evaluative techniques, and (3) relational techniques.

Review techniques refer to structured literature reviews, systematic literature reviews, and meta-analyses/reviews (Zupic & Cater, 2015). Evaluative techniques can be classified into three groups (Hall, 2011): (1) productivity measures (e.g., number of papers per author, journal, institution or country); (2) impact metrics (e.g., number of citations per paper, author and journal); and (3) hybrid metrics, such as the average number of citations per paper or the h-index. Relational techniques can be divided into four categories: co-citation analysis, co-word analysis, co-authorship analysis, and bibliographic coupling (Zupic & Cater, 2015). Co-citation analysis seeks to determine the similarity between documents, authors, and journals (McCain, 1990). Co-citation analysis occurs when two papers, authors, or journals are jointly cited in the same paper. Co-word analysis or co-occurrence of words quantifies the extent to which two words occur in the same title, abstract, or keyword list (Zupic & Cater, 2015). Co-author analysis examines the networks that authors create when collaborating on scientific articles (Acedo et al., 2006). Finally, bibliographic coupling uses the number of references shared by two documents as a measure of their similarity (Zupic & Cater, 2015).

Most bibliometric studies in the Tourism and Hospitality literature are based on evaluative techniques, and their objective is to offer rankings of authors, articles, journals, and academic institutions using productivity and impact indicators, mainly the number of articles and number of citations (Table 1). Bibliometric analyses based on relational techniques are less common and have focused mainly on co-citation analysis and co-word analysis. The co-citation analyses generate different graphs with VOSviewer© software (Van Eck & Waltman, 2010) that represent the network connections of journals, authors or documents cited in the references. Performed with keywords, the co-word analysis identifies those used with the highest frequency.

Table 1 Bibliometric studies in tourism research

In recent years, the bibliometric analysis of literature has improved with the use of various text-mining techniques (Yang et al., 2018). Specifically, text-mining techniques in most bibliometric studies are applied to abstracts in a way that can identify research topics through word frequency analysis, relationships between words, and topic modelling. For example, Jiang et al. (2016) analyzed 1726 articles related to hydropower research between 1994 and 2013. They identified the terms with a high frequency of occurrence, as well as the main research topics, and their development. Gómez-Barroso et al. (2017) collected all papers published between 1976 and 2016 in the journal Telecommunications Policy. Text-mining techniques were used to identify the key topics, the dominant combinations of concepts, and the main areas of research. Ding et al. (2018) analyzed 1600 research articles on building energy savings between 1973 and 2016. The results identified the research trends at different stages using various timestamps. Yang et al. (2018) collected 1000 abstracts from Google Scholar, 249 full texts from ScienceDirect, and six articles published in 2016 or 2017 on solar forecasting. The results showed the most important and the most frequent abbreviations, as well as key innovations for recent advances in solar forecasting. Demeter et al. (2019) analyzed 566 papers on inventory research in the International Journal of Production Economics between 1994 and 2016, identifying the main topics, terms, research areas and their evolution over time. More recently, Correia et al. (2020) performed a bibliometric analysis of the last 20 years of research on virtual reality and augmented reality in tourism contexts. The full texts of 56 journal papers and 325 conference proceedings were analyzed using a topic modeling approach with the statistical package R. The most important latent topics that emerged from the journal papers and the conference proceedings were shown separately.

The use of text-mining techniques also plays an important role in analyzing the text of online consumer reviews, an important type of electronic word-of-mouth (eWOM) recommendations (Litvin et al., 2008). These comments on products and services, whether positive or negative, are used by other consumers to reduce any risks associated with purchase decisions and have great influence on consumer shopping behavior (Mauri & Minazzi, 2013; Sparks & Browning, 2011; Zhang et al., 2010). Particularly in the tourism industry, online consumer reviews are a very important source of information, due to the unique characteristics of hospitality services (e.g., intangibility, heterogeneity, and inseparability), especially when the consumer has no previous experience (Litvin et al., 2008). In this context, previous studies have analyzed the influence of online reviews on firm performance (e.g., sales and prices of hotel rooms) (Ogüt & Tas, 2012; Ye et al., 2009), the determinants of online reviews credibility (Munzel, 2016; Xie et al., 2011) or how firms manage online responses to consumer reviews (Mauri & Minazzi, 2013; Xie et al., 2014).

One of the most common techniques used to analyze the text of online reviews is topic modeling, especially the Latent Dirichlet Allocation (LDA) algorithm. This methodology is used to discover the underlying topics of massive data volumes of unstructured text (Guo et al., 2017). In the hospitality and the tourism literature, several researchers have employed LDA to analyze online reviews. For example, Guo et al. (2017) collected 266,544 online reviews for 25,670 hotels located in 16 countries with the objective of identifying the most important factors influencing consumer satisfaction. The LDA algorithm uncovered 19 dimensions such as location of hotel, breakfast, hotel staff, car parking, check in and check out, etc. The authors also demonstrated that the relative importance of the dimensions differed according to the gender and the age of the reviewers. Bi et al. (2019) analyzed 24,276 online reviews from two five-star hotels from TripAdvisor with the LDA algorithm. The results showed that the most important attributes of hotel performance were value, location, room, cleanliness, service, food/drink, check in/out, facility and Wifi/Internet. Song et al. (2021) collected 20,476 online reviews from 18,387 Tripadvisor users to understand how visitors experience public spaces on the Las Vegas Strip. The results revealed 30 distinct topics and the most important ones were exploring different hotels, night scenes, people watching, and walking long distances.

Other researchers used a text-mining method called Structural Topic Model (STM) (Roberts et al., 2014), a variant of the LDA model. For example, Hu et al. (2019) analyzed 27,864 hotel reviews in New York City from Tripadvisor with this algorithm to identify the antecedents of hotel customers’ dissatisfaction. The most negative topics in the reviews were severe service failure, dirtiness, booking and cancellation, noise, etc. The research also revealed how customer complaints varied across different hotel grades. Finally, Korfiatis et al. (2019) also applied STM to 557,208 Tripadvisor online reviews from airline passengers. The analysis was performed with R statistical software and the results suggested 20 topics relating to customer satisfaction, such as business class, value for money, low cost, delays, staff, check-in and airport experiences, etc.

Research methodology

Text mining is an emerging sub-discipline of data mining, which aims to extract useful information and knowledge from unstructured and semi-structured text (Miner et al., 2012). The idea behind text mining is the transformation of text into numerical data (Silge & Robinson, 2017). There is a wide range of text-mining analyses—some relatively simple and others based on more complex algorithms, as previously mentioned—to perform various interrelated tasks: information retrieval, document clustering, information extraction, natural language processing, sentiment analysis, and social network analysis (Kwartler, 2017).

This paper uses R software to perform the text-mining analysis for three main reasons. Firstly, this software enables rapid processing of massive texts. Secondly, text-mining analyses generally require the use of a programming language that can be adapted to the characteristics of the database and the nature of the research. And thirdly, there are packages and libraries specifically designed with this programming language for processing text databases. Figure 1 presents the research methodology that was used to perform the analyses and follows the usual workflow for text mining. It details the main functions and packages used for the purposes of this study. The statistical software package SPSS V. 26 and the Excel spreadsheet were also used to analyze the data.

Fig. 1
figure 1

Source: Adapted from Silge and Robinson (2017)

Text-mining analysis flowchart.

Data collection

The first step of the data-collection procedure involved the compilation of the articles from the following academic journals: Annals of Tourism Research (ATR), Current Issues in Tourism (CIT), International Journal of Contemporary Hospitality Management (IJCHM), International Journal of Hospitality Management (IJHM), Journal of Destination Marketing & Management (JDMM), Journal of Hospitality & Tourism Research (JHTR), Journal of Hospitality Marketing & Management (JHMM), Journal of Sustainable Tourism (JST), Journal of Travel & Tourism Marketing (JTTM), Journal of Travel Research (JTR), Tourism Management (TM) and Tourism Management Perspectives (TMP). All the articles under study were extracted from the Scopus database, because it has records of all articles published in these journals since their establishment. In contrast, articles are only indexed in the Web of Science from the year in which the journal was first indexed in this database. The articles were downloaded on 1 October 2020, with a total of 20,381 records, and only those articles with the available abstract were selected. The result was a database with 18,798 articles (Table 2). The abstracts of the articles were compiled as the primary data for text-mining, because they are a better representation of the main contents of the paper than various keywords, and the excessively lengthy time spans and resources needed for full-text analysis ruled out that option (Gómez-Barroso et al., 2017; Jiang et al., 2016).

Table 2 Number of articles in the database

Abstract cleaning

The process of abstract cleaning involved upper-to-lower case conversion and removal of numbers, punctuation marks, whitespaces, line breaks and stop words (Miner et al., 2012). The process of filtering out the stop words removed the most trivial words such as pronouns, prepositions, and articles (e.g., “the”, “for”, “these”, “in” etc.); and common words in abstracts, the meanings of which were not relevant for the purposes of this study, by comparing each word to a stop word list (e.g., “article”, “results”, “show”, “data”, “research”, “literature”, “sample”, etc.). The main functions used in this phase were: str_replace_all, tolower, stripWhitespace, revomeWords and stri_trim from various R packages.

LDA topic modelling

Latent Dirichlet Allocation is an unsupervised learning method and one of the most common algorithms for topic modelling (Guo et al., 2017; Köseoglu, 2020). It treats each document as a mixture of topics, and each topic as a mixture of words (Blei et al., 2003). The algorithm estimates two probability matrices: (1) a matrix of term-topic probabilities where each entry is the probability of term i being generated from topic j; and (2) a matrix of document-topic probabilities where each entry is the probability of terms from document i being generated from topic j. The LDA model was estimated using the collapsed Gibbs sampling method, following the process described by Kwartler (2017).

The number of desired topics have to be entered in the LDA algorithm before the model is run. The function FindTopicsNumber in the ldatuning package (Nikita, 2020) calculates the value of the metric “CaoJuan2009” for a different number of topics taking the Document-Term Matrix (DTM) as its input (a matrix where each row represents one abstract, each column represents one token and each value contains the number of times that token occurred in that abstract). Small values of this metric are indicative of a possible optimal number. Finally, the LDAvis package (Sievert & Shirley, 2014) was used to visualize the LDA models that were estimated.

In this study, the LDA algorithm was applied to the entire corpus of abstracts with the purpose of finding the main latent research topics in leading Tourism and Hospitality journals. The latent research topics identified were labeled using the words most likely to be generated by each topic. The matrix of document-topic probabilities was used to create a contingency table showing the number of articles from each journal assigned to the different research topics.

Contingency tables analysis

The contingency tables analysis was used to determine whether there was a statistically significant relationship between the topics and the journals given the categorical nature of the variables (Hair et al., 1999). The Chi-square test of independence contrasts the hypothesis of the independence of the variables against the alternative hypothesis that the distribution of one variable will differ from the various levels of the other (Hair et al., 1999). In turn, the corrected standardized residuals indicate whether the difference between the observed frequency and the expected frequency is statistically significant. At a confidence level of 95%, those residuals greater than 1.96 will indicate that the frequency is greater than expected, if the variables are independent, while residuals less than − 1.96 indicate that the frequency is less than expected under the condition of independence. The corrected standardized residuals were taken as a reference, to measure the degree of fit of the research topics in each journal.

Results

Main latent research topics in leading Tourism and Hospitality journals

Once the cleaning process of the abstracts was complete, the LDA algorithm was used to identify the main latent research topics. A document-term matrix was created for use as an input of the FindTopicsNumber function, which calculates the values of the metric “CaoJuan2009” for a different number of topics (Fig. 2). However, the determination of an optimal number of topics is not easy as the metric “CaoJuan2009” can indicate different solutions. Furthermore, choosing a small number generates very generic topics, while too high a number can produce overly specific topics. The LDA algorithm was performed on 18, 24, and 28 topics. Following the recommendations of Sievert and Shirley (2014), the solution of 18 topics was chosen for easy interpretation and for clearly distinguishing between each one without overlaps. Finally, the 18 topics were plotted with the LDAvis package for their visualization.

Fig. 2
figure 2

Selecting the optimal number of topics

Figure 3 shows the latent research topics identified (left-hand side) and the most representative terms of the topic ‘social media’ (right-hand side). The topics are represented with circles and the inter-topic distances are computed using multidimensional scaling, by using first and second Principal Component Analysis (PCA). The area of the circles indicates the prevalence of the topics that are sorted in descending order. The most useful terms for interpreting a topic appear when that topic is selected. The overall frequency (grey bars) and the estimated frequency of each term (red bars) within the selected topic define the most representative terms of each topic. However, Sievert and Shirley (2014) suggested that if the meaning of each topic is based on the most probable terms in that topic (red bars), then the problem will be that common terms in the abstracts will appear at the top for multiple topics, making it hard to differentiate the meanings of those topics. These authors therefore proposed a measure called “relevance” for ranking the terms, suggesting λ = 0.6 as the optimal value for topic relevance. The latent research topics were labeled with the words that were most likely to be linked with each topic (Table 3).

Fig. 3
figure 3

Main latent research topics in leading Tourism and Hospitality journals

Table 3 Main latent research topics and most relevant terms

The LDA algorithm also estimated a document-topic probabilities matrix with 18,798 rows and 18 columns. This matrix was used to assign each article to the latent research topic with the highest probability of belonging. Figure 4 shows the probability distributions of the articles from each journal on the different topics. As an example, most of the articles from the ATR were assigned to four topics: sustainable tourism, tourism economic, cultural tourism, and tourist experience. The total number of articles from each journal assigned to the eighteen research topics appear in Table 4.

Fig. 4
figure 4

Probability distributions of the articles on the latent research topics

Table 4 Contingency table of the latent research topics and the journals

Measuring the topic-journal fit

The degree of association between the research topic and the journal cannot solely be based on its observed frequency, since a topic may have a higher frequency in one journal than in another, due to the simple fact that the first might publish a greater number of articles than the second. Therefore, the corrected standardized residuals from the contingency table analysis were used to measure the topic-journal fit. The contingency table analysis reported a statistically significant chi-square value (χ2: 12,324.187; p: 0.000). This result suggested a relationship of dependency between the research topics and the journals. The analysis of the standardized residuals pointed to the nature and degree of this statistical relation. The higher the value of a residual, the better the research topic fit with the journal. On the contrary, negative values of the residuals were indicative of a bad fit.

As shown in Table 5, papers related to sustainable tourism, ecotourism and climate change had high adjustments in JST. Similarly, papers on hotel performance, restaurant industry, job satisfaction or Hotel, Restaurant and Institutional Education (HRIs) also had a good fit in IJCHM, IJHM, and JHTR. Papers on destination image also had a good fit in JTTM, JTR, and JDMM. The excellent adjustment of the cultural tourism on ATR also stood out. Furthermore, the high residual variance values indicated that the research topics simultaneously presented a good fit in these journals and a bad fit in the rest.

Table 5 Standardized residuals analysis between latent research topics and the journals

From another perspective, the variance of the residuals in each journal is an indicator of their degree of specialization. In this sense, the journals of this study can be divided into two groups. On the one hand, the publications with the greatest variability of fit and greatest specialization were: IJHM, JST, ATR and IJCHM (in that order). On the other hand, the publications with less variance of fit and a multidisciplinary approach were: TMP, JDMM, CIT, JHMM, TM, JTTM, JTR and JHTR (in that order).

Finally, taking as a reference the values of the residuals (the topic-journal fit) and the JCR impact factor (2019), a guide to the most appropriate journals for the main research topics in Tourism and Hospitality is shown in Fig. 5.

Fig. 5
figure 5figure 5

Journal selection based on the topic-journal fit and the JCR impact factor

Discussion of results and conclusions

The use of text mining analysis in the bibliometric field is gaining ground due to the new possibilities it offers. In this paper, this methodology has been used to quantify the fit of the main research topics found in the abstracts of the twelve leading journals in Tourism and Hospitality. This quantitative measure of the fit of the topic manuscript and the journal impact factor was designed so that authors may accurately assess journal suitability before the submission of a paper. The results that are discussed below and the main conclusions are then presented.

Firstly, the fit is a measure that depends on the research topic of each article and the referenced journals to which to submit the paper. The fit of the research topics will probably differ between other academic journals. For example, the fit for the topic “Tourism demand” in Tourism Economics and “Medical tourism” in Globalization and Health are likely to be higher than those reported in this study. In contrast, the JCR impact factor is an indicator that depends exclusively on the publication to which it refers.

Secondly, the fit of a research topic becomes relevant when it is known for different journals and its importance with respect to the impact factor depends on the variance of the fits from one journal to another. In this way, the fit is more relevant than the impact factor, as the variance of the topic-journal fit increases. On the contrary, the impact factor is more important when the variance of the topic-journal fit decreases. In these cases, it may be advisable to choose journals with a higher impact factor and a lower fit. For example, although papers related to medical tourism have a greater fit in JTR, they could be sent to TM, which has a higher impact factor.

Thirdly, the focus of the journals also influences the fit of the topics. It was observed that the variance of the topic fit increased in unidisciplinary publications and decreased in multidisciplinary publications. In this sense, it is recommended specialized journals be chosen, if the topic presents a close fit. On the contrary, those publications should be omitted if the topic presents a very low adjustment. The choice of multidisciplinary journals may be more conditioned by the impact factor, as the topics are not closely fitted.

As with all research, this study has also some limitations. First, the use of the impact factor has been questioned, as it may be manipulated by intentionally increasing the citations that a journal might receive (Kurmis, 2003; Vanclay, 2012). Second, authors may consider other factors when choosing a journal such as its visibility, the speed with which a journal will respond and the time taken between submission and publication, philosophical and ethical issues, the journal language, and the monetary cost for the publication, among others (Knight & Steinbach, 2008; van Teijlingen & Hundley, 2002). Thirdly, the temporal stability of the results cannot be guaranteed because the journals incorporate new content in each issue. In addition, the published research topics could change in various scenarios such as a new editorial line, new topics such as Covid, etc. In this sense, Mazanec (2017) demonstrated that the content of a journal's abstracts can vary significantly over time. The results may therefore be a good representation of the time period of the study. However, these analyses should be repeated in the future, to detect new research trends. Moreover, the journal impact factor can change over time. However, a simple update of the metric should be sufficient to update the information in Fig. 5.

Our own method also presents some difficulties and limitations. Text-mining analyses can be more complex than numerical data analysis due to the nature of language. This is a general limitation of the use of this methodology and the results may sometimes not be totally satisfactory. In this sense, the text-cleaning process must be a priority step before performing any text-mining analysis, because this process can significantly influence the quality of the results. Generally, the elimination of those words that have no meaning in themselves (prepositions, conjunctions, articles, adverbs, pronouns, etc.) is recommendable. It may also be necessary to eliminate words with a high frequency of occurrence whose meaning contributes nothing to the purposes of the study (e.g., article, results, objectives, authors, show, obtained, etc.). Sometimes, there are terms within the documents that can distort the results. In this study, all words referring to property rights or copyright (e.g., Elsevier, science, Ltd., all rights reserved, etc.) also had to be removed. Furthermore, the results obtained can show terms that need to be removed in the text-cleaning phase, which means that the entire analysis must be repeated. The text-cleaning phase can be thorny given the diversity of words that exist in any language, but it guarantees results that contribute the fulfillment of the objectives of the study.

Finally, it has been demonstrated that it is possible to quantify the topic-journal fit; a concept of a qualitative nature that is difficult to measure. Researchers from other knowledge areas may therefore also use the same method that has been followed in this study.