Gamication for behavior change: a scientometric review

Gamiﬁcation, which refers to the use of game design elements in non-game contexts, provides similar experiences and motivations as games do; this makes gamiﬁcation a useful approach to promote positive behaviors. In recent years, the volume of scientiﬁc publications focusing on gamiﬁcation has increased. It has been applied to different ﬁelds (e.g. learning and training, mental health, positive behavior and behavior change, personnel selection, employees training, etc.); in this way, the scientiﬁc community has spread out over different domains and with different aims. Gamiﬁcation has turned out to be an excellent method also to provide a sense of community, encouraging social interaction in both present and online contexts, increasing com-petencies in work and educational settings. This has made gamiﬁcation also a useful tool during the COVID-19 pandemic, in which it has been used to promote an active and interactive experi-ence in the educational environment. As a useful tool for keeping users motivated, engaged and active, there is a wide interest in adopting gamiﬁcation solutions for supporting and promoting positive behaviors and behavior change (e.g. quit smoking, ecological behaviors, food choices, civic engagement, mental healthcare, sustainability, etc.). However, the development of this research area has proceeded without a unique theoretical approach or a clear concept of the ﬁeld; even though several studies analyzing the literature have been made, a literature mapping of gamiﬁcation applied for behavior change is still missing. In this study, we use the CiteSpace software to examine 447 publications and their 20608 unique references on gamiﬁcation applied for behavior change. The corpus of studies was downloaded from the Scopus database and refers to studies published between 2012 and 2021. Several methods were used to analyze these data: (1) document co-citation analysis (DCA) was performed to identify the pivotal researches and the research areas; (2) author co-citation analysis (ACA) was performed to identify the main authors; (3) country collaboration and institutions network analysis were performed to identify the countries and institutions that contribute the most; and ﬁnally, (4) keyword analysis was performed to detect the most inﬂuential keywords and their change over time. Overall, we discuss the ﬁndings and the need for a more cooperative and united community, in order to make the use of gamiﬁcation applied to behavior change more effective, faster and goal-oriented. Hence, we introduce some future challenges to promote an improvement in the quality of publications in this research domain, as well as in other gamiﬁcation ﬁelds.


Introduction
The impact of video games has been studied in parallel with the development of the game industry (Dale, Joessel, Bavelier, & Green, 2020). Since the early '70s, a growing body of research has been investigating video games' effects on brain functions and behaviors and how they can affect our motivation and engagement (Reid, 2012). With the development of serious games, we started using video games features in non-playful contexts, in order to increase motivation and engagement of users (Djaouti, Alvarez, & Jessel, 2011).
Scientometrics 4 can help us identify the structure behind the literature, measuring research quality and the impact, by providing a map of the scientific field, and measuring the scientific quality of documents.
In the present study, a body of documents was identified in the Scopus 5 database. The articles and their references were classified by co-citation techniques in the CiteSpace software 6 (C. Chen, 2014;C. Chen & Morris, 2003). Hence, they were analyzed in their content following network and timeline analysis. Using CiteSpace software, documents were represented graphically in interactive maps. Parameters and metrics implemented in the software estimate the impact of documents, authors, keywords, countries, and institutions on a certain cluster or in the whole network. This is useful to identify the most influential documents over time in gamification's literature applied to behavior change and positive behaviors.
Our aim is to provide an accurate overview of the literature's structure and to describe in a structured and systematic fashion the developments and trends behind the gamification-related literature in the domain of behavior change, reporting the most influential documents, authors, keywords, countries and institutions. Considering the aim of our research, we state the following research questions:

RQ1.
Which are the most influential documents in the gamification for behavior change field? RQ2. Who are the most influential authors contributing to the research of behavior change? RQ3. Which countries and institutions contribute the most to the scientific outputs regarding behavior change?

RQ4.
Which are the different application domains in the behavior change field?

RQ5.
How have research trends in gamification applied to behavior modification changed over time? The article is organized as follows. We start with the presentation of the study protocol adopted to guide our scientometric review (see Section 2). In Section 3 we present the results according to the different analysis included in our work. We dedicate Section 4 to the discussion of the results, and Section 5 to present the future research directions and our research challenges. Section 6 concludes the paper.

Materials and Methods
In this section, we present the method adopted in our study that is visually represented in Figure  1. It is mainly composed of two macro-steps: (1) the Literature search and, (2) the Data analysis and visualization. The following sections present all the details needed to understand the study protocol used and possibly replicate it.

Literature search
The data used for the analyses include 447 publications on gamification and its application in the field of behavior change and positive behavior published between January 1st 2012 and September 30th 2021, with 20608 unique references downloaded from the Scopus database. The time range of publications depended uniquely on Scopus' availability and no a-priori temporal exclusion criteria was applied. No language restrictions were applied. The search code used was "( TITLE-ABS-KEY ( gamification ) OR TITLE-ABS-KEY ( gamif* AND element ) OR TITLE-ABS-KEY ( gamif* AND component ) AND TITLE-ABS-KEY ( "behav* change" ) OR TITLE-ABS-KEY ( "positive behav*" ) )". The search terms gamif* and behav* were chosen as they take into account all possible forms derived from the root (i.e. gamif* covers also gamification, and the verb gamify in all its forms; behav* covers behavior, behavioral, behaviour, and behavioural). Gamification is a recent discipline and large part of its literature is made up of book chapters and conference papers. The reason for choosing Scopus over other databases was its coverage of books, book-chapters, reference books, and scientific publications (C.-K. Huang et al., 2020;Pranckutė, 2021).
Considering the document typologies defined by the Scopus database, most of the documents fell under the category "Articles" (43.6%) and "Conference paper" (42.7%). They were followed by the categories "Review" (5.9%), "Book chapter" (3.3%), "Conference review" (2%), and other few categories with low level of incidence. Figure 2 presents the results of the frequency analysis performed on the sample, revealing the number of documents by year, the most productive institutions, authors and countries. Figure 2a presents the total number of documents by year. Overall, it can be observed an exponential growth in the research domain, except for the last two years. The lower level of publication in these two years can be assessed as a Covid-19 pandemic consequence. We expect that in the next few years it will return to the same growth trend. The Scopus database presents only 2 publications in the first year of appearance, reaching 16 publications the following year; they join their highest point in 2019 with 79 publications, which correspond to the 17,7% of the total production in the Scopus database. Figure 2b presents the 10 most productive institutions. Among them, Fondazione Bruno Kessler was the most prolific with 12 documents, followed by The University of Oulu and the Politecnico di Milano with 11 documents each. Other prolific institutions were The University of Waterloo (10 documents), Queensland University of Technology (9 documents), The University of Auckland (7 documents), Università degli Studi di Torino (6 documents), The University of Texas at Austin (6 documents), University of Saskatchewan (6 documents) and The University of Sidney (6 documents). Figure 2c presents the 10 most productive authors. A. Marconi and R. Orji were the most prolific authors with 11 publications each. Other prolific authors were F. Cellina (7 documents), P. Fraternali (7 documents), E. Loria (6 documents), J. Novak (6 documents), A. Rapp (6 documents), A. E. Pizzoli (6 documents), L. E. Nacke (5 documents), and H. Oinas-Kukkonen (5 documents). Figure 2d presents the most prolific countries. The first in appearance was the United States (79 documents), followed by United Kingdom (65 documents), Germany (43 documents), and Italy (43 documents). Other prolific countries were Austria (40 documents), Switzerland (32 documents), Spain (31 documents), Canada (29 documents), Netherlands (27 documents), and Greece (24 documents). Figure 2e shows the different areas of application according to the Scopus database division. The biggest area corresponded to the computer science domain (31%); this is understandable as most of the gamification is implemented in software and mobile applications. The second application domain was medicine (16%), followed by engineering (12%), social sciences (10%), and mathematics (8%). Other areas of application were energy (4%), environmental sciences (3%), psychology (3%), and health professions (3%). Other application areas represent the 10% of the frequency.

Data Analysis and Visualization
The data from the Scopus database were converted to a CiteSpace-friendly format (C. Chen, 2014) with the information related to each of the 447 publications retrieved. At this point, we used the CiteSpace software (version 5.8.R3) to analyze the data. Out of the 447 publications only 446 have been considered as valid. 19587 of the 20608 (95%) total references cited in the collected papers were considered valid (Figure 1). A small loss of references is due to data irregularities that cannot be processed by CiteSpace. This percentage of unprocessed references can be considered as a negligible loss of data (C. Chen, 2016).

Settings
To generate and analyze the networks with CiteSpace, we set the time span from 2012 to 2021, with the time slicing outline at 1 per year. For the node selection, we compared g-index with a scaling factor of 25 and 10. G-index is an improvement of the h-index that allows to measure the global citation performance of a set of articles. It is the "(unique) largest number such that the top g articles received (together) at least g 2 citations" (Egghe, 2006). Overall, we selected a g-index criterion with a K scale factor of 25 because it provided better silhouette and modularity indexes. Furthermore, to obtain the best network as possible, we set CiteSpace parameters "Link Retaining Factor" and "Maximum Links per node" as unlimited. After a first check, we decided to set "Look back years" as "80" to remove the few outlier values related to internet sites references with wrong temporal information: the year of these values were set to 1900, leading to alterations in the cluster identification and information, and timeline representation. In complex networks, a large number of links may prevent users from recognize salient links. Link reduction algorithm, such as Minimum Spanning Trees and Pathfinder networks may help in capturing the salient relationships between concepts (C. Chen & Morris, 2003). After a first visualization, we decided to apply the Pathfinder function to optimize the arrangement of the network nodes, as Pathfinder tend to better preserve the evolution of networks. An overview of the settings is presented in Figure 1.

Analysis
Document co-citation analysis (DCA) was performed to examine the frequency in which multiple documents have been cited together in later publications (Aryadoust, 2020;Carollo et al., 2021;C. Chen, Song, Yuan, & Zhang, 2008). The study of co-citation networks focuses on interpreting the nature of clusters of co-cited documents (C. Chen, Ibekwe-SanJuan, & Hou, 2010). If two documents receive high co-citations, they can be thematically connected with each other (Bar-Ilan, 2008). Author co-citation analysis (ACA) was performed to identify the times authors were cited together. It allows to identify higher-order connectivity patterns between authors (C. Chen et al., 2008). Keywords analysis was carried out to detect the most influential keywords and their change over time. It can provide information about the core content of the articles (X. Chen & Liu, 2020); analysis of keywords and their co-occurrence can help us find hot and frontier topics (Xie, 2015).
Then, country collaboration and institution network analysis was employed to individuate existing relationships and cooperations among different institutions and countries (Liu, Li, Shen, Yang, & Luo, 2018). Besides producing a cluster view, CiteSpace software can also generate a timeline view. For all the analysis mentioned before, this provides co-citation information as a function of the time sequence (Xie, 2015).

Metrics
To examine the properties of the networks and clusters, several temporal and structural metrics of co-citation were adopted. The parameters considered to detect the structural quality of the network were betweenness centrality, modularity Q index and average silhouette; while citation burstness, and sigma (Σ) were considered temporal and hybrid metrics (Carollo et al., 2021;C. Chen, 2014;C. Chen et al., 2009C. Chen et al., , 2010). An overview of the used metrics is presented in Figure 1.
Betweenness Centrality. The betweenness centrality is defined for each node in the network. It measures the extent to which the node is part of a path that connects other nodes in the network (C. Chen et al., 2010;Freeman, 1977). High betweenness centrality identify a node connecting two or more large groups of nodes (Gaggero et al., 2020). Hence, high values can identify documents or journals with great influence in the network.
Modularity Q Index. The modularity of a network measures the extent to which a network can be divided into multiple blocks. It has a range from 0 to 1, where low values suggest that the network cannot be reduced to clusters with clear boundaries. Instead, high levels refer to a well structured network, clearly divided into distinct groups. Anyway, values close to one can suggest that components are simply isolated from one another (Aryadoust, Tan, & Ng, 2019;Carollo et al., 2021;C. Chen et al., 2010;Gaggero et al., 2020).
Silhouette. The silhouette metric indicates the homogeneity of a cluster. Its value ranges between -1 and 1, where the values over 0 represent higher homogeneity. When its value is high, the cluster can be considered internally consistent and distinct from other clusters (Carollo et al., 2021;C. Chen et al., 2010).

Burstness.
The burstness refers to a sudden increase of the number of citations for a node or a cluster during a short time interval within the overall time period (C. Chen et al., 2010;Kleinberg, 2003). This metric reflects the most active research areas or the rising trend in the literature (Aryadoust et al., 2019).
Sigma. Sigma (Σ) is a measure for scientific novelty. It comes from the combination of betweenness centrality and citation burstness. Its value ranges from 0 to 1, where higher values indicate works with higher influential potential (C. Chen et al., 2009Chen et al., , 2010Gaggero et al., 2020).

Clustering
We used the clustering function in CiteSpace in order to identify clusters and their connections. Cluster labels are selected from noun phrases and index terms following three different algorithms: Log-Likelihood Ratio (LLR) (C. Chen, 2014), Mutual Information (MI) (Zheng, 2019), and Latent Semantic Indexing (LSI) (Deerwester, Dumais, Furnas, Landauer, & Harshman, 1990). Cluster labelling was conducted automatically using the LLR algorithm to compare the occurrences of terms in the citing articles. The cluster obtained through the LLR algorithm were numbered in descending order according to their cluster size. This approach is supported by the software creator (C. Chen, 2014), since the cluster labeling LLR provides the best results in unique labeling with sufficient coverage. Then, we used two different CiteSpace visualizations methods, the cluster view, which displays a spatial representation of the diagram (Figure 3), and the timeline view, which displays a network by arranging its clusters along horizontal timelines (Figure 4). In the cluster view, the thickness of the node reflects the amount of cited references inside the clusters. The passage of time is represented with the color shading from the oldest (purplish) to the newest (yellowish). In addition, multi-colored rings reflect the burstness (red) and betweenneess centrality (purple). In the timeline view, the major clusters are arranged in an horizontal timeline, in which the oldest nodes are placed on the left of the timeline, while the newests are placed on the right part of the timeline. Items in the timeline are connected with a link, whose thickness is proportional to the strength of co-citation.

Results
In this section, we provide a list of results according to the adopted metrics for each CiteSpace analysis used. Hence, we describe each cluster found through cluster analysis. Figure 3: Cluster view of the document co-citation analysis (DCA) generated using CiteSpace Version 5.8.R3. Modularity Q = 0.7211; average silhouette = 0.8529. Colored shades indicate the passage of the time, from past (purplish) to the present time (yellowish). Colored tree rings refer to the nodes with high betweenness centrality (purple tree rings) and burstness (red tree rings). The DCA provided a network with 399 nodes and 1624 links, showing a modularity Q index of 0.7211 and an average silhouette metric of 0.8529, suggesting high both modularity and homogeneity (Figures 3, 4). As the modularity Q score represented how well the network was split into various independent elements (C. Chen et al., 2010), the score suggested that the networks and clusters were highly well-structured. Instead, the average silhouette suggested that the clusters had heterogeneity.

Document Co-citation Analysis (DCA)
DCA resulted in the identification of 12 clusters (Table 1 sorted from the largest in size (cluster #0 = "Game-based approach", size = 57, silhouette = 0.724, mean year = 2010) to the smallest (cluster #12 = "Chronic illnesses", size = 15, silhouette = 0.944, mean year = 2011). Overall, cluster duration ranged from 10 to 68 years, presenting several overlaps. Cluster #3 = "Life change" has the higher duration over time (68 years), followed by cluster #7 = "Online gamified service" (64 years) and cluster #8 = "Theoretical basis" (63 years). Cluster #9 = "Chinese male smoker" has the smallest duration over time (10 years). Within the different clusters, there's a homogeneity regarding the influential documents. By looking at publication mean year, cluster #7 seems to be the oldest one (mean year = 2000), while cluster #8 and cluster #9 the most recent ones (mean year = 2013). However, it is worth noting that the mean year of publication of some clusters may have been largely biased by older publications. For example, cluster #7's mean year of publication is 2000, but this cluster is among those with a longer duration (64 years, from 1949 to 2013) and smaller size (25). Since mean is extremely affected by extreme values, older papers, even if few, may have drastically lowered the mean year of publication. For this reason, we chose to group and describe clusters by sorting them by size (Table 1), rather than by year.
Cluster #0, "Game-based approaches" is the biggest one in size, but also the less homogenous (silhouette = 0.724). It contains 56 cited references written between 1977 and 2019 (mean year = 2010), some of which contributed to the definition and the development of the gamification domain. It collects early and recent documents about gamification and its application to several different domains, from both a theoretical and applied point of view. The documents with the highest citation frequency are Hamari, Koivisto, and Sarsa (2014), which appears 2 times in the list, with a frequency of 37 and 14 respectively, Seaborn and Fels (2015) with a frequency of 35, C.  with a frequency of 26 and Deterding, Sicart, et al. (2011) and Baranowski, Buday, Thompson, and Baranowski (2008) with a frequency of 13. Cluster #1, "Persuasive technology" is the second according to size. It contains 42 cited references published between 1984 and 2018 (mean year = 2008). This cluster collects theoretical documents about persuasive techniques, game design (mostly in educational and medical domains), and gamification definition. This cluster contains a document with a high citation frequency, which is De-terding, Dixon, et al. (2011) with 72 citations, and then a series of documents with a much lower citation frequency, such as Huotari and Hamari (2012) with 9 citations, Deterding (2012) with a frequency of 8, and Hunicke, LeBlanc, and Zubek (2004) with a frequency of 7. Cluster #2, "Gamification user type", contains 36 cited references written between 1970 and 2017 (mean year = 2007). It is mainly composed of documents about game design, persuasive techniques, motivation, and reinforcement learning. Most of these are shared with other clusters. The documents with the highest citation frequency are Hunter and Werbach (2012) with a frequency of 21, Domínguez et al. (2013) and Alahäivälä and Oinas-Kukkonen (2016) with 8 citations each,  with 7 citations and Hamari and Koivisto (2013) with 6 citations. Cluster #3, "Life change", is composed of 28 cited references, from 1946 to 2014 (mean year = 2006), it is the cluster with the larger duration overall (68 years) and it is the older one according to the begin year (1946). This cluster is composed of documents dealing with gamified environment and gamification design, mainly applied to a medical domain. Among the most influential cited references we find Zichermann and Cunningham (2011) with 24 citations, Zuckerman and Gal-Oz (2014) with a frequency of 13, and Ryan et al. (2006) with 8 citations. Cluster #4, "Social incentives", has a size of 27 documents from 1979 to 2018. It collects several documents involving behavioral changes and gamification, mostly focusing on promoting healthy behavior. The documents with the highest citation frequency are Michie et al. (2013) with a frequency of 13, Edwards et al. (2016), which appears 2 times, with a frequency of 11 and 8 respectively, and Sardi, Idri, and Fernández-Alemán (2017) with 8 citation. Cluster #5, "Evaluation process" and Cluster #6, "Breaking information barrier" both have a size of 26 cited references, with the former ranging from 1971 to 2019 and the latter ranging from 1991 to 2017. Cluster #5 contains theoretical documents involving motivation and behavior change, as well as more applied documents focusing on intervention methods design. Here, the three most influential publications are prior to 2011 (when gamification was mentioned for the first time) with (Ryan & Deci, 2000) having 12 citations, (Abraham & Michie, 2008) with a frequency of 7 citations and (Prochaska & Velicer, 1997) with a frequency of 6. Cluster #6 is composed of both theoretical and applied documents on gamification and computer games, with some documents involving environmental behavior. Among the most influential cited references we find Cugelman (2013)  Among the 18 citation bursts computed using DCA, Table 2 reports the strongest 10. The publication of Hamari (2013) has the strongest burst of the network, with a strength of 3.41 and it was the burst with the longest duration over time (3 years) along with the publication of McGonigal (2011). The oldest burst in the network started in 2013 (Bogost, 2007) and lasted 2 years, while the newest started in 2019 (D. Johnson et al., 2017;Michie et al., 2013;Sardi et al., 2017)    Among our network, the publication of Hamari (2013) has a sigma value higher than the other publications (1.67), which do not differ so much from 1. Instead, regarding the values for the betweenness centrality, publications range from 0 to 0.17 ( Table 3). The highest value is the publication of Hunter and Werbach (2012 Seaborn and Fels (2015) 0.11 0 Fogg (2002) 0.10 7

Author Co-citation Analysis (ACA)
By analyzing author co-citation analysis, we can find influential authors in the field of gamification applied to behavior change. The magnitude of each node represents author's citation counts and the length between two nodes represents the two author co-citation frequency. A bigger node suggests an important author for the network; a smaller distance between two nodes detect a high authors' co-citation frequency, and a closer research topic and direction (X. Chen & Liu, 2020). The network obtained through the ACA contains 454 authors and 1817 collaboration links ( Figure  5). The network has a wide range of collaborations, which reflects the interdisciplinary nature of gamification and the several domains in which behavior change can be utilized. Table 4 shows the top 10 authors according to citation frequency. The largest node represents the author Deterding S with a citation frequency of 190 and a centrality value of 0.11, followed by Hamari J with a citation frequency of 140 and a centrality of 0.02. The third author ordered by citation frequency is [Anonymous] which is not of interest because it might be a combination of many references without explicit author information (Gong, Jiang, Yang, & Wei, 2013

Keyword Co-occurrence Analysis
The keyword co-occurrence analysis is an important aid to elucidate the structure of scientific knowledge and discover research trends (Su, Li, & Kang, 2019). The detection of keywords refers to the words that are frequently used or that are used in a shorter period. The keyword co-occurrence analysis provided a network with 385 nodes and 1542 links, showing a modularity Q index of 0.5753 and a silhouette value of 0.8041, suggesting a moderate modularity and a high silhouette. Table 6 lists the top 20 keywords with the strongest bursts. In terms of burst strength, the top ranked keyword is "user interface" with a burst of 3.13, followed by "behavioral research" with a burst of 3.82, "human computer interaction" with a burst of 3.52, "user centered design" with a burst of 3.46 and "design" with a burst of 3.30. All the other keywords have a value above 3. "behavioral research", "human computer interaction", "technology", "sustainable development" and "persuasive technology" have the earliest burst begin, while "mobile app" and "mobile phone" have the latest burst begin, which is over in 2021 because it was the date of our search. It is legitimate to think that it could continue in the future years, increasing the duration time. The keyword with the longer duration time is "human computer interaction" with a span of 3 years.
According to the beginning and the end of the burst, we can discover the change over time for the topics in the field. In the early stages, "technology", "sustainable development" "behavioral research", "persuasive technology", and "human computer interaction" are the mainstream trend, followed by "user interface", "human", "engineering", "design", "behavior change support". After them, "health care", "game design", "virtual reality", "information use", "user centered design", "social support", "behavior therapy", "survey", and "information system" have become the trends in the literature. However, "mobile app" and "mobile phone" have become the research frontier in recent years.

Country collaboration and institution network analysis
Country collaboration and institution network analysis can be useful to provide valuable information for researchers to easily find where their colleagues work in different parts of the world. This should help to establish future collaboration (Xie, 2015). Country collaboration analysis provided a network of 88 nodes and 70 links. The network showed a modularity Q index of 0.5053 and a silhouette of 0.8041, suggesting a moderate modularity and a high silhouette. The top countries in number of publications are the United States, the United Kingdom, Germany, Australia, Italy, Netherlands, and Canada. In particular, it is worth noting that Austria is not among the countries with the most publications, but its citation burst is 5.14. According to bursts, it is followed by Greece (2.49), United Kingdom (2.47), and Switzerland (2.19). This data show that although the most publications came from the United States, Austria has made a significant contribution in the field. Using node centrality, United States (0.25) and Italy (0.18) have played a key role in the field.
Institution network analysis provided a network of 231 nodes and 20 links. This suggests that although there are several institutes working in the field of gamification for behavior change, there are only few cooperations. The network showed a modularity of 0.769 and a silhouette of 0.9667, suggesting high level for both the metrics. According to our network, Fondazione Bruno Kessler is the most contributive institute with 6 publications. The University of Saskatchewan and the University of Oulu are the most influential institutions according to the citation burstness with 1.71 and 1.68 respectively.

Discussion
In this section we answer the research questions we initially defined. Our aim is to provide a structured and systematic description of gamification's literature applied to behavior change. Thus, we outlined the main outcomes we found during the analysis and we propose some directions for future studies. In each section, a single research question is discussed based on the findings described in the results section.

Which are the most influential documents in the gamification for behavior change field?
To answer this question, we focused on DCA only, since it contains all the information needed to respond. Here, to extrapolate the most influential documents we followed 2 different ways: (1) on the one hand we looked at documents with higher burst strength and betweenness centrality (Table 2, 3), (2) on the other hand we took the most frequently cited documents contained in the clusters with higher size ( Table 1).
Considering burst strength and betweenness centrality, the most influential paper is definitely Hamari (2013), with the first place in burst strength (3.40) and the second in betweenness centrality (centrality = 0.16). This document consists of a paper describing a large gamified experiment with 3234 participants. According to our review, it has been a cornerstone paper in the field of gamification applied to behavioral change, but its citation peak has ended in 2018 (burst started in 2015 and ended in 2018). This could mean that Hamari (2013) has been an important and popular document in this field, but has recently been overlooked. In contrast, citations bursts of Michie et al. (2013), Sardi et al. (2017), and D. Johnson et al. (2017), ranked second (3.26), third (2.86), and fourth (2.65) in terms of burst strength respectively, began in a relatively more recent year (2019) and may not have ended yet (burst ended in 2021, which is the year this review was written). In detail, Michie    (2012) (centrality = 0.17), followed by (as already reported) Hamari (2013), then  (centrality = 0.14), Seaborn and Fels (2015) (centrality = 0.11), and finally Fogg (2002) (centrality = 0.10). Thus, Hunter and Werbach (2012) is the document with the highest influence on the network of documents selected for this review. Interestingly, Hunter and Werbach (2012) is a book (not a paper) and this might have affected its top ranking in betweenness centrality. Since books (generally) contain more information than scientific papers, they are very likely to be cited more and in more domains. However, it is important to note that the maximum betweenness centrality value (0.17) is not high, indicating that the literature in this field is fragmented and has the need for points of connection between domains. Considering documents with the highest citation frequency contained in the four largest size clusters, we find  and Seaborn and Fels (2015) as the most cited papers in cluster #0 (i.e., the cluster with the largest size), with 51 citations the former and 35 the latter. These papers are respectively a review and a survey focusing on both theoretical and applied aspects of gamification. Therefore, it is not surprising that they have been grouped in cluster #0, which collects the most important documents concerning general theoretical and applicative information on the gamification domain. In cluster #1,  stands out from all other documents in terms of citation frequency, with 72 citations. It is one of the first documents that defined the concept of gamification, describing the design of a typical gamified paradigm. In cluster #2 we again find Hunter and Werbach (2012) with 21 citations, which is also the document with the highest betweenness centrality in the network. Again, not surprisingly, this book is among the most influential in cluster #2, and is among the papers that best describe a gamified design. Finally, in cluster #3 we find Zichermann and Cunningham (2011) with a frequency of 24, a book that aims to describe the potential of implementing game mechanics in web and mobile apps.
Interestingly, the most important documents in the field of gamification applied to behavior change are almost never papers about original experimental studies (apart from Hamari (2013)). This seems to suggest that this field possesses some strong theoretical works (mainly books and reviews), but lacks corroborated experimental support. Future studies should focus more on this second aspect.

Who are the most influential authors contributing to the research of behavior change?
To address this research question, we rely on the results of the ACA. Tables 4 and 5 give us an overview on the most influential authors according to citation frequency and burst strength.
Considering burst strength, the most influential author in the field is Fogg BJ (burst strength = 6.88). This author's documents are not primarily related to gamification, but to persuasion technology and persuasion design (Fogg, 2002(Fogg, , 2009, which have a primary role in a behavior change approach. The high sigma value (Σ = 1.48) suggests that the author's contributions to the literature were novel. He has been an influential author for the first part of the development of this field, but its citation peak has ended in 2016 (burst started in 2013 and ended in 2016). This can mean that Fogg BJ helped in shaping the earliest part of the literature, then he has been overlooked. In the second place in terms of burst strength, we find Zichermann G (burst strength = 6.05). His documents deal with gamification design (Zichermann & Cunningham, 2011). His burst strength lasted 3 years (burst started in 2013 and ended in 2016) and the sigma value suggests that he has been an influential author in the field of gamification for behavior change (Σ = 1.29). In the third and fourth place in terms of burst strength, we find McGonigall J (burst strength = 4.38; Σ = 1.30), whose most cited document (McGonigal, 2011) deals with the use of serious games to change people and the world, and Farzan R (burst strength = 4.35; Σ = 1.13), whose most cited documents (Farzan & Brusilovsky, 2006;Farzan et al., 2008) discuss how to enhance user participation. Looking at the timeline of the most influential documents, the authors with the biggest burst strength are the first in chronological order; instead, the most recent authors show a much lower value of it. Hence, we can infer that recent authors influenced the literature to a lesser extent.
Interestingly, exploring the research fields of the most influential authors for burst strength, the ones with the higher burst value, as well as the first for burst begin (Fogg BJ, McGonigal J, Taylor TL, and Gee JP), have a background other than gamification or behavior change. Contrariwise, the most recent authors within the top 25 authors for burst strength have a more directional research interest in gamification (Edwards EA, Johnson D, Dicheva D, Lister C, Patel MS, and Chou Y-K), and a much lower burst value.
Overall, the ACA results suggest that initially the structure of gamification's literature applied to behavior change has been guided by documents of Fogg BJ, Zichermann G, and McGonigal J, resulting in an initial cohesive structure. Thus, over time, and breaching into multiple domains, the structure has become more intended and without reference points.

Which countries and institutions contribute the most to the scientific output regarding behavior change?
To answer this question, we rely on country collaboration and institution network analysis. What emerged from the country analysis was that the geographical distribution of documents resulted strongly centered on the four main anglophone countries (United States, United Kingdom, Australia, and Canada). Specifically, authors in anglophone countries provided almost a third of the analysed data. The rest of the distribution is located in Europe (Germany, Italy, Netherlands, Switzerland, Spain, Greece, Austria, Finland, Portugal, Sweden) and South America (Brazil). As shown in Figure 6, the links between the United States, Italy, Germany, Finland, United Kingdom, Switzerland, Sweden, and Brazil suggest that they cooperate intensively. The 3 nodes with purple rings displayed in Figure 6, including the United States, Italy, and Germany; clearly indicate that these 3 countries play a pivotal role in the cooperation network between 88 countries.
Considering institution network analysis, the short amount of links between institutions (20) compared to the high number of nodes (231), suggests that although there is a large number of institutions contributing to the development of the field, almost all the research results have been completed by a single organization. In terms of the number of publications, Fondazione Bruno Kessler is the center that contributes the most; while, in terms of citation burstness, the University of Saskatchewan and the University of Oulu are the most influential organizations. The maximum number of publications by a single organization is 6, which is a relatively small number of publications for one institute, indicating that the institutes' research on gamification for behavior change has not gone deep enough.
Overall, these findings suggest the need for future collaborations to create a more cohesive and collaborative community.

Which are the different application domains in the behavior change field?
To answer this question, we rely on DCA and keywords analysis. We extracted the most frequent keywords for each cluster evidenced by the LLR algorithm and the keywords with higher burst strength within the network ( Table 6).
According to LLR algorithm, some prevalent application domains emerged: the most frequent is certainly the physical and mental health domain. This domain seems to be transversal to most of the clusters, with a particular prevalence in clusters #0, #3, #4, #9, and #11. The keywords that emerged involve mental health (clusters #0 and #3), physical exercise (cluster #4), health behavior (clusters #0, #4, and #9), and chronic diseases (cluster #11). Thus, it appears that gamification applied to behavioral change is widely employed in health and medical fields, in order to motivate patients to engage in health-oriented behaviors (Chow et al., 2020;. Another prevalent domain concerns sustainable behavior and environmental awareness. In particular, clusters #5, #6, #7, and #8 collect documents that address gamified approaches in this domain, showing evidence that gamification has been used to encourage environmentally sustainable behaviors. Finally, a last important application domain concerns persuasive and motivational techniques, collected mainly in clusters #1, #2, #4, #7, and #10. Thus, it seems that gamification has a fundamental role in motivating or persuading people in the behavioral change field. Regarding keywords with the highest burst strength, we find "user interface" at the top of the list (see Table 6), with the highest burst strength in the network (4.13). This seems to show that user interfaces have been the elite domain when referring to gamification applied to behavioral change, even though the keyword burst only lasted one year (2014 to 2015). Moreover, in the third and fourth place in terms of burst strength, we find "human computer interaction" (3.52) and "user centered design" (3.46), which are keywords closely related to "user interface". Therefore, it seems clear that human-computer interaction is an important field of application and that gamification largely involves making user interfaces more accessible. A final keyword of possible interest is "mobile app" (burst strength = 2.96), which signals the recent (the burst ended in 2021, so it could still be in progress) implementation of gamification applied to behavioral change within mobile phone apps.

How have research trends in gamification applied to behavior modification changed over time?
To answer this question, we rely on an overview of the keywords change over time. Examining the keywords' burst strength (Table 6), we managed to extrapolate a timeline of the research trends based on burst's beginning years. From this analysis, it is evident that they have changed over time, according to technological development. The first trends that appeared in the field of gamification applied to behavioral change are "behavioral research" (begin year = 2013), "human computer interaction" (2013), "technology" (2013), "sustainable development" (2013), "persuasive technology" (2013), "design" (2014), "human engineering" (2014), "user interface" (2014) and "behavior change support" (2015). This seems to suggest that the first research trends were linked to an initial general design stage. Hence, the trend has changed, showing interest in video game design first ("game design" and "virtual reality"), then in the practical application of gamified interventions ("behavior therapy", "social support", "information system", and "information use"). The last trends are related to the technological development of mobile devices ("mobile app" and "mobile phone"). Interestingly, burst's ending year of these two keywords is 2021, reflecting the fact that the bursts may still be ongoing.
Overall, it seems that trends have changed considerably over time, leading more and more resources in the direction of app and mobile based gamified interventions.

Future challenges
So far, we have presented an overview of the gamification's literature applied to behavior change, how it is structured, its development over time, and some relevant issues. The existence of many theoretical papers, suggesting a correct application of gamification, has produced the failure to use a unique guideline (for both development and application). Moreover, the lack of consistency for a methodological rigor and measurements, suggests the need for studies with corroborated experimental support. Interestingly, these findings seem to exist in other domains of gamification as well (Fitz-Walter, 2015;Koivisto & Hamari, 2019;Martí-Parreño et al., 2016;Morschheuser, Hamari, Werder, & Abe, 2017;Seaborn & Fels, 2015;Trinidad, Ruiz, & Calderón, 2021).
Therefore, in this section we describe a set of future challenges we consider critical for the general domain of gamification, in order to: (1) make gamification applications more effective and goal-oriented, and (2) achieve a stronger communication and collaboration between researchers and institutions.

Holistic model
A first step to reach these goals could be the development or the selection of a general model/guideline for designing gamification. Several researchers on gamification unanimously agree that gamification design should follow a holistic and standard procedure (Fitz-Walter, 2015;Martí-Parreño et al., 2016;Morschheuser et al., 2017). There is plenty of models and frameworks guiding gamification design and development in different domains (Choi & Choi, 2021;B. Huang & Hew, 2018; J. T. Kim & Lee, 2015;Morschheuser et al., 2017;Szegletes, Koles, & Forstner, 2015;Wells et al., 2014;Wongso, Rosmansyah, & Bandung, 2014) and with different goals. However, most of them are not detailed enough to provide sufficient practical guidance, and some of them lack a detailed description of the design phase (Morschheuser et al., 2017).

Personalization and adaptation
The presence of a guideline is not enough to get gamification applications more effective. Several authors refer to the need of introducing personalization elements to increase gamification effectiveness (Bucchiarone et al., 2021;Hassan, Habiba, Khalid, Shoaib, & Arshad, 2019). Hence, standardized systems providing personalized and adaptive components for (1) feedback, (2) gamified elements, and (3) design could enhance the outcomes and the effectiveness related to gamification applications, and should be considered in the model development.

Methodological rigor
Differently from other disciplines, in the gamification field there's plenty of non-empirical papers, mostly in the field of behavior change (Sardi et al., 2017); although, the number of studies presenting quantitative data supporting their results is increasing (Koivisto & Hamari, 2019), they lack in employing controlled experimental research methods, creating an inconstancy in measurement instruments, and in the selection of sample sizes. In order to have adequate data supporting researchers findings, depth analyses should be encouraged (D. Johnson et al., 2017;Koivisto & Hamari, 2019).

Open science
In order to make the research process, the data collection, and the analyses available for other researchers, it would be useful to adopt an open science process. The concept of transparency at all stages of the research process, matched with free and open access to data, code, and papers, creates what is called "open science". According to Woelfle, Olliaro, and Todd (2011), open science is an accelerator for the research process, creating an iterative cycle composed of: (1) problem identification, (2) preliminary solution, (3) open appeal to the wider community, and (4) receiving community input. Hence, the utilization of Open Science Framework (OSF) 7 , and registered reports 8 could be a useful tool to review the study methodology, the statistical analysis, and to promote cooperation between researchers (Chambers & Tzavella, 2020;Foster & Deardorff, 2017).

Conclusions
Gamification is facing a continuous growth in disparate application contexts (e.g. education, training, health, and so forth), especially in those that promote a positive behavior change (Adrián & Elena, 2019). Indeed, gaming, as a motivating and engaging activity, makes it easier to convince people in breaking their bad habits and changing their behavior.
This scientometric study analysed research works on gamification to promote behavior change or positive behaviors, based on publications available in the Scopus database. We identified the most influential documents, authors, countries, and institutions, and we investigated the trends change over time. Hence, we presented some future challenges to improve the quality of scientific production on gamification.
Overall, what emerges most is that the research community is growing continuously; nevertheless, it is mainly anchored to old and different theoretical documents, which continue to be the starting point of new publications, producing neglects towards new documents. Moreover, due to the lack of a method or a guideline, the scientific field emerges as fragmented and widespread; this brings to few international collaborations and difficulties in the communication between researchers and institutions.

Declaration of competing interest
None