Soil erosion modelling: A bibliometric analysis.

Soil erosion can present a major threat to agriculture due to loss of soil, nutrients, and organic carbon. Therefore, soil erosion modelling is one of the steps used to plan suitable soil protection measures and detect erosion hotspots. A bibliometric analysis of this topic can reveal research patterns and soil erosion modelling characteristics that can help identify steps needed to enhance the research conducted in this field. Therefore, a detailed bibliometric analysis, including investigation of collaboration networks and citation patterns, should be conducted. The updated version of the Global Applications of Soil Erosion Modelling Tracker (GASEMT) database contains information about citation characteristics and publication type. Here, we investigated the impact of the number of authors, the publication type and the selected journal on the number of citations. Generalized boosted regression tree (BRT) modelling was used to evaluate the most relevant variables related to soil erosion modelling. Additionally, bibliometric networks were analysed and visualized. This study revealed that the selection of the soil erosion model has the largest impact on the number of publication citations, followed by the modelling scale and the publication's CiteScore. Some of the other GASEMT database attributes such as model calibration and validation have negligible influence on the number of citations according to the BRT model. Although it is true that studies that conduct calibration, on average, received around 30% more citations, than studies where calibration was not performed. Moreover, the bibliographic coupling and citation networks show a clear continental pattern, although the co-authorship network does not show the same characteristics. Therefore, soil erosion modellers should conduct even more comprehensive review of past studies and focus not just on the research conducted in the same country or continent. Moreover, when evaluating soil erosion models, an additional focus should be given to field measurements, model calibration, performance assessment and uncertainty of modelling results. The results of this study indicate that these GASEMT database attributes had smaller impact on the number of citations, according to the BRT model, than anticipated, which could suggest that these attributes should be given additional attention by the soil erosion modelling community. This study provides a kind of bibliographic benchmark for soil erosion modelling research papers as modellers can estimate the influence of their paper.

Soil erosion can present a major threat to agriculture due to loss of soil, nutrients, and organic carbon. Therefore, soil erosion modelling is one of the steps used to plan suitable soil protection measures and detect erosion hotspots. A bibliometric analysis of this topic can reveal research patterns and soil erosion modelling characteristics that can help identify steps needed to enhance the research conducted in this field. Therefore, a detailed bibliometric analysis, including investigation of collaboration networks and citation patterns, should be conducted. The updated version of the Global Applications of Soil Erosion Modelling Tracker (GASEMT) database contains information about citation characteristics and publication type. Here, we investigated the impact of the number of authors, the publication type and the selected journal on the number of citations. Generalized boosted regression tree (BRT) modelling was used to evaluate the most relevant variables related to soil erosion modelling. Additionally, bibliometric networks were analysed and visualized. This study revealed that the selection of the soil erosion model has the largest impact on the number of publication citations, followed by the modelling scale and the publication's CiteScore. Some of the other GASEMT database attributes such as model calibration and validation have negligible influence on the number of citations according to the BRT model. Although it is true that studies that conduct calibration, on average, received around 30% more citations, than studies where calibration was not performed. Moreover, the bibliographic coupling and citation networks show a clear continental pattern, although the co-authorship network does not show the same characteristics. Therefore, soil erosion modellers should conduct even more comprehensive review of past studies and focus not just on the research conducted in the same country or continent. Moreover, when evaluating soil erosion models, an additional focus should be given to field measurements, model calibration, performance assessment and uncertainty of modelling results. The results of this study indicate that these GASEMT database attributes had smaller impact on the number of citations, according to the BRT model, than anticipated, which could suggest that these attributes should be given additional attention by the soil erosion modelling community. This study provides a kind of bibliographic benchmark for soil erosion modelling research papers as modellers can estimate the influence of their paper.

Introduction
Systematic bibliometric analyses can be useful analytical tools to gain a better understanding of research patterns (e.g., journal, author, country) and characteristics of research fields (Wu et al., 2015). Recent applications have shown that such analyses can be used to recognize emerging topics (Small et al., 2014), study cooperation networking in research (Wagner et al., 2015) or gain in-depth insight into a research topic (Tang et al., 2020). Moreover, a joint search in the SCOPUS database for article titles, abstracts and keywords containing "bibliometric analysis" or "citation analysis" in January 2021 yielded over 40, 000 documents with a clear upwards trend in number of published items in the last years.
Literature analysis as a tool is gaining popularity among interdisciplinary academic fields such as earth sciences. For instance, Liu et al. (2012) performed a bibliometric study of earthquake research during 1900, Wu et al. (2015 performed a bibliometric analysis in order to study global research trends in landslides during 1991, and Emmer (2018 studied research on natural hazards worldwide during 1900-2017. Gariano and Guzzetti (2016) reviewed published papers that investigated the past, current, and future (expected, projected) impact of climate change on landslides. Moreover, Reichenbach et al. (2018) conducted a critical review of statistical methods for landslide susceptibility modelling and associated terrain zonation. They used a database of 565 articles published in peer-reviewed international journals from January 1983 to June 2016 and identified by a systematic search of the Web of Science database using a set of keywords and criteria as evidence. A recent bibliographic review of landslide susceptibility provides insights on the trends and journal performance in field of geomorphology (Pourghasemi et al., 2018). Therefore, these studies indicate that different fields that are part of the earth science field can gain knowledge about the field based on these kinds of analyses. Moreover, the analyses can also identify steps forward.
The research topics of soil degradation and erosion in the field of earth sciences are studied from many points of view and are highly relevant to a wide audience of researchers. They range from the climate change perspective (Lal, 2019) to sustainable agriculture production (Tarolli et al., 2019) to understanding sediment transport, water fluxes and extreme storm events at catchment scales (Keesstra et al., 2018;Lizaga et al., 2019) to investigating the impact of soil erosion on biogeochemical cycling Quinton et al., 2010;Tan et al., 2020), or the modelling of soil erosion (Batista et al., 2019;Borrelli et al., 2018;Panagos and Katsoyiannis, 2019;Ricci et al., 2018). Moreover, there are other emerging topics such as the use and abuse of biocides on soil erosion, agricultural and forest management practices to reduce soil erosion rates or experimental studies at small scales. A literature review on research trends and hotspots in soil erosion from 1932 to 2013 was performed by Zhuang et al. (2015) using the Science Citation Index (SCI) database. According to this study, soil research has rapidly increased since 1990 with major contributions from the USA and Europe before 2001, and additionally from China and Australia since 2001. They also discovered through co-citation analysis that soil erosion research mainly focuses on three aspects as follows: soil erosion modelling, soil erosion estimates using caesium-137 and the impact of soil erosion on the environment. Niu et al. (2014) used a keyword analysis to discover that "evolution", "water", "soil(s)", and "model" were consistent hotspots in sediment-related research in earth science during 1992-2011. To investigate how soil erosion model evaluation is approached in soil erosion research, Batista et al. (2019) compiled a database of 550 papers published between 1958 and 2018 that were selected by querying the Web of Science using the query "soil erosion model". However, Batista et al. (2019) did not conduct a detailed bibliometric investigation and focused on a much smaller number of papers than the GASEMT database (Borrelli et al., 2021) that was used in this study. Therefore, to extend these studies, we performed a bibliometric analysis based on the enhanced version of the GASEMT database (Borrelli et al., 2021). The main goal of this paper was to investigate how soil erosion modelling study characteristics (i.e., study scale, mathematical model used, validation/calibration etc.) and related bibliometric characteristics (number of co-authors, country of affiliation, book chapter vs. journal paper, etc.) influence the impact of a given publication measured by the number of citations. Moreover, potential bibliometric networks (i.e., journals, countries) that are part of the constructed database were also analysed. Specifically, we evaluated the following questions: a) How is the number and geographic origin of the authors and the publication's CiteScore related to the number of citations? b) Which mathematical models are widely applied and used as a reference when cited in the literature, and how do the other modelling framework characteristics affect the impact of the publication as measured by the number of citations? c) How can a study of citation patterns and clusters help recognize interrelated countries and determine who the leading countries and leading journals are that publish research results in the soil erosion modelling field?

Methods
Bibliometric analyses require extensive datasets that contain sufficient number of records and period covered. To gain a better understanding of the global application of soil erosion prediction models, a group of more than 60 soil erosion scientists from more than 20 countries all around the world comprehensively reviewed relevant peerreviewed research literature on soil erosion prediction modelling in the 1994-2018 period (Borrelli et al., 2021). As a result, the 'Global Applications of Soil Erosion Modelling Tracker (GASEMT)' database was created (Borrelli et al., 2021). Additional information about the constructed database and results of the study can be found in Borrelli et al. (2021). GASEMT database is available to users as part of the publication (Borrelli et al., 2021).

GASEMT database enhancement
In this study, the analysis of the GASEMT database was enhanced by investigating the relationship between soil erosion modelling and bibliometric characteristics. For this purpose, for the 1697 publication entries (3030 modelling records) that are included in the GASEMT database, the number of citations from the Scopus database was added. The number of citations indicates the citation status in September 2019 when they were downloaded. Additionally, the Scopus CiteScore 2018 was added to the database for all sources with a CiteScore in 2018; where "citations" and "publications" mean the number of citations and citable items published in a specific year, respectively. Additionally, the number of authors of each publication was also added to the database. Moreover, for each document type (i.e., journal, conference proceeding or book series), the main (i.e., listed first) sub-subject area from the Scopus database was extracted. This, information was semiautomatically extracted from the Scopus database based on matching paper titles in the GASEMT and Scopus. The GASEMT database includes studies published between 1994 and 2018. To account for the impact of the different number of years from the publishing date, the decision was made to use the normalized number of citations, which was calculated for each publication as:

Normalized citations=
Total number of citations Number of years from the year when study was published .
(2) Therefore, we have added the following attributes to GASEMT: CiteScore 2018 , total number of citations, number of authors, normalized citations, document type and the main Scopus sub-subject area. The enhanced GASEMT database including the bibliographic data is available in the European Soil Data Centre (ESDAC; Panagos et al., 2012).

Generalized boosted regression trees (BRT)
To investigate the impact of different soil erosion modelling characteristics on the gained number of citations, the generalized boosted regression trees (BRT) model was used. This model is able to estimate the relative impact of different variables on the target variable. BRT is a machine learning tool. This model has been used successfully in different fields for activities such as calculating the relative impact of variables on evapotranspiration (Maček et al., 2018), investigating impact of different meteorological variables on rainfall interception variables (Zabret et al., 2018) or predicting topsoil organic carbon (Veronesi and Schillaci, 2019). A detailed description about the method is provided by Elith et al. (2008) and Ridgeway (2019). The BRT modelling was conducted using the 'gmb' package (Greenwell et al., 2019) in the statistical software R (R Core Team, 2017). In our case, the target variable was the normalized number of citations, which was calculated using Eq. (2). The following variables were used as an input for the BRT model: number of authors, publication's CiteScore in 2018, publication type, Scopus sub-subject category, and from the GASEMT database: -erosion agent (e.g., water, wind, water and wind, etc.), -name of the soil erosion model used, modelled period (e.g., present, past, future), -time resolution (e.g., daily, monthly, annually), -continent of model application, -modelled area (e.g., forest, arable land), -scale of the study (e.g., plot, hillslope, catchment), -type of field soil sampling, model calibration and model validation.
For the BRT analysis, the following parameters were used: a) the minimum number of trees was 1,500, b) the minimum number of observations in the terminal target node was 10, c) the learning rate was set to 0.005, d) the number of cross-validation folds was 5, and e) the Gaussian distribution was used as a loss function. As a result, the BRT model calculated the relative impact of input variables. The relative impact was determined by considering the number of times that the variable was used for splitting trees and weighted by squared improvement of the model as a result of the splitting procedure that was averaged over all of the trees (Elith et al., 2008;Friedman et al., 2000).

Bibliometric networks
To analyse the bibliometric networks, the VOSviewer software was used van Eck and Waltman, 2010;VOSviewer, 2019;Waltman et al., 2010). VOSviewer is a freely available software that can be used for visualizing bibliometric networks that include journals, individual publications, authors affiliations, etc. (VOSviewer, 2019). To visualize bibliometric networks, part of the GASEMT database, which also appears in the Clarivate Analytics Web of Science database (i.e., the overlap between Scopus and Web of Science was approx. 70%) was used. Moreover, Schillaci et al. (2018) also found approximately 60% agreement between Scopus and the Web of Science as a result of the systematic search. The reason for selecting part of the GASEMT database was to take into consideration only more eminent publications since Scopus also covers journals that are not indexed in Web of Science and other document types such as conference proceedings. The following analyses were conducted (VOSviewer, 2019): a) the citation, bibliographic coupling and co-citation analysis of sources (e.g., journals), b) the citation, co-authorship and bibliographic coupling analysis among countries, c) the citation and bibliographic coupling analysis of the most frequently used soil erosion models.
Co-authorship analysis investigates the relatedness of items based on the number of co-authored documents (VOSviewer, 2019). Moreover, citation analyses define the relatedness of items based on the number of times they cite each other (VOSviewer, 2019). Furthermore, bibliographic coupling expresses the relatedness of items based on the number of shared references (VOSviewer, 2019). Finally, co-citation analyses determine the relatedness of items based on the number of times they are cited together (VOSviewer, 2019). The difference between bibliographic coupling and co-citation is that the former links two items that both cite the same document while the latter links two items that are both cited by the same document (VOSviewer, 2019).
Full counting was used and documents with a large number of authors were not ignored. Full counting means that each co-authorship, cocitation, etc. has the same weight (VOSviewer, 2019). To improve the readability of network visualization, we used certain thresholds to remove less frequent entries (specific selected threshold values are mentioned in section 3). For example, in the case of co-authorship among countries, there were many countries with few entries that would worsen the readability of the network. For visualization, network visualization style was used where items were represented by a label and a circle. Moreover, the size indicated the weight of an item (i.e., the larger the circle, the higher the weight and vice-versa). Additionally, the colour of the item indicates the cluster to which the item belongs. A detailed description of the clustering techniques in VOSviewer is provided by Waltman et al. (2010). Additionally, lines represent links among items. We used a maximum of 1000 lines, which means that the 1000 strongest connections are shown (VOSviewer, 2019). Furthermore, the distance between items also shows their relatedness. Therefore, the closer the items are together, the stronger their relatedness (VOSviewer, 2019).

Results and discussion
Using the enhanced GASEMT database and the methodology described in section 2, the impact of different variables on the total and the normalized number of citations was investigated. In section 3.1, the differences among different document types, the Scopus sub-subject categories, and the relationship between the number of authors and the publication's (i.e., source) CiteScore is discussed. In the section 3.2, the relative impact of the different variables on the normalized number of citations is estimated using the BRT model. In section 3.3, a detailed evaluation of the most cited papers is performed and in the last section various characteristics of the bibliometric networks are visualized and discussed (section 3.4).

Publication type, journal selection and number of author's impact
It is evident that most (i.e. 89%) of the soil erosion modelling papers that are included in the Scopus database were published in peerreviewed journals (Table 1). Moreover, journal publications also receive, on average, a considerably larger number of citations than book series and conference proceedings (Table 1). Accordingly, the average normalized number of citations for journal publications, book series and conference proceedings is 2.78, 0.42 and 0.22, respectively. The mean number of citations of the journal articles was 5.4-fold that of book series and 11.2-fold that of conference proceedings and similarly, but slightly more pronounced was the variation for the normalized citations (6.6-fold and 12.6-fold, respectively). A similar relationship was also observed by other researchers. For example, a difference between the citation rates of papers published in journals and in books or conference proceedings was also observed by Mikoš (2018), who studied 3426 book chapters from 52 landslide-related books published by Springer Nature from 2005 to 2018, in the earth sciences category, and he also observed that articles in conference proceedings were not cited as often as journal articles. The reported average number of citations in these 52 books was 0.86 citations per year and chapter.
There are 23 journals that have more than 10 papers where most of the articles were Regarding the journals receiving the highest number of normalized citations, we found Science of the Total Environment, followed by Geomorphology, Journal of Hydrology, Land Degradation & Development, Environmental Modelling & Software and CATENA. Furthermore, Scopus also relates journals with primary Scopus sub-subject categories. Fig. 1 shows Scopus sub-subject categories where more than 50 publications (per category) were found in the database. There are ten categories and most of the papers were published in the "Water Science and Technology" (e.g., Journal of Hydrology, Hydrological Processes) and "Earth-Surface Processes" categories (e.g., CATENA, Geomorphology). These two categories had approximately 200 publications each, the "Geography, Planning and Development" category had approximately 130 publications and the remaining categories had from 50 to 90 papers each. Fig. 1 also shows the relationship between the mean number of normalized citations per publication and the mean CiteScore in 2018 of the category where the mean was calculated considering CiteScores for all journals in a specific category. It is interesting to note that there is no clear relationship between the average category CiteScore and the mean normalized citations in the field of soil erosion modelling. Therefore, it seems that if a soil erosion modelling paper is published in a sub-subject category that is not a primary focus of the researchers that are publishing in this field, that this kind of paper receives, on average, fewer citations (e.g., "General Environmental Science" sub-subject category). Therefore, other topics in this category seem to be more influential than soil erosion modelling. Additionally, articles that were published in journals such as SOIL that are included in the "Soil Science" category have, on average, fewer citations than articles included in the "Water Science and Technology" and "Earth-Surface Processes" categories. This observation is partly due to the higher visibility of a published paper in a more focused journal than in a general one. Researchers that publish their papers in SOIL journal focus on other aspects of soil erosion and not purely on modelling. Thus, such papers are not included in the GASEMT since the focus of the database is on the modelling (Borrelli et al., 2021) (i.e. this journal has less than 10 entries in the database). It is also true that the average CiteScores for these categories are relatively similar and range between 1.2 and 1.8. Similarly, Mikoš (2017) performed a comparison between the top 20 journals in 2016 from the SCI-expanded category "Engineering, geological" and their ranking in the CiteScore metrics in the category "Geotechnical Engineering and Engineering Geology". Using the Web of Knowledge tool Essential Science Indicators, the annualized expected citation rates for papers in three selected research fields for all years (average) were as follows: for Engineering 6.82 citations/paper, for Geosciences 11.34 citations/paper, and for Multidisciplinary 13.29 citations/paper. Therefore, other scientometric studies have also shown that differences among scientific disciplines exist.
Furthermore, the relationship between the number of citations and the publication name (i.e., source) from CiteScore was also analysed (Fig. 2). As expected, papers that are published in journals with higher CiteScore metrics also have, on average, more citations (Fig. 2). However, this dependence is rather weak and (R 2 between the normalized number of citations and publication name CiteScore is 0.2) yet statistically significant with the selected significance level of 0.05 (p-value < 0.0001) where a value of 1 would indicate a perfect linear dependence between these two variables. Papers with a very high number of normalized citations such as Panagos et al. (2015) (i.e., the highest normalized number of citations) or Cerdan et al. (2010) were published in journals with CiteScore values in the range of 3-6 while others appeared in journals with high CiteScore values > 6 ( Borrelli et al., 2017;Quinton et al., 2010;Van Oost et al., 2007). Likewise, papers published in journals with very low impact (i.e., CiteScore below 1.5) did not receive more than five citations per year. Furthermore, there is no paper with more than 60 normalized citations per year in the analysed GASEMT database (Fig. 2). Articles published in journals with a CiteScore between 1.5 and 3 can have either a low number or a medium number of normalized citations (Fig. 2). Therefore, we agree with Seglen (1998) that scientific papers receive their citations largely independent of the journals in which they appear, i.e., the journal impact is determined by the articles, not vice versa. However, we found only 2 articles having >20 normalized citations in the Citescore range 1.5-3. Nevertheless, we cannot exclude that the soil erosion modelling scientific community may have a prejudice against considering articles from journals with low CiteScores. For the majority of scientific disciplines, the citability of publications increases with the number of co-authors (e.g., Abramo and D'Angelo, 2015). Therefore, the relationship between the number of publication authors and the normalized number of citations according to the Scopus database was investigated (Fig. 3). In general, studies on soil erosion modelling are typically conducted in groups of 2-6 co-authors (Fig. 3). Moreover, only a few papers were co-authored by more than ten researchers (Fig. 3). It seems that in the soil erosion modelling field, a large number of authors does not necessarily guarantee a large number of citations, and no clear relationship between the number of authors and citations per year could be found (Fig. 3). More specifically, the mean normalized number of citations per number of authors gradually increases from one to eight co-authors and then decreases again in case that only studies with 1-12 co-authors were taken into account (there are only few studies with more than 12 co-authors). The maximum number of mean normalized citations was found in publications with 8 co-authors (i.e. on average such publication received 4.6 citations per year). While much smaller values can be seen for publications with one or two authors: 1.4 and 1.9, respectively. Furthermore, the 7 highly cited publications with more than 30 citations per year had between 2 and 19 authors (i.e., Borrelli et al., 2017;Cerdan et al., 2010;Fu et al., 2011;Panagos et al., 2015;Quinton et al., 2010;Syvitski and Milliman, 2007;Van Oost et al., 2007). All other publications included in the database received less than 20 normalized citations per year (Fig. 3). Moreover, for the 30 most cited studies in GASEMT the number of authors range from 2 to 19 with a mean of 5.7. All single-authored articles have less than ten normalized citations per year (Fig. 3).
In addition, 8.5% of the papers included in the GASEMT database (Borrelli et al., 2021) have not yet received any citation. This value is close to the value report by Van Noorden (2017) that showed that approximately 10% of all published papers are uncited. Moreover, Ioannidis et al. (2019) and Van Noorden and Singh Chawla (2019) pointed out that the median self-citation rate in their global database was approximately 12.7%. According to the GASEMT and the Web of Science (WoS) database, 12% of the citations were attributed to self-citations, which corresponds well to the median self-citation rate of 12.7% (Ioannidis et al. (2019); Van Noorden and Singh Chawla (2019)). Therefore, both the non-citing papers and the self-citations of soil erosion modelling studies are close to the overall statistics of all papers published in the WoS.

Confounding factors for the number of citations in soil erosion modelling
The impact of different variables included in the enhanced GASEMT database (Borrelli et al., 2021) on the normalized number of citations was also studied. For this purpose, the boosted regression trees (BRT) model was applied. The variables that were included in the BRT model are listed in section 2. Quite surprisingly, the soil erosion model selection clearly has the largest relative impact on the normalized number of citations. Model selection is followed by the soil erosion modelling scale, publication's CiteScore, Scopus sub-subject category, continent and number of authors (Table 2). Other considered variables have, according to the results of the BRT model, no significant impact on the normalized number of citations ( Table 2). The sum of the relative impact of the variables soil erosion model used, modelling scale, and publication's CiteScore explained 86.9% of the total variable importance. Next, the sub-sections provided discussion about the impact of these variables. The impact of the publication's CiteScore, Scopus sub-subject category and number of authors was already discussed in section 3.1.

Soil erosion model
It is evident that the largest maximum number of citations include studies with RUSLE, WaTEM/SEDEM and USLE applications. However,    (Bakker et al., 2008;Feng et al., 2010;Quinton et al., 2010;Van Oost et al., 2000;Van Rompaey et al., 2005); however, all of them before 2010. The WaTEM/SEDEM is followed by the STREAM (e.g., Simonneaux et al., 2015), RHEM (e.g., Nearing et al., 2011), RUSLE2 (Sahoo et al., 2016), EROSION 3D (e.g., Routschek et al., 2014), EPIC (e.g., Gao et al., 2017), and PESERA (e.g., Kirkby et al., 2008) models. STREAM, RHEM, RUSLE2, EROSION 3D, EPIC, PESERA are used by less than 1.5% of studies/catchments included in the database. If one compares the USLE and RUSLE models, the (R)evised USLE model receives, on average, 0.8 more normalized citations per year than the original version. It should also be noted that the SWAT model is relatively widely used (i.e., in approximately 6% of papers in the database) and on average articles using this model receive more citations than the RUSLE and USLE models (Table 3). Moreover, Borrelli et al. (2021) showed that SWAT has become more popular among the soil erosion modellers in the last years. Papers with the highest number of annual normalized citations (i.e. > 13) using the SWAT model are Betrie et al. (2011), Gessesse et al. (2015 and Yesuf et al. (2015).
In absolute numbers, it can also be seen that the RUSLE model has the largest number of total citations (i.e., multiplying normalized citations and percent of database entries), followed by the WaTEM/SEDEM, USLE, SWAT and WEPP models. Moreover, the maximum number of normalized citations for the RUSLE and USLE models is also high. Therefore, many studies apply these models, but in many cases these studies are not very well cited. Therefore, the mean normalized number of citations is lower as in case of some other models. Additionally, Borrelli et al. (2021) concluded that the number of RUSLE model applications is increasing. The same also applies for the total number of studies in the GASEMT database as the number of soil erosion model studies in the post-2010 is increasing. It should also be noted that some of the highest erosion rated were predicted by the RUSLE and USLE models (Borrelli et al., 2021). Additionally, the median erosion rates predicted by these two models are also larger than for some other model (e.g., WaTEM/SEDEM) (Borrelli et al., 2021). Moreover, we investigated if the higher average number of citations depends on the self-citations of authors that are using specific models. The comparison was performed for the WaTEM/SEDEM, SWAT, RUSLE, USLE and WEPP models. However, self-citation in the case of specific models was similar where the maximum value was characteristic of the RUSLE model with approximately 8%. Other models had a self-citation rate of approximately 5%. Moreover, there are also some differences among the Scopus sub-subject categories and the most frequently used models. For example, the most frequently used models in the Water Science and Technology category are RUSLE and USLE, whereas the WaTEM/SEDEM model is only used in a small number of studies included in this category. A similar pattern can be seen for the Forestry, Geography, Planning and Development and General Earth and Planetary Sciences categories. On the other hand, in the Earth-Surface Processes category the RUSLE and USLE models are used less frequently. Additionally, some differences also exist in different publication types. For example, the WaTEM/SE-DEM model is only included in journal publications. Moreover, it seems that the USLE model is used in almost half of the publications that are published as book series and in approximately 40% of conference proceedings publications. While in case of journals, the USLE is used by 27% of publications. A similar pattern can also be seen for the RUSLE model. Therefore, one could argue that since USLE and RUSLE only account for the gross soil erosion rates, these types of models are more frequently published in book series and conference proceedings, and therefore have a smaller outreach. On the other hand, models that also account for sediment deposition and transport such as the WaTEM/SEDEM model could have a larger outreach since more processes are incorporated within the model.
A comparison of models used for soil erosion assessment in the Chinese Loess Plateau (Li et al., 2017) that used eleven empirical and process-based models showed that even for regional studies many different models are applied. Batista et al. (2019) investigated soil erosion models from the performance perspective and found out that different models do not systematically outperform each other. Validation or uncertainty evaluation is in many cases as important as the choice of a soil erosion model. Therefore, differences in the mean number of citations shown in Table 3 cannot be explained with better model performance of a specific model.

Scale and continent impact
According to the BRT model, the scale of the study and the investigated continent have an impact on the normalized number of citations. As one could expect, global studies, on average, receive many more citations than studies that are dealing with a specific local catchment or even performing soil erosion modelling on a regional scale (Tables 4 and  5 Yang et al. (2003). As pointed out by Borrelli et al. (2021) global scale studies were published both in mid-nineties (e.g., Batjes, 1996) and also in the recent years. Moreover, examples of highly cited Table 3 Mean and maximum normalized number of citations where different soil erosion models were used. Only models that were used in more than 15 publications are shown. Models are sorted based on the percentage of entries in the database.  (2015), Cerdan et al. (2010), and Panagos et al. (2015). Furthermore, it is also true that performing modelling on a global or continental scale does not guarantee a high number of citations since there are also studies with a relatively low normalized number of citations (e.g., Batjes, 1996;Borrelli et al., 2015). When comparing the mean normalized number of citations for different continents, it is evident that studies that focused on Europe, on average, receive more citations than studies that focused on catchments/areas located in other continents, even though the most studies are conducted in Asia (Table 4). The co-citation investigation results are presented in section 3.4 and based on these one could also assume that the higher average values shown in Table 4 are the result of co-citations. Moreover, Borrelli et al. (2021) also showed that higher erosion rates are generally characteristic of articles focused on Africa, Asia or even South America where areas with very high soil erosion rates can be found. Although it is true that some extremely high erosion rates can also be found in Europe (Borrelli et al., 2021). Therefore, the calculated erosion rate obviously does not have a direct impact on the normalized number of citations. It should also be noted that the GASEMT database (Borrelli et al., 2021) only included publications that were written in English. Thus, the actual number of soil erosion studies focusing on Asia is probably even higher (i.e. publications written in Chinese language). Quite interestingly, studies that focused on a regional and national scale do not, on average, receive more citations than studies that focused on a specific watershed or even those with a plot or hillslope scale (Table 5). It should be noted that the percentage of database records could impact the mean normalized number of citations in cases when these percentages are low.

Other variables with negligible impact according to the BRT model
Several other variables were also used as an input to the BRT model, but according to the model results, these variables do not have an impact on the normalized number of citations (Table 2). It is evident that papers focused on tillage and harvest erosion, on average, have slightly more citations than studies focused on water or wind erosion (Table 6). Multiple examples of highly cited papers focused on these two erosion agents can be found (De Alba et al., 2004;Quinton et al., 2010;Verstraeten et al., 2002). However, it is also true that tillage and harvest erosion are only investigated in less than 2% of the publications included in the database, and this limits the information that is needed to establish efficient management factors in agriculture, that preserve soil, yield and profit at the same time. As pointed out by Borrelli et al. (2021) these low percentages could be a result of the Scopus search criteria used.
On average, papers receive more citations if they address both the future and present or the present and past than papers that only address the present or the future (Table 7). Therefore, it seems that if two time periods are discussed, this yields on average more citations than if only one period is investigated. Moreover, one can also notice that the present, future and past all yield a relatively similar normalized number of citations (Table 7). This is a relatively surprising result because the terms "climate change" and "future projections" are hot scientific topics. For example, a Web of Science search for the topic "climate change" shows that number of papers that mention this topic are significantly increasing (i.e., 241 in 1990, 2655 in 2000, 11,630 in 2010 and 33,814 in 2018). A similar trend can also be found with the search "future projections" in the Web of Science. However, in the field of soil erosion modelling, focusing on future projections is obviously something that does not yield, on average, more citations than focusing on the past or present (Table 7).
Surprisingly, additional field activity or soil sampling does not have a significant impact on the mean number of normalized citations of publications included in the database according to the BRT model (Fig. 4). Although, the mean normalized number of citations is about 15% higher in case that soil sampling activities were conducted. As discussed by Borrelli et al. (2021) in-situ soil erosion measurements are the most common field activity related to modelling. Moreover, publications where the soil erosion model was calibrated receive, on average, 0.8 more normalized citations per year than publications with no model calibration. This is almost 30% higher number of citations for studies using calibration methods compared to the ones that do not include calibration. It should be noted that only 1/3 of GASEMT entries reported model calibration (Borrelli et al., 2021), which can be regarded as a relatively low number. Even though recent studies have argued that model calibration seems to be the main method for model improvement in the soil erosion modelling field (Batista et al., 2019), the soil erosion modelling community should give more focus in future to model calibration, evaluation and uncertainty assessments.
Different types of studies can have diverse ways of model calibration (e.g., sediment fluxes data at system outlet, remote sensing data). Moreover, Borrelli et al. (2021) emphasized that model calibration is most frequently performed with LISEM, SWAT, WaTEM/SEDEM and MMF (e.g., Bezak et al., 2015). These are also models that account for sediment delivery and not only gross erosion rates. SWAT, WaTEM/-SEDEM and MMF are also models that, on average, receive more citations than the more frequently used USLE or RUSLE models (Table 3). However, as already discussed, USLE and RUSLE studies can also be highly cited (Table 3), but the average values are lower because of the low citation rate of book series and conference proceedings where those two models are used quite frequently. Additionally, Batista et al. (2019) also pointed out that focusing on model validation should be replaced with the uncertainty assessment or model evaluation since no model can be completely valid because all models are only simplified representations of the environmental processes. This of course also applies to all other environmental models (e.g., Beven and Young, 2013). However,  since model validation is terminology still used in the field, this was included in the global review and statistical analysis performed by Borrelli et al. (2021). One could argue that model validation does not have a significant impact on the mean number of normalized citations (Fig. 4). Moreover, validation is performed in about half of the studies included in the GASEMT database (Borrelli et al., 2021). Plot-scale studies show higher level of validation (evaluation) and calibration (Borrelli et al., 2021). Therefore, the focus should always be on model validation and in-depth discussion of the results, as the incorrect use of model parameters can also lead to incorrect conclusions. However, it should be noted that absolute number of studies that report model calibration, validation, evaluation is increasing, while on the other hand the proportion of these studies in the GASEMT has been decreasing (Borrelli et al., 2021). Two additional variables, temporal model resolution and modelled area, were used as an input to the BRT model. Table 8 shows a comparison between the mean normalized number of citations for the different model temporal resolutions. Regarding the model temporal scale, it is evident that if the daily time step is used then such papers, on average, receive more normalized citations than publications where the model is applied on an annual or monthly time scale (Table 8). These differences can be related to the results shown in Table 3 because for example, the SWAT model can only be used for a daily time step, and the RUSLE and USLE models should be used for annual resolution. The same applies for the WaTEM/SEDEM model that can only estimate long-term average soil erosion rates (Borrelli et al., 2021). As pointed out by Govers (2011), care should be taken when performing soil erosion modelling because for example, the USLE model was developed for long-term annual soil loss assessments and not for short time period calculations. Gessesse et al. (2015) is an example of a study that used a daily time step model and has a large number of citations.

Most cited papers
The 20 most cited papers (i.e. top 1%) included in the database were analysed in more detail (Bakker et al., 2008;Benavides-Solorio and MacDonald, 2001;Betrie et al., 2011;Borrelli et al., 2017;Cerdan et al., 2010;Fu et al., 2011;Ganasri and Ramesh, 2016;Gessesse et al., 2015;Haregeweyn et al., 2017;Leh et al., 2013;Panagos et al., 2015;Parras-Alcántara et al., 2016;Prasannakumar et al., 2012;Quinton et al., 2010;Syvitski and Milliman, 2007;Van Oost et al, 2000Van Rompaey et al., 2005;Viglizzo et al., 2011;Yang et al., 2003). The threshold for the 1% top cited paper in the soil erosion modelling is 14 normalized citation. The most cited papers were selected based on the normalized number of citations. These papers were published in an almost 20-year time window. The number of authors ranges from 2 to 19 with an average of 6.4. Moreover, these papers were published in 17 different journals, which indicates that none of the journals has a dominant impact in the publishing of the most cited papers. If one investigates the affiliations of authors (countries) of these 20 most cited paper it is evident that the authors of the most cited papers are mostly from Europe or the United States. This presence of EU countries (e.g., Italy, Spain, Belgium, United Kingdom, Netherlands, Germany, etc.) could partly explain the higher normalized citations of publications that investigated EU areas (Table 4), as EU authors focus more on EU catchments/areas than on other places. It should be noted that some of the 20 most cited papers focus on global scale modelling, which means that authors from the EU took more initiatives to address this issue at global scale. Additional networking analysis is shown in section 3.4. Moreover, investigation of the most frequently used words in the titles of the 20 most cited papers about soil erosion modelling revealed that the words "land", "soil", "erosion", and "model" could be expected since the focus is on soil erosion modelling, but words such as "change", "impact", "risk" and "assessment" indicate that the most cited papers are either focusing on change/variability assessment or risk or impact evaluation. Furthermore, only one of the 20 most cited papers investigated a combination of present and future while other papers mostly focused on present and present and past.
We also investigated if any of the papers mentioned above were defined as either a highly cited paper or a hot paper according to the Essential Science Indicators by Clarivate Analytics. Hot papers, by definition, are papers that have been published in the past two years that received enough citations in May/June 2019 to put them in the top 0.1% of papers in each of the 22 academic fields. On the other hand, highly cited papers received enough citations as of May/June 2019 to be in the top 1% of the specific academic field based on the field threshold and publication year. Moreover, it should also be noted that there are some differences in these thresholds for different fields (Mikoš, 2017). Borrelli et al. (2017) is defined as hot paper by the above definition. Moreover, there are five highly cited papers included in the list of the 20 most cited papers in the soil erosion GASEMT database (Borrelli et al., 2021). These are Borrelli et al. (2017), Panagos et al. (2015), Cerdan et al. (2010), Fu et al. (2011 and Quinton et al. (2010) in the Environment/Ecology, Environment/Ecology, Geosciences, Environment/Ecology and Geosciences fields, respectively. Moreover, Wang et al. (2012) is a highly cited paper in the field of Agricultural Sciences, and according to the soil erosion modelling database is in the top 30 most cited papers based on the normalized number of citations. This indicates that papers focusing on soil erosion modelling are among the most highly cited and top papers in these fields, which shows the relevance of this topic for the wider scientific community (e.g. agriculture, ecology, geosciences).

Investigation of the relationship among papers about soil erosion modelling (VOS viewer)
Additionally, bibliometric networks using the methodology described in section 2.3 were analysed. The next two sub-sections present bibliometric networks from the perspective of journals and Fig. 4. Mean number of normalized citations per publication based on the field activity, soil sampling activity, calibration attempt and validation attempt. Numbers written at the top of bars indicate the percentage of entries in the database. "Yes" means that specific step was done, "No" means that this step was not carried out and "Unknown" means that it was not possible to determine if the step was done or not based on the information provided in the article. countries. As mentioned in section 2.3, only part of the database that is included in the Web of Science database was used as an input for the VOS Viewer software.

Journals
A citation analysis of the journals that are included in the soil erosion modelling GASEMT database indicates the relatedness of journals based on the number of times that they cite each other (Fig. 5;VOSviewer, 2019). It is evident that six different clusters have been identified (i.e., indicated by different colours in Fig. 5). Quite surprisingly, CATENA, where most of the papers included in the database are published, is on the edge of the central soil erosion cluster and more towards Climatic Change and Agricultural and Forest Meteorology. This observation confirms assumptions made in the section 3.3 that soil erosion modelling papers are also cited in other fields since these two journals are not among the 23 journals mentioned in section 3.1. Furthermore, at the same time this also means that articles published in CATENA often cite papers published by these two journals. However, it is true that CATENA has the strongest connection (i.e., line width) with the Journal of Hydrology, Hydrological Processes and Geomorphology. Moreover, it is true that CATENA in comparison to some other journals such as Science of Total Environment or Geomorphology receives, on average, less citations (section 3.1). Furthermore, it is also evident that journals are not clustered in the same way as they are categorised based on the Scopus sub-subject categories. For example, Hydrological Processes is clustered together with Land Degradation & Development, Landscape Ecology and Soil Science Society of America and not, for example, with the Journal of Hydrology or Hydrological Sciences Journal. A similar conclusion can be made for some other cluster/journals. Additionally, it is evident that journals, whose title starts with the word "environment" are clustered together (i.e., dark blue cluster group in Fig. 5).
A co-citation investigation, which reveals the relatedness of journals based on the number of times that journals are cited together (VOSviewer, 2019), identifies three different clusters (Fig. 6). Stronger connections exist between CATENA and Journal of Hydrology, Hydrological Processes, Earth Surface Processes and Landforms, Geomorphology and surprisingly also with Journal of Soil and Water Conservation (Fig. 6). The latter journal has relatively strong connections with Journal of Hydrology and Transactions of the ASABE.
The bibliographic coupling of journals with more than 250 citations where this kind of investigation shows the relatedness of journals based on the number of shared references (VOSviewer, 2019) shows four different clusters (Fig. 7). For example, CATENA has strong connections with Geomorphology, Hydrological Processes, Journal of Hydrology and Environmental Earth Sciences (Fig. 7). Otherwise, some of the identified connections are similar to those shown in Fig. 6.

Countries
Bibliographic coupling of countries with more than 12 documents in the database was investigated for their relatedness of shared references (Fig. 8). It is evident that three clusters have been identified, whereas one of the clusters only includes two members (i.e., Japan and Ethiopia). Quite interestingly, all European countries, except Turkey, which is partly in Europe and partly in Asia, are clustered together. This means that authors from Europe usually cite similar references, and these are at least to some extent different than the ones that authors from other countries are citing. Moreover, some regional European patterns can also be seen (i.e., position of the countries in the plot). For example, Italy and Greece or Belgium and the Netherlands are located close together. Moreover, the connection of the USA with China is stronger than the connection with European countries. Bibliographic coupling of organizations was also tested and three major clusters appear; first, there is a cluster with European organizations (mostly from Belgium and Netherlands), second, there is one with mainly Chinese organizations and third, there is one with mainly organizations in the USA. Therefore, it seems that reference lists in the field of soil erosion modelling are very regionally focused.
Additionally, citation analysis from the country perspective, which shows the relatedness of papers based on the number of times that they cite each other, identifies two clusters (Fig. 9). It is evident that one cluster includes all European countries (except Turkey) and the other cluster contains all other countries with more than 12 documents. Therefore, the pattern is very similar to the one shown in Fig. 8, which indicates that not only do European authors use similar references, but these papers are also cited by each other. Therefore, this kind of pattern could partly explain the results shown in Table 4, which show that papers focused on European areas/catchments, on average, receive more citations. It seems that papers focused on other continents often also cite papers from different continents, whereas for Europe this is more regionally based.
The co-authorship of papers from the country's perspective indicates a relatively strong connection between the USA and China (Fig. 10). Moreover, four clusters are identified, whereas one of these is composed only of European countries. However, France, Germany and the  Netherlands are located in a different cluster than most of the European countries. Therefore, co-authorship of documents is slightly more international, but still, as one would expect, some strong regional connections can be detected. A similar investigation was also performed from the organizations' point of view, and in this case, different organizations were more regionally clustered (e.g., Belgium and Netherlands organizations together, Chinese organizations together, etc.). Indeed, such results likely depend on the research funding in each nation, language of origin, similarity among environments and ability of the researchers to access given research funds.

Models
The citation and bibliographic coupling networks of the 12 most frequently used soil erosion models was also investigated (Figs. 11 and 12). It is evident that USLE, RUSLE, USLE-SDR and RUSLE-SDR are clustered into one group. This means that publications that discuss or apply these models often cite similar literature, and often cite each other, and this might be related to an inability of the authors to link their results to the newer models. This is an expected result since these interrelated models have the same theoretical background and were all developed based on the USLE model. The WaTEM/SEDEM model is, in both cases, (i.e., citation and bibliographic coupling analysis) clustered into a different group although soil loss calculations in this model are based on the RUSLE equation (Van Rompaey et al., 2001). In the case of the bibliographic coupling analysis (Fig. 11), the MUSLE model is also clustered in a one-group member while in case of the citation analysis, this model is part of a cluster with more models (Fig. 12). Other larger group of models mostly contain physically based models such as WEPP, LISEM or RHEM. Therefore, it seems that in terms of citations and bibliographic coupling some differences between more empirical-based and more physically based soil erosion models exist. Moreover, one could also expect that models that only account for the gross soil erosion would be clustered together and models that also account for sediment delivery would be in a different group. Obviously, this is not the case since for example, (R)USLE and (R)USLE-SDR are clustered together (Figs. 11 and 12).

Conclusions
We evaluated 3030 model applications published in 1697 articles included in the GASEMT database (Borrelli et al., 2021) in a rigorous bibliometric investigation. This study can be used as a metric benchmark for future erosion modelling studies as potential authors can measure the impact of their paper comparing with the proposed metrics here. However, it should be noted that the results presented in the scope of this paper should not be regarded as a guideline to prepare a highly-cited paper or to propose specific journals, models or other practices related to soil erosion modelling. These should be selected based on the aims of the study.
The largest percentage of studies (i.e. around 13% per category) were published in the Scopus categories "Earth-Surface Processes" and "Water Science and Technology" and these papers have, on average, higher number of normalized citations (i.e. more than 3 normalized citations per paper). We observed that soil erosion modelling community mostly published its studies in journals such as CATENA, Land Degradation & Development, Journal of Hydrology, Hydrological Processes or Geomorphology (i.e. in total around 20% of all studies in GASEMT). However, soil erosion studies are published in a wide range of journals. The journal Citescore has no significant impact on the normalized number of citations as their correlation is rather weak yet statistically significant (R 2 = 0.2; p-value < 0.0001). On the contrary, we noted that the model selection and the scale have an impact in the normalized number of citations. For instance, the WaTEM/SEDEM model received the highest number of normalized citations (i.e. 8.9 compared to 3.1 of RUSLE, 2.3 of USLE and 2.8 of WEPP). However, WaTEM/SEDEM is applied only to the 4.7% of the studies in GASEMT database compared to 17.1% of RUSLE, 13.9% of USLE and 6.4% of WEPP. The insights emerging from our investigation suggested that studies using more empirically based (e.g., USLE) and more physically based models (e.g., WEPP) are not citing each other and use different references.
Furthermore, the WaTEM/SEDEM model is clustered into a different group than the remaining most frequently used soil erosion models.
Regarding the scale, papers evaluating the global scale generally receive considerably more citations than papers focused on a continental, national, or smaller scale. However, we also observed that national scale studies, on average, do not receive more citations compared to local or watershed ones. Additionally, European studies have more citations than publications targeting other continents. European countries have high levels of co-citations and shared references, which could partly explain higher citation values.
The proportion of non-cited papers (i.e. 8.5%) and the share of selfcitations (i.e. around 10%) of soil erosion modelling community are in Fig. 11. Citation network of 12 most frequently used soil erosion models where the network shows the relatedness of soil erosion models based on the number of times that these cite each other. The size of the circle indicates a weight of the item, the lines indicate the links among items, the distance among items shows their relatedness and different colours indicate clusters. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.) Fig. 12. Bibliographic coupling network of 12 most frequently used soil erosion models where the network shows the relatedness of models based on the number of shared references. The size of the circle indicates a weight of the item, the lines indicate the links among items, the distance among items shows their relatedness and different colours indicate clusters. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.) line with the shares for all papers in Scopus. We observed that journal publications, on average in the field of soil erosion modelling, receive 6-12 times more citations than book series and conference proceedings. Soil erosion modelling publications are mostly co-authored by 2-6 people. Single-authored publications receive, on average, fewer citations. Concerning the co-authorship of publications, we observed some connections among some neighbouring countries (e.g., Belgium and Netherlands) while some connections were not expected.
Regarding the impact of field activity, model calibration and validation, the conducted investigations demonstrated that these attributes have an impact in increasing normalized annual citations by up to 30%. However, these attributes were not recognised as influential in case of the BRT model where impact of other attributes (e.g., model selection) was larger.
In a nutshell, this review reveals that soil erosion modelling is an important scientific topic, which attracts citations/readership from different fields. Additionally, this review identifies that field activity/ measurements, model calibration and evaluation using long-term measurements are to some extent appreciated by the scientific community, but additional focus should be given to these aspects in future. Moreover, different sources of uncertainty (e.g., Beven and Young, 2013) or study limitations should be presented in relation to the soil erosion modelling, which can be regarded as a way forward to have better studies that also receive more citations.