Topics and trends in Mountain Livestock Farming research: a text mining approach

Pasture-based and small-scale livestock farming systems are the main source of livelihood in the mountain primary sector, ensuring socioeconomic sustainability and biodiversity in rural communities throughout Europe and beyond. Mountain livestock farming (MLF) has attracted substantial research efforts from a wide variety of scientific communities worldwide. In this study, the use of text mining and topic modelling analysis drew a detailed picture of the main research topics dealing with MLF and their trends over the last four decades. The final data corpus used for the analysis counted 2 679 documents, of which 92% were peer-reviewed scientific publications. The number of scientific outputs in MLF doubled every 10 years since 1980. Text mining found that milk, goat and sheep were the terms with the highest weighed frequency in the data corpus. Ten meaningful topics were identified by topic analysis: T1-Livestock management and vegetation dynamics; T2-Animal health and epidemiology; T3-Methodological studies on cattle; T4-Production system and sustainability; T5-Methodological studies; T6-Wildlife and conservation studies; T7-Reproduction and performance; T8-Dairy/meat production and quality; T9-Land use and its change and T10-Genetic/genomic studies. A hierarchical clustering analysis was performed to explore the interrelationships among topics, and three main clusters were identified: the first focused on sustainability, conservation and socioeconomic aspects (T4; T6 and T9), the second was related to food production and quality (T7 and T8) and the last one considered methodological studies on mountain flora and fauna (T1; T2; T3; T5 and T10). The 10 topics identified represent a useful and a starting source of information for further and more detailed analysis (e.g. systematic review) of specific research or geographical areas. A truly holistic and interdisciplinary research approach is needed to identify drivers of change and to understand current and future challenges faced by livestock farming in mountain areas.


Introduction
Mountains occupy a quarter of the Earth's solid surface and host almost one billion people (Wymann von Dach et al., 2018). Human activities have been shaping mountain environments for millennia, leading to the transformation of natural ecosystems into agro-pastoral ecosystems (Faccioni et al., 2019). Mountain agro-pastoral ecosystems regulate water flows and quality, mitigate the consequences of natural hazards and support the delivery of material and non-material benefits such as food, biodiversity, attractive landscapes and mental well-being (Grêt-Regamey et al., 2012;Martín-López et al., 2019). It is increasingly recognized that low-input farming practices, such as pastoralism and livestock grazing, often in combination with grain and vegetable cultivation, maintain and enhance diversity of species, habitats and landscapes (O'Rourke et al., 2016). In mountain areas low temperatures, short growing seasons, limited sun exposure, topography and lack of organic soil further constrain the possibility to carry out many types of agricultural activities in an intensive and disruptive way (Briner et al., 2013). Indeed, these environments currently do not offer many profitable agronomic alternatives to forage production and the utilization of local grassland resources (Cozzi and Bizzotto, 2004).
Despite the above-mentioned constraints and the related marginal economic value of mountain primary production as compared to that in the lowlands (Wymann von Dach et al., 2013), the positive interdependence that exists between mountain agroecosystems and livestock farming has attracted research efforts from a wide variety of academic communities worldwide. However, this variety can also become a constraint when trying to truly acknowledge these research outcomes. This study aims at mapping mountain livestock farming (MLF) research by carrying out a deep analysis of topics and trends in the available scientific literature over the last 40 years, to inform future research actions and collaborations. Systematic literature reviews are useful tools for understanding the current state of an issue and for informing further studies (O'Connor and Sargeant, 2015). However, they involve the identification, appraisal and synthesis of all relevant studies investigating a defined theme and require significant resources and time. Text mining and topic modelling analysis represent suitable alternatives (Li et al., 2016) for reducing the burden associated with document screening as they produce, fully unsupervised, a structured map of textual knowledge Park and Kremer, 2017) by uncovering recurrent topics and latent themes in large sets of documents.
The present paper summarises the main outcomes of a text mining and topic modeling analysis on the available scientific literature carried out by the European Federation of Animal Production (EAAP) across commissions working group on MLF. By identifying the main explored topics and their trends, this work will update the scientific community on the developments of research on MLF in the different geographic areas of the planet, and it will serve as knowledge platform for future research actions and collaborations.

Identifying relevant papers
A literature search protocol was set up to identify the peer-reviewed papers dealing with mountain livestock farming using Scopus®, the abstracts and citation database of Elsevier©. The bibliographic search was developed starting from three terms (i.e. "mountain", "livestock" and "farm*"). Few additional terms were added according to the expertise of the authors in order to keep the search as broad as possible and capture the largest number of papers related to MLF (Fig. 1). The terms were searched for in article title, abstract and keywords fields. Some descriptive statistics of the selected records were calculated to profile the scientific corpus, based on information retrieved from Scopus and SCImago database, which provide an overview of international research output and scientific influence, respectively. The information considered were publication year, publication source and its quartile within the scientific journal ranking in the year of publication. Geographic localisation of each record was set based on the affiliation of the corresponding author or first author. Text mining Text mining analysis was performed in order to identify the main words of the data corpus and their associations. This technique converts the text into a numeric information and analyses the word frequency distributions (Sebastiani, 2002).
To pre-process the text data, words were converted to lowercase; any stop words, punctuation, blanks and numerical digits were excluded. In addition, the main words used for the papers selection procedure (mountain, livestock, farm) were also removed from the dataset to avoid poor discriminative information due to their presence in almost all the abstracts retrieved. A term frequencyinverse document frequency technique (TF-IDF)was applied to weigh the number of times a word appears in an abstract adjusted for how common or rare the term is across all abstracts (Salton and Buckley, 1988). This approach aims at reflecting how important a given term is in the whole collection of documents. This first text mining step provided infrastructure for constructing a corpus of documents and to transform a corpus to a document-term matrix, which is the input data for topics modelling.

Topics modelling and hierarchical clustering
Topics modelling analysis is a tool to uncover the structure of meaningful themes among collections of documents as well as to discover hidden textual patterns. Latent Dirichlet Allocation (LDA), one of the most popular approaches to perform topic modelling analysis, was applied to pursue text mining of the corpus of abstracts. The LDA is a Bayesian probabilistic approach that leads to discover a set of thematic topics from words that tend to occur together in a document (Grün and Hornik, 2011). A single topic can be described as a multinomial distribution of words, and a single document can be described as a multinomial distribution of latent topics. This model gives both topic representations of all the documents and word distributions of all the topics, in an iterative process implemented using Gibbs sampling. At the end of the iterative process, a posterior distribution was calculated to estimate the proportion of words assigned to each topic within a document and the proportion of words associated with each topic in all documents.
We used LDA function with Gibbs sampling option of the topic models package in R (Grün and Hornik, 2011). The default parameters supplied by the LDA function were used; Gibbs sampling parameters were only set to obtain reproducible results and to avoid correlation between subsequent runs. The number of topics needed to be fixed a priori. Because the number of topics is in general not known, models with several different numbers of topics were fitted, and measures of evaluation were calculated (log-likelihood and perplexity), selecting 10 topics as a cut-off. Each document was assigned to a topic with the highest probability.
To explore the relationship between topics, hierarchical clustering analysis was then performed and a cluster dendrogram was generated. Topics were clustered based on the topic-word matrix, which was transformed to binary data with a 1/0 to indicate presence/absence of a word in a given topic. The distance among topics was calculated based on the Jaccard distance, and the average linkage method was applied with an agglomerative clustering algorithm to generate the cluster dendrogram. The automatic truncation that ultimately assigned each topic to a cluster was based on entropy (i.e. the lower the entropy, the more stable the cluster). The analysis was performed with XLSTAT (Addinsoft, v 2014).

Results
The string used for the literature search identified 2 893 records. These items were then subjected to a manual screening to exclude incomplete records. Reasons for discharging records were no abstract available (n = 39), no author reported (n = 8), no title source available (n = 3), duplicated (n = 18) and document-type erratum (n = 2). The database was further filtered by only considering the available publications from years 1980 to 2018. Finally, 2 679 documents were retained ( Fig. 1), of which 92% were scientific publications.
The majority of papers (61%) were published in scientific journals ranked in the top quartile (Q1) of their subject category in the year of publication, while the documents ranked as Q2, Q3 and Q4, according to quartile of the journal, accounted for 22, 13 and 4%, respectively. Small Ruminant Research (Q2) was the journal with more than 100 published documents followed by Journal of Dairy Science (Q1) with almost 80 documents, PLoS ONE (Q1) with more than 40 documents and Mountain Research and Development (Q2) with 40 documents.
The text mining exercise kept 1 441 relevant words for subsequent analysis according to the TF-IDF ponderation system, with milk being the most frequent word (Fig. 2) followed by goat, sheep, population, genetic, product and graze, which were assigned a TF-IDF value of at least 25. According to the affiliation of the corresponding or first author, the majority of documents was produced in Europe (54%), with Italy being the most frequently represented country (214 documents). Both Asia and North America accounted for 18% of the scientific output. South America, Africa and Oceania produced 5, 3 and 2% of the documents, respectively.
Beyond recurrent themes linked to the search string such as T3-Methodological studies on cattle and T10-Genetic/genomic studies, the topic analysis (Table 1) highlighted issues such as T1-Livestock management and vegetation dynamics; T2-Animal health and epidemiology; T4-Production system and sustainability; T5-Methodological studies; T6-Wildlife and conservation studies; T7-Reproduction and performance; T8-Dairy/meat production and quality and T9-Land use land use change.
The most frequent topic in MLF research was related to production system and sustainability (T4) and was represented by 363 documents, followed by T10 which was defined by 351 papers investigating genetics/genomics. The least popular was related to methodological studies on cattle (T3), which merged 101 documents. Considering the dendrogram produced by cluster analysis (Fig. 3), topics were sorted in three main clusters at the height (indicating the distance between clusters) of 0.98: cluster 1, grouped sustainability, conservation and socio-economic topics (i.e. T4, T6; T9); cluster 2, grouped topics related to livestock performance and food production and quality (i.e. T7 and T8) while cluster 3 grouped methodological studies on mountain flora and fauna (i.e. T1, T2, T3, T5 and T10). Cluster analysis grouped topics that had different trends throughout the study period. Clusters 1 and 3 were composed of topics that had both increasing and stable trends, while Cluster 2 encompassed only topics with steady trends.
The affiliation of the first or corresponding author was used to define the country in which the studies were carried out. Affiliates to European research institutions were the most active on all topics, while the scientific contribution from other continents was more focused on specific topics (Fig. 4). More than 80% of publications allocated to T3-Methodological studies on cattle, were produced by European academics and only T6-Wildlife and conservation studies, saw less than 50% of the scientific production originating from European research communities. North American affiliates contributed to more than 20% of the scientific production in T2-Animal health and epidemiology, T5-Methodological studies and T6-Wildlife and conservation studies. Asian academics were more productive in T4-Production system and sustainability and T10-Genetic/genomics studies, publishing about 25% of the scientific production allocated to those topics. Affiliates to South American, African and Oceanian research institutes contributed with an average of 10% of publications across all topics.

Discussion
This paper aimed at mapping available scientific knowledge on MLF. Although this approach can be seen as methodological-orientated, such identification of the most explored research topics and their trends over the last 40 years represents a useful starting base of information to scientists for further studies. Once the main topics are identified within a large number of documents using our approach, a systematic review of the papers allocated to each topic could then be carried out for further in-depth analysis. Indeed, literature reviews are useful tools for understanding the current state of an issue and for informing further studies (O'Connor and Sargeant, 2015). When performing systematic literature reviews, scholars identify a specific research question to be answered through document reading and manual data extraction. However, when broader research questions are asked and larger document collections need to be screened, such as in this paper, text mining and topic modelling analysis, represent suitable methods for information retrieval Park and Kremer, 2017).
Similar to global bibliometric trends (Bornmann and Mutz, 2015), the number of scientific outputs in MLF has doubled every 10 years since 1980. In addition to the widespread use of English for publication purposes even among non-native speakers, a reason for this sharp increase has been linked to the adoption of bibliometric indicators for the evaluation of scientific output (Fanelli and Larivière, 2016) and to the emergence of new academic communities in countries of the Global North and South, respectively. In fact, according to the Indicators Report Table 1 The most probable 25 words defining the 10 topics emerged with Latent Dirichlet Allocation (LDA). Classification of topics according to hierarchical clustering is also shown.  of the National Science Foundation (National Science Foundation (NSF), 2018), in BRIC countries (i.e. Brazil, Russia, India and China), the number of scientific publications is growing at a fast pace due to the availability of research funds and to the increasing science and technology capacities. The increasing share of publications on T4 and T10 from Asian scholars is in line with this hypothesis (Yuan et al., 2018). Pastoralism and extensive rearing of domestic herbivores are the most frequent farming systems found in harsh mountainous environments (Battaglini et al., 2014); therefore, it is not surprising that the text mining analysis identified milk, goat and sheep as terms with the highest weighed frequency in the data corpus. Additionally, the majority of the papers came from Europe, so the agricultural policies over that same period probably need to be considered, as research priorities and industry concerns would have been somehow aligned. For example, reproduction and performance (T7) and land useland use change (T9)attracted most of the European research interest between 1980and 1990. Indeed, in 1980, the sheep and goat support regime within the EU Common Agricultural Policy (CAP) was implemented (Gordon et al., 1993), as well as the 1975 Less Favoured Areas directive (Brassley and Lobley, 2003) for challenged mountain areas in particular. This could explain the increasing number of studies relating to these topics. In most recent years, a relevant number of studies have been related to emerging topics of agricultural and environmental sciences, such as production systems and sustainability (T4) and livestock management and vegetation dynamics (T1). This points at the essential role played by livestock farming systems in the management and protection of a fragile environment such as that of mountains (Grêt-Regamey et al., 2012). Agricultural and rural policies orientations as well as associated research priorities for these areas can also explain this emergence, especially at European level. Indeed, the CAP reform in 1992 (Gardner, 1996) and the Agenda 2 000 reform saw the introduction of agri-environmental measures and rural development programmes support measures (Ackrill, 2000). In particular, the latest iteration of the CAP focused on protecting food quality, preserving the landscape and biodiversity and enabling vibrant rural areas (European Commission, 2018). Whilst this may not explain all the trends observed in these research topics, agricultural policies at European level would certainly have had an impact on the publications trend. For instance, in Europe, many of the animal-derived food from mountain regions are dairy products, often safeguarded and promoted by official EU or national quality schemes that protect their traditional production method and origin (Zuliani et al., 2018). New analytical technologies Fig. 4. Scientific production (n. of publications) by topic T (Table 1), continent and year (1980-2018).
A. Zuliani, B. Contiero, M.K. Schneider et al. Animal 15 (2021) 100058 (Coppa et al., 2012;Segato et al., 2019) such as those aiming at assessing product authenticity are often applied to European niche products, centering and almost limiting the scientific production on dairy/meat production and quality (T8) to Europe especially in the years that follow the launch of Agenda 2000 and food-quality schemes. Additionally, in recent decades, the growing number of scientific articles has coincided with the establishment of more international collaborations (Bornmann and Mutz, 2015). In this study, the affiliation of the first or corresponding author was used to define the country in which studies were carried out. In the context of increased international collaborations, this assumption could have led to a geographic misrepresentation of MLF scientific output in favor of universities of the Global North, which may carry out international research projects also in countries of the Global South. Nonetheless, collaborations between academic communities of different geographical areas may have brought new perspective and tools to established research approaches and contributed to shifting the interest of MLF research from traditional topics such as reproduction and performance (T7) and methodological studies on cattle (T3) to innovative ones such as production system and sustainability (T4). Moreover, cluster analysis may suggest how different disciplines and expertise could work in an interdisciplinary manner to get a better picture of new challenges and opportunities for mountain livestock farming. For example, scientists focused on farming systems and sustainability (T4) could benefit from collaborations with experts working on land useland use change (T9) and wildlife and conservation (T6). Similarly, traditional MLF studies (e.g. T3-Methodological studies on cattle) could use new approaches derived from genetic/genomic studies (T10) to optimize livestock robustness and resilience in a changing environment (Friggens et al., 2017). In addition, given the rich heritage of practices and traditions of mountain communities worldwide, the use of participatory approaches and the integration of traditional knowledge into scientific frameworks would further strengthen the impact and relevance of future research efforts on MLF.

Conclusion
The use of text mining and topic modelling analysis allowed to draw a unique and detailed picture of the main research topics dealing with MLF and their trends over the past 40 years. The 10 topics identified represent a starting source of information for further and more detailed analysis (e.g. systematic review) of specific research or geographical areas. Topics were grouped in three clusters: the first focused on sustainability, conservation and socioeconomic aspects, the second covered food production and quality, and the last one was related to methodological studies on mountain flora and fauna.
New collaborations between academic communities and disciplines, as well as changing policies orientations and research priorities, may have contributed to shift the interest of MLF research from traditional topics such as reproduction and performance and methodological studies on cattle to innovative ones such as production system and sustainability. Based on the knowledge platform set by this work, further integrations between research communities and disciplines may provide a sound interdisciplinary perspective for a deep understanding of current and future challenges faced by mountain livestock farming worldwide and help communicating its role for the conservation of mountain agroecosystems and the well-being of communities far beyond mountain areas.

Ethics approval
Not applicable.

Data and model availability statement
None of the data/models were deposited in an official repository but are available upon request.

Declaration of interest
None.