Finding the context indigenous innovation in village enterprise knowledge structure: a topic modeling

Indigenous people have deep local knowledge of environmental sustainability and natural resource utilization, which are sources of innovations that often are drivers for economic growth in rural areas. This study explores the knowledge structure of indigenous innovation in village enterprises through content analysis of research publications. The resulting knowledge structure can be used to set up a roadmap for the studies on village enterprise and in a broader context to build metadata as a foundation for an evaluation system of village enterprise. The authors deploy topic modeling and co-word analyses to scrutinize 775 village enterprise research articles from the Scopus database and 665 paper from ScienceDirect. In the topic modeling, topic models village enterprises are setup. The topics found are local ownership (such as market and property), land, services (housing, health care), economy and public policy, financial service micro-credit, environmental pollution control, local business sustainability, social entrepreneurship, and household income, bioenergy based electrification, and bumdes management. Four sectors of the natural resource-based indigenous economy were identified: traditional food production, bio-energy for fuel and electricity, agriculture, and tourism. The topic models are used to comprehend knowledge structure in the village enterprises, whereby the focus is to uncover the context of indigenous village enterprise and its states of the art.

the convenient number of topics, a few topic models are trained using different numbers of topics and evaluated them with measures of log-likelihood (Wallach et al., 2009), the perplexity (Blei et al., 2003), or semantic coherence of the topics (Mimno et al., 2011;Newman et al., 2009).
This paper attempted to find the structures of knowledge in the village enterprise research' publications and find the linkage to indigenous innovation. There are not many studies that take a research locus on indigenous village enterprise, so it is necessary to map the global empirical studies development. To this end, this study elaborates topic modeling to provide more insight into the knowledge structure of village enterprises based on publications from the Scopus database and Science Direct, with the end goal to build metadata on village enterprises. Because of its diversity, metadata is needed for better evaluation of the development of a village enterprise. The rationale for the development of a metadata is that the diverse indigenous knowledge generated from the village and village enterprises will certainly grow into indigenous innovation practices that needs an information system to manage the knowledges effectively.

Studies of indigenous knowledge and indigenous innovation
Indigenous knowledge studies have been used in various innovation studies (Appelbaum et al., 2016;Baskaran & Mehta, 2016;Capel, 2014;Huang et al., 2018;Jauhiainen & Hooli, 2017;Mika et al., 2017). Empirical evidence in most of Africa's population in the south of Sahara showed that indigenous knowledge forms the basic foundation of their innovation and invention (Ezeanya-Esiobu, 2019). The existing indigenous knowledge studies not only focus on increasing the economic development of a region but also on how an area's economy progress while maintaining the culture and the condition of the existing natural resources. Therefore, indigenous knowledge also functions in solving social issues in the region (Baskaran & Mehta, 2016;Capel, 2014;Padilla-Meléndez & Ciruela-Lorenzo, 2018). Indigenous knowledge studies are then expanded to define indigenous communities (Blackman & Veit, 2018;Curry et al., 2016;Karanasios & Parker, 2018;Makondo & Thomas, 2018;Padilla-Meléndez & Ciruela-Lorenzo, 2018). Besides, there are studies emphasizing the importance of local-traditional values and culture in the field of entrepreneurship and starts-up (Capel, 2014;Curry et al., 2016;Padilla-Meléndez & Ciruela-Lorenzo, 2018), also in the field of new and renewable energy which discusses the attitude of local communities to accept or reject foreign technology in their area. The issues are carbon emissions, renewable energy, land, water, and forest, and how local communities and governments manage energy and technology independently in their regions (Karanasios & Parker, 2018). These last issues then provide a place for the contribution of indigenous knowledge to solving global warming in various countries (Makondo & Thomas, 2018), including how the role of indigenous knowledge in maintaining forests in Latin American countries (Ecuador, Colombia, Brazil, and Bolivia) to reduce the increase in world carbon emissions (Blackman & Veit, 2018). In Kenya, in a rural area of Turkana, the use of indigenous knowledge by considering local culture and values tends to maintain local resources and preserved natural environments while stimulating economic development based on natural and local tourism. The challenge for local communities is to adopt new knowledge and technology to be aligned with existing indigenous knowledge (Ng' Asike & Swadener, 2015).
Page 4 of 15 Kusumastuti et al. Journal of Innovation and Entrepreneurship (2022) 11:19 Various studies have shown that government involvement is important in encouraging indigenous knowledge and indigenous innovation. Likewise, the existing institutional patterns, whether formed by the government or collective agreement, influence or maintain indigenous knowledge and indigenous innovation practices in an area (Li-Ying & Wang, 2015;Yang et al., 2014;Zhao et al., 2015). Adopting the success of indigenous practices as has been done by several studies is not necessarily direct. Each region and society has different local characteristics (thus many attributes), historical factors that have existed in the region, and various other factors that are very complex in nature, which become a consideration for not applying and adopting indigenous practices from one place to another. For this reason, it is important to carry out indigenous learning intensely and thoroughly with local adaptive context standards. Conducting research driven by local characteristics with many cases and bringing out each community's uniqueness in an area is important before indigenous practices elsewhere are implemented in new places (Nelson-Barber & Johnson, 2016). For example, a study from Padilla-Meléndez and Ciruela-Lorenzo (2018), which discusses female indigenous entrepreneurs, found that entrepreneurial practice in indigenous women's groups is not only a factor of common interests and goals of these women's groups but also the existence of social capital ties and also individual motives of the group. Although indigenous communities are important, indigenous factors that encourage economic entrepreneurship development also need to be considered. Even though local culture contributes to the development of female indigenous entrepreneurs, the values in it need to be corrected because of these groups' individual motives, which become the bonds of cooperation within these indigenous groups.

Materials and methods
The global research publications related to village enterprise were searched in the Scopus database and Science Direct. The phrase "village enterprise", "rural enterprise", and "bumdes" were searched in the topic field (title/abstract/keyword) in the database. The examined publications are until October 2020, using search script as follows: TITLE-ABS-KEY ("village enterpr*" OR "bumdes" OR "rural enterpr*" OR "rural busine*" OR "village fund*" OR "village owned enterprise*"). Titles and abstracts of all publications were carefully considered for relevance to village enterprise. Publications that were not related to village enterprise, or duplicate publications, were excluded from the results of the search query. As a result, a total of 1440 publications were retained for the next stage. Before analysis, the common preprocessing of the text consists of three steps: tokenization, stop word removal, and stemming. Tokenizing is the process of dividing the content of each text into a sequence of character strings called tokens. This will generate a token consisting of a single word before finally building the word vector. Stop word removal means eliminating the filler words that are often used, or often called stop word, which does not add value to the analysis. Stemming involves removing word endings to reduce vocabulary size, and words are returned to the root word (Porter, 2006). Depending on the objectives of a study, stemming can mean better results or an increase in errors (Manning et al., 2008). In this study, stemming is not used to get a more straightforward interpretation of the results.
Page 5 of 15 Kusumastuti et al. Journal of Innovation and Entrepreneurship (2022) 11:19 Topic modeling LDA is the most widely used topic modeling originally developed by Blei et al., (2003), which introduced the initial Dirichlet distribution to the topic-and-word document distribution, encoding the intuition that the document covers a number of topics and that topics use a set of words. This model can reveal the main topic of a corpus that can potentially be used to build knowledge structures in a domain of interest. This quantitative method does not offer the depth of contextual understanding that qualitative methods do. From a data set (a collection of documents or a corpus), LDA backtracks and determines what topics will make up the document. The corpus is represented as a matrix of terms in a document (DTM), which is generally very rare (sparse matrix). Reducing the dimensions of the matrix can improve the topic modeling results. For this purpose, preprocessing is necessary so that syntactically close words can be included in just one basic term. Figure 1 shows a graphical representation model of the LDA using plate notation, which illustrates the dependencies between model parameters. The plate box represents the text. The outer plate represents the document, while the inner plate represents the topic choices and repetitive words in the document.
The total probability of the corpus can be calculated by the formula: The LDA model is represented as a probabilistic graphical model in the diagram above. There are three levels to the LDA representation. M represents the total documents in the corpus, while N represents the number of words in a document. Parameters α and β are corpus level parameters; it is assumed that the sample is taken once in the process of producing the corpus. α is the parameter of the initial Dirichlet on the per-document topic distribution, β is the parameter of the initial Dirichlet on the word-by-topic distribution. The variable θ d is a document-level variable that represents the topic distribution for document d, which is taken once per document. Finally, the variables z dn (the topic for the nth word in document d) and w dn (the specific word) are word-level variables and are taken once for each word in each document. Topic model selection is carried out is based on the minimum perplexity value, which is defined as  Innovation and Entrepreneurship (2022) 11:19 In information theory, perplexity is a measure of how well a probability model predicts a sample to determine the statistical goodness of fit of a topic model (Blei et al., 2003). It can be used to compare probability models. A low perplexity value indicates a good probability distribution in predicting the sample which would give results that making it easier to interpret. Chang et al. (2009) showed models which achieve better predictive perplexity often have less interpretable latent spaces.
The pursuit toward a better method to tackle the interpretability issues of a topic model and the topic size determination was also directed to evaluating the semantic coherence of the topic models. Semantic coherence is a measure of the co-occurrence of highly probable words in a topic and has been shown to correlate with expert judgments of topic quality (Mimno et al., 2011). Newman et al. (2009) proposed a scoring model using pointwise mutual information (PMI) with external data to evaluate the semantic coherence of the topic models. The model was proposed from the facts that a topic has some odd-words in the list of top-ten words. The coherence of topics is calculated based on a sliding window with the size 10 and the pointwise mutual information (PMI) of all word pairs of the given top words. The coherence is the result from the arithmetic mean of the PMI values. The PMI of all a given word pair (w i ,w j ) is calculated as In this paper the selection of the models are based on the triangulation of the perplexity and the semantic coherence following Newman et al. (2009). The selected individual topics are evaluated and compared on its interpretability and the theoretical concept (Bonilla & Grimmer, 2013;Maier et al., 2018). The dynamics of the topics are calculated using labeled LDA (Ramage et al., 2009) which can classify a document with multiple labels, and is useful to study how the topics evolve over time.

Results and discussion
In the selection process of the topic model a grid search on several other topic sizes were carried out, namely, k = 5, 10, 15, 20, 30, 50, 60, and 75. This was carried out to gain knowledge about the granularity of large topics. From Fig. 2 (left), the optimal searched values of the perplexity were selected at three points, namely, the topic model with topic sizes 15, 30 and 60. These three topic models are supported by results from the coherence values calculation (right) which are then used as a basis for interpreting the knowledge structure on village enterprise.
In this study, the focus is placed on finding concepts in the text (publications) related to the indigenous rural enterprise. These concepts may be hidden in the large volume of research documents. For this reason, efforts were made to find the concept by modeling in various ways, such as modeling with several topic sizes (topic sizek). Modeling also considers finding these hidden concepts using a large enough topterm. The top term count is the number of main words that make up a topic, which is usually decided according to need. If the problem statement focuses on extracting a theme or concept, it is advisable to choose a higher number; if the problem statement focuses on extracting a feature or term, a low number is recommended. Table 1 provides a topic model with a value of k = 15 with meaningful topics, such as: traditional food production, local ownership (such as market and property), land, public services (housing, health care, retirement), economic policy, financial micro-credit, environment pollution control, employment, local business sustainability, electricity, women (gender) and household income, bumdes management, and public policy. Using a topic model with k = 30, some subtopics related to topics in the topic model with k = 15 can be constructed. For each topic, the intensity is calculated by summing the probability of every topic in each of the documents.
In the topic of traditional food production (A00), terms, such as management, supply chain, quality and cost are aspects of the topic. The topic of local ownership includes aspects, such as market, government, reform, privatization, property, rights, collective, institutional firms, political, control, and governance. Topic public/community services (A02) is covering aspects of housing, healthcare and retirement. Topic 03 is about the village economy, where the aspects are market in the agricultural sector, policy, growth, employment, population and labor. Topic 06 is a topic on electricity, whereby generation from biomass and biogas is dominant in a rural area's renewable energy generation. This topic has subtopics, as shown in Table 2, namely, the topic of electricity supply and demand system for households, business, and community (B17), biomass-based renewable energy technology for electricity (B03), and biogas production (B22). In the business topic (A07), the issues are access, sustainability, and resilience related to the environment. Topic A08 on the environment is closely related to water pollution and the river. Topic A12 on BUMDes management considers aspects of the community, financial system, technology support, planning, sustainability, challenges, and policy. Topic A05 on household income is related to women, education, and inequality. In Table 2 can be found subtopics of business (A08) from the topic model in Table 1, which are related to innovation (B18) and entrepreneurship (B24). The context "indigenous" was found related to sustainability as in topic B13. The terms in sequence reflect increasing weighted value: forest, landscape, ecological, natural resources, tourism, social, environmental, planning, conservation, species, cultural, land, and biodiversity. The interpretation would be that forest, landscape, ecological natural resources, biodiversity conservation, and embedded  Innovation and Entrepreneurship (2022) 11:19 cultural and social values should be integrated into the planning. The term indigenous is closely related to forest conservation, whereby there are contexts of policy and deforestation, and sustainability (Table 3). This would reflect that deforestation has an impact on indigenous people, whereby policy is needed for their sustainability. The knowledge structure based on the topic modeling includes term indigenous in the topic of conservation and forest using a topic model with k = 60 (C28), as shown in Table 3, while the term forest itself is also embedded in the topic on "environment" and "climate change".   Government, housing, urban, family, planning, women, jobs, million, township, capital, local, workers, residents, urbanisation, 1980s, conditions, market, homes, especially, provide 36.83 B12 Social, networks, network, capital, local, process, digital, internet, knowledge, access, mobile, relationships, resources, communities, population, operation, central, innovation, transformation, regional 29.33 B13 Sustainable, forest, sustainability, landscape, ecological, natural, tourism, social, environmental, planning, conservation, forests, species, cultural, resources, cover, land, private Innovation and Entrepreneurship (2022) 11:19 The dynamic of the topic models is illustrated using topic model with size k = 60, as shown in Fig. 3, whereby the intensity for each topic is calculated using the terms' frequency. Topics of forest, tourism, biomass, land-use change, and electrification   , electrification, solar, renewable, access, demand, benefits, supply, community, India, generation, biomass, communities, households, micro-enterprises, cost, approach, consumption, grid, mini-grids 20.856 0.912 C45 Land, industrial, urban, spatial, cultivated, transition, city, construction, pattern, structure, changes, types, industrialization, land-use, change, patterns, towns, Beijing, built-up, residential 18.794 − 0.986 C49 Tourism, life, residents, examines, living, homes, farmers, location, form, hosts, typical, opportunities, partnership, demand, improving, tourists, understanding, scale, space, home 15.447 − 1.295 C59 Biomass, climate, change, carbon, environmental, sustainable, fuel, emissions, crop, ecosystem, sources, production, resources, global, changes, vulnerability, renewable, wood, increase, crops 17.543 1.808 C26 Innovation, business, firms, role, networks, institutional, products, support, innovations, product, marketing, businesses, external, performance, trust, jatropha, technology, orientation, factors, findings 17.034 − 0.599 Page 11 of 15 Kusumastuti et al. Journal of Innovation and Entrepreneurship (2022) 11:19 from renewable energy are compared with list of the top-terms are given in Table 3.
Most of the topics selected has coherence higher than the mean value of the topic model's coherence (− 0.542). Two selected topics are below the mean value, but have higher relevance and easier interpretability. In Fig. 3, the topic of tourism (C49) has shown significant intensities in the 90 s, but since 2003, it has continued to experience a decrease in intensity. On the other hand, the topic of electrification based on renewable energy (C35), especially solar and biomass, continues to develop. The topic of forest (C28) contains aspects of conservation, biodiversity and the utilized species, indigenous, and population, forestry enterprise, private ownership and deforestation. By investigating some selected full-text papers, a relationship can be identified between forestry enterprise and indigenous innovation, namely, through innovation in developing viable businesses for non-timber forest products (de Beer & McDermott, 1989;Matias et al., 2018;Meinhold & Darr, 2019). The term "innovation" in the topic model is embedded in the topic of business innovation (C26), wherein the keywords are such as network, product marketing and technology. These keywords reflect the research goals in investigating a sustainable indigenous rural enterprise through development of network, institutional arrangement, product marketing and the use of technology. This finding reflects that there are responses from the indigenous people for their resilient livelihood against their vulnerability related to climate change by preserving natural resources and landscape (environment) and forest conservation. The development of tourism in forestry areas is closely related to indigenous people's efforts in supporting forest conservation against land conversion and maintaining biodiversity in the forest, especially the utilized species. In addition, by preserving indigenous traditional food production, they can set up a sustainable market for local products and diversifications. This potential emerges as they developed indigenous knowledge in the use and management of such natural resources. Diversification  Innovation and Entrepreneurship (2022) 11:19 from farming (agriculture) can lead to indigenous food production, local tourism services, and the local generation of electricity (such as biogas and biomass). Indigenous innovation in the field of traditional food production is a form of innovation to strengthen the food security of the surrounding community by involving most of the community. Innovations in the forestry and land sectors generally aim at preserving nature, conserving and even using forest products by relying on existing local wisdom. When the community is involved in public services, the main goal is not only to get closer to services, but also to improve the welfare of the local community, usually by streamlining costs and also expanding access to services. In the field of empowerment of small community businesses, the area of study revolves around access to capital, networks, increasing skills and business knowledge of the business actors and also the empowerment and sustainability of local businesses. In terms of basic community needs, indigenous innovation is seen in the form of procurement, electricity, water and the need for employment. In addition, the study of women's empowerment in business is an interesting issue besides leadership, succession in family enterprises, household income and employment. The focus of the study that has received quite a lot of attention is also government policies and programs that can be used as leverage in regional development.
The vision of viewing local resources as opportunity for building indigenous village enterprises should be adopted by all residents, businesses and local government, whereby technology supported innovation will be key to the growth and sustainability. The study's results have shown that government involvement is important in encouraging indigenous knowledge and indigenous innovation through related policy and program development, such as setting up a microgrid based on renewable energy. The local community maintains sustainable and profitable operations from day-to-day electricity use to support a village enterprise. To expand the social entrepreneurship, use the surplus to develop local ecotourism to increase people's welfare. In developing a village enterprise, efforts to maintain business and environmental sustainability include land, forest, water, and river, whereby key terms are technology, innovation, and entrepreneurship. All these key terms described above are identified in Fig. 4, whereby a knowledge structure is attempted to be visualized from the extracting abstracts of the collected articles based on results from topic modeling.
The results show that the topic modeling provides the knowledge structure of the content of researches on village enterprises. The results also show various aspects of indigenous village enterprise, whereby the state of the art is on forest conservation and related environment, ecology, biodiversity, and species sustainability because of the deforestations. Policy development is critical, whereby one of the aspects is the ownership issue (see Table 3). The hidden knowledge that this study found is the identification of economic potentials related to natural resource based indigenous innovations in the sector of agriculture, traditional food production, ecotourism, and renewable energy electrification. To this end, research on the village enterprise should consider and focus on these potentials to elevate the rural economy.

Conclusion
Innovation is a critical factor not only for economic development, social life but also environmental sustainability. The high interest on village entreprise study should be effectively managed by looking at indigenous innovation which is one of the critical factors. There is a gap in the literature that maps the indigenous innovation within the study of the village enterprise. To this end, the authors deploy topic modeling to extract village enterprise research articles from the Scopus and Science Direct repository. The hidden knowledge that this study found is the identification of economic potentials related to natural resource based indigenous innovations in the sector of agriculture, traditional food production, ecotourism, and renewable energy electrification. This finding reflects that there are responses from the indigenous people for their resilient livelihood against their vulnerability related to climate change by preserving natural resources and landscape (environment), and forest conservation. This potential emerges as they developed indigenous knowledge in the use and management of such natural resources. Research on the village enterprise should consider these potentials to elevate the rural economy. Text analytics using topic modeling application produces knowledge structure of indigenous innovation on the village enterprise through content analysis of research publications from the Scopus and Science Direct repository. This provides a clear and structured picture that empirical studies from indigenous innovation range from traditional food production, local ownership (such as market and property), land, public services (housing, health care, retirement), economy policy, financial micro-credit, environment pollution control, employment, local business sustainability, electricity, women