Predicting determinant factors and development strategy for tourist villages

Tourist village program is one development priority program for rural development. Despite numerous opportunities to develop tourist villages such as the availability of natural resources and high demand for tourist villages recently, some challenges are still faced to develop tourist villages, especially in a developing country such as Indonesia. Governance problems, infrastructure, and effective partnership are among other factors that remain challenging in developing tourist villages. This study attempts to identify factors that determine the state of tourist villages in Indonesia and determine the appropriate strategies for better tourist village development. Using the case of tourist villages in Kedung Ombo, Central Java, a water based attractive tourist village, this study uses both machine learning and multicriteria approaches by means of Promethee in order to address the objective of the study. This study shows that government support, application of information technology, infrastructure, local participation, partnership, and attractive variations, are among the determinant factors that affect tourist village development. The study also reveals that the appropriate strategies for tourist village development include, improving infrastructure, institutional strengthening, and capacity building. This study could be used to assist local national as well as sub-national governments to effectively manage tourist villages in Indonesia.


Introduction
Rural tourism is becoming a trend, especially in developing countries as a manifestation of the concept of community-based tourism which is believed to be able to counteract the negative impacts of mass tourism related to social equality, environmental degradation and efforts to save community culture (Muganda et al., 2013;Khalid et al. al., 2019). Rural tourism is present as a vector of sustainable development which is capable of generating jobs and income, combating rural exodus, becoming a proposal for socio-economic networks, saving and enhancing cultural and natural heritage, and improving the quality of life for local residents (Rodrigues et al., 2021;Powell et al., 2018), and developing communities (Basile et al., 2021). Rural tourism is also considered to be able to support development in rural areas that are structurally weak (Neumeier & Pollermann, 2014). Supporting this, Gohori & van der Merwe (2020) state that there is a reciprocal relationship between rural tourism and poverty alleviation and community development. Related to sustainable tourism, Sharpley & Roberts (2004) stated that rural tourism is synonymous with sustainable tourism both in its nature, scale, character, and development process.
Regarding the understanding of the tourist village, there is actually no single definition of a tourist village, and there is little agreement on what it means (Tang, 2022). Researchers from different countries develop their own definitions based on the unique experiences or contexts they encounter (Nair et al., 2015). Just as in the definition, in terms of classification or typology of tourist villages as well as parameters for the development of tourist villages, there are actually no standard and universal guidelines (Tang, 2022). Rural tourism has a different scale, character, and function between different countries (Sharpley & Roberts, 2004). However, there are several studies that explain the parameters of the progress of tourist villages that can be used as a basis for analysis in tourism village research. Amin and Ibrahim (2015), Yu et al. (2018) and Kantsperger et al. (2019) stated that community involvement is a key factor in tourist village development (TVD). The community must be fully involved in the decision-making process (Powell et al., 2018). TVD must be owned, managed, and fully controlled by the community (Mtapuri & Giampiccoli, 2013). Another determining factor is the role and support of the government (McLennan et al., 2014). The government plays a role as a driver in the process of developing partnerships among stakeholders, developing and overseeing the strategic direction of tourist villages (Koopmans et al., (2018); Liu et al., (2020). To succeed, tourism villages require public partnerships both in the local and global contexts (Purbasari & Manaf, 2018). Research conducted by Kristianto et al. (2019) found the critical success factors for developing a tourist village include attractions, facilities, accessibility, image, human resources, and tourism prices. Another factor is infrastructure, including homestays (Bhalla et al., 2016). In line with the development of the internet, ICT (Information and Communication Technology) has become a very decisive factor, both as a promotional media and transaction facility (Waghmode & Jamsandekar;Hidayatullah et al., 2018;Kamel & Atiya, 2008).
In Indonesia, rural tourism is manifested in the form of tourism village development, which since 2021 has been set by the Coordinating Ministry for the Economy to be the tourism development direction for boosting economic growth for society's welfare, eradicating poverty, overcoming unemployment, preserving nature, the environment and resources, and promoting culture in rural areas. The development of tourist villages is expected to be one form of acceleration of integrated village development to encourage social, cultural, and economic transformation of the village. The success of the tourist village will be a lever for the village and regional economy which will ultimately encourage national economic growth. One of the areas in Indonesia that has been developed as a tourist village is the villages in the Kedung Ombo area of Central Java. There are eight tourist villages in this area. Against the background of the limited benefits of reservoirs for communities in the upstream areas, several community groups take advantage of the reservoir panorama as a tourist attraction. However, although it has been developed over the years, until now these efforts have not shown significant progress. Instead of creating alternative employment opportunities, these villages have not even been able to bring in sufficient visitors.
One of the suspected causes is the pattern of development which is based on a conventional approach that focuses on in situ characteristics. Although this approach has advantages in identifying local needs, it is weak in understanding the hidden factors that are very likely to determine the development of tourist villages. In addition, the absence of a strategy on which to base long-term development directions is also not consistently available. This study aims to analyze the determinants of the success of tourist villages and find the right strategy for the development of tourist villages, especially in the Kedung Ombo area and Indonesia in general.

Materials and methods
This research is predictive using a modern approach, namely machine learning for data analysis. Research data in the form of profiles of tourist villages in Indonesia with a total of 134 villages were obtained from big using Google Search. The tourism village profile includes the seven attributes in Table 1. Through the data mining method, the data is processed by machine learning using Orange 3.3.0 software. Data mining is a data acquisition method and information collection methodology that can guide decision making efficiently by extracting and analyzing accumulated datasets (big data) in order to obtain useful knowledge (Adekitan et al., 2019). The use of machine learning has been widely used in the field of tourism, including to predict tourism demand (Ahmed et al., 2007;Li, 2022;Yu & Chen, 2022), marketing strategies for rural tourism (Xie & He, 2022), and recommendations for smart tourism strategies (Ho, 2022). For the case of tourism in Indonesia, its use is still limited, including predicting international tourist arrivals during the Covid-19 period (Andariesta & Wasesa, 2022) and estimating international tourists (Purnaningrum & Athoillah, 2021). The image of data mining through machine learning is shown in Fig. 1. The decision tree algorithm method from the Orange 3.3.0 software was applied in this study to determine the determinants of the progress of tourist villages. Meanwhile, to determine the tourism village development strategy in the Kedung Ombo area, using the multiple-criteria decision-making (MCDA) analysis method with the Promethee technique (Preference Ranking Organization Method for Enrichment Evaluation).
Source: Output Orange Software 3.3.0

Decision Tree
Decision tree is a designed method applying tree structures or decision hierarchy in a sector. Regarding (Anggarwal, 2015), a set of decisions in a hierarchy is displayed by this model, called decision tree forming tree structures with special variables. The results provide a great analysis of the generated predictions. The gain ratio value from the decision tree figures out which variable is the split classification. Decision implications from decision trees are frequently precise and simple to understand (Yuliawan et al., 2022;Witten et al., 2017). The ID3, C4.5, and CART criteria of the decision tree method are frequently utilized. The method iterative dichotomizer 3 (ID3) divides its features into two classes at each stage and has an iterative basic structure. By starting at the tree's root and producing potential decisions, this technique creates a classification in the shape of a decision tree. This is in line with Quinlan's (1992) description, who later created the C4.5 algorithm as a better version of the earlier technique. Breiman (2001) also created the CART (Classification and Regression Tree) decision tree technique, which splits a binary data set into two distinct categories. According to Anggarwal (2015), the CART method's computation procedure involves the following stages: -A group of points on the data set S. Assume that p belongs to the dominating class. 1-p is used to calculate the error rate.
The error rate of the split for the r-way from set S to sets S1. Sr can be described as the weighted average of the error rates of the various sets of Si, where Si is |Si|. The alternative separation with the lowest error rate is chosen. -The training data for S in the distribution of class p1...pk from the training data points in S is what makes up the Gini index G(S).

= 1 −
The overall Gini Index for the r-way split from the set S to the set S1…Sr can be quantified as a weighted average of the Gini Index values G(Si) of each Si, where the weight of Si is |Si|.
The split with the lowest Gini Index is selected from the alternatives. The CART algorithm uses the Gini Index as the split criterion.

Performance Test
Cross-validation is used to obtain the optimal validation and learning model using a performance test. In this approach, the best classification probability-giving method that may also be used to make predictions is evaluated. The test scores and prediction scores reflect these outcomes. The cross-validation sampling approach was used for the performance test since it was successful in preventing accidental impacts, particularly because of data constraints. Witten et al. (2017) also recommended this approach. The Area Under Curve (AUC) and the values of the confusion matrix serve as the foundation for measuring performance values as shown below.
(1) The Area Under Curve (AUC) describes how accurately the model can classify correctly visually. The accuracy of the ROC classification is done by visually calculating the area under the Receiver Operating Characteristics Curve (ROC) curve. A very good model has an AUC value close to 1. The accuracy of the predicted values was confirmed using the criteria developed by Gorunescu (2011) in Table 2.  Gorunescu, 2011 (2) Classification Accuracy (CA) calculates the prediction accuracy by dividing the total results by the predicted and actual values. Like AUC, the accuracy of the model's prediction increases as the CA value approaches 1.

Promethee
Promethee is a multi-criteria decision analysis (MCDA) method based on an outranking approach. The basic principle of outranking is that there is no single best alternative in a decision, but a set of alternatives that is "dominating" or "dominated". Domination occurs when an alternative performs better on at least one criterion and is not worse on another criterion, whereas dominance occurs when an alternative performs worse on at least one criterion and is not better than the other alternative (Hersh, 2006). Decision making using the Promethee method is based on relevant criteria that are weighted based on priority and taking into account the preferences of decision makers (Brans et al., 1986). The promethee method has been widely used in various fields, including determining renewable energy scenarios in the UK (Kolios et al., 2016).
Predicting the competitiveness index of travel and tourism in middle-east countries (Nazmfar et al., 2019), ranking tourism sustainability in Iran (Ghasemi et al., 2021), establishing a strategy for the development of rail lines in Italy (Bottero et al., 2019 ), to determine the model of poverty alleviation policy in Indonesia (Ariyani et al., 2016), assess the environmental impact of tourism (Tian et al., 2020), evaluate the attractiveness of tourism in Vietnam , and assess the index tourism competitiveness in Portugal (Lopes et al., 2018). Promethee was adopted in this study because it offers a simple concept and application, the results are stable and the interpretation is easy, especially related to the multidimensional phenomenon of tourism. After setting the criteria, it is important to define preference function P (a,b) for alternatives a and b in order to perform alternative ranking using the PROMETHEE method. The merits of options A and B are assessed using the following criteria: where f j is the criteria for alternative (a) and alternative (b).
In order to determine the best preference, Promethee takes into account both the indifference and preference thresholds. The biggest divergence that the decision-maker views as negligible is the indifference threshold. On the other hand, the preference threshold is the smallest variation that is thought to be adequate to establish a full preference. The preference setting is therefore defined as follows, assuming that the divergence of these preferences is between 0 (no preference) and 1 (very preferred): Alternative an is favoured above alternative b, and vice versa, as shown by the expressions π (a, b) and π (b, a). Preferences are typically divided into six categories inside Promethee, as follows: Usual Criteria, Quasi Criteria, Level Criteria, Linear Preference with Indifference, and Gaussian are some examples of linear preference criteria. Since Promethee is an MCA outranking approach, the outranking is accomplished by computing preference indices in the manner described below: where P (a, b) represents the preference between the previously mentioned options a and b. Following the calculation of the preference ranking, the Promethee will compute the outgoing ( ) and incoming ( ) flows as well as the difference between the two, or net flow, using the following formula.

Determinants of Tourism Village Progress
The results of the machine learning process with the decision tree method obtained an overview of the attributes/conditions that are predicted to determine the progress of the tourist village (Fig. 2). The process of determining using the decision tree method begins with the formation of roots (located at the very top), then the existing data is divided based on the appropriate attributes to become leaves that are connected and develop through branches (branches). The decision is determined based on the tree that has been formed and then the factors to be sought are determined by tracing from roots to leaves.

. Determinant Factors of Tourism Village Progress
From Fig. 2, it is known that government support is the most decisive factor for the progress of a tourist village, because it has the highest gaining value. Government support becomes the first branch to determine the other determinants with the following pattern: -If government support is strong, the next determining factor for village progress is infrastructure.
-If the infrastructure is complete, then the village progress is determined by the level of community participation activation. However, if the infrastructure is not complete, then it is determined by ICT. -If the community actively participates, the progress of the tourist village is determined by the management pattern.
On the other hand, if the community does not participate, the tourism village will not progress.
-If the management pattern is collaborative, the progress of the tourism village is determined by the partnership.
-If the partnership is multi-stakeholder, the progress of the tourist village is determined by tourist attractions, both mono attractions and varied attractions.
Based on the decision rules, the status of tourist villages in the Kedung Ombo area can be predicted. Table 3 shows the status of all villages in this area that are not yet developed. Then look for the factors that most determine the progress of the tourist village based on the parameters Info.Gain, Gain ratio, Gini and X2. From the results of the decision tree prediction, the order of attributes that most influence the progress of the tourist village is as follows (Table 4).  Table 4 shows that government support is the first factor that is predicted to determine the progress of a tourist village with the highest score on all parameters, followed by the implementation of ICT, completeness of infrastructure, community participation, partnerships, and attractions. While the management factor is less decisive, because its contribution is very small. The management pattern factor according to the decision tree pattern will only affect if the tourism village infrastructure is incomplete, but the application of ICT is good and the partnership pattern includes multiple parties (Fig.  2). Given that these findings are predictive in nature, it is necessary to know the accuracy of the prediction results. To determine the accuracy of predictions on the status of tourist villages in the Kedung Ombo area and the determinants of the progress of tourist villages, the values of AUC, CA, F1, Precision and Recall are as follows: In accordance with the classification of prediction accuracy values developed by Gorunescu (2011), this prediction is included in the excellent classification. Similarly, the use of attributes to predict the tourist village status, are community participation, variety of attractions, government support, partnerships, infrastructure, management style, and ICT is correct.

Tourism Village Development Strategy
This study proposes four alternative strategies toward village tourism i,e (1) digital tourism village development, (2) infrastructure strengthening, (3) institutional strengthening, and (4) social empowerment (capacity building). The infrastructure strengthening strategy emphasizes the development of infrastructure networks, tourism facilities, and accessibility in realizing advanced and sustainable tourism villages. This strategy is directed at increasing the quantity and quality of tourism village infrastructure. The basic infrastructure of rural tourism includes: (1) public infrastructure including roads, clean water, electricity, telecommunications facilities and internet networks, as well as environmental safety; and (2) rural tourism facilities include dining and restaurant facilities, gift shops, public toilets, prayer rooms, parking lots, photo spots and home stays.
Institutional strengthening strategy is a strategy directed at developing good tourism governance that involves all stakeholders. Tourism is an activity that is multi-sectoral and borderless (no administrative boundaries), therefore its development requires coordination between various parties and the integration of various policies. Good governance will regulate the main tasks and functions of each party and establish coordination and formal relationships that benefit all parties.
Capacity building is a strategy that focuses on increasing the capacity of the community to have the ability to manage and develop tourism destinations professionally. This strategy is carried out through a series of programs aimed at improving tourism services, increasing community entrepreneurial abilities, and increasing the habituation of local communities to cultural differences, behaviour, and attitudes of tourists.
The development of digital tourism villages is a strategy directed at implementing digital technology in the management and development of tourist villages. This strategy has been adopted by advanced tourist villages in line with efforts to meet the wishes of tourists who need accurate information and fast service. Digitalization will help optimize service processes internally and externally, work automation, and cost efficiency so that the management of tourist villages is more efficient and effective. Digital transformation must be carried out simultaneously between technology and the readiness of people/humans.
The determination of the four alternative strategies is based on the existing conditions of the tourist village and the possible demands from the environment and the needs of tourists in the future. The criteria used to evaluate the strategy are the determinant factors that have been found in the decision tree process where the value of Info.Gain is used as the weight of the criteria. Based on the Promethee concept, the tourism data profile in the Kedung Ombo area (Table 2) is assessed on a Likert scale of 1-5. Based on the ranking results from the Promethee software, it is known that the right strategy to be applied in the Kedung Ombo tourist village is as follows. Strategy   Fig. 3 shows the results of the ranking of the analyzed alternative strategies. In the partial ranking (left image), the left bar shows the ranking based on the most strengths of the alternative strategies (outflow (Φ+), while the right bar shows the ranking based on the least weaknesses (incoming flow (Φ-). The partial ranking relationship (Promethee I) is formed.From Figure 3, both the analysis on outflow (Φ+) and entering flow (Φ-) shows that the infrastructure strengthening strategy is at the top. From this finding, the infrastructure strengthening strategy dominates the other strategies. followed by a strategy of capacity building, strengthening institutions and developing digital tourism villages. The ranking results are reinforced by the Promethee II illustration which shows the net performance value (difference between the outflow value and inflow value) for each strategy, where the infrastructure strengthening strategy has the highest score of 0.3038, followed by capacity building of 0.1413, institutional strengthening 0.0124 and village development digital travel -0.4575. In this finding, the digital tourism village development strategy shows a negative value, this indicates that this strategy has not been properly implemented in tourist villages in the Kedung Ombo area. The findings of the strategy ranking are influenced by the existing conditions of the research object on each factor as shown in Fig. 4. This figure shows that the infrastructure strategy (top right image) has the most advantages, indicated by factors that are above the horizontal line, followed by a capacity building strategy, and institutional strengthening. Meanwhile, the digital tourism village strategy shows that all factors are below the horizontal line, which means that there are no supporting factors to implement the strategy.

Fig. 4. Contribution of Each Attribute to Strategy
The finding of this strategy explains the importance of adding and developing infrastructure in tourist villages in Kedung Ombo. Infrastructure is a factor that determines the quality of tourism village development related to accessibility and telecommunication networks. Adequate infrastructure will become a locomotive for tourist arrivals and improve the quality of life and welfare of the local community through increasing consumption values, labor productivity, and access to job opportunities. This strategy is needed considering the location of the Kedung Ombo area which is quite far from the reach of the main road with poor road conditions, as well as limited electricity and ICT facilities. Technically, the strengthening of infrastructure in the Kedung Ombo area includes road access, signposts, lighting and internet networks as tourism infrastructure. The facilities that must be equipped are eating/restaurant facilities, public toilets, prayer rooms, souvenir stalls, homestays and other facilities needed by visitors. The capacity building strategy will improve the ability of managers and the community in managing tourism villages. This strategy can be carried out through training for managers and the community including training on tourism services, digital marketing, and entrepreneurship. Training can be carried out by means of visits to developed tourist villages, or through on the job training in collaboration with educational institutions. develop cooperation with educational institutions. Institutional strengthening strategies are important in this area because many parties are involved in both licensing and development. These parties are reservoir management institutions, forest management institutions, local governments and the private sector. Strengthening coordination and regulations involving these parties will greatly help the progress of tourist villages.

Fig. 5. Tourism Village Development Strategy Network
The digital tourism village strategy is not unimportant to be implemented in the Kedung Ombo area, because as shown in Fig. 5 regarding the relationship between strategies, it can be seen that all strategies will lead to village readiness to develop digital tourism villages. The Promethee method is a decision-making method using weights on evaluation criteria that reflect the assessment priority of the decision maker. Weighting like this allows the difference in weighting between one decision maker to another, so that it seems less consistent. In addition, the existence of bounded rationality and incomplete information in allocating weights can cause the ranking results to become less stable. To test the robustness (robustness) of the ranking results of the Promethee Algorithm, it is necessary to perform an interval stability test (or sensitivity test), to determine the effect of variations in the weight of the criteria on the final ranking results. The ranking results are said to have good robustness (robustness) if the weight variation does not cause the ranking order to change (Schwartz & Gothner, 2009). To test the results of the ranking of tourism village development strategies which are the focus of this research study, weight manipulation is carried out using absolute weight scenarios. The results of this analysis show that of the seven criteria, only infrastructure does not change the order of choice for the strategy, namely: strengthening infrastructure, capacity building, strengthening institutions and digital tourism villages even when the infrastructure weight is changed from 16% at baseline to 100% (Figure 6). Similarly management changed from 7% to 100%. Meanwhile, for attraction, this criterion is relatively sensitive to changing strategies where when the weight of the strategy is increased by more than 10%, the capacity building strategy is the best strategy choice. So that the ranking results are robust.  Fig. 7 presents Promethee GAIA which represents projection of Principal Component Analysis related to criteria and strategies. The blue squares represent the options while the dark blue diamonds represent the criteria. The Red bar indicates the good and bad directions related to both strategies and criteria. As we move along the northeast corner, the strategies are the best strategies related to criteria. For example, management, partnership and attraction are best suited for capacity building strategy, while infrastructure and management are best suited for strategies related to strengthening infrastructure institutions and digital village.

Conclusions
The rural tourist value development is facing complex and challenging issues. Hence, a paradigm shift in determining the right rural tourism policy is needed. The use of machine learning will be critical to help develop science-based and datadriven policies in the future to be more objective. While the decision-making method using an outranking approach is the right method to facilitate the situation and potential of various tourist villages.
In this study, the Decision Tree method is the correct predictive method to describe the profile of tourist villages and the determinant factors of progress. Similarly, the use of attributes of community participation, variety of attractions, government support, partnerships, infrastructure, management, and ICT is appropriate as the determinant factor of progressive tourist village. This study finds government support is the main factor that is predicted to determine the progress of tourist villages. In addition, good application of ICT, complete infrastructure, community participation, partnerships that reach multi-stakeholders must also be developed. Similarly, various tourist attractions are other essential factors that must be developed in integrated ways because they determine the tourist village development.
From Promethee's analysis, it is known that the right strategies to be applied to develop tourist villages in the Kedung Ombo area are infrastructure strengthening, capacity building, institutional strengthening and tourism village development. Through the implementation of these four strategies in an integrated manner, it will be able to optimize efforts to develop a tourist village into an advanced tourist village. Complete infrastructure support, accompanied by qualified community capabilities in managing tourism businesses, strong institutions with strengthening partnerships and networks, as well as digital implementation in services and promotions will lead tourist villages to become tourist villages in accordance with tourist demands and technological advances.