A Survey of Data Mining Techniques for Smart Museum Applications

A Survey of Data Mining ... | Puspasari, S., Ermatita, 33 – 42 33 A Survey of Data Mining Techniques for Smart Museum Applications Shinta Puspasari1, Ermatita2* Department of Informatics, Universitas Indo Global Mandiri, Palembang, Indonesia Department of Computer Science, Universitas Sriwijaya, Palembang, Indonesia 1shinta@uigm.ac.id, 2*corr-author: ermatita@unsri.ac.id  Abstract This research aims to find out what data mining techniques are effectively implemented in museums and what application trends are currently being used to improve museum performance towards modern museums based on intelligent system technology. The review was carried out on a number of articles found in journals and proceedings in the 2004-2020 period. It is found that the majority of data mining techniques are implemented in museum virtual guide applications, recommender systems, collection clustering and classification system, and visitor behaviour prediction application. Data classification, clustering, and prediction technique commonly used for museum application. Collections with historical and artistic value contain a lot of knowledge making data mining an important technique to be included in various applications in museums so that they can have an impact on the achievement of museum goals not only in the fields of education and culture but also economics and business.


I. INTRODUCTION
The museum stores a collection of objects of historical, artistic and cultural value. The museum exists to preserve the culture and history that existed in the past until its existence known by the future generation. The museum collection is an attraction for visitors to come to learn about it. The museum collection is valuable data that attracts visitors to come. Museum collection data can be in the form of document collections of pictures, text, objects, etc. contain information for visitors [1]. If the information presented to visitors is monotonous and not in line with expectations, visitors will not be interested in exploring the museum. Will not even recommend to their colleagues to come to or revisit the museum [2]. Therefore, modern museums are currently competing to change the business strategy to improve visitors experience [3] by adding a touch of information and communication technology in the digital era [4] one of which is by implementing data mining techniques for smart museum application.
Data mining is a technique that aims to extract important information used for specific purposes [5]. For museum applications, data mining extracts information from data held by museums for various purposes improving its performance including to explore visitors' interest in museum collections [6] which can be used to provide recommendations for visitors about museum exhibition, predicting the level of museum visits in a certain period so as to prepare for events that can increase visitor interest [7][8][9]. Using an artificial intelligence approach to implement interactive applications to collections [10] or augmented reality that allows visitors to interact virtually with collections so that it seems like they are in the past where the collection is located, as well as interesting games so that visitors do not get bored while in the museum [11]. Modern museums become an interesting place to study as well as travel [12], one of which is by utilizing applications based on data mining techniques [13].
In this paper, we will discuss a number of data mining techniques that have been implemented in museums with various application purposes. Based on the results of the review, it will be known what are data mining techniques for extracting information in museums to be utilized by museum management in museum management and served to museum visitors so that they can optimize the performance of the museum in terms of education, economic and business.
The rest of this paper is organized as follows: Section 2 provides a brief literature review; Section 3 discusses the methodology that was followed in this study; Section 4 presents the findings of the implementation of data mining technique for museum application; and finally Section 5 concludes the paper.

A. Review of Data Mining Techniques
Data mining carried out to the various data set formats with general process stages as shown in Fig. 1 [14]. The data set stored in the data warehouse will undergo a selection and preprocessing process at the beginning of the data mining stage in order to obtain information or knowledge extraction with minimum bias. The techniques used at each stage in the data mining process adjusted to the data format in the data warehouse.
In museums, collections are not only in the form of image documents but can also be in the form of text, spatial, and video, as well as various other data related to visitors and museum management, thus forming big data [15][16]. For that, data mining techniques need to be included in smart museum application accordance with the data to be extracted and the purpose of extracting information to be carried out [17]. For example, to predict visitor interest or classify museum collections. Both have different targets even though they use the same dataset. As a result, various data mining techniques have emerged that effectively implemented for various museum applications.

1) Association Rule:
As the most mature and wide use technique in data mining, association rule reflects the correlation among data in dataset [18]. The association rule generates the association rules in dataset to find a minimum support and minimum confidence where the confidence and support of association rule represents the rule's effectiveness and reliability. The process contains two steps, finding the item with all the transaction supports greater than the minimum support and generating strong association rules based on frequent item sets finding that affect the performance of association rule [19]. Finding the best association is high cost process to meet the requirement in the dataset. Optimization on association rule has been proposed to improve its performance [19][20][21].
2) Decision Tree: This data mining technique has a tree-like structure where the top is a root and each leaf has an attribute value to be analysed while each branch represents the possible output. The structure is simple but reliable for processing multi-dimensional data [22] widely used for prediction and classification [23][24][25]. The challenge faced when using this method is the repetition process of optimum tree structure analysis based on certain conditions affecting application performance.

3) K-Nearest Neighbour:
This technique looks for the closest distance between one data and a centroid so that it is included in a class. Euclidean, as in (1), or other distance measurements used as the basis for determining data classification. This method is widely applied for data classification and pattern recognition [26]. KNN is a high cost technique for have to find all the distances between data [27] and K the centroid to find out the closest distance that represents the similarity of the data to a certain group.
where d(x,y) is the Euclidean distance between x and y.

4) Hierarchical Clustering:
The data grouped using a hierarchical approach to form a tree-like structure where each level of the tree will show a different group. One cluster can be decided to be separated or merged at a certain level based on the similarity criteria between the clusters are met. The agglomerative hierarchical clustering approach begins with separate data input which are grouped gradually to form a single group (bottom-up process) [28][29] while the divisive hierarchical clustering approach should be done in a top-down manner starting from one cluster which is separated into smallest element of the data set. Hierarchical clustering hybrid with other data mining methods for input optimization purposes [30].

5) Fuzzy
Classifier: Fuzzy Classifier overcomes constraints on linguistic variables. By applying fuzzy logic this method is able to model uncertainty so as to form sufficient boundaries to separate data groups [31]. However, the constraints that arise due to the application of learning algorithms together with fuzzy logic cause high computation to become a challenge in the era of big data and industrial applications that demand accurate real time analysis results [32].

6) Artificial Neural Network: Artificial Neural
Network (ANN) adopts biological neural workings that process input into output targets based on the learning process. Neural network architecture consists of a set of nodes located in connected layers and send signals to each other by various learning methods and algorithms. ANN has been proven its effectiveness for classification, pattern recognition, prediction in various other applications [33][34][35].

7) Support Vector Machine: Support Vector
Machine (SVM) is a supervised learning method that is widely used for pattern recognition, classification and prediction of various fields of application, both text and image data [36]. SVM used for linear and non-linear data input transformed by certain methods and then look for a hypersurface of the input data set that is able to separate the data into two groups. SVM is effectively processing high dimensional space data with less time, for example in the text analysis process [37].

B. Research Steps
This study is a systemic literature review that provide evidence as the result of study [38]. The review tried to answer questions about data mining techniques applied to museum applications. Traditional museums that transformed into modern by adopting information and communication technology in museum operations combine with the application of intelligent systems promise a more interesting experience for visitors in exploring museums and improving museum performance [4]. This paper begins with a research question that must be clear, concise, focused, significant, feasible, and answer able so that each stage formulated for answering the research problems. The research stage begins with the formulation of the objective of reviewing and developing the dataset, then proceeding with a review process with a quantitative approach.

1) Research Questions: Q1:
What are the data mining techniques for museum applications? Q2: What is the latest trend in museum applications implementing data mining techniques?
2) Development of Dataset: In this study, a data set was build containing a collection of articles on the search results of two data warehouses. The stages of developing the data set are as follows: • Keywords Identification: Literature search on the database using inaccurate keywords will result in a very large data set so that it requires more energy to select the data obtained. The keywords in this study use the words "DATA MINING TECHNIQUE" AND "MUSEUM". The search method based on keywords was adjusted according to the search engine of each database used. To review the current state of research regarding data mining techniques related to the implementation of museum applications, a bibliometric mapping analysis was carried out with the VosViewer [39] tool. Based on the results of data processing, a visualization overlay is obtained as shown in Fig. 2. The bigger the circle the more publications regarding the term [40].
The word MUSEUM has a relationship with TERM DATA MINING TECHNIQE with the density seen from the color distribution and the size of the circle that represents the weight of the occurrence of words in the data set related to the above keyword. "DATA MINING" is related to the word "APPLICATION" and "SYSTEM" where the three of them are directly related to the word "MUSEUM". These results indicate that there are previous studies related to the application of data mining techniques in museums but with a small publication density compared to publications relating to museums in general.
This research will carry out a deeper search to get the facts from the results of the analysis based on the results of the VosViewer processing. • Database selection: In this study, Ieeexplore and Researchgate database were used with the consideration that Ieeexplore as one of the databases contains a collection of trusted and up-to-date publications while Researchgate contains a larger collection of free access for finding publication results relevant to the research objectives. Constraints arise when having to select the articles that were found one by one because the total number of retrieved documents is unknown and there are many publications that are not relevant for the purpose of the study. So for the research gate article from the top 500 results used as the initial data set. Meanwhile, Ieeexplore managed to find 87 documents relevant to the same given query "DATA MINING MUSEUM". A total of 587 articles formed the data set to then undergo the stages shown in Fig 3 and the review process.
• Articles Selection. At this stage, an assessment of the search result articles in both databases was carried out by conducting a quick review process of the titles and abstracts of the articles in the period of year 2004-2020 whether they were appropriate to this research question or not. The results of this stage are a dataset containing articles that are ready to be reviewed to answer research questions. The article selection process is carried out by taking a quantitative approach to the articles in the dataset that have been built according to the PRISMA used by [41] with steps such as Fig. 3.

Fig. 3 Data set development phases for articles review
III. RESULTS AND DISCUSSION This study conducted a review of the extracted dataset from the Ieeexplore and Researchgate databases with the keyword "Data Mining Museum" as many as 17 articles were included for analysis on data mining techniques for museum applications and answered research questions Q1 and Q2 in this study.

A. Trend of Museum Applications
The museum is an institution that stores a collection of ancient objects of historical and cultural value. The mindset that museums must remain old-fashioned in order to maintain that identity is going to be abandoned along with the development of information and communication technology in digital era. The development of data mining applications in museums can be seen from the publications reviewed in this study in the period of year 2004-2020.

1) Distribution of Publication Years:
A total of 17 articles in this research data set consisted of conference articles and journals in the period of year 2004-2020 with a portion of 24% of reputable international journals and 76% of indexed international conference proceedings. The distribution of publication years was grouped into two period, namely 2004-2010 and 2011-2020 with the percentage as shown in Fig. 4. Based on chart in Fig 4 it is known that the trend of developing museum applications with data mining has increased rapidly since 2010. It is also known that 71% of scientific publications published in the last 10 years. This fact shows that data mining has technical improvement for museum applications, especially in the last 3 (three) years seen from articles published in journals in 2018.

2) Type of Dataset:
The museum maintains a collection of historical objects including tools, paintings, books, photos which when they can be digitized in image, video, sound, and text formats. The results of literature review in this study show that the classification and catering process is carried out on digital museum collections and even creating a virtual museum for museum exploration by implementing data mining for text analysis for recommendations and prediction of visitor behaviour 32% using image data, 68% alphanumeric text data. Data in museums has even developed into big data for various analysis purposes as the virtual museum continues to develop following the development of the digital era [42].

3) Objective of the research study:
The results of the analysis of the articles in the data set show that the application of data mining techniques in museums can be grouped based on the purpose of the application, namely virtual guide museums, collection classification and clustering system, recommender systems, and predictions of museum visitor behaviour. Interestingly, it is known that in the first period of publication of articles containing virtual guide applications, collection classification/ clustering, and visitor behaviour prediction, meanwhile the development of recommender system started in the second period. The distribution of the number of articles for the four categories of data mining applications in the museum is illustrated in Fig. 5, while the distribution per period for each category is shown in Fig 6. These facts founding answer Q2 research question, the trend of museum applications implementing data mining techniques. Recommender system and application for prediction visitor behaviour are the current trend of museum application. Most of them are used to mine the knowledge used for visitors to attract their engagement with museum and absolutely will affect museum performance.

4) Data Mining Techniques:
The research has succeeded in finding that the data mining techniques implemented for museum applications are Association Rule, Neural Network, Fuzzy Logic, Support Vector Machine, Hierarchical Clustering, K-Means, Nearest Neighbour, and some use a Hybrid approach of a number of data mining methods and other methods. Research question Q1 answered based on detail information in Table I.

B. Challenge in Implementing Data Mining for Museum Applications
The current trend of implementing data mining techniques in museums is the recommender system and visitor behaviour prediction application where 64% of articles during 2016-2020 period raised both of them as research topics. The purpose of implementing data mining in both application categories are to increase visitor interest in the museum, helping visitors to explore the museum exhibition easier, and also to manage the museum with regard to the arrangement of showrooms and museum collections that are in accordance with visitor interests, so that the museum is not only historical and cultural valuable but also economical and business.
The variety of visitor behaviour with various characters is a challenge for the museum to study for optimizing the role of the museum in the digital era. The rapid development of the internet today has demanded data mining techniques to be able to effectively processing big data, especially in museums with a high number of visits so that the extracted information can be used for modern museum management. Even though with the same collection, it can still attract visitors to revisit so that it will result in the formation of a big data set that can be optimized for its use by implementing information-based management strategies from the extraction of data mining applications [43].

IV. CONCLUSION
This research is a literature study aims to find what data mining techniques are effectively implemented for museum applications and what application trends are built based on data mining currently used by various museums that have been published in articles and journals. Data collection from Ieeexplore and Researchgate databases with a selection process to obtain a data set. The dataset development process finally includes 17 articles of journal and proceeding published the period 2004-2020. The results of the review process on the dataset show that the commonly used data mining are the association rule, techniques for classification, clustering, including K-Means, K-Nearest Neighbour, Hierarchical, Neural Network, SVM, which are applied to the virtual guide system, recommender, collection classification/ clustering, and prediction of museum visitor behaviour. Recommender system and application for predicting visitor behaviour are the latest trend of museum application implementing data mining technique. The results of the research in this article indicate the application of data mining techniques effective in extracting information in the form of knowledge that is presented in a form suitable for the application's purpose in a museum and has increasing trend of implementation in museum application during last 3 years. The challenge in the future work is examining the effectiveness of data mining techniques for big data in museums considering the large number of museum collections and visitors with various cultural and character backgrounds will produce fast growing data set to be analysed and to produce information in the form of knowledge that can be used by modern museums.