USING SIMILARITY DEGREES TO IMPROVE FUZZY MINING ASSOCIATION RULE BASED MODEL FOR ANALYZING IT ENTREPRENEURIAL TENDENCY

: Higher education has great potential in producing new startups in the IT (Information Technology) field. Many choices influence students to become IT-entrepreneurs. Association Rule can be used to obtain a model by analysing data so that it can be used to make a rule to the IT entrepreneurship-student model, but the association algorithm has disadvantages in handling large datasets. We propose reducing candidate itemsets using degrees of fuzzy similarity. The membership function in fuzzy sets can be used to measure the quality of rules obtained. The purpose of this study is to improve the algorithm by evaluating the similarity of candidate itemsets to get a good quality rule. This research method has 2 phases, namely (1) calculating the membership function with similarity itemset and (2) applying fuzzy mining association rule. Phase 1 has several steps, including: preparation of a transaction database, the taxonomy process, and identification of similar itemset. Phase 2 has several steps as well. The first is defining membership functions, and the last is a fuzzy mining fuzzy association rule. In this study, a questionnaire was distributed to 1225 students who were members of the IT entrepreneurship program. The results of this study were reduced into 823 itemsets and produced an IT entrepreneurship rule model.

ABSTRACT: Higher education has great potential in producing new startups in the IT (Information Technology) field.Many choices influence students to become ITentrepreneurs.Association Rule can be used to obtain a model by analysing data so that it can be used to make a rule to the IT entrepreneurship-student model, but the association algorithm has disadvantages in handling large datasets.We propose reducing candidate itemsets using degrees of fuzzy similarity.The membership function in fuzzy sets can be used to measure the quality of rules obtained.The purpose of this study is to improve the algorithm by evaluating the similarity of candidate itemsets to get a good quality rule.This research method has 2 phases, namely (1) calculating the membership function with similarity itemset and (2) applying fuzzy mining association rule.Phase 1 has several steps, including: preparation of a transaction database, the taxonomy process, and identification of similar itemset.Phase 2 has several steps as well.The first is defining membership functions, and the last is a fuzzy mining fuzzy association rule.In this study, a questionnaire was distributed to 1225 students who were members of the IT entrepreneurship program.The results of this study were reduced into 823 itemsets and produced an IT entrepreneurship rule model.

INTRODUCTION
Data mining technology is widely used to analyse large amounts of data, including classification, clustering, and association.The association algorithm is a classic association algorithm and widely used.Classic association algorithms look for relationship patterns between one or more itemsets in the dataset.This algorithm is famous for finding highfrequency patterns.This frequent itemset pattern is used to compile associative rules [1].Frequent itemsets play an important role in determining the pattern of associations.A frequent itemset is able to detect problems to determine patterns in the dataset [2].The main key to the association algorithm is in the iteration stage in the database.Each iteration produces a frequency pattern to get the support value of each item.The association algorithm can reduce the number of candidates whose support must be calculated by removing unnecessary objects and attributes to get minimum attributes, and this algorithm has a better performance than the classic association algorithm [3].
Research on improvement of association algorithms has been carried out, all aimed at improving algorithm performance, i.e. to reduce the calculation time of candidate items by duplicating the frequent itemset to build the next candidate item [4].Some paths are used to improve the association algorithm [5][6][7][8]: (1) Reduction of transactions, transactions that do not contain the itemset are often deleted from the itemset candidates, (2) hashing tables to calculate the occurrence of set items, (3) partitioning each itemset often must be in one partition.Fuzzy theory can be used to improve the association algorithm.The basic idea comes from dividing quantitative values into a crisp set that will weaken or overestimate the specified limit.Fuzzy sets can solve problems by allowing membership to be in different sets.Reference [9] applied the fuzzy association rule mining algorithm to find associated relationship information.Mining fuzzy rules may lead to the discovery of more general and important knowledge from the data [10][11][12].
The focus of this study is the reduction of data by measuring the similarity of items in the transaction.This measurement of similarity uses the average distance between items.Distance measurement for similarity is used for decision making based on certain criteria.Therefore the measure of similarity must be able to distinguish similarities from fuzzy sets so as not to overlap [13,14].After the process of measuring similarity, is used to reduce the dataset.The contain members whose membership value is not less than value.The selection of will have an effect on the results.The selection of the appropriate is based on the previous experience of the decision maker and/or the case under study [15][16][17] Association algorithms are used to analyse data to obtain a model, but association algorithms have disadvantages in handling large datasets.We propose reducing candidate set items using degrees of fuzzy similarity.The focus of this study is the reduction of datasets by measuring the similarity of items based on distance in the dataset with Fuzzy similarity degrees.

METHOD
There are 2 phases in this research, namely the phase of seeking membership functions, then the phase of the fuzzy mining association rule.The first phase consists of (1) transaction database, (2) taxonomy process, (3) identification of similarity items, the second phase consists of (4) defining membership functions (MFs), ( 5) Fuzzy mining and (5) fuzzy association rule.Figure 1 describes the research method.

Transaction Database
In this process, the activities carried out include collecting and using historical data to find order, patterns, or relationships in the dataset.The dataset used in this study were 1225 itemsets.The dataset used for this research came from questionnaire survey data of students who took part in entrepreneurship programs at Muria Kudus University.In Table 1, the values T1, T2, T3, ..., T1225 are the IDs of students participating in the program, students are given a survey to choose the knowledge that will be used to pioneer into independent IT entrepreneurs.These choices are Mobile Development (MD), Software Engineering (SE), Database (D), Web Development (WD), Artificial Intelligence (AI), Robotics (R), Mechanics (M), Electric (E), e-marketing (EM), Graphics Design (DG), Photography (P), e-learning (EL), e-business (EB).

Taxonomy Process
Relevant taxonomy items are usually predetermined and can be represented as tree hierarchies.Figure 2 is a picture of the IT entrepreneurship taxonomy.This taxonomy was built based on a combination of computer science taxonomy [18][19][20][21] and the concept of https://doi.org/10.31436/iiumej.v20i2.1096entrepreneurship [22,23].In this Taxonomy, engineering students can see the opportunity to use IT and engineering to be more innovative, be able to share knowledge, and utilize technology to develop their businesses.Fig. 2: Taxonomy of IT-Entrepreneurship.

Similarity Identification
The Euclidean distance formula (1) is used to measure the distance between items.Euclidean Distance is the most common use of distance.Euclidean distance examines the root of square differences between coordinates of a pair objects.If ) and ) then the distance is given by: .(1) Finding distance between several points can be reached by the average formula (2).If distance and sample then the distance is given by ( For example, to calculate the similarity item in Table 2, T1 is measured against cases MD, AI, and EB.The measurement of the distance of MD to AI, MD to EB, AI to the EB is carried out using the Euclidean formula.The third result of the distance is then calculated on average.

Fuzzy Membership Function
Once the similarity between elements is known, the next step is to calculate the degree of fuzzy membership in similarity.A fuzzy set is a set containing elements that have varying degrees of membership in the set.Fuzziness is characterized by its membership function.The membership function is a curve that describes the mapping from input to membership level between 0 and 1.Through the membership function that has been compiled, the input values become fuzzy information that is useful later for fuzzy processing.The membership function , is the usual fuzzy set A defined as: ( where [0,1] shows real value intervals from 0 to 1.While notations are the elements of fuzzy set A, notation is the value of the degree of membership in A. A is usually represented as follows: (4) The membership function used is a triangle curve and a linear curve.Membership in fuzzy sets has different curve shapes [24].Fuzzy sets can also be defined by assigning a continuous function to describe the membership either analytically or graphically.Some commonly used membership functions are shown in Fig. 3.The triangular membership function in Fig. 4 be expressed as: (5) In the linear curve, the fuzzy set consists of 2 states, namely linear up and linear down.For the linear down, the straight line starts from the value of the domain with the highest degree of membership on the left side, then moves down to the value of the domain that has a lower degree of membership with the membership function: (6) In linear rise, the increase in the set starts at the domain value that has zero membership degree [0] moves to the right towards the value of the domain that has a higher degree of membership with the membership function: (7)

Fuzzy Mining
Mining is a technique for extracting process models from an execution log.Fuzzy mining can be applied to very large and unstructured logs.[24,25] In this stage, data on fuzzy distance similarity is collected.In Table 2, the similarity item represents near, middle, and far similarity.

Fuzzy Association Rule
Fuzzy logic is used as data mining which involves calculations based on predictions and clustering.Fuzzy logic data mining algorithms are not only able to analyse data but also to develop accurate results that are easy to implement.Threshold value is a domain threshold value based on the membership value for each domain, where α-cut has 2 conditions: weak α-cut can be expressed as: , strong can be stated as: μ (x) <α.
In this paper the Fuzzy Association Rule algorithm is: https://doi.org/10.31436/iiumej.v20i2.1096 1.The main candidates are generated based on similarity 2. Fuzzy operations are carried out to obtain fuzzy mining 3. Reduction of candidates with 4. Fuzzy rules are built using fuzzy support and confidence.

Processing Data
The object of the study was 1225 engineering faculty students consisting of 560 women and 665 men, aged 19-22 years.Students are required to give a value between 1-9 on the questionnaire items used in entrepreneurship.Students are asked to write material that is mastered and provide an assessment of their interests, level of difficulty, and material needs on the market.Questionnaire results were then averaged.The students have participated in an entrepreneurship program held by the university for 1 year.The questionnaire results are averaged as shown in Table 4.

Similarities by Calculating Distance
In this step, each data is calculated by the Euclidean distance formula.

Degree of Membership
From the calculation of distance similarity, the value of membership is sought with fuzzy theory.Distance variables are divided into three sets, namely near, middle, and far.Table 6 is the result of calculating fuzzy values, representing the itemset similarity.

Fuzzy Mining Process
After the process of calculating the degree of membership from the distance of similarity.the next process performs an intersection operation such as step 2.6.Distance is represented in terms of near, middle and far.In this case the intersection operation is near and middle.Far term is not used, because the distance between items is far apart.the result of fuzzy operations on Fuzzy Mining.Fuzzy intersection operations are used to find the minimum value between itemset membership degrees.

Fuzzy Association Rule
Before running this algorithm, the minimum support parameter or the percentage of occurrence of a rule must be set first.Minimum confidence or percentage of occurrence of rules for the appearance of the antecedent must also be set.The value depends on the researcher.Rules are presented in antecedent consequent.

First Implementation
In the first experiment, all of the 1225 itemsets were used, after several tests, the values of support = 0.5 and min confidence = 0.4 were obtained.The rules discovered where : In Fig. 4, three rules have an antecedent Software Engineering and consequent Robotic.They are associations (think correlation) not necessarily causally related, so be very careful about interpreting association rules. https://doi.org/10.31436/iiumej.v20i2.1096

Second Implementation
This experiment determined a strong alpha α-cut ≥0.3, after several test, the value of min support was 0.6 and min confidence = 0.5.From the initial 1225 itemsets, it was reduced to 823 itemsets with an α-cut.In this study, we focused on reducing the number of items included in the a priori algorithm using α-cut.The rules discovered were: Fig. 5: Results of the fuzzy mining association rule algorithm.
In Fig. 5 that algorithm presented 5 rules learned from the dataset.The number associated with the antecedent is absolute coverage in the dataset (in this case a number out of a possible total of 823).The number next to consequent is the absolute number of instance that matches the antecedent and the consequent.In this case, no rule has confidence (number of antecedent divided by number matching consequent) less than 0.96.

CONCLUSION
Knowledge extraction in databases is a way of extracting knowledge in the form of rules.This rule expresses association relationships between various data, certain data items associated with other data items.In this study Euclidean distance is used to measure the proximity or similarity itemset.Fuzzy membership functions are used to obtain fuzzy mining.Fuzzy logic operation was performed to reduce the size of the dataset, with α-cut.The results of this study are the reduction of the number of dataset from 1225 itemsets to 823 itemsets and to produce an IT entrepreneurship rule model.

Fig. 4 :
Fig. 4: Results of the fuzzy mining association rule algorithm.

Table 1 :
The dataset of participant

Table 2 :
Fuzzy mining of itemsets

Table 4 :
Results in the average value of questionnaire

Table 5 :
Table 5 describes the results of calculating the similarity of distance between data.://doi.org/10.31436/iiumej.v20i2.1096Results of calculation of the similarity of distance between data https

Table 6 :
Results of calculation of fuzzy values

Table 7 :
The result of fuzzy operations on Fuzzy Mining