BASIC CONCEPT PYTHAGORAS TREE FOR CONSTRUCT DATA VISUALIZATION ON DECISION TREE LEARNING

Decision Tree in Data Mining frequently used to learn the pattern by interpreting data. A hierarchy of tree model in Decision Tree as data visualization which often used makes fully load space. Another option in using model is Phytagoras Tree. Pythagoras Tree in this study is the basic concept of Pythagorean Theorem that used by a binary hierarchy with a fractal technique which the shape using the square as branches enclose a right triangle. A fractal of Pythagoras Tree is the dataset which split the subsets into trunks and leaves. Construct a fractal of Pythagoras Tree depends on the angle θ for build branches followed by square area. Pythagoras Tree model is an easy way to understanding the dataset based on the size of the square. The smaller the size, the fewer instances in the rectangle. Also, data associations easily traced when ﬁ lled with color.


INTRODUCTION
Data Mining is useful in learning pattern big data to understand the fl ow of the problem for the base of further decision.The techniques of the pattern as a data mining task for analyzing are descriptive, predictive, and prescriptive.One of the models which often used for predictive in Data Mining is Decision Tree.It serves by predicting attributes used as branches based on the target as part of the leaves.Decision Tree used for covering classifi cation and regression with visualization for interpretation data in decision analysis.The Decision Tree Learning represented a tree model which makes the rule of IF-THEN or Association.A tree model has a role as a visualization that describes how the dataset is branched.Therefore, branches making it easier to trace interconnected data until dataset become leaves which it can not split again.Based on the study of [1], a tree model with hierarchies which often used makes run out of space to visualization.The model becomes full and complex when branches are going to depth.Therefore, the alternative option that used is Pythagoras Tree, which based on Pythagorean Theorem.Pythagoras Tree of the fi rst introduction by [2] that a tree model used a binary hierarchy with a fractal technique which the shape using the square as branches enclose a right triangle.A fractal of Pythagoras Tree is the dataset which split the subsets into trunks and leaves.A fractal becomes recursive until all dataset has been separate.Therefore, we observe the Pythagoras Tree in Decision Tree Learning based on Pythagorean Theorem.Pythagoras Theorem had over 371 proofs by discoverers [3] which several visualizations can use to build the tree based on the dataset, such as in the study of [4] using the central square theory [5] form branches.Branches itself is using Geometrical Pythagoras' and Plato's visualizations by construct data.Whereas the concept built tree in this study is to construct data visualization based on Pythagoras' Geometric with using a right-angled triangle [6], and Algebra [7] which square made with four right triangles.Both Pythagoras Theorem Proofs applied on Decision Tree Training.The Geometric is used to separate dataset and determine an angle in a right-angled triangle.The function is to make a tree model.Also, the algebraic used for measuring the number of datasets in a square.Therefore, the data visualization depended from the square's size and an angle in a right-angled triangle, until the result is a fractal of Pythagoras Tree.Determined Phytagoras Tree uses the Decision Tree Algorithm, which has several methods that used as an experiment: ID3 [8], CART [9], and C4.5 [10].In this study, we used ID3 with Standard Deviation Reduction (SDR) for discovering regression which the dataset is a numerical type.SDR used as branches separation determination for construct Pythagorean Tree as data visualization.

METHODOLOGY Pythagorean theorem
Similar to the name, Pythagoras who discovered the right triangle which has two legs and one hypotenuse.Two legs in right triangle are Opposite and Adjacent.The opposite is across to the angle θ, Adjacent is next to the angle θ, and the hypotenuse has the longest distance, which is the sum of the opposite and adjacent.Also, the Original Scientifi c Paper value of hypotenuse always the same when the angle θ changes, because of the side of the opposite has a 90-degree angle.The equation of edge for a right triangle is: (1) Pythagoras Theorem represented in terms of area, which the hypotenuse square area is equal to the sum of opposite and adjacent square area.It illustrated as Figure 1 as below:  [11] The relationship between hypotenuse (c), opposite (a), and adjacent (b) as the equation is: (2) Equation 2 was proof with the square area in Figure 1 [7] The standard deviation isbenefi cial to quantifythe spread dataset while the mean (μ) used asa dataset concentration measurement, which able to describe a data in the range of the dataset.The mean is not able to measure of concentration for nominal and ordinal data types.Therefore, the steps that can do with SDR are as follows: • Calculate the standard deviation for the target's attributesas the basis of the decision.• Separate the datasetto examine the relationship two attributes variable: target and predictor with equation σ(T,P).The predictor is able used to determinethe hypotenuse and two legs (opposite and adjacent) values.• Compute the standard deviation for each branch.
The result of the standard deviation, which subtracted from the standard deviation before separate to gain SDR.
The attribute with the most signifi cant SDR values is the best choice.Construct decision tree recursively until the Coeffi cient of Variant (CV) become smaller than the threshold which is determined. (

RESULT AND DISCUSSION
Data visualization of Decision Tree done made by the separation of datasets using SDR.Constructing the Pythagoras Tree is done by obtaining the number of instances that used as hypotenuse and legs.Table 2 is an example of a data sample where the total number of datasets for instances is n = 114 with SDR selected as the best choice, which the name attributes initialused an alphabet.As an example for SDR understanding can use some raw data in Tabel 1 as below.Decision Tree does some training by comparing SDR whichever is greater.Also, when the CV is higher than 10%, the dataset needs to be subdivided; otherwise the dataset becomes leaves.In this case, instances are 108 and 6, which use for legs in a right triangle is shown in Table 2. Based on table 2 that the value obtained on SDR depends on the number of instances in the dataset.Therefore, the depth to one is the overall dataset n = 114, and legs have 108 and 6.Dataset n is hypotenuse which the sum of two legs: 108 + 6.If a dataset is a square area under the Pythagoras Theorem, then c 2 = 114 followed by a 2 = 108 and b 2 = 6.Construct a fractal of Pythagoras Tree also depends on the angle θ for build branches followed by square area.Therefore, it is essential to determine which one of the two legs are opposite or adjacent.For example, to construct a tree trunk, adjacent is more used because this part is closer to hypotenuse when compared to the opposite.
According to equation 1, the value used is not a square area, but the value of each side, which based on   6).In fundamental, datasets (assumption as black spot) for dataset itself had a classifi cation in the square area such as Figure 5.It means the size of the square area is equal to the amount of data.It is call construct data visualization.Illustrated by Pythagoras Tree is an easy way to understanding the dataset based on the size of the square.The smaller the size, the fewer instances in the rectangle.Also, data associations easily traced when fi lled with color, as in Figure 6,which shows the relationship data of IF-THEN based on the μ values.Based on [14] concept, it is establishing a gradient color scale from 0023BF hex color (μ = 0) to D1E100 hex  color (μ = 31500).It used to present μ values, as shown in Figure 6.For example, if μ = 6138.816which color is 1436AB hex color.Therefore, Pythagoras Tree presents a view that simple and understandable to read.Based on [14] concept, it is establishing a gradient color scale from 0023BF hex color (μ = 0) to D1E100 hex color (μ = 31500).It used to present μ values, as shown in Figure 6.For example, if μ = 6138.816which color is 1436AB hex color.Therefore, Pythagoras Tree presents a view that simple and understandable to read.

Figure 1 :
Figure 1: Square area of the right triangle[11] The relationship between hypotenuse (c), opposite (a), and adjacent (b) as the equation is: , was surrounded by four right triangles, illustrated in Figure 2, which is similar to comparison 5. Therefore, Decision Tree obtained from dataset c 2 and subset a 2 and b 2 , which is split based on SDR.SDR based on Standard Deviation (σ) which to calculate a numerical dataset until itcontains instances the homogeneity values completely.The σ equation is [12]:(6)(7)

Figure 2 :
Figure2: Square constructed with four right triangles[7] The standard deviation isbenefi cial to quantifythe spread dataset while the mean (μ) used asa dataset concentration measurement, which able to describe a data in the range of the dataset.The mean is not able to measure of concentration for nominal and ordinal data types.Therefore, the steps that can do with SDR are as follows:• Calculate the standard deviation for the target's attributesas the basis of the decision.• Separate the datasetto examine the relationship two attributes variable: target and predictor with equation σ(T,P).The predictor is able used to determinethe hypotenuse and two legs (opposite and adjacent) values.• Compute the standard deviation for each branch.The result of the standard deviation, which subtracted from the standard deviation before separate to gain SDR.

Figure 2 ,
where each square area is the length of a right triangle.Therefore, hypotenuse c = √114 = 10.6770782520313.While the adjacent is gain from the other leg, 108>6 which means adjacent a = √108 = 10.3923048454133 and opposite b = √6 = 2.4494897427832.The calculation the angle θ used from one of sin θ, cos θ, or tan θ, the value is the same because it used a right triangle which the upright side has a 90-degree angle.Therefore, the value the angle is sin θ = 2.449489742 7832/10.6770782520313or θ = sin-1 (2.4494897427832 /10.6770782520313).The result is 13.2626760083048 o .According to Figure3, < BAC = 13.2626760083048o and <ACB = 90o.Therefore, to obtain <ABC, reduce the value of <ACB to <BAC, 90o-13.2626760083048o= 76.73732399169520.According to Figure1, c2 is an area of square 114, where the circumference of the rectangle is √114, which illustrated in Figure4.The square area also applied to legs of a 2 and b 2 .Perform recalculation instances for values of other legs.For example,the instances of 108 and 6 until there are no more legs.6 instances divide to 4 and 2: where square area c 2 = 6; hypotenuse c = 2.4494897427832; adjacent a = 2; opposite b = 1.4142135623731; and θ = 35.2643896827547o.A fractal Pythagoras Tree is

Figure 3 :
Figure 3: Construct the right triangle recursive until the values become the homogenous instances (Figure6).In fundamental, datasets (assumption as black spot) for dataset itself had a classifi cation in the square area such as Figure5.It means the size of the square area is equal to the amount of data.It is call construct data visualization.Illustrated by Pythagoras Tree is an easy way to understanding the dataset based on the size of the square.The smaller the size, the fewer instances in the rectangle.Also, data associations easily traced when fi lled with color, as in Figure6,which shows the relationship data of IF-THEN based on the μ values.Based on[14] concept, it is establishing a gradient color scale from 0023BF hex color (μ = 0) to D1E100 hex

Figure 4 :
Figure 4: Pythagoras Tree on Depth to one

Figure 5 :
Figure 5: The dataset in Square Area

Figure 6 :
Figure 6: Pythagorean Tree with the μ color

Table 2 :
Hypotenuse (n) and two legs based on SDR by three depth