FEATURE IMPORTANCE OF THE AORTIC ANATOMY ON ENDOVASCULAR ANEURYSM REPAIR (EVAR) USING BORUTA AND BAYESIAN MCMC

A retrospective study of the abdominal aortic aneurysm (AAA) with EVAR treated patients. The third-party collected the data from twelve vascular centres in Indonesia during 2012-2017. Patient demographics and computed tomography data were evaluated with Osirix MD Software. During five years, we had 148 EVAR cases done using Endurant stent graft (Medtronic). In this paper, we perform Bayesian modelling and selection of feature selection by Boruta. Before performing the models, we will determine the selection of dependent variables start with the Age, Class, and Sex. It will get what is important to be dependent and independent. The difference between Bayesian and the Available online at http://scik.org Commun. Math. Biol. Neurosci. 2020, 2020:22 https://doi.org/10.28919/cmbn/4584 ISSN: 2052-2541 2 CARAKA, NUGROHO, TAI, CHEN, TOHARUDIN, PARDAMEAN classical method is the introduction of prior information in the form of probability distributions. In addition, to determine the parameters using the Bayesian method obtained from the probability statement. Parameter estimation in Bayesian is no longer a point estimate but, on the contrary, is a statistical distribution. In other words, Bayesian states that a parameter is a variable that has a distribution. Bayesian has become a popular method in modern statistical analysis. Bayesian is applied to a broad spectrum in the scientific and research fields. Bayesian data analysis involves learning from data that uses probability models for many observations and some information to be studied. In other words, analysing statistical models are by combining prior knowledge about the model or parameters of the model. In a nutshell, the simulation results obtained modelling with Bayesian-ZIP-MCMC R 87.52 and Bayesian-Boruta R 88.28%.


INTRODUCTION
Abdominal aortic aneurysm screening or broadly known as AAA screening is an examination procedure that aims to check for the abnormal widening of the aorta [1]. The examination is recommended since the beginning of the abdominal aortic aneurysm occurs because if it is too late, the size of the aorta can increasingly widen and burst. Moreover, the AAA case is predominantly an asymptomatic case, that is the reason an AAA screening is recommended on a high-risk patient with age >65 years and has a familial history of AAA [2]. Abdominal aortic aneurysm (AAA) is a condition of widening of the aorta in the abdomen, abnormally. The aorta is the main artery blood vessel that comes out of the heart to supply blood throughout the body.
The exact cause of an abdominal aortic aneurysm is not yet known, but several factors are thought to cause this condition. Among smoking habits, high blood pressure (hypertension) [3], trauma due to accidents [4], hereditary diseases [5], as well as infections and thickening of the arteries (atherosclerosis). The characteristics of the aorta in each individual are different, especially for each country [6] conduct studies on the characteristics of abdominal aortic aneurysm (AAA) in Korean patients. Moreover, [7] an epidemiology study for AAA patients is performed in the South-East Asian state of Sarawak in Borneo Island. This study shows that AAA in this Asian population is not uncommon, and the incidence is comparable to the Western world. It is important to analyse the characteristics of AAA in Indonesia [8], [9], [10].
Abdominal aortic aneurysm (AAA) are defined as pathologic dilatation of the abdominal aortic segment and due to this dilatation the aorta prone to expand and rupture that leads to a higher mortality rate [11]. Usually diameter of the abdominal aorta >30 mm is diagnosed with AAA [12].
The progressivity of AAAs is different between male and female gender. It is hypothesized that estrogen has a role in the modulation of the immune system and resulting the protective issue in women via reducing the macrophage MMP production. Besides, the expansion and rupture rate are greater in the female gender, presumably due to the beginning of aneurysm process is begun in smaller diameter of aorta in women. The AAA events is also higher in the patients with smoking history and reaches 222 (95% CI 129-355 per 100 000 persons/year). If indication of AAA treatment exists, endovascular treatment is the gold standard of the treatment, despite there is an option for open aortic repair (OAR). Endovascular treatment belongs to EVAR (endovascular aneurysm repair), TEVAR (thoracic endovascular aortic repair), ch-EVAR (Chimney EVAR), f-EVAR (fenestrated EVAR), and b-EVAR (branched EVAR) depends on the aortic segment which involves in aneurysmal sac.
The surveillance of successfulness of endovascular treatment can be described in the reintervention rates, 30-days post-op mortality rates, and follow up mortality rates in years. In the EVAR procedure, from the meta-analysis of four studies (EVAR-1, DREAM, OVER, and ACE), the re-intervention rates are 5. 1-8.5 [17].
Feature selection is one of the most important techniques and is often used in data mining preprocessing [18], especially for knowledge discovery and discovery science. This technique reduces the number of features involved in determining a target class's value, reducing irrelevant, redundant,and data features that cause misunderstanding of the target class. The selection of the best features also provides high accuracy and a parsimony model will be obtained, which is easy to apply because not all variables need to be involved for the construction of the model. In addition, discussions with some experts in the field of health will also be done to improve insight so that it can produce a solution that is really appropriate [19].
The main task in feature selection is to determine which features are selected and used in the context of forecasting attribute attributes. Features are considered relevant if their value varies systematically with category membership. Feature selection is also useful for reducing data dimensions.
In addition, Boruta widely used in the field that requires superior features selection algorithms which is very important to create a ranking evaluation framework is based on the similarity between classifiers used as the core feature selection algorithm. Basically, the subset that will be obtained from the feature selection is a ranking order of all features selected by each algorithm.
In this paper, we analyse the proposed feature selection using Boruta and then Bayesian regression for the formation of models for the characteristics of the aortic for endovascular aneurysm repair in the Indonesian population in five years experiences.
Regression analysis is used to determine the effect of one or more independent variables on the independent variables [20], [21]. Currently, the use of regression analysis mostly uses a classic approach that does not include prior information [22], [23]. In this case, Bayes regression fills these weaknesses [24], [25]. The Bayes approach allows researchers to combine prior information and information obtained from a sample and then use it together to estimate posterior parameters.
Moreover, this method is also useful to further understand the importance of stability in the feature selection algorithm so that it can then be used as a basis for decision making, which algorithm is suitable for one problem domain. The remainder of the paper is organized as follows. Section II provides the methodology. Section III provides five years of study experience. The feature selection and Bayesian regression are presented in Section IV. Finally, conclusions and future research directions are indicated in Section V.

METHODS
In this section, we introduce the feature selection Boruta and Bayesian Regression.

Feature Selection
Feature selection is the process of identifying and removing irrelevant and redundant variables.
Feature selection is considered relevant if their values vary systematically with category membership. This process is essential in machine learning [26]. Many machine learning algorithms experience a decrease in accuracy when the number of variables is significant but not optimal [27], [28]. In addition, the number of variables slows down algorithm performance and takes up too many resources [28]. Boruta is one of the relevant feature selection algorithms, which can work with classification methods that measure the importance of variables. By default, Boruta uses Random Forest. The steps for Boruta's algorithm in finding all relevant variables are as follows.
1. Add shadow attributes, i.e., random data from copies of all attributes.
2. Run a random forest classifier and get Z scores.
3. Get the maximum Z scores among the shadow attributes (MZSA). 4. Mark the first attribute that has lower importance than MZSA as 'Rejected' and deletes it from the system. 5. Mark the first attribute that has higher importance than MZSA as 'Confirmed'. 6. Remove all shadow attributes. 7. Repeat the procedure until only the Confirmed attribute is left, or when reached the specified iteration limit.

Bayesian Regression
The posterior density function in the Bayes rule is proportional to the product of the prior distribution and the data distribution [24], [29], [30]. Following are the Bayes rules in establishing the posterior distribution of data: Continue. . The form equivalent to the above equation is obtained by issuing ( ) because it does not depend on . Therefore, ( ) considered constant. The form of the equation above becomes: Which is a construction of ∝ ℎ . Furthermore, in Bayes regression, the Bayes estimator is assumed to use the expected value of the posterior distribution.
The equation above is a form of ∝ ℎ . With the Bayes approach, regression parameters 0 , 1 , … , considered a random variable [30]. Next, estimating these regression parameters takes into account prior information. There are two scenarios for priors, namely diffuse improper prior and prior informative conjugate provers for the regression parameter vector, (β, 2 ). The regression model in this study assumes a normal distribution [31]. The prior noninformative distribution that is appropriate for normal heretical assumptions is uniform or uniform continuous on (β, log σ). The prior is equivalent to diffuse improper prior.
where the regression coefficient can take all real values, -∞ < βk < ∞, for k =0,1, . . ., K-1, and variance 2 > 0. By carrying out the multiplication operation between likelihood and prior to produce a posterior of the model parameters as Posterior distribution of β if 2 known, Where ̂ is the OLS estimator, and ( ′ ) -1 2 is the covariance matrix of ̂. Moreover, Posterior distribution of 2 can be written as follows: To find out the characteristics β that does not depend on 2 the marginal distribution must be obtained from β. In practice, 2 is an unknown parameter. This can be demonstrated by uniting the combined posterior distribution.
by integrating the function against 2 , obtained unconditionally posterior distribution of , and a multivariate or Student-t distribution ( | , ) ~ . The multivariate Student-t distribution with degree of freedom, v = n -K, with the density function as follows: It can be seen that issued 2 makes distribution heavy-tailed as it should reflect uncertainty 2 . Although, the mean vector of unchanged, the variance increases with its v/(v-2): Where v = n -K is the degree of freedom of parameter distribution of the multivariate Student-t distribution. As a conclusion in the discussion of the posterior one regression coefficient, βk, under the diffuse improper prior scenario, it can be shown that βk standardised has a Student-t distribution with degrees of freedom n -K. That is because the marginal posterior distribution is as follows, Where, ℎ , is a diagonal element to-k from ∑ and ̂ is the OLS estimator of βk . Marginal posterior distribution of 2 can be written as follows: This form is an Inverse form -χ 2  The numbers represent the measurement of diameter or length of its correspondence measure of anatomical site in millimeter (mean ± SD mm)

ANALYSIS
The dataset used in this paper can be seen in  Figure 2.

Figure 2. Feature Selection Based on Class
Then, a second simulation is performed to perform feature selection when considering age. Figure   3 explains that there is only one important attribute, which is renal inv and 18 attributes confirmed unimportant.  Based on the three simulations using feature selection, it was found that the good use of variables is Class (1 Evar, 2 Tevar, and 3 Both) because this simulation has a lot of important variable information compared only using age and sex. Furthermore, Bayesian regression will be built with MCMC and prior ZS-Null.
Bayesian and MCMC methods are complicated to separate [32]. For the posterior model of a Bayesian that is very complicated, it requires a challenging integration process in determining the marginal posterior parameters. Then, we need an alternative solution with a numerical approach.
Another way to perform MCMC is by Gibbs Sampler. Gibbs Sampler facilitates the numerical solution to generate random data = ( , 2 , ). It contains as a random vector with a precise distribution and has an estimated value of (Θ ̅ ) of function is related in (Θ) [33]. Since all parameters = ( , 2 , )in bayesian are treated as variables, then the inference in the estimation will be based on:

Gibbs Sampler is a very efficient generator, so it is often used as a generating of random variables
in data analysis using the MCMC method [34], [35]. Gibbs Sampler method can be defined as a simulation technique to generate random variables from a particular distribution indirectly, without having to calculate the density function of a data distribution assuming several random variables 0 , 1 , 2 , … , , in a mixture distribution. 0 , 1 , 2 , … , , The mixture of the function distribution considered known as bellows. To get the distribution characteristics of x or often said to be the marginal distribution of ( ), multiple process integrals must be carried out as many random variables are left in their combined distribution.
( ) = ∫ … ∫ ∫ ( 0 , 1 , 2 , … , , ) 0 1 2 … Integration in the above equation will be very difficult or even impossible if the combined function is very complicated. To solve this problem, we can use the Gibbs Sampler method with the condition that the distribution of each variable contained therein is known. With this method, without having to calculate and know what the shape of the marginal function is, the characteristics of each marginal random variable in the combined function will be known. The way to approach the marginal distribution characteristics with Gibbs Sampler through numerical data generation with Monte Carlo simulations. Otherwise, the conditional distribution of random variables is being studied for all remaining random variables under the combined distribution function. Then to find out the characteristics of the marginal distribution ( 0 ), the Gibbs Sampler with the Monte Carlo method will generate some data X that has distribution ( 0 | 1 , 2 , … , ) and its marginal distribution characteristics. It is estimated based on data from the Monte Carlo simulation. This explains that with huge amounts of data, the values obtained based on these data will reflect the condition of a population. Data generation of each random variable in the combined density function using the Gibbs Sampler is as follows.

⋮
Step d. 1 from ( | 1 1 , 2 1 , … , 1 ) To obtain the data generation value in the second iteration,  As mentioned in the previous section, we can generate random data in each iteration. Other words, the data generated is very dependent on the random value that has been generated by one iteration before. Moreover,the data generated does not depend on random values in the second, third, until N iteration previously.The model that can be produced explained below: As mentioned in the previous section, we can generate random data in each iteration. Other words, the data generated is very dependent on the random value that has been generated by one iteration before. Moreover, the data generated does not depend on random values in the second, third, until N iteration previously. The model that can be produced, is explained below: = 0 + 1 ( 1, − 1 ̅̅̅) + 1 ( 2, − 2 ̅̅̅) + ⋯ + 19 ( 19, − 19 ̅̅̅̅) +∈ , 1 ≤ ≤ Then, we compared the accuracy of the models by performing all the variables we had and also compared to the feature selection with Boruta. Table 3 explains that the best model used is model 3, with an R 2 87.52% compared to other models.   We are likewise presenting a shortened beta-binomial dissemination on the model size. This condition allows all models with more than 2 coefficients a zero likelihood from the earlier which confines the number of potential anomalies. Then the Bayesian inference model selection is used with the Bayesian Model Averaging (BMA); at this stage, we can have predictions and a combination of estimates from the resulting model. In Figure 4, it can also be seen that the linear line (red line) can fit the residual (dot), which can be assumed that our model is good enough to be used as a prediction.

CONFLICT OF INTERESTS
The authors declare that there is no conflict of interests.

SUPPLEMENTARY MATERIALS
The dataset is available by request to corresponding author.

APPENDIX
At point π, the objective thickness or the full contingent of in the Gibbs sampler is of a nonstandard structure, can be utilized to bring about a Gibbs sampler having a lot of effectively tested standard full conditionals. It increases the objective thickness with a positive dormant variable developing the joint thickness of and . Along these lines, the minimal thickness for θ is given by π and the Gibbs sampler is reached out to incorporate an additional full contingent for .
Assume that we wish to test from a thickness π given by where is a thickness of known structure and f is a non-negative invertible capacity. With the presentation of a latent : Ω ⟶ + with the joint thickness can be composed as follows: ( , ) ∝ ( ) ( < ( )) Then, we can minimize and we may get ( ) along these lines, the growth is substantial. The prior ( ) distribution which has parameter a and comes from the family of distribution is said to be conjugate to ( | ), if the posterior distribution ( | ), which is obtained is also a family of distribution . As ~ ( ), = 1,2, … , with the likelihood function.
It is assumed that prior ( ) has gamma distribution with parameter = ( , ) or ~ ( , ) so that it can be written as follows,