Bayesian Maximal Information Coeﬃcient (BMIC) to Discover Causal Correlations from Large Dataset in Random System

Bayesian network (BN) is a probability inference model to describe the explicit relationship of cause and eﬀect, which may examine the complex sys-tem of rice price with data uncertainty. However, discovering the optimized structure from a super-exponential number of graphs in the search space is an NP-hard problem. In this paper, the Bayesian maximal information coeﬃcient (BMIC) is proposed to uncover the causal correlations from a large dataset in a random system by integrating a probabilistic graphical model (PGM) and maximal information coeﬃcient (MIC) with Bayesian linear regression (BLR). First, MIC is to capture the strong dependence between predictor variables and a target variable for reducing the number of variables during the Bayesian network structural learning. Second BLR is to assign orientation in a graph resulting in a posterior probability distribution. It conforms to what BN needs to acquire a conditional probability distribution when given the parents for each node by the Bayes’ Theorem. Third, the Bayesian information criterion (BIC) is treated as an indicator to determine the well-explained model with its data to ensure correctness. The score shows that the proposed method obtains the highest score compared to the two traditional learning algorithms. Finally, the BMIC is applied to discover the causal correlations from large dataset on Thai rice price by identifying the causality change in the


Introduction
With data uncertainty, one important issue is found when data keeps growing in volume, variety, and velocity. The uncertainty resembles noise, contaminating the observed dataset, leading to deviation in correctness. However, the observed data is a valid part of the computational and statistical analysis necessary to obtain knowledge and utilize it further in a prediction. Price is regarded as a key element that holds a high possibility to drive the economy through domestic and international trading. An analysis of price is wildly in demand in economic (Baharom et al., 2009;Ghoshray, 2008) and agricultural (Sujjaviriyasup, 2018;Shao and Dai, 2018) topics, requested to know the mechanism, characteristics, and future trends. The price prediction model is a popular study, yet the data analysis process is quite tricky due to non-stationary and noise data. To achieve the prediction model, the relationships among relevant data variables, especially causality, becomes crucial for analyzing data. The price domain is affected by many uncontrollable attributes that gather in timeseries data. Thus, the observed dataset simultaneously presents data variability and genuine incomprehension of data attribute relationships in the system. In this paper, agricultural price is selected for study, particularly the price of Thai rice, due to price variation and global trading competition. The price of Thai rice can fluctuate, relevant to many factors such as uncertain productivity, climatic conditions, competitors (import and export trading), and domestic and world situations (Pandey et al., 2010). Understanding causative relationships among factors can reduce obstacles in pricing and avoid unexpected future situations. Therefore, the interpretation of causative relationships is better described via a graphical model.
Probabilistic Graphical Model (PGM) (Marloes et al., 2013) illustrates causal relationships by probability and graph. It (Koller and Friedman, 2009) is a robust framework to handle uncertainty predictions. The correlated and tuple uncertainty and their relationships are demonstrated by a joint probability distribution (Singh et al., 2008). Bayesian network (BN) presents a directed acyclic graph (DAG) on a joint probability distribution using a conditional probability of variables from the Bayes' theorem along with machine learning. Much research has underlined the probabilistic forecasting of uncertain data with BN in a specific domain, including medical analysis (Helong et al., 2009), ecology system (Liu et al., 2015;Aguilera et al., 2011), agricultural forecasting (Chawla et al., 2016;Nuvaisiyah et al., 2018), and economic application (Alvi, 2018). Unfortunately, PGM's structured learning process encounters the NP-hard problem (Chickering, 1996) when all observed variables are required to construct the DAG. It implies that the super-exponential number of graphical models is in the search space. The more graphs in the search space, the more time, memory, and cost are required to discover the best network structure.
Consequently, this paper aims to use a data-dependent to scope the observed variables in the sample space. The dependence measure is calculated by maximal information coefficient (MIC). (Reshef et al., 2011) The MIC widely detects the relationships between pairwise variables in a large dataset. The MIC score is equal to the coefficient of determination, as an R-square (R 2 ), of data related to the regression. An equitability property, which is one of the main properties of MIC, is reasonable for choosing this algorithm. The equitability will return the result of noiseless function relationships that maximize the R 2 to 1 when the function is tested with different samples, noise models, and noise levels. Even so, the correlated data is going to be connected to their relationships as an undirected graph. Edges without a direction between data variables cannot represent the causality. Based on the reasons above, this paper high-lights the synergy of correlated data and PGM, aiming to diagnose the direction of the relationship by applying a statistical inference. The Bayesian linear regression (BLR) formulates the linear regression with the Bayesian inference, implying the variables' dependency in a linear function. Therefore, in this paper, the BLR is preferred to identify the directional relationship of correlated pairwise variables through their dependence by using a probability distribution. The correlated pairwise variables can manifest in the linear relationship between the target (y) and predictor variables (x). The BLR declares the linear regression model using a probability distribution from a normal (Gaussian) distribution rather than a single value estimated.
Moreover, the target result of BLR draws a posterior probability for model parameters that can describe the conditional probability of correlated pairwise variables. The highest posterior probability determines the direction of correlated pairwise variables. The pairwise variables, x 1 and x 2 , will be tested as both target and predictor variables, meaning when x 1 is a target variable, x 2 will be a predictor variable and vice versa. Consequently, the alternation of being the target and predictor variables will obtain a different posterior probability. The highest posterior probability selection determines whether x 1 or x 2 will be the target or the predictor variable. The result can then be implied to be a directional relationship and conditional dependency between x 1 and x 2 .
2 Related methods PGM and MIC are the related methods for background knowledge.

Probabilistic Graphical Model
Probabilistic Graphical Model (PGM) framework conditionally establishes the network model into Bayesian network (BN) and Markov chain (MC). Both models highly support modelling a joint probability distribution into directed or undirected graphs. BN illustrates directed PGM, presenting the captured conditional dependencies of associated attributes through the Bayes' theorem, beneficially describing a causality relationship. BN is also applicable to making predictive modelling of uncertain data that explains dependent attributes using the occurrence probability of their relationships. On the other hand, MC is designed to be an undirected PGM that shows only the nodes' association without any ordered causality. BN is a model representation based on graph and probability theories. A graph theory (Ruohonen, 2013) composes a set of vertices and edges, G = (V, E). V indicates vertices or nodes, while E indicates edges in a graph. The edge displays the connection between associated vertices. The cardinality of vertices or nodes |V | and edges |E| in a graph refers to the order and the size of a graph, respectively. In BN, nodes and edges represent random variables and their relationship, respectively, avoiding a self-connection and cycle relationship. The basic BN initially composes of at least two nodes connected with one edge. BN's edge representation draws a directed edge that uses an arrow to give the best understanding of the causal relationship. Therefore, BN is a directed acyclic graph (DAG) that determines a fundamental cause-and-effect relationship concept. An example in Fig. 1 is an arrow that directly connects node A to node B, implying that node A is a cause to node B or that node B is affected by node A. Conceptually, the joint distribution of stochastic variables is a conditional probability distribution (CPD). The Bayes' theorem provides a probability value to determine a dependency level among events of uncertain situations. However, real-world situations like medical, agricultural, and environmental must handle many relationships among random variables (factors). The CPDs become an answer to help PGM interpret complex relationships with their intuitive models. Moreover, the sample of discrete and continuous values can return different CPD results. The discrete values give the conditional probability tabular as a conditional probability table (CPT). In contrast, the continuous values correspond with mean and variance to obtain the conditional probability as a conditional density function that normally relies on the normal distribution.
Generally, three basic rules are compulsory to be accepted for conditional probability. When the sample S has the event θ n , S corresponds to the real value of the event θ n ,. . . θ n+1 for measuring probability distribution P over (θ n , S). First, all possible values of θ n in S have maximum probability at 1. Second, the probability of an event cannot be a negative value. And third, when one of two events θ n ,θ n+1 in S has a mutually disjoint event, the summation of probability in each event can be calculated.
• P (θ) = 1 • P (X) ≥ 0, when X ∈ S • When (θ n , θ n+1 ) and (θ n ∩ θ n+1 ) = ∅, then (θ n ∪ θ n+1 ) = P (θ n ) + P (θ n+1 ) To find the answer to the conditional probability of two events (event X and event Y ), the dependency between the two events must be realized. If two events are independent, the conditional probability of event Y given event X can be stated equally to the probability of Y , P (Y ). On the contrary, two dependent events manifest the conditional probability when the first event's outcome affects the second event's outcome. It implies the sequence of cause and effect for those two events. For example, the probability of event Y occurred given the knowledge that event X has already occurred: notified as P (Y |X). It can be read that the probability of event Y given X. Therefore, the P (Y |X) equals the disjoint probability events of X and Y over the probability of X, P (X), which is indeed the Bayes' theorem.
Bayes' theorem (Joyce, 2003) focuses on a probabilistic hypothesis of given conditional knowledge of the occurred dependent event(s). The Bayes' theorem can invert the conditional probability between two dependent events stated in an equation mathematically.
Equation (2) is the Bayes' Theorem equation. P (X | Y ) is a conditional probability that is measured by a likelihood of event X given a likelihood of event Y . P (Y | X) is a conditional probability that is measured by a likelihood of event Y given a likelihood of event X. And P (X) and P (Y ) are the probability of event X and event, respectively. In prediction terminology, the Bayes' Theorem equation variables should rely on the hypothesis and evidence. The result of the hypothesis's directed probability is calculated by given observed evidence. The equation is changed as given below.
where, P (H | E) is a posterior probability. P (E | H) is a likelihood probability. P (H) and P (E) are a prior probability and a marginal probability, respectively. The P (E | H) explains the probability value of data given a hypothesis (Joyce, 2003). Consequently, PGM draws edges among random variables to represent their dependencies by using structural learning of the observed data sample. There are two learning approaches, parameter learning and structure learning, emphasizing dependent attributes beyond the conditional probability estimation process. Both learning algorithms aim to construct the DAG, but each model will be done via a different method and target data sample. PGM does not only learn through the pure data sample, but the existing relationships can also be learned to construct the network model.

Parameter learning
The parameter learning approach requires sample data with the existent DAG for capturing the conditional probability of individual variables. The conditional probability is estimated by selecting two components: Maximum likelihood estimation (MLE) and Bayesian estimation. MLE gives the maximized conditional probability of data given the model, P (Data | M odel). At the same time, the Bayesian estimation returns the prior conditional distribution.

Structure learning
The structure learning approach captures the maximized probability of variable dependencies without knowing prior knowledge. The network model is obtained from the posterior probability distribution given its data sample, P (Graph | Data). According to the Bayesian statistical decision theory, the number of graphs can grow in a super-exponential number of DAGs. Then, only the most optimized structure, which is measured by a network score, will be selected. Although structure learning encounters a limitation of unknown prior knowledge, the solution is to apply the uniform prior, which holds equal probability. It is generally implied that the P (Graph | Data) is proportional to P (Data | Graph). Hence, two learning algorithms, constraint-based learning and score-based learning, can help determine the network's arcs.
(a) Constraint-based learning A constraint-based learning approach is structuralized intuitively from conditional independencies that do not overlook the concept of Bayesian Network. The conditional independence test is done by Pearson's χ 2 -test, Fisher's Z-test, and t-test instead of testing the probability. This constraint-based learning approach considers three steps to model the network: conditional independence identification, skeleton learning of undirected relationship, and arcs direction learning. In (Alvi, 2018), the Markov blanket was first learned to optimize the number of candidate nodes of DAG. The conditioned edges of the Markov blanket can return the conditional independence of every set of nodes. It benefits skeleton learning by identifying the undirected edges in DAG. Then, the undirected edges will be assigned the direction, which is a complete partial DAG. The results imply the causal relationship between nodes. (b) Score-based learning The score-based learning approach emphasizes applying a network score to evaluate the best BN which fits the data. The maximal score returns the highest posterior probability of a graph (nodes and edges) given its observed dataset.
Since the possible DAGs (BNs) have first seen growth in the search space in the super-exponentially of nodes, O(n!2 ( n 2 ) )( Bari, 2011) where n is number of nodes in the network , the heuristic search algorithms take responsibility to reduce the number of DAGs, finding the optimal BN structure. The scoring function works as an indicator that can calculate by AIC, BIC (Burnham and Anderson, 2004), K2 (Cooper and Herskovits, 1992) and Bdeu (Heckerman, 1995) for the heuristic search algorithm. The search algorithms also have selections for different purposes such as Hill-climb search, Tabu search, Genetic algorithm, and Greedy equivalent search (GES). In (Beretta et al., 2018), the focus was on the operation efficiency by comparing each scoring method of each heuristic search algorithm. The outcome found that variable types and interest domains can affect learning performance. Only a few heuristic search algorithms operate efficiently with scoring methods. However, the score-based mechanism can demonstrate the whole structural model's dependencies, avoiding the failure of individual conditional independence. (Koller and Friedman, 2009) As a result, the dependencies among random variables can be described by PGM, which helps to depict relationships by supporting conditional probability. The hard work of identifying the optimized model encounters an NP-hard problem which loads all work to heuristic-search algorithms for comparing all networks' scores in the search space. The model which holds the highest score will be determined as the best model.
Although the NP-hard problem in searching for the best network is time-consuming, PGM is still a robust framework that provides functions for illustrating the causal relationship model. Especially, BN can increase the users' comprehension by the intuition of model interpretation on a domain. (Liu et al., 2012) 2.2 Maximal information coefficient Maximal information coefficient (MIC) (Reshef et al., 2011) statistically measures a dependency between pairwise variables on a large dataset and is described as an exploratory data analysis (EDA) tool. The correlation level is defined as MIC score to determine the dependent strength of pairwise variables. MIC score ranges from 0 to 1 by R-squared (R 2 ) evaluation normalized from the mutual information. MIC score of 0 means an independent relationship. An increased MIC score, up to 1, depicts the strength of dependence. MIC score's absolute value can be interpreted as a magnitude of the correlation coefficient as in Table 1. They are usually divided into 5-6 levels (Cai et al., 2019;Liang et al., 2019), including no correlation and perfect correlation levels. MIC algorithm returns a significant score, when compared with other algorithms, because of its two heuristic properties: generality and equitability. The generality property supports unlimited access of relationship types. MIC can detect both linear and non-linear relationships while other algorithms, such as the distance correlation (Székely and Rizzo, 2009) and Pearson's R (Benesty et al., 2009), perform only the linear function. MIC also performs more equitable than the mutual information in almost all noise models. The equitability property gives a similar MIC score to different relationship types with a similar noise level. The correlation coefficient in Fig. 2 (Reshef et al., 2011) demonstrates the score comparison between MIC and other current functions of noiseless models. It is found that MIC can detect the equal score as 1.00 on different functions of noiseless models except the random function. The definition of MIC (Reshef et al., 2011(Reshef et al., , 2013 is denoted for measuring a maximal coefficient of variables' dependence. Let D| G be the probability of a finite set D over cells on a grid G, and I is a piece of mutual information.

Definition 1 (Characteristic matrix)
Let the characteristic matrix M (D) of a finite set D of ordered pairs, x and y, is an infinite matrix of elements (i, j) on a grid G.
where I * (x, y, D, i, j) is the maximum mutual information of ordered pairs, x and y, over a grid G with i columns and j rows, max I (x, y, D| G , i, j).

Definition 2 (Maximal information coefficient (MIC))
Let the finite set D contain a set of ordered pairs, x and y. Then, let I * (D, x, y) = max G I ( D| G ) where the maximal mutual information is obtained from all partitioned pairs x-by-y of a grid G. Therefore, MIC of pairwise variables over G with sample size n can be defined as The grid size of x and y should be less than B(n) where ω(1) < B(n) ≤ O n 1−ε for some 0 < ε < 1 and B(n) is set equal to n 0.6 . The symmetric relation of pairwise variables due to the mutual information basis equalizes MIC score.
MIC carries powerful benefits on revealing association among variables: the outlier's robustness, no assumptions of variable distribution, and easiness of interpretation. It is widely applied across various research domains such as language recognition, image processing, searching based, spatial data analysis and medical. Wang et.al (Wang et al., 2018) improved MIC methodology offering iMIC, which was intelligent in searching the association relationship of variables in the system. The agriculture domain also selected MIC method to find the impact factors of the rice price variation (Surapunt et al., 2017).

Methodological approach
Bayesian maximal information coefficient (BMIC) is proposed to uncover the causal correlations from large dataset in the random system by integrating PGM and MIC with BLR. To identify the cause and effect, it finds out the best causality model on synergy to minimize runtime execution.

Search the correlated variables
MIC is to narrow down the number of variables in the search space by the generality and equitability which are beneficial in unlocking the accessibility of many functions and noise data manipulation. Since MIC algorithm provides a score for ranking their correlation coefficient, a high MIC score strengthens the dependence between pairwise variables, establishing the structural model. Therefore, the execution time reduces while searching for an optimized model in the structure learning process. Unfortunately, MIC returns an undirected model of variables' dependency, the symmetrical relationship. It implies that MIC score of the relationship x → y is equal to y → x that cannot represent the causality. The use of MIC requires a directional arrow between any two nodes. As a result, there is interest in studying the orientation of pairwise variables for identifying the cause-and-effect relationship.

Identify the causal relationship
PGM is the robust framework to provide functions for identifying the causal relationship model. The BN is a representation of the PGM framework, constructing a DAG by capturing the knowledge of the probabilistic numerical information. The conditional probability distribution denotes the variables' dependency, which considers the posterior probability value. They represent the edges in a particular network. The posterior probability is a basis of any inference which requires the integration of prior knowledge and new information. The Bayes' theorem captures the posterior probability by preferring the Bayesian perspective. Therefore, the Bayesian inference is a traditional method to detect random variables' causality. It is an influential statistical theory that can determine the probability of uncertain data.

Determine the posterior probability
The posterior probability values are determined from the linear regression model under Bayesian view. BLR (Clyde et al., 2007) applies the Bayesian theorem to the linear regression model for posterior probability estimation. BLR is compatible to the linear regression model that is for understanding the linear relationship between input (feature) variable(s), and an output (target) variable. The linear function is formulated with probability distribution rather than a point estimation. The output parameter y is the response of normal distribution, controlled by a mean and variance as y ∼ N β T x, σ 2 . The mean value is a product of parameter β T and input variable x, while the variance value is a power of standard deviation. When the y variable of the linear regression model is completely computed, the posterior distribution for the model parameter given x and y variables will be detected by the support of Bayes' theorem as in Equation (8). The model parameters have followed the fundamental of Bayesian inference.
where, P (β | y, x) is a posterior probability distribution, P(y | β, x) is a likelihood of data, P (β | x) is a prior probability, and P (y | x) is a normalization.
Since continuous values are intractable to parameterize for posterior distribution evaluation, the BLR model uses the Markov Chain Monte Carlo (MCMC) algorithm to sample the posterior distribution to estimate the posterior distribution. The Markov Chain represents the next sample value drawn by the previous sample value. And Monte Carlo means the technique of drawing a random sample. The concept of MCMC emphasizes the more the drawing of posterior distribution samples, the more convergent the posterior distribution between approximated and real value. PyMC3 (Salvatier et al., 2016) is a python library which is used for implementing the Bayesian model by the MCMC method. The Bayesian inference has constructed the Bayesian Linear Models from the formulated linear function. It is provided by the generalized linear models (GLM) module, which can generate the formula from the input variable(s), x, and an output variable, y.
The BLR does not restrict non-informative priors to assign a value for the model parameters. The noninformative prior draws from the normal distribution with the mean and variance of observed data. Another benefit is the ability to quantify the model's uncertainty from the result of the posterior probability distribution. The level of certainty depends on the data quantity; a low level of uncertainty model is due to increasing the quantity of data. Therefore, the BLR has been popular in statistical training that is widely applied in various domains. It is mainly used to examine the relationship between model variables. As in the psychology of (Baldwin and Larson, 2017), the BLR can outperform frequentist statistical methods in examining the relationship of an electroencephalogram (EEG) and anxiety from their clinical data.
The BLR has also been proposed in predictive modelling, corresponding with a predicted value and a confidence (probability) interval (CI). (Kong et al., 2020) The predicted value is the distribution of the target y i given a set of features x i : P (y i | x i ). Moreover, only the posterior probability distribution might not be enough for the Bayesian decision-making. The CI helps to confirm the possible values in a range for the model parameters. The more percentage of CI is, the more the relationship between pairwise variables should be. The study of geology in (Ghosh and Chakraborty, 2020) is based on the demand prediction model of the Seismic fragility that estimates the uncertainty of the fragility curve. The BLR provides more accurate results than other benchmarks.
The CI (Hespanhol et al., 2019) is an uncertainty measurement in frequentist statistics that can also declare a degree of uncertainty. It comprises the lower and upper limit for the result estimation that can be varied by the sample size of observed data and standard deviation (heterogeneity). The width of an interval can interpret the precision of the result estimates. When the data sample is large and has a narrow CI width, a low degree of uncertainty is indicated. However, the heterogeneity is directly proportional to the uncertainty degree as low heterogeneity refers to low uncertainty resulting from the narrow CI. Hence, the narrower the width of an interval is, the more precise the effect estimates are. The CI is preferable to the pvalue, especially in the health science area, because of misinterpreting, misusing, and overestimating hypothesis testing. The p-value usually sets a boundary with a significant level of 0.05 (5 percent) for accepting or rejecting the defined null hypothesis. Many researchers have interpreted the p-value as a probability of accepting (rejecting) the null hypothesis as P (H 0 | y) where is the probability of H 0 given the observed data. In fact, the p-value measures the probability of the actual result given the null hypothesis as P (y | H 0 ). Moreover, researchers also attempt to oversimplify p-value interpretation in practice. They separate the statistically significant and non-statistically significant with the threshold of p-value at 0.05. Some concerns have been ignored the sample size and the variability of results estimation. Thus, the CI is promoted to be an alternative to the p-value, which can describe the variability of estimate and the interval's width. The interval's width indicates the precision of the result, one that is usually set at a 95 percent confidence level.
In the Bayesian approach, the uncertainty estimation of the posterior probability distribution is called the Bayesian credible interval, which is abbreviated as CrI. The Crl performs under the same principle as CI. The estimating Crl of Bayesian inference evaluates the posterior distribution with two types of Crls: an equal tail Crl and the highest posterior density (HPD) interval. The equal tail Crl is a direct threshold value of posterior probability with an interval of the posterior probability distribution, which calculates easily. For example, the upper bound of Crl is 0.975 means 97.5 percent of the quantile distribution of posterior probability. However, asymmetric posterior probability can affect the yield estimate value with a lower probability inside the interval than outside the interval. The HPD interval also determines the threshold value of posterior probability, indicating an interval with the probability mass around the distribution center. When the HPD interval of -4.0 to -1.0 is 95 percent, it implies that the mean difference mostly emphasizes between -4.0 to -1.0 given the observed data. The possible value with the highest posterior probability would be between -4.0 to -1.0. The symmetry of posterior probability makes the HPD Crl equal to the equal tail Crl, but the HPD Crl is more complicated in a computation interval when comparing with the equivalent tail Crl method.

Case Study on Thai Rice Price
As a case study, the proposed BMIC is applied to discover the causal correlations from large dataset on Thai rice price by identifying the causality change in the paddy price of Jasmine rice. It properly determines the causality of changes in the price of Thai rice due to its uncertainty and related impacts.

Thai rice background
Thai rice price is an interesting topic to study the effects of price change due to being a well-known global trading product. Many countries are willing to import Thai rice despite its high price. Thailand is the second-largest rice exporter, following only India. (Workman, 2019) Thus, Thai rice is an imperative agricultural product to expand the country's revenue. Rice is not in a monopoly market because many countries have the ability to export, such as China, India, Thailand, Vietnam, Pakistan, Australia, and the USA.(Office of Agricultural Economics, 2017) However, other competitors are fortunate to adjust their rice price following the Thai rice price under the FOB price criteria. The FOB stands for Free on Board or Freight on Board which is a shipping cost of goods on board of airlines or ship by in charging of seller. Then, every country that intends to be the rice exporter must quote the rice price that is a floor price to indicate the world rice price. The FOB price is defined to control the rice pricing in global trading that will be varied and coherent with current world situations.
Unfortunately, Thai rice encounters problems, fluctuating the price of Thai rice, from global and domestic situations. Many agriculturists decide to plant alternative crops to gain more profit. It consequently causes the rice yield to drop. The food crisis in 2008 [39] was a severe situation that caused a spike of 50-100 percent in raw materials and food prices. The demand upsurge, stocks and farming area decrease, lack of the agricultural infrastructure and investment, oil price, and exchange rate influence the rising price, impinging the demand and supply in global markets. (Kha and Trinh, 2017) Additionally, Thailand faced domestic problems with the Paddy Price Pledging scheme in 2011. The Paddy Price Pledging is the political machinery that supports agriculturists' income. The government allowed pledging the unlimited quota of a paddy and guaranteeing an obtainment of 50 percent higher than the market price. This policy affected to gradually increase the stock of Thai rice while the real market kept decreasing. It made the price of Thai rice suddenly more expensive than other competitors who lost the opportunity to trade with partners. In the study of (Wasawong, 2018), a linear model was developed to reveal the impact of Thailand's rice-pledging policy. The global trading rice price is an indirect cause setting the price of various types of Thai rice in a country. Because rice exporters are the first people to know the price direction from the global market, they attempt to negotiate with millers setting the rice price. Domestic rice price is forced to set a similar price with the export price.
During the same period of launching the Paddy Price Pledging policy, Thailand encountered an emergency disaster, the devastating flooding in 2011 which moved the first rank of being a global rice exporter from Thai-land to India because of the rising Thai rice price. The high price attracted farmers to grow more rice. Both the real market and stock held a massive quantity of Thai rice. Unluckily, the revenue of Thai rice decreased, as partners had the advantage of a bargain, knowing that Thai rice needed to be sold. Thailand was under pressure to sell in 2014. The Thai rice price dropped to 385.91 US dollars per ton in 2015 (Thai Rice Exporters Association, 2019), and all activity finally paused in 2016. The Thai rice price was then adjusted to the normal state compared to the price of Vietnamese rice. (Hoang and Meyers, 2015) Therefore, Thai rice price variation depends not only on change of time but also on impact factors. (BBC News, 2017) This paper aims to reveal the impact factors based on the probabilistic model, determining the causes of variation in the price of Thai rice. The relevant situations, such as the quantity of Thai rice for export, the Thai rice price, exchange rate, and other countries' rice prices, should be considered in detecting the variations in rice price. (Maneejuk et al., 2016) These results contribute to forecasting the Thai rice price for all stakeholders.

Data sample
Thus far, we have implicitly taken BN to demonstrate the causality of variation in the price of Thai rice due to many surrounding impacts. Every directed edge in a BN means the causal association between parent and child nodes. BN can further be used as a decision-making tool to deal with upcoming situations.
BN requires relevant data for Thai rice price model extraction. Thai paddy price, particularly the Thai Jasmine (Hom Mali) species, is a target to observe the causality of price fluctuation. There are two cultivation seasons that grow different yields. Eighty-five percent of rice grows between May and October and is harvested from August to April of the next year. This season is termed major rice which yields jasmine rice (Hom Mali rice) and parboiled rice. Another fifteen percent of rice grows during January to not later than April. It is called second rice which yields glutinous rice, non-glutinous rice, and indigenous species.(Office of Agricultural Economics, 2017) Accordingly, this paper selects the variables related to major rice to study the paddy price of jasmine rice.
This study's sample collects 11-years of data from 2008 to 2018, containing all relevant variables. As the price of Thai rice has become an interesting topic in economic and data science fields, variable selection has been taken from studying criteria in the following other literature. In economic studies, researchers focused on the relationships among rice prices or the competitive nature of rice in the market. The usability of various models (Baharom et al., 2009;Ghoshray, 2008) presented the asymmetric volatility of rice price, considering case studies in Thailand and Vietnam. They (Kha and Trinh, 2017) studied the surrounding issues of the Thai rice price between 2003-2013. In the data science area, many studies have evaluated some statistical techniques. (Sujjaviriyasup, 2018;Shao and Dai, 2018;Co and Boosarawongse, 2007;Gao and Lei, 2017) The ARIMA model has been a popular tool to present a prediction result focusing on the export quantity of Thai rice, Thai rice price, exchange rate, and price of rice in other countries. (Maneejuk et al., 2016) Based on the reasons above, there are 14 variables (including paddy price) with continuous values collected without missing the values of variables of interest. They are the plant area of major rice, the yield of major rice, minimum income of Thai citizen, paddy price of Jasmine rice, Jasmine rice price, export quantity of Jasmine rice, the export price of Jasmine rice, the export price of 5 percent white rice in Thailand, the export price of 5 percent white rice in Vietnam, the export price of 5 percent white rice in Pakistan, gold price, domestic oil price, US exchange rate with the Thai currency and Thailand's GDP of agricultural product. Since we learned that the paddy price could fluctuate due to domestic and global impacts, the selected 14 variables will be collected as relevant attributes. Domestic factors determine the relevant attributes in Thailand's economic mechanism, while global factors highlight competitors' prices on the same category of rice. Table 2 shows for each variable a longer definition (Meaning) and abbreviations (Variable names).

Exploratory Analysis with MIC
In this paper, we assume a probability model, BN, to understand the cause-and-effect relationship of factors that make Thai paddy price changes. The observed data, including target and predictor variables, is required for the learning structure process to bring out the complete BN. The structure model is learned by measuring the probability of variables' dependency without prior knowledge. The highest posterior probability distribution, given its data, is returned to confirm their relationships. However, there is required learning of many random variables to construct the model, which can cause a super-exponential growth of DAGs in the search space ∼ an NP-hard problem.
Our sample contains 14 variables prepared for constructing the model. The constraint-based and scorebased algorithms that belong to the structure learning are executed on the observed data to identify edges between variables. The optimized graph is judged by holding the highest posterior probability distribution. We found that the run time spent a long time finding the optimized model due to the NP-hard problem. Therefore, we first apply MIC algorithm to narrow down the variable space, selecting only the highest strength relationship to the paddy price of Jasmine rice. In exploratory data analysis, the process to rank the coefficient is done by measuring pairwise variables' dependence. The correlation coefficient is returned as a MIC score in Table 3. The results are taken from the first ten-ordered pairwise variables because the perfect and strong relationship (as in Table 1) has a high impact on the target variable. Unfortunately, MIC results cannot directly imply the cause-and-effect relationship because of their symmetric relationship. When the x variable connects to the y variable, MIC score is identical to when the y variable is connected to the x variable. MIC results diagram can be used preliminarily to connect relevant variables with undirected edges in Fig. 3. According to the results in Table 3, MIC score shows the perfect correlation of selected factors with the paddy price of Jasmine rice, the Jasmine rice price, and the export price of Jasmine rice. This means that the paddy price of Jasmine rice directly relates to both mentioned factors. The rest of MIC scores are not impractical. They can specify an indirect relationship to the paddy price of Jasmine rice. However, the perfect MIC score is found in the relationship of plant area and yield of major rice despite being unrelated to the paddy price of Jasmine rice. Although the export price of 5 percent white rice in Thailand, Vietnam, and Pakistan obtain a high MIC score, they are obviously irrelevant to the paddy price of Jasmine rice. These preliminary results of MIC have demonstrated the connections with the paddy price of Jasmine rice conclusively. It seems beneficial to identify a variable's scope and notice only the real relationship to the target domain.
Since MIC score cannot detect the cause-and-effect relationship, it is compulsory to assign orientations to all edges in the diagram. BN representation has been chosen to depict the paddy price of Jasmine rice variation in a graphical model. The correlated data is a parameterization of BN used for parameter and structure learning (Bae et al., 2016). Zhang et al. (Zhang et al., 2013) improved the heuristic search of structure learning using MIC. They also found the triangular loops relationship problem, needing to eliminate the loop by the d-separation rule. Then, the conditional independence test further achieved the assigning orientation in their work. Therefore, our work is an idea to improve and contribute a new method to MIC algorithm, assigning direction to the undirected edges determining the cause-and-effect relationship.

Proposed BMIC (method of PGM and MIC with BLR)
Assigning the orientation of edges to MIC diagram in Fig. 3 follows BN construction rule. Since the analysis causality of paddy price of Jasmine rice variation operates with continuous variables, both target and predictor variables occupy the time series data. We attempt to learn the cause-and-effect relationship between nodes from MIC result. In contrast, MIC results' symmetry makes both directions of pairwise variables imply the relevant association.
For analysis, we use the open-source python package for Bayesian statistic, PyMC3, with models and probabilistic machine learning using gradient-based MCMC algorithms. The BLR better supports the Bayesian inference, and it is mostly used in summarizing the posterior distribution by a central tendency (mean) and an uncertainty estimate (variance). The response, y, of a linear regression model, is generated from a normal distribution of the predictor variables mentioned above. The posterior distribution is used to determine the Bayes' theorem's conditional probability from the linear regression parameter. The linear model formula will be created from MIC results, which assumes as x ∼ y and y ∼ x. After that, the MCMC responses on model simulation perform the parameter estimation. The results are returned in a normal distribution (details of mean and variance) of the formula's intercept and predictor variables. Besides, an uncertainty measurement of the effect estimate requires the Cl or Crl. This experiment uses the HPD interval of 3 percent and 97 percent to determine the most probable parameter values. The interval's width represents the degree of uncertainty of the estimated model. The most precise linear model reveals the narrowest HPD interval.
All ten relationships are used to compute the posterior probability from the linear model. The MCMC algorithm is responsible for simulating the Bayesian model, which parameterizes from the identified linear formula. The observed data of parameters that belong to the linear function are compulsory to rely on the normal distribution. Each pairwise relationship's direction has been oriented by choosing the minimum HPD Crl according to the results in Table 4. The HDP Crls of every pairwise do not conform due to their highest posterior distribution.  Fig. 4 has been constructed to illustrate the direct and indirect effects on the paddy price of Jasmine rice following the HPD Crls of Table 4. We found four connections, divided into two groups of factors that are not relevant to the target. The first group is plant area and yield of major rice that return a perfect relationship. And the second group is the gold price and the export price of 5 percent white rice of Thailand, Vietnam, and Pakistan. However, we also found that the paddy price of Jasmine rice has clarified an effect by the export price of Jasmine rice. Also, the paddy price and export price of Jasmine rice impact the domestic jasmine rice price. Simultaneously, the US exchange rate with Thai currency and domestic oil price indirectly impact the paddy price of Jasmine rice. 5.2 The traditional structure learning experiment BN, provided by either traditional or applied methods, constructs the model. The observed dataset is a crucial attribute realized through PGM framework's learning process, returning the model from the data. The connected nodes are captured by the conditional probability of Bayes' theorem, determining the dependency values. However, there is an issue of time complexity, which is an NP-hard problem. BN candidates grow super-exponentially in the search space, needing more time to find the optimized model. Searching for the best model will be done using heuristic algorithms attempting to find a model with an optimal score. This study selects the Hill Climbing search, a well-known and most straightforward algorithm for comparing networks with the baseline. The Hill Climbing search has been denoted as a fast-searching algorithm of the maximum n variables, n(n − 1)/2 for the possible edges and 2(n(n − 1)/2) for a subset of edges in a sample space.

BN in
The traditional experiment is tested on both constraintbased and score-based learnings. All variables are taken the consideration in the learning structure. Firstly, the constraint-based algorithm illustrates a BN, as in Fig.  5. The conditional independence is done by correlation analysis, which uses the Chi-square dependency test for estimating the model skeleton. The orientation should then be assigned to each connection between nodes according to the information of separating sets: a set that contains the conditional independence of each pair of indirectly connected nodes. The skeleton model and separate set information can estimate the partial DAG (PDAG), in which the relationships might have bothway direction, x → y, and y → x. Since the fundamental of BN avoids the occurrence of a v-structure and cyclic graph, BN is intuitively constructed as a DAG from PDAG. BN of the selected case shows that Jasmine rice's export price and domestic price control the paddy price of Jasmine rice. In-depth, the domestic price of Jasmine rice is impacted by the export price of Jasmine rice before affecting the paddy price of Jasmine rice. It can be noticed that there is no other effect that connects to the export price of Jasmine rice. Thus, we can state that the export price of Jasmine rice is an initiator of the paddy price of the Jasmine rice variation. Secondly, in Fig. 6, the score-based algorithm shows a structure. The Bayesian estimator will estimate the best model of each learning process to return the optimal score. BN considers the Bayesian Information Criterion (BIC) measurement. It is a log-likelihood score with Dirichlet priors for manifesting how well the given observed data describe a model. The method adds a penalty for network complexity to avoid overfitting. The score-based structure describes that the paddy price of Jasmine rice can be changed from the export price of 5 percent white rice in Pakistan. The dependent relationship is a chain following the export price of 5 percent white rice in Pakistan, domestic oil price, US exchange rate with Thai currency, domestic price, and ending at the export price of Jasmine rice. Therefore, both learning algorithms of traditional PGM return different models on the change of paddy price of Jasmine rice.
From the results of the proposed BMIC and traditional PGM, the three models, Fig. 4 to Fig. 6, in- dicate the different impact factors that can cause the paddy price of Jasmine rice to change. It is noticeable that pairwise nodes have similar connections but distinct directions. The proposed BMIC can reduce the number of relevant variables by selecting perfect and high strength relationships. The benefits of choosing only relevant variables for learning BN structure are to avoid the NP-hard problem. The result is entirely satisfied due to the conformity with another research. The traditional method has then been tested to compare their performance and model accuracy with the proposed BMIC. As in constraint-based learning, ten connections are contained in the best-selected model. The graph is quite similar to the result of MIC (undirected edges). Only one extra relationship: GDP of agricultural product and a minimum income of Thai citizen has been added. After assigning the orientation, the different direction of cause-and-effect relationships implies the opposite effects of Jasmine rice paddy price change. In addition, the score-based learning gives 13 relationships in the constructed BN. The dependency relationships are more complicated than the proposed BMIC and constraint-based learning. The result reveals that the change in the paddy price of Jasmine rice can occur through the forwarding of many effects.
As a result, each algorithm's explicit model reveals a different BN pattern. Therefore, the best BN model should be the most appropriate representation manifesting the cause-and-effect factors when the paddy price of Jasmine rice changes. We select the BIC score to measure how well-described the constructed BN is given its data. The BIC score is a log-likelihood calculation with an additional penalty to prevent overfitting, returning in a negative value. The best model should be assigned with the highest score. The BIC scores in Table 5 prove that the proposed BMIC returns the highest score when comparing with the other two learning algorithms of traditional PGM. Hence, the proposed BMIC obtains the best-fitted BN to learn the structure of the paddy price of Jasmine rice. The constraint-based learning algorithm -1333.3 The score-based learning algorithm -1324.67

Conclusion
This paper utilizes BN framework's ability to interpret the model. In this study, BN framework, describes the causal model on the Thai rice price situation, focusing on changes in the paddy price of Jasmine rice. Since data uncertainty can be detected in the price of Thai rice and relevant attributes, the use of BN probability inference gives a better explanation. BN fundamentally establishes a model that relies on the Bayes' theorem by structural learning from the observed dataset. The NPhard problem is usually encountered in the structure learning process because of a super-exponential number of graphs in the search space. We scoped the number of variables using MIC algorithm to capture only high strength relationships in the Thai rice price system. However, MIC result cannot explain any target domain's causality due to undirected relationships in MIC results. Therefore, the edge orientation is determined by the BLR model, emphasizing the return of posterior probability distribution. The BLR model assumes the normal (Gaussian) distribution sample to formulate the linear regression. The model parameters of BLR, calculating beneath the Bayes' theorem, are given the inputs and outputs, of the linear model. The Bayes' theorem handles the probabilistic, non-deterministic model. Since BN demonstrates independent relationships of a given joint probability distribution, each connected node presents a conditional probability that results in the posterior probability. The posterior distribution is computed in both directions on dependent variables, x → y and y → x, from the undirected MIC results, resulting in the highest posterior probability distribution. The significant posterior probability distribution is evaluated by the Bayesian credible interval in which the HPD interval is selected to be a criterion. The orientation prefers the direction of dependent variables holding a minimum value of the HPD interval. From the estimated edge direction results, the proposed BMIC reveals the causes and effects of the paddy price of Jasmine rice in the form of BN model. We found that the export price of Jasmine rice can change the paddy price of Jasmine rice. This detail is confirmed in the study of (Wasawong, 2018), which describes exporters' power to foreknow the global rice market direction. Moreover, the model also indicates that the paddy price of Jasmine rice can influence the domestic Jasmine rice price. This becomes supportive information for launching policies and coping with the future situation because we can monitor trends in Jasmine rice's paddy price and its directed effects. The proposed BMIC is evaluated in reliability by comparing the network scores with a traditional PGM. The network score indicates how wellexplained the model is and how it fits the model. The score shows that the proposed BMIC obtains the highest score while the two traditional learning algorithms' scores are lower. Hence, this study improves the optimized model selection with a new BN structure learning using MIC results with the BLR model. When the cause-and-effect relationships of the paddy price of Jasmine rice are finally identified in BN, this provides a predictive model for our future work.