Framework for inferring empirical causal graphs from binary data to support multidimensional poverty analysis

Poverty is one of the fundamental issues that mankind faces. To solve poverty issues, one needs to know how severe the issue is. The Multidimensional Poverty Index (MPI) is a well-known approach that is used to measure a degree of poverty issues in a given area. To compute MPI, it requires information of MPI indicators, which are binary variables collecting by surveys, that represent different aspects of poverty such as lacking of education, health, living conditions, etc. Inferring impacts of MPI indicators on MPI index can be solved by using traditional regression methods. However, it is not obvious that whether solving one MPI indicator might resolve or cause more issues in other MPI indicators and there is no framework dedicating to infer empirical causal relations among MPI indicators. In this work, we propose a framework to infer causal relations on binary variables in poverty surveys. Our approach performed better than baseline methods in simulated datasets that we know ground truth as well as correctly found a causal relation in the Twin births dataset. In Thailand poverty survey dataset, the framework found a causal relation between smoking and alcohol drinking issues. We provide R CRAN package‘BiCausality’ that can be used in any binary variables beyond the poverty analysis context.


Introduction
B-SCM TRANSITIVE CAUSAL GRAPH INFERENCE PROBLEM: Binary MPI indicators are linearly associated with MPI index. More positive values in indicators implies more poverty issues, which results in higher MPI index. However, changing one MPI indicator might cause other MPI indicators to change.

Given binary data of indicators, the goal is to infer -SCM causal relations between variables, which can explain that whether any binary indicator causes other binary indicators to change.
Poverty is one of the fundamental issues that mankind faces. More than 100 million people are back into the extreme poverty line by living under the 1.25 USD per day during COVID19 pandemic [1]. Ending poverty in all its forms everywhere has also been recognized as the greatest global challenge in the 2030 Agenda for Sustainable Development. However, poverty alleviation often requires comprehensive measures depending on the ground-truth realities and the extent of each region's capability to tackle poverty issues. The first crucial and challenging step is to understand factors associated with poverty, and then to identify the root cause(s) of issues that give rise to poverty. One of the well-known measures for poverty is "Multidimensional Poverty Index (MPI)" [2,3], which has been proposed for estimating the degree of poverty in specific areas and populations. The MPI measures poverty beyond the aspect of monetary issues by including other factors such as deficiency in health, inadequate education, and truncated standard of living. The principle of MPI allows poverty-related factors and their weights to be adjusted according to the ground-truth realities in each region. The MPI index is an aggregate of MPI binary indicators that represent different aspect of poverty. Higher MPI implies more severe poverty issues in a given area. The value in MPI binary indicator is one if there is an issue and is zero when there is no issue. Typically, if a specific poverty issue is alleviated, then the corresponding MPI indicator is changed to zero, which makes MPI index has a lower value.
Despite the usefulness and flexibility of MPI, the focus has been primarily on 1) the degree of poverty from multiple indicators and 2) the contributions of each indicator toward poverty without any information regarding causal relations among indicators; changing one MPI indicators might cause other indicator to changes. In the worst scenario, solving one MPI indicator might cause other indicators more issues; which results in having higher MPI index.
Since MPI works only on binary data of MPI indicators and there are few studies concerning causal inference among MPI indicators, in this work, we focus on developing the framework to infer binary causal relations among binary variables to answer B-SCM TRANSITIVE CAUSAL GRAPH INFERENCE PROBLEM; whether changing one indicator causes others to change.

Related works
The scope of poverty issues is beyond monetary [4,3,5]. Poverty can relate with other factors such as social capital, homogeneity of population, in multiple ways [6].
To solve poverty issues, one needs to know how strong the issues are. Hence, MPI [2,3] was developed to measure the degree of poverty issues. The MPI is one of the well-known tools that supports policy makers (e.g. poverty measure for policy assessment [7]) to combat poverty in many countries (e.g. South Africa [8], China [9], Iran [10], Latin America [11]).
The MPI index can be measures using binary MPI indicator, which represents different aspects of poverty issues. MPI indicators typically measure factors that might cause poverty. Poverty can be caused by many factors such as health issue [12,13], education issue [14], income inequality [15], etc. Understanding causal relationships is a key step for designing effective policies to combat poverty [13].
Solving one aspect of MPI indicator might make the MPI index decreases. However, it is not the case if solving one MPI indicator causes other MPI indicators to be active, which might even cause MPI index increases [16]. Typically, many MPI indicators are correlated [16], but it is not clear whether how indicators interact with each other or whether they are compliments or substitutes [17][18][19]. To decide which indicators should be alleviated, policy makers must understand causal relations among MPI indicators and poverty issues. Nevertheless, it is still no consensus regarding how to infer causal relations among poverty variables [16].
Currently, in the era of big data, the massive amount of data is used to alleviate poverty issue [20]. One of the fields that utilizes big data to get insight from data is Causal inference. Causal inference plays a key role for explanation, prediction, decision making, etc. [21,22]. It reveals causal relations between variables/factors, which leads to the understanding of influence among variables. In policy making, causal inference can be used to estimate outcomes of policy change [23] and to support policy designing [13].
In the recent works of causal inference on binary variables, the work in [24] uses frequent pattern mining to infer causal relations called "causal rule" from discrete variables using the concept of odd ratios. The framework is consistent with the potential outcome framework [25,26] in the causal inference [24]. However, the framework assumes that directions of causal relations are given. In the related field, Bayesian network [27][28][29][30], the work in [28,29] provides a software in a form of R package "bnlearn" in the Comprehensive R Archive Network (CRAN) [31] that can be used to learn network structures in general, which is suitable for inferring causal networks.
Recently, the work in [16] inferred Bayesian networks from census data to analyze causal relations between multidimensional poverty components and violence. They also used "bnlearn" to infer causal graphs.
To the best of our knowledge, there is still no work of causal-inference framework based on structural causal models on binary variables utilizing estimation statistics, which are able to provide magnitudes of difference between groups (e.g. cause and effect) [32], and is capable of inferring causal directions with degree of causal direction in form of confidence intervals. By knowing a confidence interval of degree of causal-direction, not only we know the causal relation, but we also know how strong the causal relation is.

Our contributions
To fill the gap, in this work, we formalized the definition of structural causal model on binary variables and proposed a framework to infer causal relations from binary data using estimation statistics technique. Our framework is capable of: • Inferring the causal graph: inferring causal relations among binary variables in a form of a causal graph using frequent pattern mining on non-parametric hypothesis testing; and • Inferring magnitude of difference in term of confidence intervals: inferring dependency, association, and degree of causaldirection in forms of confidence intervals using estimation statistics. We validated our framework on simulation data by comparing the proposed method with baseline approaches. We demonstrated the application of our framework on inferring causal relations of mortality, birth weights, and other risk factors in the U.S.twins dataset and causal directions of poverty indicators from the datasets of hundred thousands of Thailand households to support data analysis in poverty from two provinces. Although, the results we provided in this work are from two provinces, the framework is able to perform the analysis in every province in Thailand. Since the data structure of variables of MPI indicators are similar across the nation although the issues and related information for each region might be different, the framework has no issue to analyze data from any region. The proposed framework can be utilized on binary data beyond the field of poverty causal inference.

Objective and hypotheses
In this work, the main objective is to develop a framework to infer causal relations among binary variables. The framework is designed to be applied in the poverty analysis in order to find casual relations of MPI indicators. We have two research questions with two pairs of null/alternative hypotheses we need to address by using our framework as follows.
• Does each pair of binary variables have dependency? 0 : there is no dependency. 1 : there is dependency.
• If a pair of binary variables has dependency, then, does this pair of binary variables also has a causal relation? 0 : there is no causal relation. 1 : there is a causal relation.

Surveys of poverty of Thailand
The survey of poverty used in this paper was from the work in [5]. The survey was taken in 2018 by Ministry of Interior of Thailand. The main purpose of the survey is to collect information on poverty issue that represent by MPI indicators along with other information that can be used later by policy makers. The data is currently utilized under the Thai People Map and Analytics Platform (www.TPMAP.in.th) project under the collaboration of National Electronics and Computer Technology Center and Office of the National Economic and Social Development Council to address three questions: 1) where poor people are, 2) what issues the poor people face, and 3) how policy makers can help them.
The number of household for Chiang Mai province in the survey was 378,466 households, while it was 353,910 households for Khon Kaen province. The survey was conducted for the purpose of analyzing of multidimensional poverty index (MPI) [2,3].
In the aspect of MPI, the surveys collected 31 MPI indicators that represent five main aspects of poverty to compute MPI index 0 (see some indicators in Table 1). For each individual, if he/she has an issue with a given indicator (e.g. he/she has less income than a specific threshold for an income indicator), then an indicator has value 1, otherwise it is 0. The surveys were processed by transforming each answer of a specific issue in the surveys using a set of criteria to be a binary MPI indicator. For example, in one of the indicators of "Health", the question is "Did the newborns in the house weigh above 2.5 Kg?". If the answer is "No", then the corresponding MPI indicator is one. Otherwise, it is zero. For more details regarding criteria for other indicators, please visit www.TPMAP.in.th.
The degree of individual deprivation is computed by counting a number of indicators that a person has poverty issues divide by a number of total indicators. Then, if a person has greater than a specific threshold (varying from country to country), then is considered to be a deprived person. Given 0 ∈ [0, 1] is a ratio of deprived people within total populations, 0 is average degree of individual deprivation within a deprived population. The MPI index can be computed as follows: The index 0 ∈ [0, 1] in Eq. (1) represents the degree of poverty deprivation in a given population. MPI is close to 0 when there is no poverty in any indicators, while it is close to 1 if everyone has issues in almost all indicators. Hence, lower MPI is better.
After knowing the MPI index of each area, policy makers can realize how severe deprivation issues each area is for the entire nation by analyzing MPI indices. The policy makers also know which aspect of deprivation each area has from MPI indicators. With MPI index and MPI indicators from the entire nation, policy makers have answers for 1) where poor people are, and 2) what issues the poor people face. Then, they can plan to solve the last question: 3) how policy makers can help them.
By having policy to alleviate a specific MPI indicator, the poverty can be alleviated in a specific aspect, which results in reducing MPI index 0 . However, the impacts of solving one MPI indicator among other MPI indicators still remain; whether solving one indicator causes another indicator to be solved/ to have more issue. In this work, we focus on the remaining question of causal relations among MPI indicators.

Twin births of the United States
Infant mortality can be a predictor of poverty [33]. By understanding causal factors of infant mortality, policy makers might be able to understand more regarding poverty situation in areas. This dataset consists of several variables regarding pairs of twins, birth weights, the mortality outcome, etc., from the Twin births of the United States in 1989-1991. There are 71,345 pairs of twin in the dataset. The dataset was used in [34], which was included in the literature survey work in [35].
In this work, since we are interested only in inferring of causal relations in binary variables, we reformat the dataset and use only binary variables: birth weights of twins, and the mortality outcome along with other risk variables. For the birth weight, the value is one if at least one of the twin has the weight below or equal 1000 grams. Otherwise, it is zero. For the mortality outcome, one represents the twin being death and zero represents being alive. There are also other parent's risk-factor variables we included in the analysis: alcohol use, Anemia, Cardiac, chronic hypertension, Diabetes, Eclampsia, Hemoglobinopathy, Herpes, Incompetent cervix, Lung, Preqnancy-associated hypertension, tobacco use, and Uterine bleeding. All risk-factor variables are one if there is any risk, otherwise, they are zero.
Our goal is to use the dataset to evaluate whether the framework is able to reveal the causal relation of birth weight and twin mortality.

Methods
In this section, the details of proposed framework for inferring causal relations among binary variables are provided. The reasons we choose to study and develop the framework for binary variable rather than other types of variables because MPI index requires only binary indicators for computing the index; MPI cannot take multinomial or real-number variables as MPI indicators. However, it is not clear how each MPI indicator impacts each other. Hence, the main focus on this work is to develop the framework of causal inference that works on binary variables.
Given a dataset  = { ⃗ 1 , … , ⃗ } where ⃗ = ( ,1 , … , , ) is an th vector of realizations of random variables 1 , … , , the main purpose of this work is to provide a solution for B-SCM TRANSITIVE CAUSAL GRAPH INFERENCE PROBLEM 4 by inferring a transitive causal graph ̂= ( , ̂) from . In the context of poverty analysis,  can be represented as an × matrix where rows represent households and columns represent poverty factors or MPI indicators. The output of the framework is the adjacency matrix of a causal graph among poverty factors. In the context of MPI, the matrix contains information of causal relations between MPI indicators; which indicators cause other indicators to changes when they change.
is an th vector of realizations of random variables 1 , … , in ℭ output : 4 Suppose a binary vector ⃗ has a decimal number 3 Counting a number of binary vectors in  * s.t. th-′ th bits are equal to , … , ′ . 4 Divide the counting number above by the size of  * and keeps this ratio as ̂( = | = ). 5 Return ̂( = | = ) Fig. 1 illustrates an overview of the proposed framework. In the first step, the framework performs "Bootstrapping" to generate . Then, it aligns data in  using Algorithm 1. The purpose of these two steps is to infer patterns of strong association relations among binary variables and to prepare data for the next step.
Afterwards, the framework infers a transitive causal graph ̂u sing Algorithm 3, which deploys several statistics that derived from . The core of statistical estimation in the framework is the estimation of conditional probability using Algorithm 2. This step infers causal relations from asymmetry of association direction between binary variables; changing one variable can change another but not vice versa.
To increase readability of notations, in a directed graph, we use ← ← → to represent that there is a directed edge from to , and ← ← ← for a directed edge from to . We also use ← ← → to represents causes in a causal graph. We also write ⟂ ⟂ if , are statistically independent as well as using ⟂ ∕ ⟂ to represent that , are statistically dependent.

Inferring empirical conditional dependency and probability
In this part, we build a function to estimate conditional dependency and probability among binary variables. To infer whether two variables , are statistically independent given or ⟂ ⟂ | , we can check the following statement: In Eq. (2), if | ( , | ) − ( | ) ( | )| = 0, then we can conclude that ⟂ ⟂ . Otherwise, ⟂ ∕ ⟂ . However, in real datasets, if the distributions that generate the data are unknown, we cannot access to compute the probability ( ) directly. In the data mining community, the concept of support and confidence [36][37][38] might be used to estimate the probability of any given event. Before computing conditional probability using support and confidence, we need to align dataset  using the Algorithm 1. After aligning vectors, we can compute estimate probability and conditional probability using the Algorithm 2.
Let ̂( = ) be an estimate probability of = estimated by support and ̂( = | = ) be an estimated conditional probability of = given = estimated by confidence. We can have the following equation to compute the degree of dependency between , given .
Where , , are any possible binary values. For a degree of conditional dependency, we can estimate it using the equation below.̂( Where () is an absolute function. In both Eq. (3) and Eq. (4), , are independent if the value is close to zero.

Inferring empirical association
In this part, we assess whether changing one binary variable turning other variables to change in which of three directions: positive, no change, or negative. If it is a positive direction, then two binary variables trend to have the similar values. If it is negative, it implies two variables trend to have an opposite binary value. No change implies there is no pattern whether having a specific value for one variable implies having a specific value in another variable. To find a direction of association between variables, the first method is the Odd Ratio.
The second method is called the Odd Difference, which is an alternative of the odd ratio in Eq. (5), can be defined below.
has a negative association. There is no association if oddDif f ( , ) = 0

Inferring empirical causal direction
After we check that there is no variable s.t. ⟂ ⟂ | using Eq. (3). In Algorithm 3, the next step to check whether ← ← → is to check their estimated conditional probability.We approximate the probability below.

Hypothesis tests and estimation statistics
In this part, we focus on inferring dependency, association direction, and causal relations among binary variables using both hypothesis testing and estimation statistics.
In the aspect of hypothesis testing, suppose the null hypothesis 0 ∶ = 0 while the alternative hypothesis 1 ∶ > 0, we can test either 0 or 1 is supported by ′ using the sets of data from bootstrapping ′ : ′ 1 , … , ′ . However, there are several disadvantages of using the hypothesis testing alone as follows: 1) the hypothesis testing provides only either 0 or 1 is supported by data, but there is no information regarding the magnitude of summary statistics we estimate [41], 2) the hypothesis testing always rejects 0 in some system even the effect might be too small [42], 3) the hypothesis testing faces the problem of repeatability [43].
To address these issues, "estimation statistics" has been developed, which is considered as a methodology that is more informative than the hypothesis testing [44][45][46]32].
In the aspect of estimation statistics, the sets of data from bootstrapping ′ : ′ 1 , … , ′ can be used to estimate 100 * (1 − )% confidence interval (CI) of . Moreover, if we have two datasets ′ and ′ , we can compare the magnitude of difference between ′ and ′ using mean-difference CI. Given is an th vector of realizations of random variables 1 , … , , and a number of bootstrap replicates , we generate  = { ′ 1 , … ,  ′ } from bootstrapping. Then, we use  to estimate the following quantities.

Dependency between ,
given : For each pair of variables , , we can infer Let ℑ be the expectation of ℑ. The null hypothesis 0 ∶ ℑ = 0, while the alternative hypothesis 1 ∶ ℑ > 0. We use Mann-Whitney test [47], which is a nonparametric test, to determine whether we can reject 0 with the significance level = 0.05. If 0 is rejected, then we can conclude that ⟂ ∕ ⟂ | . In the aspect of estimation statistics, we report the 95%-CI of ℑ .
Let be the expectation of . We use Mann-Whitney test [47] to determine whether we can reject 0 ∶ = 0. If 0 is rejected, then we can conclude that the alternative hypothesis 1 ∶ ≠ 0 is supported. After rejecting 0 , , has a positive association if > 0, otherwise, for < 0, , has a negative association. We also report the 95%-CI of . 3. Causal direction causalDir( , ): We compute = {causalDir 1 ( , ), … , causalDir ( , )} on Eq. (7) from . Let be the expectation of . The null hypothesis 0 ∶ = 0, while the alternative hypothesis 1 ∶ ≠ 0. If we cannot reject 0 , then there is no conclusion regarding the causal direction of , . In contrast, suppose 0 is successfully rejected, We also report the 95%-CI of .

The proposed algorithm for inferring binary causal relation from binary indicators
After having all functions we need to estimate causal relations, we propose Algorithm 3 to solve B-SCM TRANSITIVE CAUSAL GRAPH INFERENCE PROBLEM. Specifically, given binary data of indicators, the goal of the problem is to infer -SCM causal relations between variables, which can explain that whether any binary indicator causes other binary indicators to change. See Theorem A.4 for details of the proof that the algorithm provides the solution for the problem.
Briefly, Algorithm 3 takes binary data to assess association relations and directions among binary variables using the methods in Section 4.1 and Section 4.2 respectively. Then, the algorithm assesses statistical significance of these association relations and directions using methods in Section 4.4. After having significant association relations and directions, the causal relations are estimate using the function causalDir( , ) in Section 4.3. Afterwards, the inferred causal relations are tested for the statistically significance by the method in Section 4.4. Finally, the algorithm reports all outputs that are related to causal relations and their by-product results.

Time complexity
Given is a number of data points, is a number of dimensions, and is a number of bootstrap replicates, for the Vector alignment in Algorithm 1 and the Conditional probability estimation in Algorithm 2, both require ( ).
To check whether ⟂ ∕ ⟂ and any independence check, it requires ( ) = ( ) for the bootstrapping approach of which its replicates are needed to estimate the conditional probability in Eq. (3). The is typically considered as a constant number. In the Algorithm 3, it requires ( 2 ) for line 1-6. For the line 7-16, it also requires ( 2 ) since the number of edges is bounded by ( 2 ) and the operation of Independence checking is ( ). For the line 17-26, it also requires ( 2 ), which has the same reason for the number of edges and the operation to compute the conditional probability requires ( ). Hence, the Algorithm 3 has the time complexity as ( 2 ).

Simulation data
In the first simulation, there are 10 poverty indicators. Let 1 , … , 10 be random variables of poverty indicators, be a probability of a random variable being 1, and is a random variable that has ( = 1) = . The following equations (Eq. (8), (9), (10), and (11)) represent the directed causal relations of these random variables.
For each individual, if the value is one in the indicator , it means this individual has a poverty issue in the indicator . In the first simulation, data is generated using ∈ {0.5, 0.3, 0.1, 0.05}, which has 500 individuals for each value. In the second simulation, data is generated varying number of individuals ∈ {50, 100, 150, 300, 500, 750, 1000}, which has = 0.3.

Baseline methods and performance measure
To the best of our knowledge, there is no direct method that deals with causal inference from binary variables using frequent pattern techniques except the work in [24]. It uses the Frequent Pattern Mining to infer causal relations called "Causal rule" from discrete variables using the concept of odd ratios. The framework is consistent with the potential outcome framework [25,26] in the causal inference [24]. However, the causal-rule framework in [24] assumes that the causal directions are given. Therefore, we modified the causal rule framework to be able to infer causal direction using the same approach as our framework.
For the Bayesian network, we deploy the PC algorithm [30], which is a first practical constraint-based structure learning algorithm from the "bnlearn" package in R [28,29]. The PC algorithm is designed for inferring causal structure from data, which is suited in our task of causal inference in this paper.
Another baseline approach is the Frequent-pattern approach that can be applied in data from binary variables. This approach utilizes the support and confidence in association rule mining directly to find causal relations. For example, if the confidence of Y given X is higher than X given Y, then X causes Y.
We compare all methods with the tasks of 1) inferring Transitive causal graph and 2) inferring Directed causal graph. In the task of inferring the Transitive causal graph, if X causes Y and Y causes Z, then inferring that X causes Z is acceptable. However, in the 2) task, all methods must be able to infer that X causes Y directly but X does not cause Z directly.
We measure the performance of all methods using simulation datasets by comparing the inferred causal graphs from both tasks with the ground truth graph using precision (Pre), recall (Re), and F1 score. The true positive (TP) is the case when a causal relation or causal edge (e.g. X causes Y) exists in both inferred and ground-truth graphs. The false positive (FP) is the case when the causal edge exists in the inferred graph but never exists in the ground-truth graph. The false negative (FN) is the case when the causal edge exists in the ground-truth graph but never exists in the inferred graph. The precision is a ratio of TP/(TP+FP), the recall is a ratio of TP/(TP+FN), and F1 score is a ratio of 2(Pre*Re)/(Pre+Re).

Results
In this section, the results of our proposed approach were reported using several datasets in order to illustrate that 1) our framework performed well against baseline approaches (Section 6.1), 2) our framework was able to retrieve causal relations in a real-world dataset (Section 6.2), as well as 3) our framework was able to infer none-trivial causal relations of MPI indicators in Thailand poverty surveys (Section 6.3).
In Section 6.1, the results of performance of our framework compared to several baseline approaches using simulation datasets that the ground truth was known were reported. Then, in Section 6.2, the Twin-births-of-the-United-States dataset was used to illustrate that our framework was able to retrieve causal relations, which are consistent with the ground truth in the literature. Finally, in Section 6.3, the Thailand datasets of poverty surveys are used to demonstrate the application of our framework that can support policy makers to alleviate poverty issues by inferring causal relations among MPI indicators.

Simulation results
In this part, the results of inferring causal relations in simulation datasets are reported. Briefly, the results were from four methods: 1) Causal rule method, 2) Frequent pattern, 3) PC algorithm, and 4) Proposed method. There were two tasks for measuring performance of causal inference: A) inferring a transitive causal graph and B) inferring a directed causal graph.
For inferring transitive causal graphs, the task is to infer whether any X and Y variables have any directed and/or indirected causal relations. In contrast, the task of inferring directed causal relations considers to find whether any X and Y have only directed causal relations.

Table 2
The result of inferring transitive causal graphs by frequent pattern, Causal Rule, Bayesian Network, and proposed methods in simulation varying with = 500. The red color is the better results in term of F1 between two methods with the same simulation dataset.  Table 3 The result of inferring directed causal graphs by frequent pattern, Causal Rule, Bayesian Network, and proposed methods in simulation varying with = 500. The red color is the better results in term of F1 between two methods with the same simulation dataset. According to the results, the Frequent pattern performed slightly better than others in the task of inferring transitive causal graphs while our proposed method performed better than others in the task of inferring directed causal graphs. Below are elaborate details of the results.
Results of performance of four approaches in simulation with different levels of (the probability of variable being 1) are in the Table 2 and 3. For the task of interring transitive causal graphs (Table 2), based on the F1 scores, the Frequent pattern approach performed the best, while the second and third performers were our approach and Causal Rule respectively. The last performer was the PC method. In the high value of , all approaches performed the best; the F1 score is equal to 1. However, when the decreases, only Frequent pattern approach performed well.
In the task of inferring directed causal graphs (Table 3), however, the Frequent pattern approach performed the worst, while our approach performed the best. When the decreases, only our approach performed well.
Results of performance of four approaches in simulation with different number of individuals are in the Fig. 3 and 4. For the task of interring transitive causal graph (Fig. 3), based on the F1 scores, the Frequent pattern approach performed the best, while  the second performer was our approach. The third performer was the Causal rule method. The last one was the PC algorithm. In the high value of , all approaches performed the best; the F1 score is equal to 1. However, when the decreases, only Frequent pattern approach performed well. In the task of inferring directed causal graph, however, the PC algorithm and Frequent pattern approach performed poorly, while our approach performed the best. When the decreases, only our approach performed well. The result in Fig. 4 is consistent with the result in Table 3. Fig. 2 illustrates the results of inferring directed causal graphs from four methods. The proposed method (Fig. 2 D.) inferred the correct directed causal graph. The Frequent pattern method (Fig. 2 B.) inferred a causal graph that cannot distinguish between directed and indirected causal relations. For example, in Eq. (11), 6 is directly caused by 1 , 4 and indirectly caused by 2 , 3 (Eq. (9)) and 2 , 5 (Eq. (10)). However, in Fig. 2 B., all types of causal relation appear in the graph inferred by the frequent pattern method. In Fig. 2 A. and C., the inferred directed causal graphs of Causal rule method and PC algorithm are shown. Both methods were able to distinguish between directed and indirected causal relations. Nevertheless, the Causal Rule missed two causal relations:

Case studies: twin births of the United States
Given is a variable of status of twin birth weights (one if the weight of either child below 1000 grams and zero otherwise), and is a variable of twin mortality status (one if both children are dead and zero otherwise) along with other parent's risk-factor variables, the result of causal inference of the proposed framework is below. Briefly, 1) the dependency of , was report that it existed, 2) the association direction of , was reported that , was positively correlated, and 3) the causal relation ← ← → was found in the dataset.
In the aspect of dependency, only the causal relation of birth weight and the mortality of twins exists. There is a dependency between and . The 95th percentile confidence interval of the degree of dependency ̂( ⟂ ∕ ⟂ ) in Eq. (3) is [0.018, 0.020]. The Mann-Whitney test reject the 0 that ̂( ⟂ ∕ ⟂ ) = 0 with the significance threshold at 0.05, which implies there exists a dependency between and . In the aspect of correlation direction, the Mann-Whitney test reject the 0 that oddDif f ( , ) = 0 with the significance threshold at 0.05. The 95th percentile confidence interval of oddDif f ( , ) in Eq. (6) is [0.019, 0.021], which implies a positive association.
It implies that almost all mortality in twins had issues of low birth weights, but not all low-birth-weight twins were died. No other risk variables have strong causal relations. This result is consistent with the work in [49] that the low-birth-weight issue has smaller effect on twin mortality than previous belief; it is not a sole cause of birth mortality. While the low-birth-weight issue plays a key role in twin mortality, other confounding factors (e.g. genetic) might contribute significant effect on twin mortality [49].

Case studies: Thailand poverty surveys
In this section, the Thailand poverty surveys were used to find causal relations among 31 MPI indicators from two provinces: Khon Kaen province and Chiang Mai province.
In the aspect of dependency, briefly, among 31 MPI indicators, only dependency of smoking cigarette and drinking alcohol was found.
In Khon Kaen province, there is a sole dependency between smoking cigarette 25  In the aspect of correlation direction, the Mann-Whitney test reject the 0 that oddDif f ( 24 , 25 ) = 0 with the significance threshold at 0.05. The 95th percentile confidence interval of oddDif f ( 24 , 25 ) in Eq. (6) is [0.060, 0.062], which implies a positive association.
In the aspect of causal direction of Chiang Mai province, the Mann-Whitney test reject the 0 that causalDir( 25 , 24 ) = 0 with the significance threshold at 0.05. The 95th percentile confidence interval of causalDir( 25 , 24 ) in Eq. (7)  This implies smoker trends to drink alcohol but not vice versa in Chiang Mai province. The MPI of Khon Kaen is 0.018 while the MPI of Chiang Mai is 0.024. This implies Chiang Mai has a higher degree of poverty than Khon Kaen's. According to the result of Chiang Mai province, since smoking causes drink alcohol, by alleviating the smoking issue, the alcoholic consumption issue might be alleviated, which makes MPI index decreases. For Khon Kaen province, since there is a dependency between smoking and alcohol drinking but no causal relation, there might be confounding factors of both variables that were unable to be measured and existed outside the dataset. Hence, in Khon Kaen, by alleviating either issue of smoking or alcohol drinking, it might not alleviate another issue.
In literature, it is not surprised that smoking associates with drinking alcohol [50,51]. However, due to the nature of results from exploratory data analysis, the smoking and drinking alcohol causal relation in this study can be considered as a guideline of possible causal relation and it is needed to be validated in an experimental study.

Discussion and limitation
In the simulation, our proposed method performed well compared against several baseline approaches.
Briefly, the results indicated that the Frequent pattern is a proper method for the task of interring transitive causal graphs, which is simpler than the task of inferring directed causal graphs, while our proposed approach is more appropriate for the task of inferring directed causal graphs. If the task is about inferring directed causal relations, our approach should be used in binary data.
Frequent pattern method infers causal relations by only using patterns of pairs of variables either being active together or being the opposite in data without any mechanism to check confounding factors or checking the robustness of inferred relations. Even though the method is simple, it performed well in the task of inferring transitive causal graphs. On the other hand, other methods that are more sophisticated performed slightly poorly compared against the Frequent pattern method. This implies that there is no need for complicated mechanism to detect transitive causal relations.
Since the task of inferring directed causal graphs is more challenging than the task of inferring transitive causal graphs, it is no wonder that the simple method like the Frequent pattern was unable to perform well in this task. To detect a direct causal relation, it requires that we have to know whether there are any confounding factors between two variables that are correlated. If it is a case, then, two variables might not have causal relation; they are just associated via their confounding factors. This is why our method equips the confounding-checker mechanism (Algorithm 3 line 7-16). Additionally, estimation statistics supports the robustness of inferring any kinds of relations. PC algorithm, Causal Rule, and our proposed method have confounding-checker mechanism. However, only our method utilizes estimation statistics to enhance the robustness of our statistical inference. By utilizing both confounding-checker mechanism and estimation statistics, our method performed the best in this task.
In the twin of the USA dataset, the results indicated that almost all mortality in twins had issues of low birth weights, but not all low-birth-weight twins were died, which is consistent with the work in [49] that the low-birth-weight issue has smaller effect on twin mortality than previous belief. It is not a sole cause of birth mortality. While the low-birth-weight issue plays a key role in twin mortality, other confounding factors (e.g. genetic) might contribute significant effect on twin mortality [49].
In the Thailand poverty surveys, the results indicated that, among 31 MPI indicators, there was only a dependency of smoking cigarette indicator and drinking alcohol indicator in both Khon Kaen and Chiang Mai provinces, which is consistent with the literature that smoking associates with drinking alcohol [50,51]. Only a causal relation of the smoking causes a drinking alcohol issue was found in Chiang Mai province. The existence of dependency of smoking and alcohol consumption without its causal relation in Khon Kaen might imply that there were confounding factors of both MPI indicators existed outside the dataset.
For Chiang province, the policy makers might attempt to de-couple both issues by expanding smoke-free areas around places that sell alcohols (e.g. bars, pubs, restaurants) [51]. By not allowing smoking in public areas, among moderate-and-heavy-drinking smokers, the smoke-free policy was associated with the reducing of drinking behavior in pubs [52]. By solving the smoking issue, the drinking alcohol issue might also be alleviated in Chiang Mai, which results in reducing of MPI index.
In term of limitation, causal relations inferred by this work are not the real causal relations. They are empirical causal relations that needed to be validated and incorporated to support policy making process. We also made many assumptions to make it possible to infer causal relations, which might not be true in some situations. See Section A.1 for more details of related assumptions in causal inference that we made. Hence, our main goal of this research is to develop an exploratory data analysis tool to pinpoint possible causal relations to support researchers before the validation in the field studies to find real causal relations.

Conclusion
MPI is a well-known poverty measure that covers multidimensional aspects of poverty beyond monetary. MPI index requires binary MPI indicators that represent different aspects of poverty in order to compute its value. While focusing on each MPI indicator might reduce MPI index, however, solving a specific MPI indicator might lead to changing other MPI indicators or even causing MPI index increases. Moreover, there is no consensus regarding how to infer causal relations among binary indicators.
In this work, we proposed an exploratory-data-analysis framework for finding possible causal relations among factors that contribute to poverty from similar data sources that are used in MPI analysis. By combining causal graph and MPI, not only we know how severe the issue of poverty is, but we also know the causal relations among poverty factors, which can help us to target the right issues to solve poverty effectively.
We evaluated the proposed framework with several baseline approaches in simulation datasets varying degree of noise and number of data points. Our framework performed better than baselines (Frequent pattern and Causal rule methods) in most cases.
The first case study of Twin births of the United State revealed that almost all mortality cases in twins had issues of low birth weights but not all low-birth-weight twins were died. The second case study revealed that smoking was associated with drinking alcohol in both provinces. While there was no causal relation in Khon Kaen province, there was a causal relation of smoking causes drinking alcohol in Chiang Mai province.
Note that the causal relations inferred by this work are not the real causal relations; they are empirical causal relations that needed to be validated. Our main goal is to develop an exploratory data analysis tool to pinpoint possible causal relations to support researchers before the validation in the field studies to find real causal relations.
The framework can be applied beyond the poverty context. Lastly, the framework in this work has already been implemented in R programming language [31] in a form of R package "BiCausality" [53]. The official link for BiCausality at the Comprehensive R Archive Network (CRAN) can be found at https://cran .r -project .org /package =BiCausality.

CRediT authorship contribution statement
Chainarong Amornbunchornvej: Conceived and designed the experiments; Performed the experiments; Analyzed and interpreted the data; Wrote the paper.
Navaporn Surasvadi: Conceived and designed the experiments; Analyzed and interpreted the data; Wrote the paper. Anon Plangprasopchok: Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data; Wrote the paper.
Suttipong Thajchayapong: Analyzed and interpreted the data; Wrote the paper.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
Data will be made available on request. In other words, from the Definition 2, every variable and all other variables are -separated by 's directed causes or 's parent nodes in a causal graph.
This assumption guarantees that only directed causes are all we need to know to understand behaviors of an effect variable.

A.1.2. Faithfulness assumption
In the Causal Markov assumption, all pairs of variables are independent given their -separation set that blocks all paths between them. However, there might be some independence relations that are not the results of -separation. To guarantee that we can find all causal relations in a causal graph from data, we need Faithfulness assumption. has an independence relation, only independence relations inferred by -separation in exists in the probability distribution over . This implies that an independence relation of , can be found by -separation.
This assumption guarantees that all independence relations can be found from data by -separation. This eliminates independence relations that might occur by chance that are not complied with the structure of a causal graph.

A.1.3. Causal sufficiency assumption
For the Causal Markov assumption, we guarantee that -separation can be used to find independence relation. It links the connection between a causal graph and a probability distribution. For Faithfulness assumption, only independence relations found by -separation exists, which implies that there are no any other independence relations that are not complied with a causal graph . The Causal sufficiency links the connection between a causal graph and what we can measure and exists in data.

Definition 4 (Causal sufficiency)
. Given an SCM model ℭ = ( , ) with a directed graph = ( , ) where = { 1 , … , } is a set of variable nodes and is a set of causal directions. For any variable ∈ , all variables of 's directed causes are measured and exist in data.
Even though this assumption is strong for many cases, without real experience, the Causal Sufficiency is a crucial one that makes it possible to infer causal relations from data. Without this assumption, there are many possible causal graphs that exist and conflict with a given causal graph that can yield the similar result of statistical inference. The Definition 6 represents a case when one of many root causes can impact the effect significantly. For example, in poverty, only one of many factors, such as lack of education, disability, lack of accessing health care can harm poor people significantly. This inspires us to use Definition 6 in this work.

A.2. Problem formalization
Suppose we have a dataset  = { ⃗ 1 , … , ⃗ } that was generated from b-SCM ℭ where ⃗ = ( ,1 , … , , ) is an th vector of realizations of random variables 1 , … , in ℭ. However, the equations in of ℭ are unknown to us. In this work, we are interested in finding both directed and indirected causes of any variable . Hence, we define a transitive causal graph to represent this idea. Assuming that there is no confounding factors outside variables in ℭ, we can formalize the following problem for inferring the transitive causal graph ̂o f ℭ.
In the backward direction, given any ̂= ( , ̂) of ℭ, we show that ̂i s inferred by Algorithm 3. Suppose there is a transitive causal graph ̂′ = ( , ̂′ ) of ℭ s.t. ̂≠̂′ and Algorithm 3 cannot infer ̂′ . Assuming that the variables of two graphs are the same. There is only one possibility: ̂≠̂′ . Suppose ( , ) ∈̂but ( , ) ∉̂′. This means causes in ̂b ut is not a cause of in ̂′ . However, this is impossible since both ̂, ̂′ are transitive causal graphs of the same ℭ; if ( , ) ∈̂, then ( , ) must be in ̂′ . Hence, it establishes the contradiction that ̂=̂′ and the Algorithm 3 provides the unique solution. □