DEFINING AND ANALYSIS OF MULTIMORBIDITY PATTERN OF DISEASES USING MARKOV RANDOM FIELD APPROACH: A COMPARATIVE ANALYSIS

. Aim: Multi-morbidity remains poorly understood due to the multifactorial complexity of this phenomenon and the lack of a standardized methodology for building and analysing Multimorbidity network. A comparative analysis of methods of modeling Multimorbidity network in literature may help to understand the pros and cons of these methods, then to facilitate a consensus about a standardized methodology. We propose to study two approaches for building Multimorbidity network focusing in their technical speciﬁcities. Subject and Methods: We propose to model Multimorbidity using Ising Model, a Markov Random ﬁeld based approach, and to compare its performance to the approach consisting in building a network of co-occurence using pairwise association strength estimated by Multimorbidity Coefﬁcient. Besides, we illustrate how to use network science techniques to extract structural knowledge from Multimorbidity network. Results: The results show that the Ising model is able to detect a similar structural patern as the approach of computing Multimorbidity coefﬁcient for all paires of diseases. An evaluation of the stability and precision of the obtained comorbidity network has proved its reliability. Conclusion: Deﬁning methods and algorithms of detecting Multimorbidity network in formal language may help interdisciplinary cooperative research. Ising Model is a machine learning based on a probabilistic formalism capable of detecting the same pattern as traditional approaches in Multimorbidity research literature. Understanding how diseases co-occur at the same time will help physicians to reason on multimorbidity burden as a complex system rather than reasoning on diseases as single and isolated entities.


INTRODUCTION
With the increase of the average life expectancy, the aging phenomenon has led to a substantial increase in chronic diseases, therefore rising the prevalence of multimorbidity. Multimorbidity, i.e. two or more than two diseases in the same patient are diagnosed at the same time [37], is a significant health problem in modern medicine. It has been associated with poor prognosis, lower quality of life [29], increased health care costs, polypharmacy and the risk of premature death [11]. The management of multimorbidity is a complex process, and has become an emerging priority for public healthcare professionals. Unfortunately, multimorbidity remains not well understood due to its multifactorial complexity aspects, and to the health systems that are still designed in a single disease paradigm rather than multimorbidity. However, the transition from disease-centered care, to patient-centered care is ongoing [37]. Recently, increasing initiatives to exploit data-driven techniques and the increasing amount of electronic healthcare data [33] are taken to get more insight into this phenomenon.
There is a debate in the literature which has called for consistent and replicable methodology for the study of multimorbidity [18,32,45]. In a recent review, Jones et al. [19] highlighted the lack of a consensus in defining and measuring Multimoridity which results in no recommended standard method for calculating networks in multimorbidity. One way to facilitate a consensus about a standard methodology is to express the problem in a formal langage. In this work we try to adress this problem by proposing a formal definition of building a network of Multimorbidity, formal definitions helps in expliciting definitions and hypothesis about the problem and thus building common terminologies for researchers that may facilitate communicating their findings. Further, comparative analysis of different approaches in literature can reveal better understanding of pros and cons of each approach.
In particular, we propose to model comorbidity pattern detection using Ising model, a pairwise Markov random field-based approach (pMRF). It is a machine learning algorithm that estimates from binary data an undirected network of conditional dependences using regularization techniques. In addition, we compare the proposed approach with a baseline algorithm based on estimating Multimorbidity Coefficient (MC) between all pairs of diseases (will be denoted MC − Algorithm in the paper). The studied methods will be applied on a case study of real medical data to detect comorbidity pattern of some selected valvular heart related diseases.
After performing the two proposed approaches, we will compare the outcomes of their corresponding algorithms. Besides, we will analyze structural information characteristics revealed by the obtained networks. The results show that Ising and MC − Algorithm outputted the same Comorbidity Disease Network. Further, we will illustrate how a mesoscopic analysis of the obtained multimorbidity network/pattern can be used to suggest an individual care strategy to manage patients who suffer from multimorbidity.
In section 2 we review literature related to the multimorbidity modeling. Section 3 presents an overview of the studied diseases in this work. Section 4 is devoted to data and methods used in this work. We define mathematically automatic detection of Multimorbidity pattern problem, and then develop its corresponding algorithm. We present and discuss the obtained results in section 5 before concluding in section 6.

RELATED WORKS
Recently, deep studies in medical literature were conducted to tackle multimorbidity burden, to explore its risk factors, its impact on quality of life in terms of mortality, costs and healthcare utility [46]. More technically, the methods and models differ either on the data, (cross sectional, temporal dimension, etc.), or whether the goal of the model is to explain, explore or to predict.
Earlier medical research relied on regression models, which were applied on single diseases and which ignore the hidden structure of the multimorbidity complexity. Recently, combinations of traditional data analysis and machine learning were proposed as multimorbidity research methods. In [49] the authors used Classification/regression trees and random forest applied to data of elderly adults to model and identify how specific combinations of chronic conditions, functional limitations, and geriatric syndromes affect costs and inpatient utilization.
In [47] applied non-hierarchical cluster analysis based on k-means on cross-sectional study using electronic health records of patients aged between 45 and 64 years to identify and separate population groups from others. In [50] added fuzziness upon k-means algorithm to estimate clusters of patients as well as membership matrix indicating the membership degree of a patient to a given cluster. In [48] a multilevel analysis of the influence of individual and area level factors on patterns of physical-mental multimorbidity and healthcare used in the general population. Applying this method allows detecting the isolated and combined influence of variables of each level on the outcome variables.
Other approaches in literature focused on probabilistic formulation and longitudinal data.
In [51], Lappenschaar et al. summarize and classify some terminologies used in definitions of concepts of multimorbidity. They proposed a probabilistic framework to model these concepts using causal Bayesian network [54]. In [52], the authors proposed Bayesian network structure learning methods for modeling the interactions between risk factors explaining co-occurrences of malignant tumors in oncological area. This model was extended with a temporal dimension in [53]. Authors in [55] proposed a latent-based approach to model multimorbidity related events in temporal electronic health records. They introduced the notion of clusters of hidden states allowing the exploration of multiple dynamics that underlie events in data.
Network science is a relatively new approach to investigate Multimorbidity. To build Multimorbidity network, researchers estimate association strengths between diseases like Salton Cosine Index [20], odd ratio [1], Relative Risk [35], the standardized lift and confidence scores of the association rules as a probabilistic measuring of how conditionally the diseases are related [17]. Then the obtained network can be analyzed to reveal some structural caracteristics using for instance weighted degree, closeness and betweenness centrality [20], clustering coefficient, Page Rank and degree centrality [1], community detection algorithms [17].
In this work, we propose to study two methods to build Multimorbidity network. The first based on estimating Multimorbidity Coefficient (MC) as association strength of all paires of diseases. The second is based on estimating an Ising Model for the co-occurence of the diseases.
Ising model was used as a data analytic model to estimate dependencies between binary variables [14]. This method is becoming frequentely in psychology [38]. For example, it was used to model theoretical assumptions of political beliefs, attitudes, and depression [6,9]; and as an analytic tool in psychometric network [5,22,43]. We apply the two methods to build network co-occurence of valvular heart diseases, then we investigate if they output the same Multimorbidity network.

COMORBID VALVULAR HEART DISEASES OVERVIEW
The human heart is viewed as a pump consisting of four chambers and four valves keeping enough blood flowing in a one-way direction: mitral, pulmonary, tricuspid and aortic as in this makes heart muscles become overworked and cannot pump properly. This may generate other problems like pulmonary hypertension, heart failure, stroke and others [16,26,28,39,41].
In Figure 1 there are two types of heart valve dysfunction: valvular stenosis (Figure 1a), and regurgitation or insufficiency (Figure 1b). Valvular stenosis refers to narrowing in the valve, which does not open enough and blood flow is therefore slowed. Insufficiency (leakage) occurs when the valve doesn't close properly and so the heart has to work harder to work properly.
Valvular heart diseases can have various causal patterns: degenerative in origin, inflammatory or bacterial infection, like streptococcus pyogenes which can cause, in long term, rheumatic valvular heart diseases [10]. These functional disorders are accompanied by dilatation and cardiac fatigue: shortness of breath and risk of edema of the lower extremities, malaise, sometimes loss of consciousness, palpitations, congestive heart failure. Furthermore, multimorbid patients will suffer from increasing burden and lowering quality of life. Therefore, understanding the tendency of comorbidity between these diseases can help healthcare systems to anticipate valvular heart disease patients' needs and reduce unnecessary charges in managing multimorbid patients profiles. We encourage the reader to read the appendices (Section 7) to take an overall view of the notations and abreviations used in this methodological section.
Let us start with the following assumptions: Let |S| denotes the number of elements of a set S. Let R be a k-ary relation over Cartesian product sets D k . The diseases d 1 , d 2 . . . d k are related by the relation R if and only if they satisfy a predefined condition that depends on the context of the study. It can be for example the fact of being correlated, causally related, being in the same category or conditionally related. In this work R represent multimorbidity relation over k diseases. The relation R defines an undirected weighted hypergraph G = (D, R) such that the vertices D are the nodes/diseases and hyperedges R represent the multimorbid diseases. The weights of the hypergraph are a measure of the strength of co-occurrence of these k multimorbid diseases in data. Each hypergraph G represents a Multimorbidity pattern. We define the Automatic Multimorbidity Pattern Detection problem (AMPD) as the conceptualization and the study of algorithms that automatically detect hidden Multimorbidity patterns given a medical data set.
We use in this work a probabilistic framework to model AMPD problem. Let X = {X 1 , X 2 , . . . , X |X| } represent the set of |X| patients. Each patient is characterized by a set of a tuple of diagnoses X i = (x 1,i , x 2,i , .., x |X i |,i ) where x j,i ∈ D is a random variable for the i th patient of the j th diagnosis. We suppose that X are random variables that are independent (in general it is supposed in the case in cross-sectional data like our data of application) and identically distributed (i.i.d) samples (the data are governed by the same underlying Multimorbidity Mechanism described by the joint probability distribution noted P * over the variables X). The data X can be indexed by diseases X = {X d |d ∈ D} such that X d is a binary random variable of the presence (or absence) of disease d ∈ D over the patients (In the following the random variable X d and the node/disease d will be used interchangeably).
Statistically, R is estimated in function of observations X. We consider the hypothesis space H = {R θ (X)|θ the parameters of the model.}. Given a family of models R θ (X), our task is to learn some models R * θ (X) ∈ H that best fit the distribution P * from which our data X were sampled. This is done by minimizing an expected loss function L(X, R θ ) which measures the loss that a model distribution R θ makes on input observation X.
In this work, we focus on the special case of learning Comorbidity Disease Network (CDN) (thus learning an ordinary graph with ordinary edges). We define the binary relation between comorbid diseases using two approaches: Multimorbidity Coefficient based on strength computation approach and probabilistic dependency-based approaches. We will use the first approach as a baseline to evaluate the second proposed approach.  [42]: if d 1 has occured, then d 2 will be more likely to occur than what would be expected just by chance. We consider d 1 and d 2 are in positive comorbidity, i.e., they tend to appear together, if P(d 1 , d 2 ) > P(d 1 )P(d 2 ). If P(d 1 )P(d 2 ) = P(d 1 , d 2 ) we consider that the co-occurrence of the two diseases is what would be expected just by chance. The final case P(d 1 , d 2 ) < P(d 1 )P(d 2 ) can be interpreted as d 1 andd 2 are in protective comorbidity (for instance, myopia may be protective against diabetic retinopathy [24]).
To measure how strongly disorders are associated, a multimorbidity coefficient (MC) is calculated. MC is commonly used method for measuring pairwise association [2]. MC is defined as the division of observed rate of comorbidity (multimorbidity) by the rate which is expected under the null hypothesis of no association between the separate disorders. Using the Table 1 notations, the MC score for Disease 1 and Disease 2 is equal to:

4.1.3.
Conditional dependence based approach. Another possible definition of the binary relation R to model structure of CDN, rather than the association strength concept, is to fit interdiseases probabilistic dependences/independences structures and search for the optimal structure given a Loss measure.
Let P(D) be the power set of diseases/binary random variables. Given the graph G = (D, R), the set of random variables (X d ) d∈D form a Markov Random field in respect to G if any two subsets of variables are conditionally independent given a separating subset: Where every path from a node in A to a node in B passes through S. Let P(X = x) be the probability of finding that the random variables X take on the particular value x. We suppose that the hidden distribution P * underlying our multimorbid data can be factorized in the form: where R denotes the set of hyperedges of G, and each factor Ψ c is a non-negative function over the variables in a hyperedges/multimorbid diseases c from R. The partition function is a normalizing constant that ensures that the distribution sums to one.

Markov Random Fields (MRFs) can compactly represent independence assumptions that
Bayesian network (directed acyclic graph) cannot represent and visualize a probability distribution in undirected graph terminology: A Pairwise Markov Random Field (pMRF) is a network in which nodes represent variables, connected by undirected edges indicating conditional dependence structure. Two variables that are not connected (i.e. no edge between them) verify the Markov property: two nodes are conditionally independent given the set of all other nodes in the network. Further, pMRFs are well defined and have no equivalent models, unlike Bayesian network [25]. Therefore, they facilitate a clear interpretation of the edge-weight parameters as strength of unique associations between variables, which in turn may highlight potential causal relationships. Besides, Conditional independencies are also to be expected in many causal structures [34].
A pMRF can be parameterized as a product of strictly positive potential functions Ψ i for all nodes i and j from D such that i = j [31]: is the node potential function that can map a unique potential for every possible This results in: Where Z is a normalizing constant and defined as: We can view the Ising model as a probability distribution that is governed by main effects α i and pairwise interactions/edges ω i, j . The model can be represented by an undirected weighted graph G = (D, R) such that edge weight ω i, j are a real valued measures of dependence between nodes/diseases i and j: The higher (lower) ω i, j becomes, the more nodes X i and X j prefer to be in the same (different) state (they tend to be both present or both absent in the same patient at the same time) and α i can be viewed as a threshold parameter that denotes the tendency for node i to be in some state (the tendency to be present or absent in a patient). Ravikumar et al. used L1 regularized logistic regression [36] to estimate the structure of the Ising model (known as the least absolute shrinkage and selection operator [40]). The pseudo likelihood is used to approximate the full likelihood. For each node i, the expression which is maximized is [15]: The pseudolikelihood PL approximates the likelihood with the product of univariate conditional likelihoods: Where ω i is the i th row (or column due to symmetry) of the Ising structure network ω. The λ is the regularization tuning parameter and Pen(ω i ) denotes the penalty function, which is defined in terms of the LASSO as follows: With a pseudo-likelihood estimation, we first estimate the neighborhood of each diseases node. This is performed through one logistic regression per node (here, we consider a simple three-node example). We combine the neighborhoods into a single network model whose weight matrix is ω, through the AND rule: an edge is present if both β i, j and β j,i are non-zero. This step is necessary because each node is both the dependent and independent variables; hence, we have two β estimates: β i, j and β j,i .
In Figure 2, this conditional probability is shown for each node of a simple three-node network. More importantly, if x 1 , x 2 and x 3 are observed variables, this equation translates directly into a logistic regression. Hence, we can estimate β and α with one regression per node. For example, in Figure 2 network estimation is performed on three nodes by three regressions: (14) Regression 1 : A caveat is that we will have two estimates per edge, β i, j and β j,i , because each node serves both as the dependent and independent variables. However, they will converge as sample size goes to infinity. To solve this and complete the weight matrix ω, researchers in [43] used the and-rule: any edge is the average β if both β i, j and β j,i are non-zero, otherwise ω i, j = 0. This multiple logistic regression approach inflates the rate of false positives, because of performing multiple testing. Authors in [43] used LASSO regularization (Least Absolute Shrinkage and Selection Operator) to overcome this problem. The LASSO procedure shrinks estimates towards zero, lowering the amount of detected edges and their strengths [40], and thus dropping out of the model the spurious edges letting just interpretable and important edges [43], limiting effect on over-fitting for smaller samples and leading to better out-of-sample generalizability [12]. The amount of shrinkage depends on a hyper-parameter γ which determines whether false positives or negatives are preferred. Lower γ favours more edges, while higher γ favours stronger shrinkage and hence, sparser networks. Therefore, a low γ increases the false-positive rate, while a high γ inflates the false-negative rate. This tuning parameter γ can be selected by minimizing the Extended Bayesian Information Criterion (EBIC) [7], such that (17) EBIC = −2Likelihood(X) + |ω i |ln(|X|) + 2γ|ω i |ln(|D| − 1) in which |ω i | is the number of nonzero parameters in ω i and |X| is the number of observations.
The EBIC has been shown to be consistent for model selection (e.g. in psychometric domain [3,43]), and to perform best with hyper parameter γ = 0.25 for the Ising model [3].

Multimorbidity Coefficient based Algorithm (MC-Algorithm).
In this section we define and implement the pairwise association strength computation methodology for building Comorbidity Disease Network which will be considered as a benchmark to compare to. We will call this algorithm MC − Algorithm.
Let If the MC is significantly higher than 1 then the algorithm considers that these two diagnoses are in comorbidity. If the MC is significantly less than 1 then we consider that these two diagnoses are in protective comorbidity. The bigger this number is, the stronger the association is considered. We will focus our analysis on positive comorbidity only. if H 0 : "W expected ≥ W observed " is rejected at risk α then  If we conceptualize the outputted network as a distance matrices such that the distance in each pair of diseases corresponds to similarities in their level of co-occurrence, ranging from 0 (does not co-occur) to 1 (co-occur almost always), then we can perform Mantel test to measure the similarity of the two outputted networks [23].
The Mantel test is used for correlation between two proximity matrices ω Ising and ω MC−Alg and tests the null hypothesis H 0 : "proximity among diseases in the matrix ω Ising are not linearly related to the corresponding proximity in the matrix ω MC−Alg " against the alternative hypothesis After that, we assessed how accurate networks are estimated, and how stable inferences from the network structure are. We estimate the accuracy of edge weights using bootstrapped Confidence intervals (CI) [13]. Then we estimated if edge-weights/centralities significantly differ.
This can be done by checking if zero value is in the constructed bootstrapped CI of differences of edge-weights/centralities [8]. Once a network is computed, centrality indices can give information on the importance of each node. We will consider three main centrality indices. 1) Strength centrality is defined as the sum of the absolute weights of the edges incident to a node, a node has a high strength if it has strong connections with many others. have been suggested to indicate sufficient stability and good stability [13]. The stability analysis indicates if the order of centrality indices does not change after re-estimating the Comorbidity network using less data.

RESULTS AND DISCUSSION
We performed MC-Algorithm on the valvular heart diseases data mentioned in section 4.3.
Visualizations are performed using R Studio software. We see in Figure 3 the weighted graph of detected comorbidity disease network for male patients (older adulthood more than 65 years).
The MC-Algorithm outputted a graph with 12 non-zero edges over 36 possible edges. The thickness of edges shows visual differences of calculated MC weights between each pair of diseases. If an edge is absent between two pairs, this means either their association is random or not significant (the null hypothesis was not rejected at risk 0.01) or both cases. In Figure   3 for example, Non rheumatic tricuspid insufficiency co-occurs with non-rheumatic mitral in-   Table 2 for the values).  Fig. 3. Since the built graph is undirected, the weight matrix is symmetric (just the lower triangular is filled). The cases with "-" symbols mean that either the MC score is equal to one, or the null hypothesis is not rejected, or both.  To assess the accuracy of the obtained CDN we used bootstrapped methods to construct confident intervals (CIs). Figure 6b shows Similarly, MC-Algorithm outputted these edges to be either not significant or MC score was equal to 1, i.e. they randomly co-occur. We can interpret these results by supposing that these edges being spurious edges and their occurrences can be due to chance or other con unmeasured  Other edges have low weights edges but their confidence interval contains zero, which suggests that they should be interpreted as weak edges (visually weak thickness in Figure 4).   because it has the weakest connection towards other nodes (MC=3.14 ) which makes her almost an isolated node. According to strength centrality, I27.2 and I34.0 were the most central nodes, followed by I36.1. Their centrality indicates that these nodes were those more likely to affect or to be affected directly by other nodes in the network. Nodes strength has gradually increasing scores which makes the interpretation of differences in these scores are not straightforward.
To investigate more the strength scores, we performed hypothesis testing -similar to edge weights difference test-. The null hypothesis in which we set α = 0.05, H i, j 0 : " strength centrality scores of node i and j are equal" for all nodes with i and j. Figure 7b shows the results.   influence in the other nodes in the network. Thus, they should be prioritized to be targeted in a therapeutical operation whenever a patient case exhibits these comorbid diseases. Targeting this highest centrality is a strategy to read off the comorbidity pattern and thus help to deconstruct these comorbid components and alleviates the multimorbidity burden put upon the patients.
While assessing accuracy of centrality measure can result in biased results, researchers in [13] suggest assessing the stability of centralities by the coefficient of stability CS. Figure 8a shows the results of the average correlation with the original sample for strength centrality of our data. Ising Model present some advantages for reducing spurious edges, and the regularization technique estimates a statistical model while including a penalty for model complexity has been shown to converge to the generating network structure under the assumption that the network is sparse [36] and simulation studies reported that the LASSO has a low likelihood of false positives [21].Thus, The LASSO yields a more parsimonious graph (fewer connections between nodes) that reflects the most important empirical relationships hidden in the data. However, the nature of the relationship represented as an edge needs to be further investigated and interpreted (the edge could represent a direct causal pathway between nodes, or it could reflect the common effect of a latent variable not included in the network model).
As a conclusion, regularization technique reduces over-fitting and removes false positive edges, to exclude spurious relationships and to make networks more parsimonious, robust, and interpretable. It is however, important to consider that the detected morbidity can be affected by the number and the nature of the selected diseases. For example a strong relationship detected between two diseases can be caused by a third disease acting as a latent variable not included in the analyzed network. Thus the replication of this study by other researchers on more diseases is important and crucial for investigating such limits.

CONCLUSIONS AND PERSPECTIVES
In this work, we proposed pairwise Markov Random field approach to detect the comorbidity pattern of some valvular heart diseases. Using Ising Model suited for binary data; the obtained results suggest that this model can detect potential causal links among diseases and provide Comorbidity Disease Network comparable to widespread and traditional approach in multimorbidity research. We applied these methods on a case study of a real dataset. An assessment of the stability and accuracy of the obtained Network suggests that the obtained network is reliable and the degree of confidence with which edge weight and centrality rankings can be interpreted is meaningful.  Table 3 some abbreviations used in this paper.  Table 4.  x j,i ∈ D The diagnosis j for the i th patient.
P * The joint probability distribution over X.
The data X indexed by diseases d ∈ D.
X d Binary random variable of the presence of disease d ∈ D over the patients of the dataset.
R θ (X) Family of models estimated from the data X.
H The hypothesis space defined by a parametrization of the relation R θ (X).
R * θ (X) ∈ H The model that best fit the distribution P * .
L(X, R θ ) The expected loss function which measures the loss that a model distribution R θ makes on input observation X.  Table 6. of the disease number i.

P(d i )
The probability of observing d i .
The probability of observing d i and d j at the same time.

P(D)
The power set of diseases/binary random variables.
P(X = x) The probability that the random variables X take on the particular value x.
Ψ c Non-negative function over the variables in a hyperedges c.
Z Normalizing constant that ensures that the distribution sums to one.
The node potential function for a realization of x i .   β The set of estimte parameters β i, j .
ω i, j Edge weight of the Ising structure network that links nodes i and j. ω The weight matrix of the Ising structure network.
ω i The i th row (or column due to symmetry) of the Ising structure network ω. λ The regularization tuning parameter.
Pen(ω i ) The penalty function which is defined in terms of the LASSO. γ The parameter of shrinkage.
The subset of size k diseases from D.

P(D)
The power set of D.
f : I → {D 1 , D 2 , ..., D N diag } The application that maps every patient i to his recorded The upper bound computational complexity.
W expected Current expected Weight of d 1 and d 2 .
W observed Current observed Weight of d 1 and d 2 .
The weighted edge in the CDN/ the graph G( V, E) that maps two nodes/diseases d 1 and d 2 .