A survey on exponential random graph models: an application perspective

The uncertainty underlying real-world phenomena has attracted attention toward statistical analysis approaches. In this regard, many problems can be modeled as networks. Thus, the statistical analysis of networked problems has received special attention from many researchers in recent years. Exponential Random Graph Models, known as ERGMs, are one of the popular statistical methods for analyzing the graphs of networked data. ERGM is a generative statistical network model whose ultimate goal is to present a subset of networks with particular characteristics as a statistical distribution. In the context of ERGMs, these graph’s characteristics are called statistics or configurations. Most of the time they are the number of repeated subgraphs across the graphs. Some examples include the number of triangles or the number of cycle of an arbitrary length. Also, any other census of the graph, as with the edge density, can be considered as one of the graph’s statistics. In this review paper, after explaining the building blocks and classic methods of ERGMs, we have reviewed their newly presented approaches and research papers. Further, we have conducted a comprehensive study on the applications of ERGMs in many research areas which to the best of our knowledge has not been done before. This review paper can be used as an introduction for scientists from various disciplines whose aim is to use ERGMs in some networked data in their field of expertise.

120 Hence, in Section 4, most of the state-of-the-art works for ERGMs estimation methods have been 121 discussed. Section 5 is a review of ERGMs' applications in multiple fields. In section 6, we have 122 introduced some of the state-of-the-art new libraries and tools for ERGM estimation. Ultimately, 123 in section 7, we conclude what we had said and also give some ideas for future works in the world 124 of ERGMs.
125 Survey Methodology 126 For the purpose of finding related research articles we used two different approach. 127 1. Searching related keyword in the google scholar search engine. 128 2. Starting from an initial pool of articles and then move back and forth between their citations 129 and references. 130 In the first approach we search related keywords like "ERGM", "Exponential Random Graphs", 131 "Exponential Random Graph Models" in the google scholar search engine and extracted related 132 articles by reading their abstracts. 133 In the second approach which was our main methodology throughout the work we initiate with a 134 number of seminal works which were found by one of the following ways.   3. Papers extracted from the first approach which had a good citation count or were published 142 in journals with high impact. 143 After finding the initial seed of articles by one of the mentioned methods we checked the related 144 publications that they have referenced and the publications that they have been cited from them. 145 We continued until there were no more related articles. In situations which there were too many 146 related articles our selection criteria were mostly based on the citation count and the journals' 147 impact factor.
148 Precise Definition of ERGMs 149 In this section, we give a brief overview of the overall ERGM scheme. According to (Snijders et 150 al., 2006;Robins et al., 2007a), the first work that categorized ERGMs as a separate field of study 151 was (Frank & Strauss, 1986). Although it was named as Markov graphs at that time, basically it 152 had the same characteristics. An interested reader can refer to (Robins et al., 2007a;Lusher, 153 Koskinen & Robins, 2012) for more details on both the history and mathematical background of 154 this topic. 155 In an ERGM, each graph is associated with a probability. This probability indicates the possibility 156 of the presence of that particular graph in the probability distribution of a class of graphs. There 157 are also two other essential elements in ERGMs known as graph configurations and their 158 corresponding parameter. Each configuration or statistics (we will use both names throughout the 159 text) is composed of some nodes and ties repeated in the graph. For example, a triangle consisting 160 of three nodes and edges can be assumed as a configuration. The authors of the seminal work 161 (Frank & Strauss, 1986) were the first who argued that these configurations can be considered as 162 sufficient statistics for a log-linear mode. Sufficient statistics are features of a i.i.d dataset which 163 are sufficient for modeling the distribution probability of the data such that adding another feature 164 does not add any more insight to the model (RA Fisher, 1922). So, ERGMs are a representation of 165 the graphs by their configurations. A particular exponential function is defined to represent the 166 relationship between these configurations and the probability distribution of the graphs. This 167 formula is a variation of logistic regression which is extended so that it would handle the 168 dependent variable rather than only being applicable to independent variables which are the case 169 for logistic regression (Lusher, Koskinen & Robins, 2012). We will use the notations presented in 170 Table 1 throughout our work. 171 Note that throughout this work, the representation of the graphs is in the form of the adjacency 172 matrix. For example, in a matrix if it indicates that there is an edge between and , while = 1 173 if no edge exists between these two nodes. = 0 174 Using the introduced notation of Table 1, the ERGM probability function can be expressed as 175 follows: is the normalizing factor which is the sum of the probability of all possible graphs computed by 177 Eq. 1, whose formula is as follows: 178 If we summarize the results, this leads to: In this equation, is the matrix we want to compute its corresponding GWDC value and is the 208 number of nodes in the graph. represents the number of nodes with degree . Also, is a 209 decaying factor which ensures that the nodes with higher degrees have higher impacts. 210 Geometrically Weighted Stars Counts (GWSC): this measure is an extension of star counts 211 combined with a combination of geometrically degree discounts in computing the statistics, which 212 is expressed as the following expression: In this equation is the number of stars with the number of edges ( -stars). Also, denotes a 214 decaying factor which ensures that the stars with a higher degree have a greater impact.
217 Transitivity by Altering -Triangles (TAT): this measure is an extension of triangle counts 218 combined with geometrically discounts in the computation of the statistics, which is expressed as 219 the following expression: In this equation, is the number of -triangles. represents a decaying factor which ensures that 222 the triangles with a higher degree have a more substantial impact. Figure 2 (in Supplemental Files) 223 displays a description of -triangles.

225
226 Altering Independent Two-Path (AI2P): this measure is an extension of 2-path with a combination 227 of geometrically discounts in the computation of the statistics, which is expressed as the following 228 expression: In this equation, is the number of star -independent 2-paths. represents a decaying factor 230 which ensures that the triangles with higher degrees have higher impacts. In Figure 3 (in 231 Supplemental Files), you can see a description of -independent 2-paths. 232 The authors of (Wilson et al., 2017) addressed one of the significant drawbacks of ERGMs. As 233 can be seen in Tables 2 and 3, the weights of the graphs are missing. In other words, they are only 234 applicable to unweighted graphs, and if we want to use them in the context of the weighted graphs, 235 their weights should be omitted. However, much useful information underlies the weight of the 236 graphs and for most of the domains it is crucial to consider them to accurately model the graph. 237 Following this idea previously discussed in ( 267 We aim to find the best values of the vector in Eq. 3 which maximize the probability over the 268 observed data. In a more formal expression, we want to solve the following equation: ∶= ∈ ( | ) (10) 269 Where, is the same probability function as the Eq. 3 and, represents all possible real values 270 over a -dimentional space. Note that the is a vector of coefficients rather than a single value; 271 thus, its space value should be a vector space. Different methods exist for solving such equations. 272 Here, we are going to name a few of them which are mostly used in the ERGM related works. 273 Also, we intend to present a number of state-of-the-art methods that have been presented after 274 2016.

283
 Predicting the prior distribution of the graphs for Bayesian learning models 284 So, there is a need for sampling methods to draw a sample from the given graph distribution. In 285 this section, we present some of the sampling methods that have been used extensively in the 286 literature. 287 Monte Carlo Markov Chain sampling method which is abbreviated to MCMC (Metropolis et al., 288 1953) is a well-known sampling method which has been used in many works. Here, we only 289 discuss it in the context of graph generation. In this method, we start with an initial graph which 290 can also be an empty graph. Then, in each iteration, a new graph is generated by making a small 291 change to the graph from the last step. The form of this "change" is different from work to work. 292 The most straightforward change is adding or removing a tie. The procedure is as follows: two 293 nodes are chosen randomly. After which the state of their connection is altered (if they are already 294 connected, they become disconnected while if they are not connected, they become connected.). 295 In the next step, the probability of the generated graph is computed according to Eq. 3. This 296 probability is compared to that of the graph generated in the previous step. Then, we accept or 297 reject the new graph based on the comparison of these two probabilities. If the new graph is more 298 probable, it is more likely to substitute the old graph in the next iteration. The probability of 299 whether the new graph is chosen for the next iteration or the graph from the last step is re-chosen 300 depends on which one of them has a higher probability score in Eq. 3. Note that only having a 301 higher probability score is not a guarantee that the graph gets chosen. It only increases the chance 302 of selection. All these outlined the scheme of all MCMC methods. However, the details including 303 how many of ties are altered in each iteration or the probabilistic selection between the old graph 304 and the new one are different in literature. We intend to present a quick introduction to the 305 Metropolis-Hasting sampling methods which is mostly used in ERGM related literature. Figure 1 306 displays the overall procedure of an MCMC method. 307 Metropolis-Hasting (Metropolis et al., 1953) is the most widely used MCMC derivation in ERGM 308 studies. Metropolis Hasting in the context of graph generation is as follows. Initially, as we 309 explained in the general MCMC scheme, we start with an empty or random graph. Our goal is to 310 generate samples from the distribution of graphs, implying that we want to generate a sequence 311 of graphs. We choose two random nodes at each step and change the tie situation 1 , 2 , … , 312 between them. The probability of the newly generated graph and the graph from the last step is 313 then computed using the following formula: This formula computes the probability of whether to accept the new move or substitute the last 315 step graph as the new one.
316 Classic methods 317 So far, we have reviewed the necessary preliminaries. Now, we can review the most widely used 318 methods in the literature for estimating the value for statistical parameters ( in Eq. 4) best 319 representing the observed data. In other words, our aim is to solve Eq. 10. Most of the methods 320 use the following steps: Initially, they start with an initial value for the parameter vector. Then, the 321 distribution of the graphs is generated by one of the sampling methods. Next, the difference 322 between the distribution and the observed data is computed ( . If the ( ( )) -( )) 323 difference is satisfactory, the learning process is halted and the current vector of the parameter is 324 considered as the final answer which best fits the observed data; Otherwise, based on the learning 325 method the algorithm moves to the subsequent values of and goes back to step 2. Figure  The goal as we said is to minimize the expected value of which minimizes the expected value 334 between observed statistics and the ones generated by the ERGM model. The aim is to use 335 Maximum Likelihood that we discussed before to find the best value of vector which maximize 336 the right hand side of Eq. 10. One possible approach is to search over all possible values in the 337 search space and try them one by one. But since the search space is very large and values are 338 continuous this approach is not practical. Instead of such brute force algorithm, one of the methods 339 for ERGMs parameter estimation is the one inspired by the general framework ML estimation 340 method for dependent data introduced by (Geyer & Thompson, 1992). The main idea is to instead 341 of generating the whole possible graphs of a particular vector, we can draw a large sample of 342 the graphs and consider it as a representation of the whole possible graphs at each iteration. This 343 sample is generated from the current value of the vector using the Eq. 3 and is used in the formula 344 to compute and then compute how much the value is close to -1 As it can be seen in Eq. 12 most parts of the formula for TERGM is similar to normal ERGM. 402 However, the time snapshots are now considered and each new time snapshot is dependent to 403 its previous one . Also, the normal count of the networks statistics has been substituted with -1 404 temporal potential count over two consecutive snapshots. For more information see (Hanneke 405 et al., 2010) and for the information about the btergm which is a library for temporal ERGMs see 406 (Leifeld, Cranmer & Desmarais, 2017). 407 Most of the real-world network are associated with a value on their edges which are referred to as 408 weighted graphs in graph theory. A plethora of researches have been done to consider these types 409 of networks into the ERGM general schema. GERGM (Desmarais & Cranmer, 2012) and the 410 model proposed by (Krivitsky, 2012) are the two most well-known models which have 411 incorporated the networks' edges' weights into the model. The normalizing factor in Eq. 3 which 412 is the denominator of Eq. 4 is not assured to be convergent when the network statistics ( in ( ) 413 the Eq. 4) are infinite set like continuous valued edges. GERGM is a model aimed to overcome 414 this issue by using a probability model for such continuous values. They build a transformed 415 version of the original ERGM formula that no longer suffers from the mentioned problem. The 416 (Krivitsky, 2012) have also extended the previous binary version of ERGM which only models 417 edges existence rather than their value into a model which is capable of capturing the information 418 of weighted graphs. However, his method is restricted to natural valued weights on the edges. 419 In network science there is a special kind of networks called multiplex or multilayer networks. 420 These are networks which their nodes are connected in the context of more than one attribute. For 421 example, in a social relation network, actors might have several relations between them like 422 friendship network or co-working network. Each of these relations can be abstracted as a layer in 423 a network model. Also, in some situations, there is a hierarchical structure in the data like modeling 424 the relations inside a university. There are schools, which are divided into groups and lecturers and 425 students. An extension of ERGM which is applicable to model such scenarios in multilevel 426 networks is proposed for these networks . They considered relation between 427 the nodes in each level and also the inter-level relations into the model. For example, consider a 428 two layer network with layers , and an imaginary layer between them called which is for the 429 purpose of modelling inter-level relations between and . Then the Eq. 1 is re-written as:  In order to take care of the limitations of the descriptive analysis of brain neural networks, the 439 author of (Sinke et al., 2016) used ERGMs to be able to model the observed network using the 440 joint contribution of network structure. They also compared the changes in brain networks 441 statistics across different ages. This study was conducted to examine the effects of aging during 442 lifetime in the brain global and local structures. Graphs where extracted from brain images 443 obtained from diffusion tensor imaging (DTI). Four network statistics were used to model these 444 networks:  The geometrically weighted edgewise shared partner (Hunter, 2007) 447  The geometrically weighted non-edgewise shared partner (Hunter, 2007) 448  The hemispheric node match: a binary indicator which shows whether two nodes are in the 449 same hemisphere of the brain. 450 The Bayesian learning schema from (Caimo & Friel, 2011) was used to fit the model. 451 In a recent work, (Dellitalia et al., 2018) employed ERGMs to study the structure of neural 452 networks of the brain. They aimed to increase the chance of unconscious and injured patients to 453 recover by analyzing brain functional data. In their work, they overcame four shortcomings of 454 previous methods by incorporating ERGMs into their study. For example, one of them was the 455 ability to assess the dynamics of the network over time.They used the Separable Temporal ERGMs 456 (TERGM) (Krivitsky & Handcock, 2014) for their modeling. One of the aspects of their work that 457 successfully handled with ERGMs was that the network structures they chose should have not 458 been necessarily independent. This restriction was one of the main drawbacks of previous methods. 459 Functional Magnetic Resonance Imaging or fMRI is a method for observing brain activities and 460 their changes over time. There are components in the fMRI images which can be explained using 461 network analysis methods. Nodal signals, network architecture, and network function are the three 462 essential network properties in building fMRI-based networks (Solo et al., 2018). ERGMs are one 463 the main important network analysis methods which have been used to explain such networks. The 464 authors of a review paper (Solo et al., 2018) introduced the most critical efforts with the aim to 465 explain these brain networks. Note that there are plenty of works which used ERGM as their  If we look at this issue from a 469 macro perspective, we can see that many health-related problems can be alleviated by analyzing 470 their corresponding inter-related actors. For example, in epidemiology, there is a direct connection 471 between the patient relationships and the extent that the disease can spread. In most cases, these 472 relations between the actors will result in the formation of a network. This network can be analyzed 473 using ERGMs to answer different questions underpinning its formation and dynamics. This kind 474 of analysis is something that has already been done extensively by researchers in the healthcare 475 community. 476 Analyzing inter-hospital patient referral network is a significant problem which (Caimo, Pallotti 477 & Lomi, 2017) has recently investigated using ERMGs. They used a combination of the edges and 478 nodes of the network and utilized the Bayesian approach introduced in (Caimo & Friel, 2011) to 479 fit their model. This task was done using BERGM (Caimo & Friel, 2014) R language package for 480 their implementation. 481 Another work (Baggio, Luisier & Vladescu, 2017) shed light on the relationship between social 482 isolation and mental health. The connection between these two subjects was investigated by 483 analyzing the network of Romanian adolescents using ERGM modeling. They concluded that there 484 is a strong link between the two mentioned concepts. There are 508 many theories on how these networks shape and evolve and how they depend on immigrants and 509 country backgrounds. ethnicity, wealth, religion). (Windzio, 2018) applied ERGM in order to 510 examine theories and hypotheses about creation and evolution of these networks. He used both the 511 graph structure and node attributes in a large number of statistics. 512 Global tourism and its corresponding network, Global Tourism Network (GTN), is yet another 513 field of study, given the tremendous financial importance of tourism market. As mentioned in 514 (Lozano & Gutiérrez, 2018), it is essential to gain insight into the connections between its 515 components. In the same article, an ERGM approach was used to find the critical local 516 substructures of the GTNs. 517 Handling the budget and resources during crises is always a challenging task for humanitarian 518 organizations. There is a need for a tradeoff between the use of asset supplies for the current crises 519 and the usual ongoing projects. This problem has been formulated in the form of asset supply 520 networks. (Stauffer et al., 2018) used ERGMs as an empirical model to understand the asset flows 521 during a crisis. 522 The applications of ERGMs have even been extended to the analysis of online drug distribution 523 networks. In a recent work, (Duxbury & Haynie, 2018) conducted the mentioned research on a 524 dataset of an online drugstore on the dark web. They studied such networks concerning their 525 topological dynamics, suppliers, and customer demand as well as the resistance of such networks 526 to disruptions. 527 Does economic partnership between professionals will result in further trust and solidarity? This 528 is the central question of (Bianchi, Casnici & Squazzoni, 2018). They developed an ERGM 529 multiplex network model collaboration network and a number of other attributes and then analyzed 530 it using multivariate ERGMs to examine social support and trust for each of the network statistics. 531 Political Science 532 A large number of articles in the political science community have used ERGMs for their 533 modeling. This enthusiasm toward ERGMs among political science scholars well suggests that it 534 is among the most famous mathematical modeling in the field. Here we introduce a handful of 535 these articles. 536 Sustainable development policy is a major concern both for the government and the private sector. 537 It is only achievable by interaction among individuals. In particular, the role of the connection 538 between funding sectors and those in need of money is important for carrying out their projects. 539 This is the central problem of (Gallemore & Jespersen, 2016). The dataset consisted of 91 donor 540 organizations. The role of ERGM in this work was modeling the donor agent relationship 541 networks. 542 Another major issue that has been addressed through ERGMs is collaborative governance between 543 different sectors and individuals of multiple organizations. In (Ulibarri & Scott, 2016), the authors 544 used ERGM to test their hypothesis about what should be observed in low-collaboration vs. high-545 collaboration networks. Four simple ERGMs' configurations were used, including: In a more recent work, (Scott & Thomas, 2017) addressed the same problem. However, they used 551 different datasets and network statistics. (Hamilton & Lubell, 2018) also took the same ERGM 552 modeling approach in discussing the collaborative governance, in the special domain of climate 553 change adoption. 554 In an exciting work, (Li et al., 2017) investigated the effectiveness of military alliances in making 555 peace between states. They used temporal random graph models for longitudinal network data of 556 alliance. They employed two different sets of network statistics and developed two models upon 557 them. 558 Communications via internet social networks have helped the human to take a huge step further. 559 People from multiple backgrounds and societies are engaged in conversations that have never been 560 possible before the widespread popularity of online social networks. In the case of political 561 conversations in social networks, there is always the dilemma whether this freedom has resulted 562 in more communications between people with different ideologies or adversely it will cause people 563 with same viewpoints tend to dominate most of the conversation thereby self-reinforcing the same 564 way of thinking. (   The set of all possible graphs with the same number of nodes.
The variable that indicates the presence of a particular graph from the distribution.
The probability distribution function of graphs.
The set of all network statistics presented in the model.
Some particular statistics of the network.
The set of all count function of the network configurations.
The count function of some particular statistics of the network.
The set of all network statistics coefficients.
Some particular statistics' coefficient of the network.
The normalizing factor, the sum of all configurations.