Exploring the step function distribution of the threshold fraction of adopted neighbors versus minimum fraction of nodes as initial adopters to assess the cascade blocking intra-cluster density of complex real-world networks

We first propose a binary search algorithm to determine the minimum fraction of nodes in a network to be used as initial adopters (fIAmin\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_{IA}^{\min }$$\end{document}) for a particular threshold fraction (q) of adopted neighbors (related to the cascade capacity of the network) leading to a complete information cascade. We observe the q versus fIAmin\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_{IA}^{\min }$$\end{document} distribution for several complex real-world networks to exhibit a step function pattern wherein there is an abrupt increase in fIAmin\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_{IA}^{\min }$$\end{document} beyond a certain value of q (qstep); the fIAmin\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_{IA}^{\min }$$\end{document} values at qstep and the next measurable value of q are represented as fIAmin̲\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\underline{{f_{IA}^{\min } }}$$\end{document} and fIAmin¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\overline{{f_{IA}^{\min } }}$$\end{document} respectively. The difference fIAmin¯-fIAmin̲\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\overline{{f_{IA}^{\min } }} - \underline{{f_{IA}^{\min } }}$$\end{document} is observed to be significantly high (a median of 0.44 for a suite of 40 real-world networks studied in this paper) such that we claim the 1 − qstep value (we propose to refer 1 − qstep as the Cascade Blocking Index, CBI) for a network could be perceived as a measure of the intra-cluster density of the blocking cluster of the network that cannot be penetrated without including an appreciable number of nodes from the cluster to the set of initial adopters (justifying a relatively larger fIAmin¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\overline{{f_{IA}^{\min } }}$$\end{document} value).


Introduction
Information cascade in complex networks is a phenomenon by which one or more nodes in a network adopt a decision based on the decision made by their neighbor nodes. Information cascade in complex networks is typically initiated through a set of nodes called the Initial Adopters (considered to have adopted a particular decision) and the cascade proceeds in a sequence of iterations. We follow a threshold-based complex contagion model (Watts 2002) for sequential information cascade: A node (that has hitherto not adopted a decision) adopts a decision in a particular iteration if the fraction of its neighbors who have adopted the decision is greater than or equal to a threshold fraction (q). The iterations stop if all the nodes have adopted some decision or no new node adopts a decision in the latest iteration. An information cascade is said to be "complete" (Easley and Kleinberg 2010) if all the nodes in the network arrive at a unanimous decision starting with a set of initial adopters who also adopt the same decision. We focus on complete information cascade (also referred to as global cascade in the literature) in this paper. For a given set of initial adopters, the largest possible value for the threshold fraction (q) of adopted neighbors that would lead to a complete information cascade is called the cascade capacity of the network (Easley and Kleinberg 2010). The larger the size of the set of initial adopters and more central are the initial adopter nodes, the larger the cascade capacity of a network. We use nodes with larger degree centrality (DEG) (Newman 2010) or betweenness centrality (BWC) (Freeman 1977) as the nodes that are part of the set of initial adopters. These two centrality metrics have been observed (Jalili and Perc 2017) to be effective in speeding up the adoption process for the rest of the nodes in the network.
Clusters are perceived as the major bottleneck for complete information cascade (Easley and Kleinberg 2010). A cluster in a network is a subset of the nodes that have more links among themselves compared to links with the rest of the nodes in the network (Newman 2010). The intra-cluster density of a cluster is a measure of the number of links connecting two nodes within the cluster and the inter-cluster density of a cluster is a measure of the number of links connecting the nodes in the cluster to nodes outside the cluster. The intra-cluster density of a cluster is computed as the minimum of the intra-cluster densities of the bridge nodes (nodes that have edges with nodes both inside and outside the cluster) of the cluster (Easley and Kleinberg 2010). The intra-cluster density of a bridge node is the ratio of the number of neighbors inside the cluster and the total number of neighbors.
We begin our research in this paper to find a solution for the following problem: For a given threshold fraction (q) of the adopted neighbors that would result in a complete information cascade, what is the minimum number of nodes in a network that need to be used as the initial adopters (IA min )? An efficient binary search approach for the above problem is not currently available or used in any of the related works in the literature. With a search space of (0, …, N], where N is the number of nodes in the network, we propose a binary search algorithm to find the actual value of IA min for a network for a given q. The minimum fraction of nodes as initial adopters ( f min IA ) corresponding to IA min is then computed as IA min /N. Using this binary search algorithm, we proceed further and build a distribution of the q versus f min IA values. To our surprise, we observe the q versus f min IA distribution to exhibit a step function pattern for 37 of the 40 complex real-world networks analyzed in this research, wherein there is an abrupt increase in f min IA beyond a certain value of q (q step ); the f min IA values at q step and the next measurable value of q are represented as f min IA and f min IA respectively. We developed a second binary search algorithm (that makes use of the algorithm to determine IA min ) to determine the actual value of q step for each of the 40 complex networks and the centrality metrics: DEG and BWC. The differences between the f min IA and f min IA values were observed to be significantly high for several complex networks. We claim that the value of "1 − q step " for a network could be perceived as a measure of the intra-cluster density of the blocking cluster of the network that could not be penetrated unless we include one or more nodes of this blocking cluster as part of the set of initial adopters (justified by the sharp increase in the f min IA values from f min IA to f min IA in the vicinity of q step ). A cluster is referred to as a blocking cluster if the fraction of adopted neighbors for the bridge nodes of the cluster is less than the threshold fraction (q) of adopted neighbors and information cascade cannot penetrate through such a cluster. The sharp increase in the f min IA values from f min IA to f min IA in the vicinity of q step for a network suggests that there should be at least one such blocking cluster (of intra-cluster density measured in the form of 1 − q step ) for which starting with f min IA fraction of initial adopters is not sufficient and one or more nodes (typically the bridge nodes and some internal nodes) from this cluster need to be part of the set of initial adopters (corresponding to the f min IA fraction) to accomplish complete information cascade. We propose that "1 − q step " be called the Cascade Blocking Index (CBI) of a network, a quantitative measure of the difficulty in penetrating through the blocking cluster(s) of the network. The larger the CBI value for a network, the larger the intra-cluster density of the blocking clusters of the network and vice-versa. While a lower CBI value is preferred for positive information to seamlessly get adopted by the nodes in the network, a larger CBI value is preferred for a network to keep away the epidemics/pandemics from infecting and spreading through the nodes.
As information cascade is observed to be more successful in clustered networks rather than non-clustered networks (Ikeda et al. 2010), the problem of inferring community structures of complex networks solely on the basis of information cascade (i.e., the underlying network structure is not known) gained attention in the literature (Rodriguez et al. 2014;Ramezani et al. 2018;Prokhorenkova et al. 2019) in recent times. To the best of our knowledge, there has been no work in the literature to quantify the maximum (or the minimum or the distribution) of the intra-cluster densities of the clusters without actually determining the clusters. The pioneering book chapter (# 19) in Easley and Kleinberg (2010) relates intra-cluster density with the threshold fraction (q) of adopted neighbors and refers to a cluster as a blocking cluster if its intra-cluster density exceeds 1 − q. Other than this, there is no other work reported in the literature that relates intracluster density with the threshold fraction of adopted neighbors for information cascade. The CBI value for a network (proposed in this paper) could be considered as an upper bound for the intra-cluster density (also a measure of the clusterability of the network) that could be expected of the clusters determined by the community detection algorithms.
The proposed methodology definitely has advantages over clustering, because with the latter approach: one has to first determine all the clusters of a network and then identify the blocking cluster (the cluster with the largest intra-cluster density) of the network. There are a multitude of clustering algorithms in the literature, each with different time complexities as well as the output (the set of clusters) of one clustering algorithm might be different from another for the same graph. In Sect. 5.2, we demonstrate a strong correlation (both rank-wise and prediction-wise) between the CBI values and the largest of the intra-cluster density values (with the clusters determined using the Louvain algorithm) of the real-world networks. We thus claim our work of quantifying the intra-cluster density of the blocking cluster for information cascade (without running any clustering algorithm) by using the threshold fraction of adopted neighbors and the minimum fraction of nodes as initial adopters is a novel and significant contribution to the literature.
The rest of the paper is organized as follows: Sect. 2 presents an iterative algorithm used in this paper to conduct information cascade in a network for a given set of initial adopters and threshold fraction of adopted neighbors. Section 3 presents the proposed binary search algorithm to determine the f min IA value for a network for a given threshold fraction (q) of adopted neighbors. Section 4 first presents the procedure used to build the step function distribution for q versus f min IA as well as a binary search algorithm (that makes use of the algorithm of Sect. 3) to determine the q step value and the corresponding f min IA and f min IA values for a network. Section 4 also explains how the "1 − q step " value (referred to as the Cascade Blocking Index, CBI) for a network could be related to the intra-cluster density of the blocking cluster of the network as well as provides a qualitative analysis of the significance of CBI from the standpoints of information cascade and infection spread. Section 5 first introduces a suite of 40 real-world networks analyzed in this research, presents the (q step , CBI, f min IA , f min IA ) values for these networks with respect to the DEG and BWC centrality metrics and discusses the results from an information cascade standpoint as well as from an infection spread standpoint. Section 5 also runs the well-known Louvain community detection algorithm (Blondel et al. 2008) on the 40 real-world networks, evaluates the intra-cluster densities of the clusters/communities determined for each of these networks, presents a visual comparison of these values with the CBI values observed for these networks as well as analyzes the distribution of the initial adopter nodes in the Louvain clusters of the real-world networks and its impact on the q step , f min IA and f min IA measures. Finally, Sect. 5 demonstrates the scalability of the binary search algorithms proposed in Sects. 2-4 by measuring and modeling the computation times to determine the q step , f min IA and f min IA measures for the real-world networks. Section 6 discusses related work in the literature. Section 7 concludes the paper and outlines plans for future work. Throughout the paper, the terms 'node ' and 'vertex' , 'edge' and 'link' , 'network' and 'graph' , 'cluster' and 'community' , 'metric' and 'measure' are used interchangeably. They mean the same.

Iterative algorithm for information cascade
We use an iterative algorithm (refer to Algorithm 1 for the pseudo code) to conduct information cascade in a network for a given set of initial adopters (IA) and threshold fraction (q) of adopted neighbors. The algorithm runs until all the nodes in the network become part of the set IA (i.e., adopted the unanimous decision, leading to a complete information cascade) or no new node adopted the decision in the latest iteration (tracked through the boolean variable CascadeProgress and the set LatestAdopt-edVertices that are reset to false and φ respectively at the beginning of each iteration). In any iteration, we go through the vertices that have not yet adopted the decision and determine the fraction (f) of their neighbors that have adopted the decision. For any such node u (that has not yet adopted) whose f ≥ q (i.e., the fraction of adopted neighbors of the node is greater than or equal to the threshold fraction of adopted neighbors), we include node u to the set LatestAdoptedVertices. If CascadeProgress stays false at the end of an iteration, it implies no new node adopted the decision during the iteration and the algorithm ends prematurely with the information cascade declared to be not complete. If CascadeProgress is true at the end of an iteration, it implies that at least one node has adopted the decision during the iteration and all such vertices in the set LatestAdoptedVertices are appended to the set IA. Figure 1 illustrates the working of the algorithm.
The algorithm would need to be run for at most |V| − |IA| iterations (a scenario in which just one vertex per iteration adopts the decision) and in each iteration, we go through at most |E| edges of the graph and determine the fraction of adopted neighbors for each vertex in the set V − IA. Thus, the overall time complexity of the iterative algorithm is O(|V||E|), simply written as O(VE). As seen in the example of Fig. 1 and the analysis of the real-world networks in Sect. 5, the number of iterations needed to accomplish complete information cascade need not be as large as |V| − |IA| iterations. In Fig. 1, we accomplish complete information cascade in just 2 iterations (wherein |V| − |IA|= 6).

Fig. 1
Execution of the iterative information cascade algorithm for a given set of initial adopters and the threshold fraction (q) of adopted neighbors

Binary search algorithm to determine the minimum number of initial adopters for a threshold fraction of adopted neighbors
In this section, we present our proposed binary search algorithm (refer to Algorithm 2 for the pseudo code) to determine the minimum number of initial adopters (IA min ) needed to operate a network with a threshold fraction (q) of adopted neighbors to accomplish complete information cascade. The minimum fraction of nodes as initial adopters ( f min IA ) for a network is the minimum number of initial adopters (IA min ) determined by the binary search algorithm divided by the number of nodes (N) in the network. The nodes constituting the set of initial adopters are chosen based on a centrality metric (DEG or BWC). As nodes with larger DEG or BWC values were observed to be effective in speeding up the cascade process (Jalili and Perc 2017), in each iteration of the binary search algorithm, we identify the top nodes (nodes having the larger centrality metric values) that constitute the required number of nodes needed as initial adopters (corresponds to the Middle Index value, as explained below and in the pseudo code: Algorithm 2) for the particular iteration. Any ties in choosing the nodes based on the centrality metric are broken arbitrarily. We observe this tie-breaking policy to not result in any significant difference in the results: (q step , f min IA , f min IA ) values of a network across multiple runs of the algorithm in Sect. 5.
Binary search (Cormen et al. 2009) is a classical divide and conquer algorithm design strategy of logarithmic time complexity such that the search space (spanning from a Left Index to the Right Index) gets reduced by half in each iteration and the algorithm stops when the difference between the Right Index and Left Index gets below or equals a termination threshold. The search space for IA min is the range (0, …, N], where N =|V|, the number of nodes in the network. We maintain the following invariant throughout the execution of the algorithm for a given q: the minimum value for the number of initial adopters needed to accomplish complete information cascade is in the range: (Left Index … Right Index]. That is, the information cascade will not be complete if the number of nodes used as initial adopters corresponds to the Left Index (initially set to 0) and the information cascade will be complete if the number of nodes used as initial adopters corresponds to the Right Index (initially set to N).
In the beginning of each iteration of the binary search algorithm, we determine the Middle Index as the average of the Left Index and Right Index. We then build a set IA of initial adopters (based on a particular centrality metric C) such that the number of nodes constituting the set IA among the nodes in the network equals the value of the Middle Index. We then conduct information cascade on the network (by running the Iterative Information Cascade algorithm of Sect. 2) with the set IA of initial adopters and the threshold fraction (q) of adopted neighbors. If the Iterative Information Cascade algorithm run for the set IA corresponding to the Middle Index and q is complete, it implies the cascade will also be complete for |IA| values (i.e., the number of initial adopters) greater than the Middle Index. Hence, in such a case, we move the Right Index to the left (as part of the binary search strategy of halving the search space in each iteration) and set the Right Index = Middle Index. If the Iterative Information Cascade algorithm run for the set IA corresponding to the Middle Index and q is not complete, it implies the cascade will also not be complete for |IA| values lower than the Middle Index. Hence, in such a case, we move the Left Index to the right and set the Left Index = Middle Index. We stop the iterations when the difference between the Right Index and Left Index equals 1 and we consider the latest value of the Right Index as the minimum number of initial adopters (IA min ) needed to operate the network with a threshold fraction (q) of adopted neighbors leading to a complete information cascade. We return f min IA =| IA min |/N as the minimum fraction of initiator nodes needed to operate the network with a threshold q.
The number of iterations of the proposed binary search algorithm is log 2 (|V|), where |V| is the number of nodes in the network and is the size of the search space at the beginning of the first iteration. The overall time complexity of the algorithm depends on the number of iterations of the algorithm and the time complexity of the Iterative Information Cascade algorithm run for each iteration. The overall time complexity of the Iterative Information Cascade algorithm is O(|V|*|E|). The overall time complexity of the proposed Minimum_Initial_Adopters binary search algorithm is then O(|V|*|E|*log 2 (|V|)) and is simply denoted as O(EVlogV). Figure 2 presents an example to illustrate the working of the Minimum_Initial_ Adopters binary search algorithm. The example graph has 8 vertices and we seek to find the minimum number/minimum fraction of initial adopters needed to operate the network at a threshold fraction (q = 2/3) of adopted neighbors to lead to a complete information cascade. Assume the centrality metric used is the degree centrality (DEG) metric, which is the number of neighbors for a node. The initial values of the Left Index (LI) and Right Index (RI) are 0 and 8 respectively. In the first iteration, the Middle Index (MI) value = (0 + 8)/2 = 4. The top four vertices with the largest DEG centrality are: 2, 6, 4 and 5. With these four vertices as initial adopters, we see the fraction of adopted neighbors for the other four vertices 1, 3, 7 and 8 to be 2/3 each, which is equal to q. Hence, all the four vertices 1, 3, 7 and 8 adopt the decision of their neighbors, leading to a complete information cascade. Hence, we move the Right Index to the left and set the Right Index = Middle Index = 4.
In the second iteration, the value for the Middle Index is (0 + 4)/2 = 2. The top two vertices with the largest DEG centrality are 2 and 6. With these two initial adopters, the fraction of adopted neighbors for vertices 4 and 5 are 2/4 = 1/2 each and for the other four vertices 1, 3, 7 and 8 are 1/3 each. None of these fractions of adopted neighbors are greater than or equal to q = 2/3. Hence, the information cascade is considered to be not complete and we move the Left Index to the right by setting Left Index = Middle Index = 2. In the third iteration, the value for the Middle Index is (2 + 4)/2 = 3. The top three vertices with the largest DEG centrality are 2, 6 and 4 (we can pick either 4 or 5; we break the tie arbitrarily in favor of 4). With these three initial adopters, vertices 1 and 3 will have 2/3rd of adopted neighbors, which is equal to q and hence will get added to the set of adopters. However, the fractions of adopted neighbors for vertices 5, 7 and 8 will not be affected by this and they will remain to be less than 2/3 as shown in Fig. 2. Hence, the information cascade has to be declared not complete and we set the Left Index = Middle Index = 3. At the end of the third iteration, the difference between the Right Index and Left Index has reached 1 and we exit the algorithm. The minimum number of initial adopters needed is 4 (the latest value of the Right Index) to operate the network at a threshold fraction q = 2/3 of adopted neighbors. The corresponding minimum fraction of nodes as initial adopters is 4/8 = 0.5.

Analysis of the relationship between the threshold fraction of adopted neighbors versus the minimum fraction of nodes as initial adopters and the intra-cluster density of blocking cluster
In this section, we first empirically analyze the relationship between the threshold fraction (q) of adopted neighbors for a complete information cascade versus the minimum fraction of nodes to be used as initial adopters ( f min IA ). We then analyze the relationship between the intra-cluster density of the blocking cluster of a network and the above two parameters (q and f min IA ). Analysis of the relationship: q versus f min

IA
For each of the 40 complex real-world networks analyzed in this paper and a centrality metric (DEG or BWC), we ran the binary search algorithm of Sect. 3 for q values ranging from 0.05 to 0.95, in increments of 0.05, and recorded the f min IA values. We plot the q versus f min IA values in a two-dimensional coordinate system and observed the distribution to exhibit a step function pattern for 37 of the 40 complex real-world networks considered and the two centrality metrics. As we increase the q value, there appear one or more spikes in the f min IA values. We refer to the spike that has the "largest" increase in the f min IA value as the jump zone (see Fig. 4), which is of width 0.05 (spanned by the q values q L

Fig. 3
Step function patterns observed for the q versus f min IA distribution for real-world networks with respect to degree centrality IA are the f min IA values when q = q R and q = q L respectively (i.e., at the top and bottom of the jump zone corresponding to the f min IA axis). Figure 3 shows the typical step function patterns that we noticed for the real-world networks analyzed in this research: (a) There is only one jump zone and the f min IA values appear not to change much before and after the jump zone. (b) There are two or more spikes and the first spike in the f min IA value corresponds to the jump zone. (c) There is only one spike, but the f min IA value gradually increases (with increase in q) before and/or after the jump zone. (d) There are two or more spikes and the second spike in the f min IA value corresponds to the jump zone. The Copperfield network exemplifies that the first spike in the f min IA value (unlike the Karate network) need not always correspond to the jump zone.
Our next step in the analysis is to zoom on the jump zone and determine the exact value of q (referred to as q step ) and the corresponding f min IA values (referred to as f min IA in the f min IA -axis and q L ≤ q step < q R in the q-axis. Figure 4 presents a visualization of the variables q step , f min IA and f min IA for a jump zone of width q R − q L and height f min,T The absolute value of the difference between q step and the next measurable value of q is less than or equal to ε (ε = 0.001 in this paper), a parameter referred to as the termination threshold of the binary search algorithm that we will now describe to determine the exact value of q step .
The binary search algorithm (referred to as Jump Zone Analyzer; pseudo code in Algorithm 3) to determine the exact value of q step has a search space of [q L , …, q R ). Appropriately, the algorithm starts with the Left Index set to q L and the Right Index set to q R whose corresponding f min This will be the case for any q value ranging from the Middle Index to the Right Index. Hence, we move the Right Index to the left and set the Right Index = Middle Index. We continue the iterations as long as the absolute difference between the Right Index and Left Index stays greater than or equal to a termination threshold (ε). We stop the algorithm when the absolute difference between the Right Index and Left Index becomes less than ε and declare the latest values of the Left Index, f min,LeftIndex IA and f min,RightIndex IA as q step , f min IA and f min IA respectively. With a search space of width 0.05 and termination threshold ε = 0.001, the number of iterations of the Jump Zone Analyzer algorithm is log 2 (0.05/0.001) ~ 6. As we run the binary search algorithm Minimum_Initial_Adopters of Sect. 3 in each of the six iterations, the overall time complexity of the binary search algorithm Jump Zone Analyzer to determine the q step value is the same as the overall time complexity of the binary search algorithm Minimum_Initial_Adopters. Table 1 illustrates the execution of the Jump Zone Analyzer binary search algorithm for the jump zone of the Karate Club network of Fig. 3b whose q L and q R values are 0.30 (initial Left Index) and 0.35 (initial Right Index) respectively, and the termination threshold ε = 0.001. For ease of accommodating multiple columns in Table 1, we refer to the Left Index, Right Index and Middle Index using the acronyms LI, RI and MI respectively. Based on the initial LI (q L ) and RI (q R ) values, the initial values for f min,LI  Table 1 (note that there is no rounding of numbers in the executions conducted on real-world networks in Sect. 5). As noticed in Table 1, there are a total of 6 complete iterations and the algorithm stops at the beginning of the 7th iteration (with the difference between RI and LI becoming less than 0.001). The algorithm outputs the latest values of the LI, f min,LI Clusters are expected to have a larger intra-cluster density and lower inter-cluster density, and such clusters are referred to as modular clusters (Easley and Kleinberg 2010;Newman 2010). Modular clusters have the potential to block information cascade from penetrating to nodes inside the cluster. For information cascade to penetrate into a cluster and result in a complete cascade, the bridge nodes of the cluster need to first adopt the unanimous decision: the internal nodes of the cluster would be able to adopt a decision only if at least the threshold fraction of its neighbor nodes (primarily the bridge nodes) have adopted the decision. For a bridge node to adopt a decision, its fraction of adopted neighbors (that are outside the cluster) should be greater than or equal to the threshold fraction (q) of adopted neighbors for the cascade. For a cluster with high intracluster density, the bridge nodes of the cluster are also expected to have a larger intracluster density: i.e., the fraction of adopted neighbors (outside the cluster) of the bridge nodes is expected to be lower. Hence, for information cascade to penetrate into such a cluster with high intra-cluster density, the threshold fraction (q) of adopted neighbors must be lower. Conversely, the larger the threshold fraction (q) of adopted neighbors with which we can accomplish complete information cascade in a network, lower the intra-cluster density of the clusters in the network. For a cluster of intra-cluster density ρ, the fraction of adopted neighbors (outside the cluster) possible for a bridge node is at most 1 − ρ. In order to make the bridge node to adopt the unanimous decision needed for complete information cascade, 1 − ρ ≥ q; i.e., the intra-cluster density ρ ≤ 1 − q. If 1 − < q (i.e., ρ > 1 − q), the cluster will not be penetrable (i.e., the bridge nodes cannot be made to adopt a decision) and will become a blocking cluster. Hence, in order to be able penetrate such blocking clusters of high intra-cluster density, it becomes imperative to include one or more bridge nodes of these blocking clusters as part of the set of initial adopters. In Sect. 4.1, we noticed that the minimum fraction of nodes as initial adopters needed for complete information cascade increased in a step function pattern from f min IA to f min IA and q step (corresponding to f min IA ) is the last measurable value of q beyond which we needed to increase the minimum fraction of nodes as initial adopters from f min IA to f min IA in order to accomplish complete information cascade. That is, when the intra-cluster densities (ρ values) of the clusters in the network were all less than or equal to 1 − q step , we were able to accomplish complete information cascade with f min IA values ≤ f min IA . Any further increase in q (beyond q step ) would make the intra-cluster density of at least one cluster to become greater than 1 − q and we will not be able to penetrate such a cluster with f min IA fraction of initial adopters and will need to increase the minimum fraction of initial adopters to f min IA , with f min IA appreciably greater than f min IA for most of the networks. Hence, in order to capture the intra-cluster density of such blocking clusters, we refer to the jump zone (refer Figs. 3, 4) as the bounded area in the q versus f min IA distribution that encounters the largest increase in the f min IA values. The value of "1 − q step " for a network could be thus used to assess the intra-cluster density of the blocking cluster(s) for the network. We propose that the 1 − q step value for a network be called the Cascade Blocking Index (CBI) of the network (whose values range from 0 to 1), a quantitative measure of the intra-cluster density of the blocking cluster(s) of the network. The larger the CBI value for a network, the larger the intra-cluster density of the blocking cluster(s) of the network and vice-versa.

Cascade Blocking Index: information cascade versus infection spread
It is important to note that the CBI metric can be used to decide on the nature of values for the threshold fraction of adopted neighbors for accomplishing complete information cascade. For networks with a lower CBI value, one could impose a larger value for the threshold fraction of adopted neighbors that are needed for a node to adopt a decision and be able to still accomplish complete information cascade (the bridge nodes of the clusters will have a larger fraction of neighbors that are outside the clusters and if these neighbors adopt a decision, the bridge nodes will also be in a position to adopt the decision). For networks with lower CBI value, it is thus possible to make the nodes to consensually adopt the unanimous decision (needed for accomplishing complete information cascade) only after a majority of its neighbors adopt the same decision (i.e., the nodes need not be in a rush to adopt the decision when only a minority of their neighbors have adopted). On the other hand, for networks with a larger CBI value, one would have to operate at relatively lower values for the threshold fraction of adopted neighbors to accomplish complete information cascade. That is, for networks with a larger CBI value, nodes (especially, the bridge nodes) may be forced to adopt the unanimous decision needed for complete information cascade even if a majority of its neighbors have not yet decided by then.
From an epidemic standpoint, we do not want a virus/disease to be able to penetrate the clusters of a network and spread. In this context, if the bridge nodes are say vaccinated (to be immune to the disease; i.e., operating at q values closer to 1), unless all the neighbors become infected, the bridge nodes are not vulnerable to get infected and the virus/disease cannot penetrate through the clusters. For clusters with lower intra-cluster density (i.e., lower CBI values), the bridge nodes are expected to have a larger fraction of neighbors who are outside the cluster. If the bridge nodes are not immune to the disease (i.e., the q value at which the network can be operated is lower) and the network has a lower CBI value, it is possible for the bridge nodes to get infected even if a lower fraction of the outside neighbors get infected and the disease can easily penetrate through the clusters. Thus, if the network has a lower CBI value, it is essential to immunize the bridge nodes of the clusters. On the other hand, if a network has a larger CBI value (i.e., a lower fraction of the neighbors for the bridge nodes are outside the clusters) and the bridge nodes are immune to the disease (i.e., the q value at which the network can be operated could be higher, closer to 1), it will be difficult for the infection to penetrate through a cluster from outside through these bridge nodes.

Quantitative analysis of the real-world networks
We analyzed a suite of 40 real-world networks of diverse domains. Table 2 presents a listing of the 40 real-world networks along with the number of nodes and edges, their references, domain and the three-character code used to refer to these networks later in the paper. The networks analyzed are spread over several domains (the numbers inside the parenthesis indicate the number of networks analyzed from these domains) like coappearance networks (7), biological networks (4), collaboration networks (3), literature networks (2), employee networks (3), transportation networks (2), game network (1), geographical network (1), citation network (1), social networks (13) and web networks (3). All the networks are considered as undirected graphs and are connected (i.e., the vertices in a graph are reachable to each other and exist as a single component). Table 3 (obtained by running the Jump Analyzer binary search algorithm of Sect. 4) presents the CBI = 1 − q step , f min IA and f min IA values for the 40 real-world networks with respect to the degree (DEG) and betweenness (BWC) centrality metrics (for choosing the initial adopters). Figure 5 plots the difference f min IA − f min IA in the f min IA values in the vicinity of q step values in the jump zones observed for the 40 real-world networks. We observe an appreciable difference (of 0.14 or more) in the f min IA − f min IA values (with respect to both DEG and BWC) for 37 of the 40 real-world networks. The median of the f min IA − f min IA values is observed to be 0.44 for DEG-based analysis and 0.32 for BWC-based analysis. We observe the difference to go as large as 0.70, justifying our association of the increase in the f min IA values in the vicinity of q step to the cascade capacity of the blocking cluster(s) of the networks.    networks) and the CBI(BWC) values to be 0.50 or more for 24 of the 40 networks (i.e., for 60% of the networks). Thus, for about 2/3rds of the real-world networks, we need to operate at lower values for the threshold fraction (1-CBI; q < 0.50) of adopted neighbors and the corresponding f min IA fraction of initial adopters to accomplish complete information cascade. When we apply the qualitative discussion of Sect. 4.3 (CBI: Information Cascade vs. Infection Spread) to the results observed in this section, we can say that for about 2/3rds of the 40 real-world networks analyzed in this paper (that have a larger CBI value), a bridge node is more likely to adopt the unanimous decision (needed for complete information cascade) when less than half of its neighbors have adopted the decision. On the other hand, from an epidemic standpoint, 2/3rds of the 40 real-world networks (that have a larger CBI value) are not vulnerable for an infection spread if the bridge nodes of the clusters are vaccinated to the related virus/disease.

Analysis of the CBI values
With regards to the difference in the CBI values with respect to the DEG and BWC metrics, we observe no difference in the CBI(DEG) and CBI(BWC) values for 11 of the 40 real-world networks (i.e., for 28% of the networks) and a difference of more than 0.10 for only 10 of the 40 real-world networks (i.e., for only 25% of the networks). The median difference in the CBI values is 0.049. On the basis of the raw values, we observe the CBI(DEG) values to be numerically larger than the CBI(BWC) values for 20 of the 40 real-world networks (i.e., for 50% of the networks). We can thus conclude the choice of the centrality metric would only have at most a moderate impact on the CBI values for a network. The relatively lower CBI(BWC) values for certain networks could be due to the larger BWC values for the bridge nodes of the clusters and their preferential inclusion in the set of initial adopters by the Minimum Initial Adopters algorithm. As a result, for networks with relatively lower CBI(BWC) values, there is a relatively better chance for accomplishing complete information cascade with larger q values (lowering the 1 − q step values) when nodes with larger BWC are considered for inclusion to the set of initial adopters.

CBI values versus intra-cluster densities of the clusters
We ran the Louvain community detection algorithm (Blondel et al. 2008) on the 40 realworld networks and determined the intra-cluster densities of the clusters (identified by the Louvain algorithm) in these networks. The composition of the clusters/communities of a network could vary with the community detection algorithm used to determine them. We chose to use the Louvain community detection algorithm as it is a well-known, computationally-less intensive and commonly used algorithm in the literature and in software packages [like Gephi (Gephi 2020)] to determine modular clusters. The Louvain community detection algorithm is a hierarchical community detection algorithm that recursively merges the communities (initially, each node is in its own community) such that the sum of the modularity scores of the communities is maximized. A cluster/ community has a larger modularity score if the intra-cluster density of the cluster is significantly larger than the inter-cluster density. Recall, the intra-cluster density for a cluster is computed as the minimum of the intra-cluster densities of the bridge nodes of the cluster, and the intra-cluster density for a node (including a bridge node) in a cluster is the fraction of its neighbor nodes that are within the same cluster. Figure 7 plots the distribution of the intra-cluster densities of the clusters (in yellow colored-smaller circles) in the 40 real-world networks along with the CBI(DEG) and CBI(BWC) values in red and green-colored larger circles respectively. If the CBI(DEG) and CBI(BWC) values for a network are the same, the red and green circles are placed on top of each other. With regards to DEG or BWC, we observe CBI(DEG) to be a better choice to quantify the intra-cluster densities of the blocking clusters of the network: the CBI(DEG) values appear as an upper bound for the intra-cluster densities for about 32 of the 40 networks (i.e., for 80% of the networks); the CBI(BWC) values appear as an upper bound for the intra-cluster densities for about 27 of the 40 networks (i.e., for slightly more than 2/3rds of the networks). Either way, the largest of the intra-cluster densities of the clusters (these are the blocking clusters that decide the success or failure of information cascade for a given threshold fraction of adopted neighbors) determined for at least 2/3rds of the real-world networks are less than or equal to the CBI values (with respect to both DEG and BWC) reported in our research. Figure 8 plots the largest of the intra-cluster densities of the Louvain clusters (could be considered as the intra-cluster density of the blocking cluster) of the real-world networks versus the CBI(DEG) and CBI(BWC) values. Table 4 presents the correlation coefficient values observed between these measures: the upper diagonal presents the Pearson's (P) linear regression-based correlation coefficient values and the lower diagonal presents the Spearman's (S) rank-based correlation coefficient values. We observe a strong positive correlation (correlation coefficient values > 0.7) between the CBI(DEG) values for the 40 real-world networks and the largest of the intra-cluster densities of the Louvain clusters in these networks. A strong positive correlation also implies that we could predict the intra-cluster density of the blocking cluster of a network using its CBI value as well as the ranking of the networks based on their CBI Fig. 7 Comparison of the intra-cluster densities of the clusters (determined using the Louvain algorithm) of the real-world networks with the CBI(DEG) and CBI(BWC) values values would be almost the same as the ranking of the networks based on the intracluster densities of the blocking clusters of the networks.
We observe a moderately positive correlation (correlation coefficient values in the range of 0.5, …, 0.7) between the CBI(BWC) values for the 40 real-world networks and the largest of the intra-cluster densities of the Louvain clusters in these networks. We also observe a moderately positive correlation between the CBI(DEG) and CBI(BWC) values for the 40 real-world networks, justifying our observations in Sect. 5.1 that the choice of the centrality metric could at most have a moderate impact on the CBI values of the networks. Putting together the observations in Sects. 5.1 and 5.2, we could conclude CBI(DEG) to be an ideal choice for quantifying as well as predicting/ranking the intra-cluster densities of the blocking clusters of complex real-world networks.

Distribution of the initial adopters in the Louvain clusters
For each real-world network and centrality metric (DEG and BWC), we determine the distribution of the initial adopter nodes corresponding to the f min IA value (whose corresponding threshold fraction of adopted neighbors is q step = 1-CBI) in the clusters computed using the Louvain algorithm. Table 5 displays the number of initial adopters (corresponding to the f min IA value) in each of the Louvain clusters for the real-world networks. We refer to "Fraction of IA clusters ( f Clusters IA )" as the ratio of the number of clusters that have at least one initial adopter node in them and the total number of clusters.
We use the notations f Clusters IA,DEG and f Clusters IA,BWC to respectively indicate the fraction of IA clusters observed with respect to the DEG and BWC centrality metrics. Figure 9i plots the f Clusters IA,DEG and f Clusters IA,BWC in the decreasing order of their values for the real-world networks: the f Clusters IA,DEG and f Clusters IA,BWC values were equal to 1.00 (implying there was at least one initial adopter in each cluster) for respectively 19 and 27 of the 40 real-world networks, and

CBI(DEG) CBI(BWC)
Largest of the intra-cluster densities of the Louvain clusters   Figure 10 shows the distribution of the nodes (nodes with larger DEG and BWC values are bigger in size) in the different Louvain clusters of the four sample real-world networks that were shown in Fig. 3 (Step Function Patterns Observed for the q versus Distribution for Real-World Networks with respect to Degree Centrality). We observe nodes with larger DEG (in Fig. 10-i) and BWC (in Fig. 10-ii) values (that are candidates to be selected as the initial adopter nodes) to be distributed in all the Louvain clusters. Though we just present a sample of the real-world networks in Fig. 10-(i) and (ii), from Figs. 9 and 10, we could confidently conclude that the initial adopters chosen on the basis of the DEG or BWC metrics are not concentrated in just one cluster and are spread over multiple clusters of the real-world networks.
We now analyze the impact of the fraction of IA clusters on the CBI, f min IA and f min IA values observed for the real-world networks. Figure 11 presents plots of f Clusters IA,DEG versus   Fig. 10 Distribution of the nodes in the louvain clusters of the real-world networks CBI(DEG) and f Clusters IA,BWC versus CBI(BWC) measures, wherein we observe only at most a moderate correlation (Pearson's correlation coefficient less than 0.60) between the measures. This implies that the CBI value for a real-world network is not heavily dependent on the presence or absence of one or more initial adopters in any particular cluster of the real-world networks. Likewise, Fig. 12 presents plots of f Clusters IA,DEG versus the f min IA values and the f min IA values for DEG and BWC-based initial adopter selection (denoted as f min IA,DEG , f min IA,BWC , f min IA,DEG and f min IA,BWC in Fig. 12), wherein we only observe weak-moderate correlation (Pearson's correlation coefficients in the range of 0.4 to 0.6) in the case of DEG and no correlation (Pearson's correlation coefficients in the range of 0.0 to 0.1) in the case of BWC.

Analysis of the computation times of the algorithms for the real-world networks
In this sub section, we discuss the computation times observed for the binary search approach of Algorithm 3 to determine the q step value for a real-world network with respect to a centrality metric (used to choose the initial adopters) as well as for the binary search approach of Algorithm 2 (which uses Algorithm 1 for running iterative information cascade) to determine the f min IA and f min IA values needed to operate the network to accomplish complete information cascade at q step and the next measurable value of q step respectively. These computation times are reported in Table 6. All the three algorithms (Algorithms 1, 2 and 3) are implemented in Java and run on a desktop Windows 7 computer (Intel i7-2620 M CPU @ 2.70 GHz with 8 GB RAM). The computation times  Table 6.
For both the centrality metrics and for each of the 40 real-world networks, we observe the computation times for f min IA to be greater than that of f min IA . Though the number of iterations for Algorithm 2 (Binary Search Algorithm to Determine the Minimum Number of Initial Adopters for a Given Threshold Fraction of Adopted Neighbors: q) to compute f min IA and f min IA is ln(V), where V is the number of vertices in the network, the relatively larger computation time for f min IA could be attributed to the larger number of iterations of Algorithm 1 (Iterative Algorithm to Conduct Information Cascade for a Given Set of Initial Adopters: IA and a Threshold Fraction of Adopted Neighbors: q) for each iteration of Algorithm 2. Remember that in Algorithm 1, we conclude that complete information cascade is not accomplishable for a particular value of q and IA if there are no newly adopted vertices in the latest iteration. When operated at q (threshold fraction of adopted neighbors) values less than or equal to q step and a smaller sized set of initial adopters, it is more likely that the number of newly adopted vertices in each iteration of Algorithm 1 would be greater than 0: leading to relatively more iterations before we could conclude whether complete information cascade is accomplishable or not for a particular value of q. On the other hand, when operated at q values above q step and a larger sized set of initial adopters, we could decide whether complete information cascade is accomplishable or not for a particular value of q by going through relatively fewer iterations of Algorithm 1. When operated with a search space of 0.05 and termination threshold ε = 0.001, Algorithm 3 (Binary Search Algorithm to Analyze the Jump Zone of a q versus Distribution) runs Algorithm 2 exactly six times (see Sect. 4.1 for more details). Of course, as discussed above, the computation time for Algorithm 2 depends on the q value considered. Hence, we expect the computation time of Algorithm 3 for any real-world network and centrality metric to be greater than the computation time reported for f min IA (corresponding to one run of Algorithm 2) for the particular real-world network but less than or equal to six times the computation time for f min IA . An interesting observation in Table 6 is that for more than half of the real-world networks, the computation times reported for any of the three measures (q step , f min IA , f min IA ) are relatively lower when the BWC metric is used to choose the initial adopters compared to the DEG metric. This is because the q step values for at least 50% of the realworld networks with respect to the BWC metric are observed to be larger than those observed with respect to the DEG metric. Based on the earlier discussion in this sub section, Algorithm 1 takes relatively less time to decide whether or not complete information cascade is accomplishable when operated at a larger q value. It is thus logical to observe relatively lower computation times for the q step , f min IA and f min IA values for the real-world networks when the initial adopters are chosen with respect to the BWC metric. Note that, as mentioned earlier, we did not take into consideration the computation times for the centrality metrics (DEG or BWC) that are used to choose the set of initial adopters. It is well-known in the literature that BWC is a computationally-heavy metric (Meghanathan 2017b) and DEG is a computationally-light metric. If we were to incorporate the computation times of the centrality metrics in the computation times of the q step , f min IA and f min IA measures, it would skew the values and we will not be able to make any inferential observation. We seek to model the computation times for each of the q step , f min IA and f min IA measures reported in Table 6 as a function of the number of nodes (V) in the network. After trying different models, we observe the model for computation time t = a*V b to be the most closest fit with R 2 values above 0.90, wherein the coefficients a and b vary depending on the measure (q step , f min IA and f min IA ) and centrality metric (DEG or BWC) considered. Figure 13-(i) and (ii) present plots of the # vertices (V) versus the computation time (t) values for each of the above three measures with respect to DEG and BWC respectively; the models that we fit for each of these six plots are shown below the plots. The models for the computation times (polynomial functions of the number of vertices in the graph, with degree less than 2) clearly indicate the scalability of the binary search-based algorithms 2 and 3 proposed in this research.
The theoretical time complexities of the well-known clustering algorithms in the literature range from O(E): Louvain algorithm (Blondel et al. 2008) to O(E 2 V): Girvan-Newman algorithm (Newman 2004), where V and E are respectively the number of vertices and edges in a network. For dense graphs (Cormen et al. 2009): E = O(V 2 ) and for sparse graphs (Cormen et al. 2009): E = O(V). The theoretical time complexity of the proposed binary search algorithms 2 and 3 (see Sects. 22-4 is O(E*VlogV), which falls in between the extremes of O(E) and O(E 2 V). Moreover, the actual time complexities (empirical models) of algorithms 2 and 3 (polynomials with degree greater than one, but less than two, as indicated in Fig. 13) are all observed to be much lower than E*VlogV, the theoretical upper bound. Thus, the empirical models of the computation times for the q step , f min IA and f min IA measures presented in this sub section clearly demonstrate the suitability of the proposed binary search algorithms to run for very larger networks (with the number of nodes much greater than the ones studied in this paper) and justify our claim that the CBI values be considered a (perhaps computationally-light) quantitative measure of the intra-cluster density of the blocking clusters of a network. Even for a network of 100,000 nodes, the computation times (on a regular desktop computer with Intel i7-2620 M CPU @ 2.70 GHz with 8 GB RAM) for the q step , f min IA and f min IA values would be close to 25 min, 9 min and 3 min respectively for DEG-based selection of initial adopters and would be close to 25 min, 10 min and 2 min respectively for BWC-based selection of initial adopters.

Related work
In Jalili and Perc (2017), the authors quantified the spreading influence of a node as the fraction of nodes that adopt a decision based on the decision adopted by the node. They observed a positive correlation between the degree (DEG) and betweenness (BWC) centrality metrics versus the spreading influence of a node. In Yang and Leskovec (2010), the authors observed that for effective dissemination of information, the connections of the initial adopter nodes are more important than the number of initial adopters. In Ghasemiesfeh et al. (2013), the authors observed that for the complex contagion model that is also used in this paper (a node adopts a decision only when the fraction of neighbors that have adopted the decision is greater than or equal to a threshold), the diffusion speed depends on the distribution of the weak ties [nodes with larger BWC are considered to be weak ties in a network (Easley and Kleinberg 2010)]. In Bakshy et al. (2012), the effect of tie strength on information sharing and diffusion in Facebook was studied. It was observed that strong ties (friends who interacted more frequently) were effective in information cascade within clusters, whereas weak ties (friends who did not interact more frequently) were effective in information cascade across clusters. It was also observed that the probability of sharing of a link by a user increases with the number of sharing friends. In Watts (2002), it was observed that increased heterogeneity in the 'q' values (threshold fraction of adopted neighbors needed for a node to adopt a decision) reduces the chances of success in accomplishing complete information cascade (note that we use the same value of 'q' for all the nodes in the network), whereas increased heterogeneity among nodes with respect to degree makes a network relatively less vulnerable to information cascade. In a prior work (Meghanathan 2019), we observed that real-world networks are more dissortative with respect to the remaining degree of the vertices (one less than the degree of a vertex) and the betweenness centrality metric, and more assortative with respect to the eigenvector (Bonacich 1987) and closeness (Freeman 1979) centrality metrics. As a result, it is very less likely that the strategy of selecting nodes in the decreasing order of their DEG or BWC values to form the set of initial adopters would lead to all the nodes from the same cluster being selected. Using all of the above observations as guidelines, we decided to use the DEG and BWC metrics as the basis to choose the initial adopters.
Though the terminologies "information diffusion" and "information cascade" are sometimes interchangeably used in the literature, a recent work (Buskens 2019) clearly distinguished information diffusion from information cascade [referred to as information appreciation in Buskens (2019)] as well as explained the issue of selection of initial adopters for these two phenomena. Information diffusion is a phenomenon in which the information has to just reach all the nodes in the network; whereas Information cascade is a phenomenon wherein the nodes are required to adopt/accept the information and propagate further. The complex contagion model used in this paper is one of the commonly used models for information cascade. For information cascade, there needs to be some redundancy in the channels through which information propagates (to ensure a threshold fraction of the neighbors of a node have adopted/accepted the information before the node does so); whereas for information diffusion, redundancy in the number of channels through which information propagates could actually slow down the spread. For faster information diffusion with minimal redundancy, it would be more prudent to choose the initial adopters spread over in different clusters. On the other hand, for complete information cascade (which is the focus of this paper), due to the need for redundancy to accept the information and propagate further, it would be more effective to include one or more bridge nodes of the clusters to be part of the set of initial adopters.
In Chesney (2017), the author came up with the notion of the "cascade capacity of a node" which in the context of this paper is the maximum q value that we could employ for a network and accomplish complete information cascade by starting with the node as the only initial adopter. Like centrality metrics, each node would have a different cascade capacity depending on its position in the network vis-a-vis its neighbors. A simulationbased iterative procedure was proposed in Chesney (2017) to determine the cascade capacities of the individual nodes in a network. The cascade capacity of a node was observed to be weakly correlated with centrality metrics such as DEG and BWC that are typically used to choose the initial adopters. Like the DEG and BWC metrics, the cascade capacity of a node is a valuable information that could also be considered to choose the set of initial adopters and we intend to explore this in our future research. However, the time complexity to determine the cascade capacity of a node is not formally evaluated in Chesney (2017). As the time complexity will be overwhelming if one were to adopt a brute force approach to determine the cascade capacity of the nodes on a nodeby-node basis, as part of future work, we plan to investigate the relationship between the cascade capacity of the nodes and the CBI value for a network and accordingly develop binary search approaches exploiting any such relationship that might exist between the two measures and determine the cascade capacity of the nodes in an efficient manner.
In Watts and Dodds (2007), it was observed that high-degree nodes play a critical role to choose an appropriate value for the fraction of initial adopters needed to induce complete information cascade in random networks. Though the above work was done primarily for random networks, the theoretical observation made in Watts and Dodds (2007) is also observed to hold good for the real-world networks analyzed in our research. The high-degree bridge nodes of the clusters are the ones that appear to be the stumbling block for information cascade to penetrate through the clusters and we need to ramp up the fraction of initial adopters when the threshold fraction (q) of adopted neighbors exceeds q step . As a result, inclusion of high-degree nodes in the set of initial adopters turns out to be a relatively more effective strategy (compared to the inclusion of nodes with high BWC) to quantify the intra-cluster density of the blocking clusters of the real-world networks. In another related work (Watts and Dodds 2007) on random networks, the author observed that heterogeneous thresholds (different q values for the nodes) are more likely to result in complete information cascade. As part of future work, we plan to evaluate the impact of heterogeneous threshold values for the fraction of adopted neighbors on the CBI values of real-world networks.
Information cascade has been observed to play a significant role in sequential voting mechanisms such as presidential primaries and roll-call voting (Knight and Schiff 2010), wherein information about the previously cast votes are revealed to the voter and accordingly the voter finalizes his/her choice of the candidate that could be even different from their personal preference/initial choice. In a recent work (Tump et al. 2020), the authors modeled the dynamics of social decision-making process leading to information cascade and observed the drift rate from the personal choice to the majority choice exhibited a convex increase for smaller majority size and a concave increase for medium and larger majority sizes. The impact of the decision made by the initial adopters on the final outcome of a sequential unanimous decision-making information cascade phenomenon was also experimentally studied in Anderson and Holt (1997) involving several test subjects. Each test subject was asked to predict the predominant color of a collection of balls in an urn after looking at the color of a randomly drawn ball by the test subject as well as considering the colors of the balls drawn by the earlier test subjects. It was observed that if the initial few decisions coincide, the subsequent test subjects also took the same decision as the earlier test subjects, irrespective of the color of the ball drawn by the test subjects. Similar experiments were also designed in Alevy et al. (2007) and Mori et al. (2013). Such experiments motivated us to develop an algorithm to determine the minimum fraction of nodes to be chosen as initial adopters to accomplish complete information cascade (i.e., adopt a unanimous decision) for a given threshold fraction (q) of adopted neighbors.
In a recent work (Hisakado and Mori 2009), voter model dynamics was studied in the context of information cascade: the authors observed that in a population mix of independent voters (who vote on their own) and copycat voters (who vote probabilistically based on the number of votes polled so far for each candidate), the distribution of the voting rate is observed to go through a complete phase transition (i.e., from a binomial distribution to a beta distribution) only when the number of votes seen by the copycat voters is significantly high (theoretically, infinity). On the other hand, if the fraction of copycat voters is 1/2 or more, the voting rate converges more slowly. The voting style of the copycat voters in the above work is similar to the adoption behavior of the nodes under the complex contagion model for information cascade. In Watts and Dodds (2007), the distribution of the cascade size (the number of nodes that adopt the decision of the initial adopters) versus the fraction of nodes to be chosen as initial adopters was observed to be bimodal for synthetic networks generated per the Watts model (Watts 2002), indicating that there is a threshold value for the fraction of adopted neighbors that connects the two distributions. The sudden significant increase observed in our research in the fraction of initial adopters needed for complete information cascade with increase in q beyond q step also resembles the phase transition and bimodal distribution reported in the above theoretical works.

Conclusions and future work
The high-level contributions of this paper are the following: We proposed a binary search algorithm to determine the minimum fraction of nodes to be used as initial adopters ( f min IA ) to accomplish complete information cascade for a given threshold fraction (q) of adopted neighbors. We analyzed a suite of 40 real-world networks of diverse domains and observed the q versus f min IA distribution to exhibit a step function pattern for 37 of these networks and identified a jump zone within which the f min IA value spiked from f min IA to f min IA (with a median difference of 0.44 and 0.32 respectively when degree and betweenness centrality metrics are used to choose the initial adopters). We proposed a second binary search algorithm (that makes use of the first binary search algorithm) to determine the 'q' value (referred to as q step ) beyond which we needed to increase the minimum fraction of nodes as initial adopters from f min IA to f min IA in order to accomplish complete information cascade. We propose that the "1 − q step " value (referred to as the Cluster Blocking Index: CBI) for a network be considered as a measure of the intra-cluster density of the blocking cluster of the network as any further increase in q (beyond q step ) would make the intra-cluster density of at least one cluster (the blocking cluster) to become greater than 1 − q. We will not be able to penetrate such a cluster with f min IA fraction of initial adopters and will need to increase the minimum fraction of initial adopters to f min IA by including one or more nodes of the blocking cluster to be part of the set of initial adopters. The CBI value for a network could be used to decide the appropriate value for the threshold fraction of adopted neighbors needed to facilitate information cascade or stop infection spread. For networks with larger CBI values: complete information cascade can be accomplished only when operated with lower q values (i.e., nodes are forced to take a decision when only fewer of their neighbors have made the decision); an infection spread can be avoided by operating with larger q values (equivalent to vaccinating the bridge nodes). We observe a majority of the 40 real-world networks to incur larger CBI values (of 0.50 or more) with respect to both the degree (DEG) and betweenness (BWC) metrics, the centrality metrics used to choose the initial adopter nodes. We observe the fraction of IA clusters (fraction of Louvain clusters with one or more initial adopter nodes) is 1.00 for several real-world networks and has only at most a moderate correlation with the CBI values determined for the networks.
The CBI value for a network is not dependent on any particular clustering algorithm as well as there is no need to run any clustering algorithm to determine the clusters, evaluate their intra-cluster densities to identify the blocking cluster and thereby decide on the CBI value for a network. Note that the two-phase approach explained in Sects. 3 and 4 to determine the CBI value for a network does not use any clustering algorithm. We expect the CBI value to serve as the upper bound for the intra-cluster densities of the clusters determined by any clustering algorithm. We verified our claim in this paper by determining the intra-cluster densities of the clusters in 40 real-world networks using the well-known Louvain community detection algorithm. While the CBI(DEG) values were observed to serve as upper bound for the intra-cluster densities of the Louvain clusters for 32 of the 40 real-world networks, the CBI(BWC) values were observed to serve as upper bound for 27 of the 40 real-world networks. The DEG metric has been observed to be effective in determining CBI values that could serve as upper bound for the intra-cluster densities of the clusters in the real-world networks as well as be used to predict the intra-cluster density of the blocking cluster of a network. We also validated the scalability of the proposed binary search algorithms for large network graphs by deriving empirical models for the actual time complexities (polynomial functions of degree greater than one, but less than two) encountered to determine the three measures: q step , f min IA and f min IA . We hypothesize that smaller clusters in a network are more likely to become the blocking clusters for complete information cascade as the fraction of alien neighbors (fraction of neighbors that are outside the cluster) for the bridge nodes in smaller clusters are likely to be lower than the fraction of alien neighbors for the bridge nodes in larger clusters. As part of future work, we plan to investigate the role of cluster size in the intra-cluster densities and the CBI values of the network. Bridge nodes are the entry points to penetrate through a cluster. Though there exists centrality metrics to quantify the "bridgeness" of the nodes in networks (Jensen et al. 2015), these centrality metrics typically identify nodes that connect two or more clusters, but are not part of either of the clusters. As part of future work, we also plan to work on developing a centrality metric that quantifies the extent to which a node can let information cascade to successfully penetrate through a cluster. In addition to the above, for future work, we also plan to explore the open research problems mentioned in the Related Work section.
Abbreviations f min IA : Minimum fraction of nodes in a network to be used as initial adopters; q: Threshold fraction of adopted neighbors; q step : The threshold fraction of adopted neighbors beyond which the f min IA value exhibits a sharp increase; f min IA : The f min IA value corresponding to q step ; f min IA : The f min IA value corresponding to the next measurable value of q beyond q step ; DEG: Degree centrality; BWC: Betweenness centrality; IA min : The minimum number of nodes to be used as initial adopters for a given q; CBI: Cluster Blocking Index; CBI(DEG): Cluster Blocking Index of a network with the degree centrality metric used to choose the initial adopters; CBI(BWC): Cluster Blocking Index of a network with the betweenness centrality metric used to choose the initial adopters; IA: Set of initial adopters; LI: Left index of the binary search algorithm; MI: Middle index of the binary search algorithm; RI: Right index of the binary search algorithm.