Dynamics of opinion polarization

For decades, researchers have been trying to understand how people form their opinions. This quest has become even more pressing with the widespread usage of online social networks and social media, which seem to amplify the already existing phenomenon of polarization. In this work, we study the problem of polarization assuming that opinions evolve according to the popular Friedkin-Johnsen (FJ) model. The FJ model is one of the few existing opinion dynamics models that has been validated on small/medium-sized social groups. First, we carry out a comprehensive survey of the FJ model in the literature (distinguishing its main variants) and of the many polarization metrics available, deriving an invariant relation among them. Secondly, we derive the conditions under which the FJ variants are able to induce opinion polarization in a social network, as a function of the social ties between the nodes and their individual susceptibility to the opinion of others. Thirdly, we discuss a methodology for finding concrete opinion vectors that are able to bring the network to a polarized state. Finally, our analytical results are applied to two real social network graphs, showing how our theoretical findings can be used to identify polarizing conditions under various configurations.


I. INTRODUCTION
W ITH the rise of social media and online social networks, online interactions have started playing an increasingly important role in how people form their opinions, to the point that news consumption itself is now often mediated by social interactions [1], [2]. Social networks, though, do not merely provide a transparent technological substrate that facilitates interactions in the online dimension. Their algorithmic personalization, aimed at highlighting content that is more interesting to each of us, effectively reinforces our cognitive biases, reducing the cognitive discomfort we experience when exposed to opinions challenging our beliefs but at the same time reducing the diversity and range of opinions we are exposed to. By reinforcing consonant opinions and downplaying, or even removing, discordant ones, social networks cradle us into curated filter bubbles and comfortable echo chambers. However, whether this leads to actual polarization [3], [4], [5] is still debated. Some argue that the very nature of social All authors are with the Institute of Informatics and Telematics (IIT) of the National Research Council (CNR), Italy. email: first.last@iit.cnr.it This work is supported by the European Union -Horizon 2020 Program under the "SoBigData-PlusPlus" (Grant Agreement 871042) and "HumanE-AI-Net" (Grant Agreement 952026) projects. This work is also supported by the SAI project, funded by the CHIST-ERA grant CHIST-ERA-19-XAI-010. The work of C. Boldrini, M. Conti and A. Passarella is partly supported by PNRR -M4C2 -Investimento 1.3, Partenariato Esteso PE00000013 -"FAIR -Future Artificial Intelligence Research" -Spoke 1 "Human-centered AI", funded by the European Commission under the NextGeneration EU programme. The work of C. Boldrini is also supported by project SERICS (PE00000014) under the MUR National Recovery and Resilience Plan funded by the European Union -NextGenerationEU. networks, i.e., the socialization of information consumption, may counteract the above effects [6], others that individual choices (to bond with similar others and to prefer concordant information) are more predominant than algorithmic filtering [7], others again that exposure to opposing views is more likely to actually backfire than to widen our perspectives [8].
To make matter worse, information may not only be partisan but it could also be blatantly fake [9].
This quest towards a better understanding of the impact of the social algorithm [9] and misinformation on our societies is ingrained with a more general question that, even when removing the cyber-dimension, still remains unsolved: how do people form their opinions? This question has fascinated sociologists and economists alike since much before the advent of the Internet, but it has recently gained new momentum, with computational sociologists and control theorists now weighing in. The literature on opinion dynamics is vast, with many models being proposed that aim at capturing a variety of cognitive and social mechanisms that lead to forming an opinion, such as social influence (which determines whose opinion you are affected by), cognitive dissonance (which triggers your willingness to adapt), anchoring to one's own opinion (which captures our prejudices). For an in-depth discussion, we refer the interested reader to recent surveys, such as [10], [11], [12].
So-called averaging models are one of the most popular classes of such opinion dynamics models [13], [14], [15]. In these models, the final opinions (also known as expressed opinions) are a function of a repeated weighted averaging of the opinions of neighboring (in the influence graph) nodes. The strengths of averaging models lie in their mathematical tractability [16], ability to capture strong 1 opinion diversity [18], and their general flexibility (e.g., they can capture the wisdom of the crowd phenomenon [19] or include prominent agents [20] such as media sources and politicians that may be systematically biased and not willing to change their opinion at all).
The Friedkin-Johnsen (FJ) model [15] is the most popular averaging model in the related literature. It is the only model that has been validated on small and medium-sized groups [21], [22], and even in human-AI group experiments [23]. Focusing on it, our first contribution is to provide a comprehensive review of all the major variants of the FJ model and of the polarization metrics described in the related literature. For them, we will highlight their key features and the differences between each other. We found that polarization metrics are linked together through an invariant relationship. As a second contribution, we derive the conditions under which the FJ model yields polarization, for each of the po-larization metrics identified before. In addition, we also prove that the polarizing opinion vectors can be found analytically in most cases. All the results obtained are exploited to identify polarizing conditions, under different configurations, with two popular datasets of real social networks.

A. Background and motivation
The simplest averaging model is the DeGroot's model [13], whereby the opinion of a node is simply the average opinion of its neighbours, weighted by the strength of their social influence. This model, however, is not considered realistic, since, when it converges (i.e., if the nodes' opinions stabilize), it always leads to consensus, i.e., to a final state in which all nodes have exactly the same opinion [24]. To overcome this problem, Friedkin and Johnsen [15] proposed a variation on the Groot's model that introduces a certain degree of stubbornness in nodes. Their hypothesis is that a personal opinion always remains at least partly anchored to the initial opinion (or prejudice), more or less so depending on the individual's attitude to be influenced by others. The Friedkin-Johnsen model does not lead to consensus (except in very particular cases [25]) and has been widely popular in the related literature [26], [27], [28], [29], [30], [16], [31], [22], [25]. The FJ model has enjoyed two main avenues of research: on the one hand, the derivation of the conditions for convergence or consensus has been the main focus of the research efforts from the control theory domain [32], [31], [16], [33]. On the other hand, the graph-theoretical efforts [27], [34], [30], [35], [26], [29], [36] have been focused on understanding the effects of the underlying influence graph on opinion formation, polarization, and on how to interfere with the opinion formation process in order to obtain a desired outcome, (e.g., shifting the opinion in a specific direction, minimizing polarization and/or disagreement). While all the above works refer to the opinion dynamics model they leverage as Friedkin-Johnsen, they are often relying on a simplified version of it. Specifically, they use the more mathematically tractable version (which we refer to, later on, as rFJ), which, however, is not able to capture polarization (we discuss this point later in the paper). This has resulted in great confusion regarding which finding holds under which hypothesis. The second gap in the related literature, and a direct consequence of the above confusion, lies in whether the FJ model is actually able to capture polarization or not. Indeed, despite being opinion polarization a fundamental feature of a realistic opinion formation process, only Gionis et al. [26] and Dandekar et al. [35] have explicitly tackled this problem. Analyzing the problem on undirected social networks, they have proved that two variants of the FJ model are neither capable of changing the average opinion of the social network nor of increasing the weighted difference of opinions among nodes of the same neighbourhood. However, what happens with the general FJ model and with other polarization metrics is yet unknown.

II. MODELLING FRAMEWORK
We explicitly differentiate between the social graph and the influence graph. They both comprise the same set V of n vertices and the same set of edges E, but the weights of the edges are different and have different meanings. The social graph, denoted with S, represents people (vertices) and the social relationships between them (through the edge weightsŵ ij ). The strength of a social relationship is typically measured in terms of the number of interactions that two people have [37] and for this reason the few results on polarization in the related works assume that the social graph is undirected [35], [26]. In this paper, we will consider the general case of a directed social graph, specifying how results change in the specific case of an undirected one. The influence graph I describes how a node's opinion is influenced by that of its neighbours. The existence of an edge from node i to node j in I implies that node j exerts an influence on the opinion of node i, and the strength of this influence is expressed by the edge weight w ij . Lacking additional information, the influence graph can be derived from the social graph, leveraging the intuition that stronger social relationships will influence more than weak ones. Specifically, starting from the social weightŝ w ij , the influence w ij can be computed as w ij =ŵ ij n j=1ŵ ij . Please note that this definition is the only one that allows a unique correspondence between all the variants of the FJ model. The matrix W = (w ij ) is called influence matrix and is assumed to be row-stochastic (because it captures how the influence a node is subject to is split among its neighbours). The influence matrix is in general asymmetric (corresponding to a directed influence graph), even starting from a symmetric social matrixŴ = (ŵ ij ), because the influence weight w ij expresses the relative importance of j with respect to all i's social relationships. Hence, the same social relationship intensity can weigh very differently depending on the strength of other relationships.
A. The Friedkin-Johnsen family of opinion dynamics models A discrete-time opinion dynamics model tracks the evolution of z i (k), the opinion expressed by a node i at time k. Opinions are generally assumed to be real-valued, i.e., continuous in a certain reference interval. Similarly to the related literature [21], [26], [27], here we assume that opinions belong to [−1, 1]. Thus, extremes -1 and 1 represent opposing viewpoints on an issue. For a given configuration of its input parameters, the model is said to be convergent if z i (k+1) → z i for all i as k grows to infinity. A convergent model is said to reach consensus if z i (k + 1) → z for all i as k grows to infinity. In the Friedkin-Johnsen model family, before the opinion formation process starts, each node i has an initial opinion s i , often referred to as internal or fixed opinion (or prejudice). In contrast, the opinion z i (k) is often referred to as the expressed opinion at time k. In Table I we summarize the variants of the FJ models that can be found in the literature and we discuss them separately hereafter. We denote with N (i) the neighborhood set of node i.
1) The generalized Friedkin-Johnsen model -gFJ: (1) in Table I corresponds to the more general version of the model originally proposed by Friedkin and Johnsen [15]. The outermost weighted average depends on parameter λ i , corresponding to the susceptibility of node i to the opinions of other nodes. The innermost weighted average depends on vFJ: rFJ: the influence w ij that node j exerts on node i. Two main mechanisms are at play here: anchoring, to the node i's internal opinion s i , and variable susceptibility λ i , to other nodes' opinions. Nodes with zero susceptibility value are stubborn nodes and they never change their opinion. A common matrixformulation of the model is the following: where Λ is a diagonal matrix containing the susceptibility values λ i , while W is the influence matrix. Note that the opinion of a node i depends both on its initial prejudice s i (by a weight 1−λ i ) and on its current opinion (by a weight λ i w ii ). The only case in which this does not happen is when the node is stubborn (λ i = 0) or when w ii = 0. Λ and W are sometimes coupled via the condition 1 − λ i = w ii [16], however we do not make this assumption here. The conditions under which the gFJ model achieves convergence and consensus have been thoroughly studied in the related literature [25], [32], [16]. A sufficient condition for convergence [31] is reported below, which we will use often in the rest of the paper.
Theorem 1 (Sufficient condition for the gFJ). If ΛW is stable (i.e., has eigenvalues inside the open unit circle {z ∈ C : |z| < 1}), the gFJ model is convergent and its only stationary point z (i.e., steady-state solution) is given by the following: We refer the reader to the SI Appendix for a brief summary of the main findings on the topic of opinion convergence.
2) The variational Friedkin-Johnsen model -vFJ: Dandekar et al. [35] and Matakos et al. [34] use a variant of FJ that we call the variational Friedkin-Johnsen model (vFJ), whose update function can be found in (2) of Table I. According to this model, the current opinion of a node is the weighted average between its prejudice and the current opinion of the other nodes. Thus, in this variant of the FJ model, the current opinion of the node itself is not taken into account. We can formulate the expressed opinion in matrix form in the following way: where D is the diagonal degree matrix ( jŵ ij for the i-th diagonal element), A is the adjacency matrix (whose i, j element isŵ ij and the diagonal is null) andÃ is a diagonal matrix whose i-th diagonal entry is equal toŵ ii . To model stubborn nodes, we can admitŵ ii to be equal to ∞. In this case, matrixÃ contains infinite values and (6) should be treated as discussed in the SI Appendix. The relation between vFJ and gFJ has never been explicitly discussed in the related literature, where the two are implicitly treated as interchangeable and generically referred to as Friedkin-Johnsen model. However, the two models are not mathematically equivalent: the vFJ does not include node i's current opinion z i in the averaging process, while gFJ pools both the initial opinion s i and the current opinion z i 2 . The different flexibility of the two models becomes clear when observing that while the vFJ only features the matrixŴ = (ŵ ij ) as parameters of the model (leading to a maximum n 2 degrees of freedom, with n = |V|), the gFJ includes also matrix Λ, thus in total its degrees of freedom are n 2 + n. From a practical point of view, however, the only difference between the two models is the parameter w ii , which, in gFJ, takes into account node i's opinion z i in the averaging process, as we will see in the proof of Corollary 8.
3) The restricted Friedkin-Johnsen model -rFJ: The vFJ model withŵ ii set to 1 as in (3) of Table I is very popular in the related literature, mainly due to its mathematical tractability. The model has been used in [27], [26], [29], [30], [38]. The main difference between the rFJ and the vFJ model is the absence of the weight for s i , so the parameters are onlyŵ ij for all i = j thus implying n 2 − n degrees of freedom. Note that, since the weightsŵ ij are free to vary (ŵ ij ≥ 0), it is impossible to control the susceptibility (i.e., the importance of one's own initial opinion), even indirectly. A common matrixformulation of the rFJ model is the following: where D and A are defined as described for vFJ. The solution to the above problem can be written as z = (L + I) −1 s, where L = D − A is the Laplacian matrix. The formulation of the rFJ model is particularly convenient from a mathematical standpoint (since L + I is symmetric and many useful matrix formulas leverage symmetry), and this is the reason why it has been so often used in the related literature.
4) The matrix representation of the FJ model: In the whole set of FJ models, the final opinion z of the opinion formation process can be expressed as z = Hs, where H is a matrix that varies depending on the specific FJ version considered, whose formulas are summarized in Table II. In the remaining of the paper, we will see that those matrices will be the key to the analysis of FJ polarization. rFJ:

B. Polarization metrics
In an opinion formation process, polarization is observed when there is a variation in a target index Φ (any, e.g., of the indices in Definition 2) between the initial opinion and the final opinion of nodes. A rigorous definition is provided in the following. Note that Definition 1 below is basically an abstraction of the polarization definitions in the related literature. In fact, while related works typically focus on a specific polarization metric and define polarization based on it, here we abstract the metric into the variable Φ and we provide a general definition that holds for all the polarization metrics discussed later on in Definition 2. Definition 1 (Polarization). For a polarization index Φ, we say that the opinion formation model M is Φ-polarizing or polarizing for Φ if it exists at least an initial opinion vector s such that the corresponding final opinion vector z satisfies the following inequality: In this case, we say that s yields to Φ-polarization and we call it polarizing vector or polarizing prejudice; its induced polarization is measured in terms of the polarization shift, i.e by the function ∆ Φ defined as: If the model M is not polarizing, we say that it is Φdepolarizing or depolarizing for Φ.
Please observe that the definitions of polarizing and depolarizing model M are not symmetric: to depolarize, a model M should let opinions evolve in such a way that, at the end of the process, Φ is always decreasing for all possible choices of internal opinions (s i ); instead, M is polarizing if Φ does not decrease for at least one initial opinion vector s. The justification of the asymmetry lies in the importance of determining whether a model can capture the polarization phenomenon, which means that it does it in at least one case. Please note that, for the sake of brevity, in the following we may simply refer to the opinion vector as opinion, omitting the word "vector". For the polarization index Φ, the related literature has explored several different metrics, each capturing a different property of an opinion vector. Below we have collected the most popular definitions, for which we provide a short discussion.

Definition 2.
For an opinion x = (x i ) ∈ [−1, 1] n the following polarization indices are defined: The Network-disagreement Index (NDI) [35], [29], [27], [30], [34] is the sum, over all nodes, of the weighted disagreement in each node pair, which represents (except for the division by n) the average disagreement in the network as a whole. NDI is the only topology-dependent metric, in the sense that the same opinions may give rise to a completely different NDI depending on how the vertices are connected. The Global Disagreement Index [35] (GDI) measures the conflict between all the users in the network, regardless of whether they share a social link or not. P 1 [29], corresponding to the meancentered 2-norm of opinions, measures the polarization as a deviation of the opinions from the average. The definitions of P 2 [34] and P 3 [30], instead, intend the polarization as the deviation from the complete neutrality, represented with the value 0 (the middle ground between the two extremes -1 and 1). Finally, P 4 is referred to as total absolute opinion and has been introduced by Friedkin and Johnsen [21]. While all previous indices were related to 2-norms, the total opinion is equivalent to the 1-norm. Similarly to P 2 and P 3 , the index P 4 measures the "absolute total" opinion in the network and has the same semantic: it measures the deviation from the neutrality (represented by 0). While not directly a measure of polarization, the concept of choice shift caused by the opinion formation process (see definition below) is sometimes used in the related literature as an intermediate step in gauging the direction towards which opinion moves.
Definition 3 (Choice shift). A choice shift occurs when the mean attitude of the group at the end is different from the mean attitude at the beginning: The choice shift has been analyzed, for rFJ, by Gionis et al.
in [26], where it is found that, if the social graph is undirected (w ij = w ji ), changing the graph topology will not determine a choice shift. In the following, we will discuss if this finding carries over to gFJ and under which conditions. 1) Polarization invariants: The above polarization indices have been introduced in the literature mostly as standalone metrics. In the remaining of the section, we establish equivalence relationships among them (Lemmas 1-2) and we derive a polarization invariant (Lemma 3). Lemma 1. It holds that GDI(x) = |V| · P 1 (x), thus the two metrics GDI and P 1 are equivalent.
Proof. See SI Appendix.
Lemma 2. It holds that P 3 (x) = |V| · P 2 (x), thus the two metrics P 2 and P 3 are equivalent.
Proof. Differently from Lemma 1, the proof is trivial and the thesis can be derived straightforwardly from Definition 2.
Leveraging the results above, we can classify the polarization indices into four main classes of equivalence (Table III), in the sense that the behavior of a model is invariant in each class.
The four classes capture four different concepts of polarization. However, they are correlated by the following important invariant that will be used in the next section and whose proof is given in the SI Appendix.

Lemma 3 (Polarization invariant).
For all opinion vectors x, the following inequality holds: From this relation, the following corollary follow, whose proof is provided in the SI Appendix.

Corollary 1.
When there is no choice shift the polarization moves in the same direction for both P 1 and P 3 .
Corollary 1 says that, when there is no choice shift, the polarization with P 1 implies the polarization with P 3 and vice versa. A straightforward remark is that, in all the cases when the choice shift is null (for example when opinions are positive and W is symmetric, as shown by Gionis et al. [26]), the Dispersion and Absolute classes of polarization are identical and represent the only global class of polarization.

DEPOLARIZING
We start by focusing on the most general Friedkin-Johnsen model, the gFJ, and we investigate whether in this case the dynamics of the process lead to polarization or not.

A. Polarization under NDI
The first result is about the local polarization captured by the NDI index.
Theorem 2 (gFJ: local polarization with NDI). The gFJ model is always depolarizing with respect to N DI, in the sense that, for every prejudice s, we have that N DI(z) ≤ N DI(s).
Proof. As stated in Theorem 1, the gFJ model converges to the vector z obtained from z = (I − ΛW ) −1 (I − Λ)s. For each node i, consider the following cost function: which penalizes opinion z i if far from s i (i's initial prejudice) and from n j=1 w ij z j (the mean opinion of i's neighborhood). We can prove that the expressed opinion (z i ) i of gFJ provided by (5) is the Nash Equilibrium of cost function (21) (for details, please refer to the SI), i.e. z i minimizes f i for all i, so we obtain that N DI(z) ≤ N DI(s) and, as a consequence, it follows that the gFJ is N DI-depolarizing.
The result described above is intuitive: by definition, gFJ captures the willingness of each node to reduce the conflict (weighted by the matrix W ) caused by the discordance of opinions with its neighbours, which is exactly what N DI measures. For this reason, the gFJ model is depolarizing in a local sense, but this however does not imply anything about global polarization. On the contrary, we will prove that gFJ can be polarizing at the global level depending on the interplay between the social network weights and nodes' susceptibility to the opinion of others. This is a key result, since it proves that gFJ does capture the polarization phenomenon in social networks.
B. Polarization under P 2 , P 3 , and P 4 We start by deriving the conditions under which gFJ is polarizing for the global metrics P 2 , P 3 , and P 4 (the proof is provided in the SI Appendix).
Theorem 3 (gFJ: global polarization with P 2 , P 3 , P 4 ). gFJ is polarizing with P 2 , P 3 , P 4 if and only if matrix H g defined in (8) is not doubly stochastic (i.e., a square nonnegative matrix, each of whose rows and columns sums to 1). Furthermore, we can distinguish the following two cases: (i) if there are naive nodes (i.e., ∃i ∈ V : λ i = 1), matrix H g is never doubly stochastic and thus gFJ is polarizing; (ii) if there are no naive nodes (i.e., ∀i ∈ V, λ i < 1), matrix H g is not doubly stochastic, and equivalently gFJ is polarizing with P 2 , P 3 , P 4 , if and only if the following condition holds true for at least one node i ∈ V: Intuitively, the fact that H g is not double-stochastic is a measure of the presence of nodes that are more influential than others. This is straightforward to see in the case of naive nodes (Theorem 3.(i)), where all the non-naive nodes play the role of influencers (because they are always able to sway the naive nodes' opinions towards theirs), potentially increasing the polarization. When there are no naive nodes, the intuition behind Theorem 3 is more difficult to grasp. Let us split the effect of social influence and individual susceptibility. To isolate the former, let all nodes have the same susceptibility λ. Since (22) is reduced to j∈V w ji = 1, W not being double-stochastic becomes the condition for polarization, which corresponds to the case where the social influence out of any node i is equivalent to the incoming social influence. However, in the general case, pure social influence is dampened by individual susceptibility: stubborn nodes are not swayed, regardless of the social influence they are subject to. The condition in (22) exactly captures this interplay between susceptibility and social influence.
When gFJ is depolarizing, it is also unable to produce choice shift, as the following corollary states.

Corollary 2.
When H g is doubly stochastic and thus gFJ is depolarizing with P 2 , P 3 , P 4 , for all opinion vector s, it holds that P 4 (z) = P 4 (s) and i z i = i s i . 1) How to find polarizing opinion vectors: In Theorem 3 we have derived the sufficient and necessary condition for gFJ to be polarizing. We can give a first characterization of the polarizing vectors with P 2 , P 3 , and P 4 : they can always be chosen with concordant entries (i.e., sgn(s i ) = sgn(s j ), ∀i, j). For a vector x = (x i ) i , we will indicate with x abs the vector with positive entries given by x abs = (|x i |) i . This is what is affirmed by the following theorem.
Theorem 4 (P 2 , P 3 , P 4 polarizing vectors can be obtained with concordant entries). Whenever the model is polarizing with P i , i = 2, 3, 4, and s is a polarizing opinion vector, ±s abs (that have concordant entries) are polarizing vectors inducing greater or equal (than that of s) polarization. Furthermore, if the network has no naive nodes and ΛW is irreducible (i.e. the graph induced by ΛW is strongly connected), the polarizing opinion vector that maximises the polarization has concordant entries.
This result is important and also pretty intuitive: since the P 2,3,4 polarization captures the shift from a neutral state (close to 0) to an extreme state (close to 1 or −1), the polarization calculated on the same vector with all the entries concordant must be greater or equal, because it is easier for nodes to cooperatively move toward the corresponding extreme. Instead, when entries are discordant, nodes have to mitigate between discordant opinions and thus are less free to vary in one of the two directions. This always occurs if the nodes are susceptible and non-stubborn, otherwise there would be disconnected communities and the cooperation would be impossible (this is what the conditions of the second part guarantee).
We can go one step further and provide (Theorems 5-6 below) concrete cases of initial opinion vectors under which gFJ is polarizing with P 2 , P 3 , and P 4 . In the specific case of P 2 and P 3 , we prove that finding the prejudice vector that yields maximum polarization, i.e. the maximum of function ∆ P2,3 defined in (12), is NP-hard, so we also discuss a possible approximation algorithm (Corollary 3).
Theorem 5 (gFJ: polarizing initial opinions for P 2 , P 3 ). Whenever the model is polarizing for P 2 , P 3 (i.e. according to the conditions of Theorem 3), the polarizing prejudices s B2(1) , s B2(t) , s max P 2,3 can be derived as follows.
(i) Two polarizing prejudices ±s B2(1) correspond to the unitary eigenvectors associated with the largest eigenvalue of matrix H T g H g and they correspond to the point of local maximum for the P 2 , P 3 -polarization on the L 2ball of radius 1 B 2 (1) = {x ∈ [0, 1] n : x 2 ≤ 1}. In particular, it holds exactly ∆ P3 (±s B2(1) ) = σ 2 1 − 1 = H g B(1) denoting the largest entry of s B2(1) . In particular, its polarization is exactly t 2 times the polarization of s B2 (1) . Both these vectors have concordant entries. (iii) The global maximum for P 2 , P 3 -polarization is achieved for the initial opinion vectors ±s max P 2,3 = ± i α i v i , whose components α = (α i ) can be obtained as the solution to the following optimization problem: where σ 1 , . . . , σ n are the singular values of H g , α = (α 1 , . . . , α n ) T is the vector of the coefficients that express s max with respect to the basis B = {v 1 3 , . . . , v n } composed of the unitary eigenvectors of H T g H g , and B is the matrix whose columns are the vectors of B. The constraint guarantees that the solution s max P 2,3 has positive (and −s max P 2,3 has respectively negative) is a proper opinion vector in [−1, 1] with concordant entries. This optimization problem, being quadratic non-convex programming, is NP-hard.
Corollary 3 below tells us that, in case matrix H g has more than one singular value greater than one, it is possible to design sub-problems of the optimal problem described in (23) over spaces larger than B 2 (t) but smaller than the entire domain. These sub-problems are convex-quadratic programming and can be solved in polynomial time. Depending on the dimension of the network, numerical solutions may still not be found. Thus, we have designed a heuristic that always finds a solution ±s heu V>1 whose polarization is greater than that of ±s B2(t) . The corresponding derivations can be found in the SI Appendix.
Corollary 3 (gFJ: polarizing initial opinions for P 2 , P 3 on the subspaces V >1 and V ≥1 ). When matrix H g has more than one singular value greater than one, it is possible to design sub-problems of the optimal problem in (23) over V >1 (vector space generated by the eigenvectors associated with the singular values strictly greater than 1) and over V ≥1 (vector space generated by the eigenvectors associated with the singular values greater or equal to 1). These subproblems yield polarizing vectors s V>1 , s V ≥1 , respectively, and they are convex-quadratic programming (with polynomial time complexity). A heuristic that always finds a solution ±s heu V>1 is proposed.
With Theorem 5 and Corollary 3, we are able to identify the initial opinion vectors ±s max P 2,3 , ±s B2(1) , ±s B2(t) , ±s V >1 , ±s V ≥1 leading to polarization maxima on the corresponding subspaces. While computing the opinion ±s max P 2,3 yielding the global maximum is an NP-hard problem (Theorem 5.(iii)), an approximate solution could be obtained using standard numerical solvers (not in all cases, as we discuss in the Experimental Evaluation section). The local polarization maxima are found reducing the problem on the subspaces corresponding to eigenvectors of H T g H g associated with singular values strictly greater or weakly greater than one. In particular, the vectors ±s B2(t) of Theorem 5.(ii) (which are a scalar multiple of ±s B2(1) in Theorem 5.(i)) are the vector that maximize the P 2 , P 3 -polarization on the space generated by the eigenvectors ±s B2(1) of H T g H g (also denoted with v 1 in Theorem 5.(iii)) that correspond to the singular value σ 1 > 1. The vectors ±s V>1 in Corollary 3 maximize the polarization on the larger subspace V >1 generated by all the eigenvectors that correspond to the singular values strictly greater than one. Finally, the vectors ±s V ≥1 in Corollary 3 maximize P 2 , P 3 -polarization on the even larger subspace V ≥1 generated by all the eigenvectors that corresponds to the singular values weakly greater than one. Since these vectors correspond to the maximum of polarization over subspaces that are subset of each other, it is trivial to derive the following inequality: for Φ = P 2 , P 3 . While the results in Theorem 5 and Corollary 3 do not have an immediate practical interpretation, we can get the gist of them with a simple numerical example. Consider a network composed of three nodes -a naive node A, a node B with susceptibility value equal to 0.5, and a stubborn node C -with mutual weights equal to 0.5. Applying Theorem 5, we obtain that s B2(1) = (0, 0.30, 0.95) and, dividing by 0.95 as in Theorem 5.(ii), we obtain s B2(t) = (0, 0.31, 1), which leads to a final opinion vector (0.8, 0.61, 1). The prejudice of the naive node A is opposite to that of stubborn node C, and A's opinion shifts significantly (from 0 to 0.8). The opinion of the intermediate node B is approximately doubled. The opinion vector achieving maximum polarization s max P 2,3 is instead (0, 0.75, 1), whose corresponding final opinion is (0.95, 0.89, 1). In this case, the combined effect of nonnaive nodes' strong prejudices pushes A's final opinion to the opposite extreme. In some way, it is as if s B2(t) (which only takes into account one singular value of H) selected the prejudice that maximizes the shift leveraging only to the most influential node (node C). Instead, the s max P 2,3 (which yields the global maximum) is able to enforce a synergy between non-naive nodes. In this simple case since H has only one singular value greater than 1, we cannot obtain the vectors s V>1 and s V ≥1 .
Theorem 6 (gFJ: polarizing vectors for P 4 ). Whenever the model is polarizing for P 4 (i.e. according to the conditions of Theorem 3), the following hold true.
(i) Two prejudice vectors ±s B1(1) that yields to P 4polarization are the j-th vector of the standard basis in R n (i.e., a vector whose components are all zero, except the j-th that equals 1) and its opposite, where j = argmax j i h ij (i.e., j corresponds to the index of the column of H g = {h ij } ij with the greatest columnsum). This prejudice vector is also the point of maximum of P 4 -polarization on the 1-norm ball B 1 (1) = {x ∈ [0, 1] n : x 1 ≤ 1} and its polarization is exactly given by ∆ P4 (±s B1(1) )) = H g 1 − 1.
(ii) With the same notations of Theorem 5, the global maximum for P 4 -polarization is achieved for the initial opinion vectors ±s max P 4 = ± i α i v i with concordant entries, whose components α = (α 1 , . . . , α n ) T can be obtained as the solution to the following optimization problem: This optimization problem is a linear programming problem that can be numerically solved.
As observed for P 2 , P 3 -polarization, it is holds that:

C. Polarization under P 1 and GDI
We conclude the analysis of gFJ by studying the polarization under P 1 and GDI. For this case, Theorem 7 asserts that whenever gFJ does not polarize in P 2 , P 3 , P 4 , it does not polarize in P 1 , GDI either. Instead, when gFJ is polarizing in P 2 , P 3 , P 4 , we can guarantee that it also polarizes in P 1 , GDI only if the sufficient condition in Theorem 7 is satisfied. Again, the proof of the theorem below can be found in the SI Appendix.
Theorem 7 (gFJ: global polarization with P 1 , GDI). For polarization indices P 1 and GDI, the following results hold: (i) if gFJ is depolarizing for P 2 , P 3 , P 4 , then it is also depolarising for P 1 , GDI; (ii) gFJ is polarizing in s if the following condition holds true: where α = (α 1 , . . . , α n ) T is the expression of s in terms of the basis B of the unitary eigenvectors of H T g H g ; D. The role of stubborn and naive nodes We now show (Corollary 4 below, proof in SI Appendix) a general result regarding stubborn nodes (i.e., nodes whose opinion is not at all swayed by that of their peers, which translates into λ i = 0), whose role has not a direct impact on polarization. In fact, we will see that even if their strong anchoring attitude would intuitively suggest that they always have an effect on the final opinion, the network structure can instead invalidate it.
Corollary 4. While naive nodes tend to make polarization easier, stubborn nodes do not have a clear directional effect on the polarization with P 2 , P 3 , P 4 .
Leveraging Theorems 3 and 7, we can also study a special case involving naive nodes. This result, whose proof can be found in the SI Appendix of this paper, emphasizes the role of naive nodes (the ones with λ i =1), which essentially forget their prejudice and move their opinion towards the opinion of the other nodes.
Corollary 5. Let us assume that the set of nodes V is composed of two disjoint groups, I and J , such that all nonnaive nodes have the same opinion τ , while the naive nodes' opinions are free to vary in [−1, 1], or equivalently: ∀j ∈ J λ j < 1, s j = τ. (28) Then, the final opinion z is exactly the vector z = τ 1. In addition, this configuration is never polarizing for P 1 and GDI, while, as long as |s i | < 1 for at least one node i, it always exists a τ value such that P 2 , P 3 , and P 4 are polarizing.
IV. VFJ POLARIZES WHEN GFJ DOES As already observed, in vFJ the opinion of a node i at step k does not take into account its own opinion at step k − 1 (as it happens, instead, with gFJ, which weighs it with w ii ). Thus, from a mathematical standpoint, the two models are different. However, apart from this specific contribution (i.e. in the case w ii is null), vFJ can be manipulated to exactly yield the same polarization as gFJ, if in an indirect and less intuitive way. In fact, in gFJ the susceptibility parameter directly captures the innate tendency of a node to be influenced (and to which degree) by others. In vFJ, instead, the rate at which a node is influenced by its peers is captured by: (i) the social strength of the node with all its neighboursŵ ij , (ii) the anchoring-degree of the node itselfŵ ii , i.e. the importance it assigns to its initial prejudice.
Theorem 8 below establishes a complete equivalence, in terms of polarization properties, between gFJ and vFJ.
Theorem 8 (vFJ: local and global polarization). For all polarization metrics, the vFJ model yields polarization under exactly the same conditions as gFJ. Specifically, if we replace matrix H g with the vFJ matrix H v and we setŵ ii = 0 for naive nodes (if present), the results of Theorems 3-7 and Corollaries 5-6 hold true. In particular, the condition for H v not being doubly stochastic reduces from (22) to the following one: Proof. The proof consists in the derivation of vFJ from gFJ. This can be done using the following mapping: Thus, the thesis follows from the results obtained for gFJ.
(29) simplifies when the social graph is undirected (which corresponds to the matrixŴ being symmetric). As stated in Corollary 6 below, in that case, when the self-weights are identical for all nodes (i.e.,ŵ ii =ŵ, ∀i) vFJ is never polarising in any metric and the average opinion is invariant to the opinion formation process.
Corollary 6 (vFJ on undirected social graphs). When the social graph is undirected (i.e. matrixŴ is symmetric), vFJ is polarizing with P 2 , P 3 and P 4 if and only ifŵ ii are not identical for all i. Whenŵ ii =ŵ, ∀i, vFJ is never polarising in any metric and it holds that i z i = i s i , i.e., there is never a choice shift in the network and the average final opinion is the same as the average initial opinion.

NETWORKS
In this section, we derive the results of polarization on the rFJ model. We already know from Bindel et al. [27] that rFJ does not polarize according to the local definition N DI, and, from Gionis et al. [26], that in the specific case of undirected social graph it does not polarize according to the global definition P 4 . Here, we generalize these findings. To this aim, note that rFJ is equivalent to vFJ after setting w ii = 1. Thus, Theorem 8 also applies in this case. The condition for H r (the equivalent of H g but for rFJ) not being doubly stochastic, simply reduces from (29) to the following one: And when the social graph is undirected, we obtain an even stronger result, summarized in Corollary 7 below.
Corollary 7 (rFJ on undirected social graphs). The rFJ model is never polarizing, in any polarization metrics, for any initial opinion vector. In addition, it holds that i z i = i s i , i.e., there is never a choice shift in the network and the average final opinion is the same as the average initial opinion.
Remark: while polarization was still possible under vFJ on undirected social graph, with rFJ polarization never happens. The practical implication of this result for undirected social graphs is that polarization (in all its variations) can never be induced "naturally" by an opinion formation process following rFJ. Even more interestingly, polarization cannot be induced by altering the social graph (as long as it stays symmetric). Thus, when an initial state s is given, the final state z with rFJ can only naturally evolve towards non-polarization. Vice versa, when the social graph is directed, the above result does not hold since, in a directed graph, the opinion of nodes with stronger social power tends to steer the opinion of the others. While relationship-oriented online social networks, like Facebook, tend to feature undirected graphs, directed social graphs are common in information-driven online social networks like Twitter.

VI. EXPERIMENTAL EVALUATION
In this section we analyze the theoretical results on two real social network graphs: the Karate Club graph [37] and a Facebook graph [39]. The Karate Club dataset corresponds to an unweighted graph composed of 34 members. The Facebook dataset is a Facebook snapshot comprising 4039 users. Also in this case the graph is unweighted. After discarding isolated nodes (since they do not contribute at all to the opinion formation process), we end up with a network of 1519 nodes.  With these datasets we obtain the valuesŵ ij that describe the social links between different users. For both graphs, we obtain the influence matrix from the social matrixŴ normalizing by rows, i.e. w ij =ŵ ij kŵ ik . To proceed with the analysis we should set the susceptibility values of nodes, which are not fixed by the social network. To this aim, since both networks have a few very central nodes, as displayed in the SI Appendix, we decided to use a centrality measure to set them. In the following, we will show the results obtained considering the PageRank centrality, which is the centrality measure that better captures the influence among nodes [16], but similar results hold with other centrality measures (betweenness, degree, eigenvector and k-shell centrality). In our experiments, if the Pagerank centrality of a node i is C i , we assign λ i the value of C i (and C −1 i ) rescaled to (0, 1), so that the more central the nodes (and, respectively, the less central), the higher their susceptibility values. Furthermore, we will also show the case in which all nodes have the same susceptibility, set to 0.8. In the SI Appendix, we provide a visualization of the social networks we consider and of the susceptibility values obtained in this way for both datasets.
We can now search for the initial opinion vectors that yield polarization in the social network, by applying Theorems 5-6 and Corollary 3. For the sake of brevity, in the following we will consider only the positive polarizing vectors but analogous results can be obtained for negative ones, as stated in the corollaries. We compute the polarizing vectors s B2(1) , s B2(t) , s max P 2,3 , s V >1 , s heu V >1 , s B1(1) , s max P 4 as described in Theorems 5-6 and Corollary 3, and we compare their polarization with the one of an opinion vector s unif with entries randomly drawn from a uniform distribution in [0, 1]. Table IV shows the polarization induced by the above vectors on the Karate social network, for the three susceptibility configurations we are considering. Recall that the polarization shift ∆ Φ (s) for a given polarization metric Φ (with Φ = P 1 , . . . , P 4 , N DI, GDI) and initial opinion s is derived as Φ(Hs) − Φ(s). When ∆ Φ (s) is positive, then gFJ polarizes in s. We can see in Table IV that the theoretical results are confirmed (this is not surprising, since our theorems are obtained without any approximation). A random prejudice vector s unif leads to depolarization for all the polarization metrics. Instead, the prejudices from Theorem 5 and Corollary 3 yield to P 2 , P 3 -polarization. As expected, according to (24), their corresponding P 2 , P 3polarization shifts are progressively increasing moving from s B2(1) to s max P 2,3 (because the solution is searched for into a larger domain). Note that, since the network is small, the numerical solver is able to find the solutions s max P 2,3 and s V>1 (the latter is not applicable to the case λ i = 0.8, because its H g has only one singular value greater than 1). It is interesting to observe that the solution s heu V>1 found with the heuristic is, in one case, exactly equal to the one obtained numerically (s V>1 ) and, in the other case, extremely close to it, which confirms the heuristic validity. The prejudice vectors found according to Theorem 6, instead, yield to P 4 -polarization, and satisfy the inequality in (26). With respect to P 1 , GDI-polarization, while Theorem 7 cannot tell us whether polarization is achieved in general, we can use it to predict whether P 1 , GDI-polarization is achieved with the same prejudices that yield P 2 , P 3 or P 4 polarization. We find that the condition (sufficient for polarization) of Theorem 7 is verified only for the P 2 , P 3polarizing prejudices and λ i ∝ C −1 i . The columns ∆ P1 and ∆ GDI of Table IV confirm polarization in these cases. Finally, as expected from Theorem 2, gFJ is always depolarizing in N DI.
Similar results are obtained with the Facebook network (Table V). Two points are worth emphasizing. First, the centrality of nodes in the Facebook graph is extremely skewed, with one very central node dominating the graph. Thus, when λ i ∝ C i , there are very few susceptible nodes and polarization is harder to achieve. The opposite effect is observed when λ i ∝ C −1 i , and the polarization shifts are higher. Second, note that since the Facebook network size is large, the global solutions (s max P 2,3 and s max P 4 ) could not be found numerically and s V>1 could only be obtained for λ i ∝ C i . This example showcases the importance of the heuristics derived in the previous section, which can always return a polarizing vector.
We conclude this section by having a closer look at how polarizing prejudices are structured. In Figure 1a, each arrow corresponds to one node in the Karate graph, and it starts at its prejudice and ends at its final opinion. For P 2 , P 3 , P 4 , an increase in polarization is linked, intuitively, to some opinions moving from more neutral states (close to 0) to more extreme states (close to 1). Indeed, this is what happens in all the cases presented in the figure. In particular, the vectors s max P 2,3 and s max P 4 that maximize the polarization feature the maximum number of components with initial opinion equal to 1 (with respect to the other opinion vectors): in this way, the nodes  with more extreme opinions work synergistically to push the others' opinions closer to theirs. For selecting such an optimal "cooperative" group of extreme nodes, one should be able to search for a solution to the optimization problem within the entire domain of opinions. When this is not the case, only suboptimal polarization is achieved. For example, the vectors s B2(1) , s B2(t) , s B1(1) , only manage to select one single extreme node responsible for pushing the more neutral opinions of others, while s V >1 is in an intermediate position, being able to select more extreme nodes than s B2(t) and fewer than s max P 2,3 . We can also observe that in panels A and B of Figure 1a, where the susceptibility varies across nodes, the nodes with initial opinion 1 are always the most stubborn, so that they create a field of attraction for more susceptible nodes. Effectively, the susceptibility assigned to nodes overrides their centrality in the network, hence very central nodes can become attractors or attractees depending on how stubborn they are. Vice versa, when the susceptibility of all nodes is the same (panel C of Figure 1a), we observe the unfiltered effect of centrality: the most polarizing prejudices are those in which the most central nodes have initial opinions close to 1, and their final opinion changes much less than the others' opinions. This also confirms that the PageRank centrality is able to capture the ability of nodes to convince the others, and thus it identifies the most influential nodes.
In Figure 1b we can see the results obtained with the Facebook network. In this case, since the network is large, it is not possible to find the global solutions s max P 2,3 and s max P 4 . However, the considerations we made for the Karate graph hold also in this case. In particular, the polarizing vectors assign to more stubborn nodes initial opinions closer to 1, so that they can influence susceptible nodes to which they are connected. In the Facebook network, though, due to the scalefree topology with just a few hubs and many poorly connected nodes, we also observe very susceptible nodes that do not change their opinions (Figure 1b, panel B). These nodes have typically a single edge towards a stubborn node sharing its opinion.
VII. CONCLUSIONS In this work we have investigated under which conditions the popular Friedkin-Johnsen model yields polarised opinions. The first contribution of the work has been to systematize the variety of FJ models used in the literature, and the many definitions of polarization. Then, as the main contribution of the work, we have derived the conditions under which the FJ models yield to polarization, for each of the polarization classes identified from the related literature. Moreover, we have identified a methodology for obtaining polarizing prejudices in most cases. When exact solutions could not be found (because the corresponding problem was NP-hard), we have defined heuristics to find a sub-optimal solution. Our theoretical results have then been tested on two real-life social networks. We have seen that both the centrality of nodes in the social network as well as their individual susceptibility to the opinions of other nodes play a key role in defining their influence power, hence their ability to polarize.
The results presented in this work can be used to understand under which conditions polarization of opinions will emerge for a given social network. While the application to online social networks immediately comes to mind (as showcased in Section VI, the social graph can be collected from online social network platforms such as Twitter, Facebook, Reddit, etc), other applications can be foreseen, such as failure mode and effect analysis in reliability engineering [40]. In addition, the results presented in this paper can be exploited to design interventions to bring polarization under control. More in general, since opinions in the FJ model are actually abstracted as values in the [0, 1] or [−1, 1] domain, the FJ model could be used to study information propagation, the evolution of decision processes, and consensus/polarization on networks, as long as the mapping in the same unidimensional domain remains appropriate.