The Effects of Individuals’ Opinion and Non-Opinion Characteristics on the Organization of Inﬂuence Networks in the Online Domain

: The opinion dynamics literature argues that the way people perceive social inﬂuence depends not only on the opinions of interacting individuals, but also on the individuals’ non-opinion characteristics, such as age, education, gender, or place of residence. The current paper advances this line of research by studying longitudinal data that describe the opinion dynamics of a large sample (~30,000) of online social network users, all citizens of one city. Using these data, we systematically investigate the effects of users’ demographic (age, gender) and structural (degree centrality, the number of common friends) properties on opinion formation processes. We revealed that females are less easily inﬂuenced than males. Next, we found that individuals that are characterized by similar ages have more chances to reach a consensus. Additionally, we report that individuals who have many common peers ﬁnd an agreement more often. We also demonstrated that the impacts of these effects are virtually the same, and despite being statistically signiﬁcant, are far less strong than that of opinion-related features: knowing the current opinion of an individual and, what is even more important, the distance in opinions between this individual and the person that attempts to inﬂuence the individual is much more valuable. Next, after conducting a series of simulations with an agent-based model, we revealed that accounting for non-opinion characteristics may lead to not very sound but statistically signiﬁcant changes in the macroscopic predictions of the populations of opinion camps, primarily among the agents with radical opinions ( ≈ 3% of all votes). In turn, predictions for the populations of neutral individuals are virtually the same. In addition, we demonstrated that the accumulative effect of non-opinion features on opinion dynamics is seriously moderated by whether the underlying social network correlates with the agents’ characteristics. After applying the procedure of random shufﬂing (in which the agents and their characteristics were randomly scattered over the network), the macroscopic predictions have changed by ≈ 9% of all votes. What is interesting is that the population of neutral agents was again not affected by this intervention.


Introduction
Social influence is perhaps one of the most intriguing and fascinating phenomena that affect our daily lives. In so-called opinion formation models (also known as social influence models), the social influence effects are captured by specific mathematical rules that outline how agents' opinions (operationalized as discrete or continuous quantities standing for agents' choices or subjective attitudes toward predefined controversial issues) are changed after being exposed to peers' opinions [1,2]. These models are able to explain a huge variety of macro-scale social phenomena, such as consensus, polarization, segregation, and the formation of echo chambers [1]. However, the empirical foundation behind opinion formation models is rather limited-the majority of models have never been tested against real data, with only a few of them being validated in empirical settings [3,4].
Over the past few years, the situation has slowly changed [5][6][7][8][9][10]. Partially, this becomes possible due to the large amount of open data available from online resources. In some situations, these data allow reconstructing opinion dynamics of Internet users-social media sites give a perfect opportunity to make unobtrusively repeated measurements of users' communications with high resolution. After being carefully pretreated (for example, users' opinions should be estimated using some opinion mining techniques), such information can be used to test hypotheses regarding social influence at both the qualitative and quantitative levels [11][12][13][14][15][16][17][18].
An extremely important issue in studies of social influence and information dissemination is to understand how influential a particular individual is (that is, how effectively they influence other people) and how strongly they are attached to their own opinion [19]. Empirical research suggests that individuals are not homogeneous in influence perception [9], and the way they distribute trust over their communication networks may depend on many factors, both demographic (age, gender, education level, etc.) and structural (the number of friends, the number of common friends, betweenness centrality, etc.). Not to mention that the current opinion of an individual does also affect the individual's level of confidence-more radical people tend to be more stubborn and less easily influenced [5].
Despite our advantages in understanding the structure of trust in social networks, we still lack a systematic investigation and comparison of various factors affecting how effectively individuals influence each other. In this paper, we attempt to challenge this problem by studying empirical longitudinal data derived from an online social network. In these data, detailed information on users' opinions (estimated based on users' digital footprints), structural (social ties), and demographic (age and gender) attributes is provided. This gives us an opportunity to rigorously investigate and compare the effects of various factors, both opinion and non-opinion, that influence the way individuals distribute their trust across communication networks.
The rest of the paper is organized as follows. Section 2 reviews the relevant literature. Section 3 briefly describes the plan of our analysis. Section 4 introduces the empirical data. Section 5 outlines our notations and terminology. In Section 6, we investigate the data using regression analysis. Section 7 conducts a series of simulation experiments to test the results of the empirical analysis from Section 6. Section 8 discusses the results and makes concluding remarks. In the Appendices A-C, auxiliary information is provided.

Literature
We would like to start the review of the relevant literature from the classical DeGroot model [20]. In this model, agents' opinions are represented on a continuous scale (for example, [0, 1]), and an agent i's opinion at the next time model x i (t + 1) is defined as a convex combination of their current opinion x i (t) and the opinions of the agent's peers at the previous time moment: In Equation (1), V i is the list of i's peers, and w i,j represents how strong is the influence directed from j to i. In turn, w i,i outlines how stubborn (self-confident) agent i is. The quantities w i,i , w i,j j∈V i are usually called the influence weights; the weight w i,i is sometimes referred to as the self-weight. The set of all influence weights defines the influence network-a directed weighted graph whose edges represent how individuals influence each other and how individuals' trust is distributed among their peers. Despite this terminology usually being applied within the framework of the DeGroot model (and other models that are extensions of the DeGroot model [21]), we will use the term "influence network" in a broader context-to describe how open individuals are to the influences from their peers and how strong are their attachments to their own opinions (without assuming that opinions evolve in accordance with the DeGroot model).
Using this terminology, we paraphrase our main objective as follows: we will study how the users' influence weights depend on the nodal and structural characteristics of the networked social system.
A large line of studies is dedicated to the extraction of the influence network from repeated measurements of individuals' opinions [22]. A different line of research investigates how influence weights are linked to the intrinsic characteristics of people. At the moment, we already know a lot about how individuals perceive their peers' opinions in information exchanges. For example, we know that individuals with radical opinions tend to be more stubborn [5,16]. Further, we know that the level of stubbornness may vary across ideological groups [9]. An important observation is that younger individuals are considered to be more vulnerable to social influence [23]. Next, according to Refs. [24][25][26], females cooperate better than males, and thus we should hypothesize that females are more easily influenced. However, such differences may stem from status inequalities [27]-for example, according to empirical studies, males tend to have more friends and thus may perceive themselves as more valuable [28]. Further, we know that the perception of a message depends on how distant the message is from the focal individual in terms of the opinion space-too distant opinions may be less attractive [8], a phenomenon referred to as bounded confidence [29][30][31]. Next, empirical studies indicate that individuals that have common non-opinion features (such as age, place of residence, or culture) display more trust toward each other even if their opinions differ significantly [32]. In addition to this, structural similarity, measured, for example, as the number of common friends, fosters the propagation of opinions and ideas [33].
Further, an individual's perception of external information and openness to influence may depend on how influential the individual's opinions were in previous discussions, even if these discussions were dedicated to completely different topics (the theory of reflected appraisals) [34]. According to this theory, individuals whose opinions contributed most to previous conversations will reinforce their self-confidence in the next conversation and so on.
In the current paper, we investigate various factors that may affect the organization of influence networks. We will consider not only the effect of individuals' opinions but also demographic (age, gender) and structural (the number of friends, the number of common friends) effects. What is even more important, we will systematically compare the strength of these effects thus trying to figure out what factors have the greatest impact.

Overview of Our Analysis
Our analysis builds upon a longitudinal dataset that describes the opinion dynamics of a sample of online social network users. We use this dataset to investigate what factors govern opinion dynamics at the microscopic level. Using regression analysis, we discern statistically significant covariates (paying specific attention to non-opinion ones) and compare their effects. Those factors that are estimated to be significant are then employed in simulations with an agent-based model. These experiments are focused on comparing two models: in the first one, the non-opinion features are not accounted for. In contrast, the second model includes the non-opinion features. Both models are calibrated on the empirical data. We check the outcomes of these two models at the macroscopic levelour main objective is the public opinion states predicted by the models. The schematic representation of our analysis is presented in Figure 1.

Data
We investigate the dataset introduced for the first time in Ref. [28]. That dataset includes two snapshots of the online network VKontakte (VK), describing friendship-type connections and the opinions of a sample of = 29,248 VK users, all citizens of the same city. The snapshots were made in March and September 2019. Information regarding users' ages and genders is also available. Users' opinions (on a political issue) were measured on the scale 0, 1 using information on users' subscriptions to information sources (public pages and bloggers) with the help of the methodology from Ref. [35]. It is worth noting that the sample was cleaned of any accounts that were employed in the opinion estimation procedure (to facilitate the independence of estimated opinions). The sample comprises adults (age > 17) with open VK accounts that were active no less than one time per month during the observation period. One more filter restricts users to have no less than 10 and no more than 200 followers (this ensures the highest accuracy of the opinion estimations). However, similarly to Ref. [28], we focus on the giant connected component that includes ~95% of all vertices. As a result, we end up with a sample of = 27,861 users. For more information regarding the dataset, we refer the reader to Ref. [28] (Sections 4 and 5, and Appendix B). Further, Figure A10 (Appendix C in the current manuscript) presents some histograms that help the reader to understand the organization of the data.

Notations and Terminology
We denote the network snapshots by = , , and = , , , where represents the sample users and , stand for edges between them at times (March 2019) and (September 2019). Correspondingly, the vectors and outline estimated users' opinions. For a user , their opinion is denoted by ∈ 0,1 , and their age and gender are signified as ∈ ℕ and ∈ 1,2 (1-females, 2-males), respectively. Throughout the paper, the set of natural numbers from 1 to ∈ ℕ is denoted by . To denote the cardinal number of a set, we use the notation # … . The number of the user 's friends at time is =

Data
We investigate the dataset introduced for the first time in Ref. [28]. That dataset includes two snapshots of the online network VKontakte (VK), describing friendship-type connections and the opinions of a sample of N = 29,248 VK users, all citizens of the same city. The snapshots were made in March and September 2019. Information regarding users' ages and genders is also available. Users' opinions (on a political issue) were measured on the scale [0, 1] using information on users' subscriptions to information sources (public pages and bloggers) with the help of the methodology from Ref. [35]. It is worth noting that the sample was cleaned of any accounts that were employed in the opinion estimation procedure (to facilitate the independence of estimated opinions). The sample comprises adults (age > 17) with open VK accounts that were active no less than one time per month during the observation period. One more filter restricts users to have no less than 10 and no more than 200 followers (this ensures the highest accuracy of the opinion estimations). However, similarly to Ref. [28], we focus on the giant connected component that includes 95% of all vertices. As a result, we end up with a sample of N = 27,861 users. For more information regarding the dataset, we refer the reader to Ref. [28] (Sections 4 and 5, and Appendix B). Further, Figure A10 (Appendix C in the current manuscript) presents some histograms that help the reader to understand the organization of the data.

Notations and Terminology
We denot. the network snapshots by G( where V. represents t. he sample users and E(t 1 ), E(t 2 ) stand for edges between them at times t 1 (March 2019) and t 2 (September 2019). Correspondingly, the vectors x(t 1 ) and x(t 2 ) outline estimated users' opinions. For a user i, their opinion is denoted by x i (t) ∈ [0, 1], and their age and gender are signified as age i ∈ N and gender i ∈ {1, 2} (1-females, 2-males), respectively. Throughout the paper, the set of natural numbers from 1 to m ∈ N is denoted by [m]. To denote the cardinal number of a set, we use the notation #{. . .}. The number of the user i's friends at time t is f i (t) = #{j | (i, j) ∈ E(t)}. The number of peers users i and j have in common at time t is presented by Following Ref. [10], we will say that a positive opinion shift is undertaken if it is directed toward the opinion of the influence source. If an opinion shift is directed oppositely, then it is negative.
If an individual j influences an individual i, then we will say that i is an influence object (focal individual) and j is an influence source.

Map of Opinion Shifts
We first need to answer the following question: given the opinions of two befriended users at time t 1 , can their opinions at time t 2 be predicted? To answer this question, we discretize the opinion scale [0, 1] into aggregated opinion values m , 1 , and then the probabilities of all possible opinion changes Ξ s → Ξ k (including static ones Ξ s → Ξ s ) are computed across all possible influencing opinions Ξ l . These probabilities are captured by the quantities p s,l,k s,l,k∈ [m] , where for given s, l, and k, variable p s,l,k measures the probability that opinion Ξ s will be changed to Ξ k after being influenced by opinion Ξ l (see Appendix A for details on their computation). The quantities p s,l,k can be informatively grouped into m square row-stochastic matrices P 1 , . . . , P m , where P s = [p s,l,k ] l,k∈[m] for a fixed s showcases how users with opinion Ξ s react to peers' opinions. Organization of matrix P s is schematically presented in Figure 2. In Figure 3, we depict the values of p s,l,k estimated from our empirical data. Within such an encoding strategy, in matrix P s , the s-th column contains the self-confidence rates of users with opinion Ξ s across different values of the influence source opinion. From Positive shifts tend to feature the assimilative influence mechanism, whereby more distant opinions induce positive responses with larger probabilities. -Individuals with the right radical opinion Ξ 5 (see the matrix P 5 in Figure 1) display a tendency to distrust too distant opinions (also known as moderated bounded confidence).
It is worth noting that all these observations are in line with the previous empirical studies on opinion dynamics [3,5,8,10,15,16].

Effects of Non-Opinion Characteristics on Opinion Shifts
In the previous analysis, we completely ignored the possible effects of non-opinion characteristics on opinion changes. As we said in Introduction, the way individuals perceive information could largely depend on the non-opinion characteristics of the interacting agents, such as age, gender, or the common number of friends. Let us now shed light on this issue. For this purpose, we employ the quantity as a dependent variable. It measures the magnitude of an opinion shift x i (t 1 ) → x i (t 2 ) subject to its direction: if the shift is pointing towards the source's opinion x j (t 1 ), then the dependent variable is positive. Otherwise, y i,j is negative. By doing so, we want to find out what conditions facilitate the likelihood that the opinion stimuli will receive a positive response (will induce a positive opinion shift).
The list of independent variables is presented in Table 1. At the same place, we provide our intuition regarding the effects of these covariates on the dependent variable. Apart from the covariates that measure different sorts of similarity (structural one-as in the case of the common number of friends, or demographic one-as in the case of the differences in age or gender), we also control for various characteristics of the focal node i (influence object). This allows us to discern any specific nodal-level effects. for a fixed ∈ . Each of its rows sums up to one as it covers all possible alternatives. Let us consider the -th row that outlines how individuals with opinion Ξ react to opinion Ξ . Overall, there are possible alternatives: Ξ → Ξ , … , Ξ → Ξ . The resulting estimated probabilities of these alternatives are described by the quantities , , , … , , , . Other rows are elaborated analogously. We would like to emphasize that row stands for the situations when the influence comes from the coherent opinion Ξ , whereas column contains the probabilities of holding the current opinion Ξ . In this regard, the elements of the -th column (the quantities , , , … , , , ) represent how stubborn are individuals with opinion Ξ . According to the previous empirical research [5,9,16], we should expect that this column will dominate the others. In matrix , the components standing for positive and negative opinion shifts are easily located. The zone of positive shifts is defined as follows: = , | < , < ∪ , | > , > , where and are the row and column indices, respectively.
In other words, is the union of the second and fourth "quadrants", given that the origin of coordinates is located at = = (the component , , ). Correspondingly, the zone of negative shifts is the union of the first and third "quadrants": = , | > , < ∪ , | < , > . Each of its rows sums up to one as it covers all possible alternatives. Let us consider the l-th row that outlines how individuals with opinion Ξ s react to opinion Ξ l . Overall, there are m possible alternatives: Ξ s → Ξ 1 , . . . , Ξ s → Ξ m . The resulting estimated probabilities of these alternatives are described by the quantities p s,l,1 , . . . , p s,l,m . Other rows are elaborated analogously. We would like to emphasize that row s stands for the situations when the influence comes from the coherent opinion Ξ s , whereas column s contains the probabilities of holding the current opinion Ξ s . In this regard, the elements of the s-th column (the quantities p s,1,s , . . . , p s,m,s ) represent how stubborn are individuals with opinion Ξ s . According to the previous empirical research [5,9,16], we should expect that this column will dominate the others. In matrix P s , the components standing for positive and negative opinion shifts are easily located. The zone of positive shifts is defined as follows: where l and k are the row and column indices, respectively. In other words, D p is the union of the second and fourth "quadrants", given that the origin of coordinates is located at l = k = s (the component p s,s,s ). Correspondingly, the zone of negative shifts is the union of the first and third "quadrants": To develop a corpus of observations, we take each edge (i, j) ∈ E and then calculate the quantities introduced above by considering first i as an influence object and j as an influence source and then reverse. As a result, each user i may potentially appear in 2 f i observations (due to the filters specified below, not all observations will participate in the further analysis). The weakness of such an approach is that each edge (i, j) is considered (twice) in isolation, thus ignoring potential influence from the rest of the peers of i and j. However, the influences of these peers are also added to the corpus as independent observations (we will return to this confounding factor in Section 6). As in the case of the computation of the probabilities of opinion changes presented in Figure 1, this approach silently assumes that during the observation period, each user was influenced by each of their friends strictly one time, with all these influences being independent of each other. In fact, such a sort of interactions (also known as one-to-one) is widely adopted in opinion formation models-see Refs. [29,[36][37][38]. However, the assumption that each pairwise interaction has happened is quite strict. We will return to this point in Section 6.

Effects of Non-Opinion Characteristics on Opinion Shifts
In the previous analysis, we completely ignored the possible effects of non-opinion characteristics on opinion changes. As we said in Introduction, the way individuals perceive information could largely depend on the non-opinion characteristics of the interacting agents, such as age, gender, or the common number of friends. Let us now shed light on this issue. For this purpose, we employ the quantity , = − sign − as a dependent variable. It measures the magnitude of an opinion shift → subject to its direction: if the shift is pointing towards the source's opinion , then the dependent variable is positive. Otherwise, , is negative. By doing so, we want to find out what conditions facilitate the likelihood that the opinion stimuli will receive a positive response (will induce a positive opinion shift).
The list of independent variables is presented in Table 1. At the same place, we provide our intuition regarding the effects of these covariates on the dependent variable. Apart from the covariates that measure different sorts of similarity (structural one-as in the case of the common number of friends, or demographic one-as in the case of the differences in age or gender), we also control for various characteristics of the focal node (influence object). This allows us to discern any specific nodal-level effects.

Independent Variable Definition Hypothesis
The absolute difference in opinions Δ , should have a positive effect on the dependent variable-from Figure 3 it follows that more distant opin-  Table 1. Independent variables (i-influence object, j-influence source).

Independent Variable Definition Hypothesis
The absolute difference in opinions ∆x i,j should have a positive effect on the dependent variable-from Figure 3 it follows that more distant opinions tend to be more attractive.
∆age i,j = age i − age j The absolute difference in age ∆age i,j should have a negative effect on the dependent variable-a non-opinion similarity stimulates the decrease in opinion discrepancies [32].
This variable demonstrates if two users have different genders (gender i,j = 1) or not (gender i,j = 0) ∆gender i,j should have a negative effect on the dependent variable-a non-opinion similarity stimulates the decrease in opinion discrepancies [32].
f i The nodal degree of the influence object f i may have a negative effect on the dependent variable-we hypothesize that individuals that have many friends should perceive themselves as more valuable and having a higher social status and thus should be more attached to their own views [27].
The number of common friends f i,j should have a positive effect on the dependent variable-strong ties are more effective in conducting social influence [33].
The opinion of the influence object x i (t 1 ) should have a negative effect on the dependent variable-from Figure 3, it follows that users whose opinions are close to the right endpoint of the opinion spectrum are more stubborn. To facilitate the comparability of the covariates' effects, we standardize the data by making a zero mean and unit variance for each factor. Further, we exclude from our analysis those observations (i, j) that violate the following restrictions: x j (t 2 ) − x j (t 1 ) < 0.05 (the influence source's opinion should not undergo significant changes during the observation period-otherwise, we cannot precisely locate its value). (In fact, if in a pair of connected vertices (i, j), one vertex (say, i) has changed its opinion for more than 0.05, then the inverse pair (j, i) will not appear in the corpus of observations. Further, if both the vertices have substantially modified their opinions, then the tie is completely ignored).
As a result of such filtering, we end up with a corpus of 390,149 observations. However, preliminary analysis (see Table 2) revealed that age is highly correlated with other covariates. On this basis, we decided to exclude it from the list of independent variables. To investigate the effects of the independent variables on y i,j , we run Ordinary Least Squares (OLS) regression. Table 3 shows the results of OLS regression, and Figure 4 depicts the estimated values of the regression coefficients. We see that all the covariates, except the absolute difference in gender and the nodal degree, appear to have significant effects on the dependent variable. Further, the effects of those covariates that were estimated to be significant (at the level of 0.05) coincide with our prior intuition (see Table 1). A surprising exception is that females appeared to be less easily influenced than males, other things being equal. From Figure 4, we conclude that the opinion-related covariates (x i (t 1 ) and ∆x ij ) have the highest effects on the opinion formation processes (the highest effect is provided by ∆x ij ). The contributions of other covariates are far less strong and, additionally, roughly similar. The small values of the estimated coefficients stem largely from the fact that for most observations, the dependent variable is zero or small (see Figure A10 in Appendix C).   Table 3) plotted with the corresponding 95% confidence intervals. Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 ' ' 1.

Motivation
The results presented in the previous section indicate that despite some nonopinion features having significant effects on the way opinions change at the microscopic level, their impacts are not very sound. As such, the question arises if accounting for these features will change the macro-level behavior of the social system at stake. To answer this question, we perform auxiliary simulations with an agent-based opinion dynamics model using the empirical data from the dataset (see Section 2) to calibrate the model's parameters.  Table 3) plotted with the corresponding 95% confidence intervals. Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 ' ' 1.

Motivation
The results presented in the previous section indicate that despite some non-opinion features having significant effects on the way opinions change at the microscopic level, their impacts are not very sound. As such, the question arises if accounting for these features will change the macro-level behavior of the social system at stake. To answer this question, we perform auxiliary simulations with an agent-based opinion dynamics model using the empirical data from the dataset (see Section 2) to calibrate the model's parameters.

Agent-Based Models
As a workhorse model, we employ the one from Ref. [38]. That model was specially designed to investigate and simulate the patterns of opinion dynamics of empirical systems. In this model, agents connected by a social network are initially endowed with opinions from an abstract opinion alphabet Ξ = {Ξ 1 , . . . , Ξ m }. At each moment t = 1, 2, 3, . . ., a randomly chosen agent i communicates with one of their neighbors j in the social network (which is also chosen by chance). Let us assume that the opinions of i and j are Ξ s and Ξ l , respectively. As a result of the communication, the agent i's opinion stochastically updates according to the distribution p s,l,1 , . . . , p s,l,m , in which the quantity p s,l,k stands for the probability that the opinion shift Ξ s → Ξ k will occur (the low indices are synchronized with the indices of the underlying opinions Ξ s , Ξ l , and Ξ k ). After that, the next iteration begins and so on.
The quantities p s,l,k form a 3D mathematical object P = [p s,l,k ] s,l,k∈[m] that is called the transition matrix. In some cases, it is convenient to represent the transition matrix as a list of 2D row-stochastic matrices P 1 , . . . , P m , where P s = [p s,l,k ] l,k∈ [m] . In fact, previously, we have already worked with this kind of object-see Figures 2 and 3. However, on that occasion, we used such a construction to investigate the already existing opinion dynamics. For now, we employ the transition matrix to develop our own opinion dynamics with the aim of predicting the further evolution of the empirical system at stake.
The macroscopic behavior of the model can be captured by the variables Y 1 (t), . . . , Y m (t) that describe the populations of opinion camps Ξ 1 , . . . , Ξ m at time t. Further, we will employ the normalized versions of these quantities: y s (t) = Y s (t)/N. Below, the model introduced above will be referred to as the Basic Model.
However, in its current form, the Basic Model assumes that the outcome of an interaction depends on the interacting agents' opinions only, thus ignoring the effects of non-opinion covariates, which, as reported in Section 6.2, have some impact on opinion dynamics. To account for this issue, instead of applying a single transition matrix, we will use several ones, with each transition matrix dedicated to the description of its own specific combination of non-opinion characteristics of the interacting agents.
To be more specific, motivated by the results obtained in Section 6.2, we introduce two new features of agents (we use the same notations as in Section 5): (i) age i ∈ N and (ii) gender i ∈ {1, 2} (1-females, 2-males). For now, the opinion dynamics protocol is organized as follows. Similar to the Basic Model, at each time moment t, a randomly chosen agent i is influenced by one of their friends j (chosen by chance). Let us assume that the opinions of i and j are Ξ s and Ξ l , respectively. For now, we postulate that the outcome of this interaction depends not only on the opinions of the communicating agents but also on (i) how different the agents are in terms of age (∆age i,j = age i − age j ), (ii) the number of friends i and j have in common (denoted by f i,j ), and, finally, (iii) the gender of the influence object (agent i). Depending on the values of these variables, each pair of interacting agents is assigned to one of the eight possible types (see Table 4). For each type f ∈ [8], a specific transition matrix P f = p f s,l,k s,l,k∈ [m] is recruited (which has the same properties as the transition matrix in the Basic Model).
The resulting model is called the Advanced Model. ≥1 Note: the value of 5 that discretizes the range of ∆age i,j was chosen as the mean of this variable (the 60th percentile) (Initially, we wanted to use the median value 3 to discretize the range of ∆age i,j , but we decided that this threshold is too small to be used as a separator between ages. However, the results remain virtually the same if using the threshold of 3). The threshold value 1 that separates the range of f i,j is the median.

Simulation Design
Now, we compare the macroscopic predictions (captured by variables y s (t)) of the Basic and Advanced Models. We use the empirical data from the dataset to calibrate the models' parameters: (i) initial characteristics of agents, (ii) the social network, and (iii) the transition matrices. To project the opinion scale [0, 1] (on which the empirical opinions were estimated) to the opinion alphabet Ξ, we discretize the range [0, 1] into three subintervals [0, 0.33}, [0.33, 0.66), [0.66, 1] standing for a three-element opinion alphabet Ξ = {Ξ 1 , Ξ 2 , Ξ 3 } (m = 3). Next, we use the dataset snapshots to calibrate the transition matrices by replicating the algorithm presented in Appendix A. In the case of the Basic Model, non-opinion covariates are not accounted for, and we end up with one transition matrix (see Appendix B, Figure A1). For the Advanced Model, eight transition matrices are constructed (see Appendix B, Figures A2-A9).
For each model, we conduct 20 independent simulations. Each simulation starts from the same state defined by the first snapshot of the dataset (y 1 = 0.45, y 2 = 0.47, y 3 = 0.08). The network structure and the agents' characteristics are delicately inherited from the empirical data. To isolate the effect of the correlations between the network and agents' features ( [28] revealed that the underlying networked system is assortative with respect to the opinion, age, and gender covariates), we also randomly shuffle the nodes of the network (keeping the nodes' characteristics fixed) so that all the correlations between the nodal characteristics and the network disappear. (It is worth noting that this procedure does not suppress the correlations at the nodal level-for example, even after applying the shuffling, younger agents will be biased towards the right opinion Ξ 3 ). As a result, we end up with four possible Scenarios (see Table 5). Each simulation lasts 4,000,000 iterations: pilots revealed that this time ensures that the system reaches equilibrium in terms of the populations of the opinion camps. (To be more specific, equilibrium is reached at t ≈ 2,000,000 . To understand what this time span stands for, one should recall that in our case, one Monte Carlo step (30,000 iterations) corresponds to the observational period ( months). From this perspective, according to the model, the empirical social system should reach an equilibrium in ≈33.5 years. Of course, such estimations are unlikely applicable to the description of real-life processes because the underlying empirical system is not closed and is subject to external affairs that affect its development).  Figure 5 compares the aggregated results of simulations across Scenarios 1-4. We see that if ignoring the non-opinion covariates, then the shuffle procedure does not affect the macroscopic behavior of the model-in both cases, the system finds itself in the equilibrium state y 1 = 0.33, y 2 = 0.47, y 3 = 0.2. However, simulations with the Advanced Model revealed two important observations. First, we report that if not applying random shuffling (Scenario 3), then the Advanced Model stabilizes around the distribution y 1 = 0.35, y 2 = 0.46, y 3 = 0.19, which differs from the prediction of the Basic Model by ≈1000 agents (4% of all votes). However, topologies with suppressed assortativity lead to even larger deviations: simulations within Scenario 4 tend to end up in the state y 1 = 0.4, y 2 = 0.45, y 3 = 0.15-the advantage of leftists over rightists has increased by ≈ 9% of all votes, if compared to Scenario 3. In all Scenarios, the population of the neutral opinion camp at the equilibrium remains virtually the same (y 2 ≈ 0.46). Of course, one should keep in mind that all these differences in opinion distributions appear in the long run.

Results
≈1000 agents (4% of all votes). However, topologies with suppressed assortativity lead to even larger deviations: simulations within Scenario 4 tend to end up in the state = 0.4, = 0.45, = 0.15-the advantage of leftists over rightists has increased by ≈ 9% of all votes, if compared to Scenario 3. In all Scenarios, the population of the neutral opinion camp at the equilibrium remains virtually the same ( ≈ 0.46). Of course, one should keep in mind that all these differences in opinion distributions appear in the long run.  and rightists (bottom panels) across Scenarios 1-4 (marked with different colors). For each Scenario, the colored area is formed by the upper and lower contours of the corresponding simulations, and the curve represents the trajectory averaged over simulations. The left panels depict the time span 1 ≤ t ≤ 500,000 ; the right panels investigate the range 500,000 ≤ t ≤ 4,000,000 . The final populations of leftists and rightists are depicted on the right side of the figure in absolute and normalized (in brackets) values.

Discussion and Conclusions
In this paper, using longitudinal data from an online social network, we systematically analyzed how individuals' opinion and non-opinion characteristics affect opinion dynamics. We revealed that females are harder to convince than males. Next, we found that individuals that are characterized by similar ages have more chances to reach a consensus. Finally, we report that individuals who have many common peers find agreement more often.
In general, these results align with the literature (perhaps, the effect of gender is somewhat novel). We believe that our contribution here is that we demonstrated that the impacts of these effects are virtually the same. Further, our analysis indicates that the effects of non-opinion characteristics, despite being statistically significant, are far less strong than those of opinion characteristics: knowing the current opinion of an individual and, what is even more important, the distance in opinions between this individual and the person that attempts to influence the individual is much more valuable than information regarding their ages, genders, and the number of peers they have in common.
To gain a better understanding of this issue, we conducted a series of agent-based experiments using the underlying empirical data to calibrate the agents' characteristics and the way they interact with each other. We revealed that accounting for non-opinion characteristics leads to not very sound but statistically significant changes in the macroscopic predictions of the populations of opinion camps in the long run, primarily among the agents with radical opinions (≈3% of all votes). In turn, predictions for the populations of neutral individuals are virtually the same. In addition, we demonstrated that the effect of non-opinion features is seriously moderated by whether the underlying social network correlates with the agents' characteristics. After applying the procedure of random shuffling (in which the agents and their characteristics were randomly scattered over the network), the macroscopic predictions have changed by ≈9% of all votes. What is interesting is that the population of neutral agents was not affected by this intervention.
The main disadvantage of our analysis is that it draws upon the data from the natural experiment, so we have no opportunity to control for many confounding factors. For example, we completely dismiss external effects [39]. In fact, we do not know the history of users' interactions during the observational period. Likely not all users were active in promoting their views to their friends. Usually, individuals with radical opinions are those who translate their opinions more often [40]. In turn, our assumption was that each user influenced each of their friends. Further, we considered each pair of befriended users i and j as two independent observations, sequentially changing the roles of the agents (influence object and influence source). This approach is weakened by the following confounding factors: it ignores the possibility that the dynamics of the opinions of i and j may be subject to some external effects, such as opinions of the i's and j's peers (external stimuli). Despite this possibility being partially suppressed by the fact that we also considered the influences of i's and j's peers as independent observations, the assumption of independent observations is likely violated on this occasion, so our regression analysis, as well as our algorithm for the calibration of the transition matrices, may lead to inaccurate estimations. In Refs. [15,16], this issue was controlled to some extent because the influence directed on a user i was estimated as the mean of the opinions of the user i's friends. However, in our case, this approach is questionable because apart from opinions, we attempt to consider other users' characteristics. For a given user, the joint distribution of these characteristics among the user's friends may be quite complex and nontrivial, and simple averaging may suppress some important information.
In the end, we would like to highlight that in our analysis, we concentrated on the role of observable and easily recoverable user characteristics in opinion formation processes. Indeed, ages, genders, and the numbers of users' peers can be promptly retrieved from the Web using Application Programming Interface (API) facilities with a relatively small investment in resources. Of course, using more detailed information can lead to more precise predictions. For example, Ref. [13] reported that the allocation of a user's trust is highly correlated with how intensively the user likes their peers, with more likes indicating more trust. However, this sort of information requires employing API facilities that were unavailable to us on this occasion. Nevertheless, it would be a promising direction for further research. Data Availability Statement: All of the data and codes can be obtained upon a reasonable request.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
The quantities p s,l,k were computed via the following formula: It is worth noting that in formula (A1), each edge (i, j) appears two times: (i) when i is an influence object and j is an influence source; (ii) when i is an influence source and j is an influence object. (This approach silently assumes that during the observation period, each user was influenced by each of their friends strictly one time, with all these influences being independent of each other. In fact, such a sort of interactions (also known as one-toone) is widely adopted in opinion formation models-see Refs. [29,35,36]. However, the assumption that each pairwise interaction has happened is quite strict). Next, to control for the confounding factors conditioned by the dynamics of ties and possible changes in the opinion of the influence source, we discard those observations from our analysis that violate the following restrictions: (the number of common friends should be constant as it could have an effect on opinion dynamics). - x j (t 2 ) − x j (t 1 ) < 0.05 (the influence source's opinion should not undergo significant changes during the observation period-otherwise, we cannot precisely locate its value).
The transition matrix for the Basic Model (see Figure A1 in Appendix B) is constructed using the same approach. The transition matrices for the Advanced Model are calculated in a similar fashion. For example, the components of the transition matrix that describe interactions of type 1 (see Figure A2 in Appendix B) are defined as follows: p s,l,k == # (i, j) ∈ E(t 1 ) x i (t 1 ) = Ξ s , x j (t 1 ) = Ξ l , x i (t 2 ) = Ξ k , ∆age i,j ≤ 5, gender i = 1, f i,j = 0 # (i, j) ∈ E(t 1 ) x i (t 1 ) = Ξ s , x j (t 1 ) = Ξ l , ∆age i,j ≤ 5, gender i = 1, f i,j = 0 . (A2)

Appendix B
Computers 2023, 12, x FOR PEER REVIEW 16 of 21 Appendix B Figure A1. This transition matrix was estimated based on the empirical data (non-opinion covariates are ignored). Figure A1. This transition matrix was estimated based on the empirical data (non-opinion covariates are ignored). Figure A1. This transition matrix was estimated based on the empirical data (non-opinion covariates are ignored). Figure A2. This transition matrix was estimated based on the empirical data (type 1-see Table 4 and formula (A2)). Figure A3. This transition matrix was estimated based on the empirical data (type 2-see Table 4). Figure A2. This transition matrix was estimated based on the empirical data (type 1-see Table 4 and formula (A2)). Figure A1. This transition matrix was estimated based on the empirical data (non-opinion covariates are ignored). Figure A2. This transition matrix was estimated based on the empirical data (type 1-see Table 4 and formula (A2)). Figure A3. This transition matrix was estimated based on the empirical data (type 2-see Table 4). Figure A3. This transition matrix was estimated based on the empirical data (type 2-see Table 4).
Computers 2023, 12, x FOR PEER REVIEW 17 of 21 Figure A4. This transition matrix was estimated based on the empirical data (type 3-see Table 4). Figure A4. This transition matrix was estimated based on the empirical data (type 3-see Table 4).  Figure A4. This transition matrix was estimated based on the empirical data (type 3-see Table 4). Figure A5. This transition matrix was estimated based on the empirical data (type 4-see Table 4). Figure A6. This transition matrix was estimated based on the empirical data (type 5-see Table 4). Figure A5. This transition matrix was estimated based on the empirical data (type 4-see Table 4). Figure A4. This transition matrix was estimated based on the empirical data (type 3-see Table 4). Figure A5. This transition matrix was estimated based on the empirical data (type 4-see Table 4). Figure A6. This transition matrix was estimated based on the empirical data (type 5-see Table 4). Figure A6. This transition matrix was estimated based on the empirical data (type 5-see Table 4).
Computers 2023, 12, x FOR PEER REVIEW 18 of 21 Figure A7. This transition matrix was estimated based on the empirical data (type 6-see Table 4). Figure A7. This transition matrix was estimated based on the empirical data (type 6-see Table 4).  Figure A7. This transition matrix was estimated based on the empirical data (type 6-see Table 4). Figure A8. This transition matrix was estimated based on the empirical data (type 7-see Table 4). Figure A9. This transition matrix was estimated based on the empirical data (type 8-see Table 4). Figure A8. This transition matrix was estimated based on the empirical data (type 7-see Table 4). Figure A7. This transition matrix was estimated based on the empirical data (type 6-see Table 4). Figure A8. This transition matrix was estimated based on the empirical data (type 7-see Table 4). Figure A9. This transition matrix was estimated based on the empirical data (type 8-see Table 4). Figure A9. This transition matrix was estimated based on the empirical data (type 8-see Table 4).

Appendix C
Computers 2023, 12, x FOR PEER REVIEW 19 of 21 Appendix C Figure A10. Cont. Figure A10. These histograms show the structure of the dataset. The histograms that represent the age, gender, and opinion distributions are borrowed from the Ref. [28].