Measure User Intimacy by Mining Maximum Information Transmission Paths

$e Internet has become an important carrier of information. Its data contain abundant information about hot events, user relations and attitudes, and so on. Many enterprises use high-impact Internet users to promote products, so it is very important to understand the mechanism of information transmission. Mining social network data can help people analyze the complex and changing relationships between users. $e traditional method for doing this is to analyze information such as common interests and common friends, but this data cannot truly describe the degree of intimacy between users.What really connects different users on the Internet is the delivery of information. $e algorithm proposed in this paper considers the dynamic characteristics of information transmission, finds maximum transmission paths from information transmission results, and finally calculates the intimacy degrees between users according to all the maximum information transmission paths within a certain period.


Introduction
Social network data contains a wealth of information about events, relationships, and attitudes. On the basis of fully understanding and analyzing the data, a series of technologies, such as text mining, statistical theory, association analysis, and visualization technologies, are adopted to realize emotional orientation analysis, information extraction, user influence analysis, and so on. Many current methods of computing user intimacy can be applied to static networks. However, users might unfollow certain friends and their interests might shift to new and different topics. In other words, the tie strengths between different users change over time. e algorithm proposed in this paper takes the dynamic nature of data into account to improve information transmission analysis in social networks. After that maximum transmission paths are identified in the information transmission results, and then the intimacy degrees between nodes can be computed according to multiple groups of maximum transmission paths. e remainder of this paper is organized as follows: Section 2 introduces the related work of this paper. Section 3 proposes the concept of the information transmission matrix. Section 4 introduces the computational process of tie strength. Section 5 states the experimental results. Section 6 introduces the conclusions.

Related Works
Many enterprises use influential users to promote new products, but the mechanism of how information spreads through the network still needs to be further studied. It is very important to understand the communication mechanism of information, which can be applied in many fields, such as viral marketing, social behavior prediction, social recommendation, and community detection. ese problems attract the attention of researchers from different fields, such as epidemiology, computer science, and sociology, who propose different information diffusion models to describe and simulate the process of information transmission, such as the independent cascade model, linear threshold model, and epidemic model. ese models are mainly applied to influence evaluation, influence maximization, and information source detection. Most models recognize that information is transmitted from a source node set and other nodes can only obtain information from the nodes that neighbor the source node set.
Social networking service providers, such as Twitter and Facebook, have grown rapidly in recent years, with increasing number of users sharing information with their friends.
ere are more than 2 million active users on Facebook every month from all over the world and about 5 billion new tweets on Twitter every day. Social network analysis can be divided into the following aspects [1,2]: (a) studying the network structure and trends [3], (b) online learning of complex networks [4], (c) comparing different models, and (d) predicting node status [1,5]. e focus of social influence study is to investigate neighbors and associations to predict the impact and influence of the occurrence of an action [2,6].
Researchers have examined information transfers, including the analysis of relationships [7], social action tracking [1], and other types of relationship transfer [8]. e algorithm proposed in this paper constructs a matrix based on the information transmission between users to describe the complex correlation relations. By making certain changes to the matrix, information transmission paths can be identified and the tie strengths between nodes can be calculated. Due to the small computational difficulty involved in constructing a matrix, the algorithm proposed in this paper performs more efficiently than other algorithms.

Information Transmission Matrix
A piece of information is very valuable at one time, but after that it may be worthless. From the perspective of information transmission, the degree of interaction between users can be calculated. By analyzing information transmission paths, an information transmission tree can be generated to describe the information transmission rules and be used to analyze the dynamic changes of the correlations between users. Definition 1. Let G be a graph with n nodes and e edges. If n×e is the complete incidence matrix of graph G, namely, the information transmission matrix. In this paper, information transmission data is used to construct and update M. e construction process of M is given below. Figure 1 depicts the information transmission relationships between nodes. If there is an edge between nodes, then it means that information has been successfully passed between them. Otherwise, no information has been passed. We construct matrix M according to Figure 1, which describes the mapping relationship between nodes and edges. If there is an association between N x and e y , then a xy � 1. Otherwise, a xy � 0.
Because there is a large number of inactive nodes, most of the actions of the nodes on the Internet are from browsing information while actions such as commenting and forwarding are rare. erefore, matrix M is a sparse matrix. To reduce the negative impact of a large number of meaningless zeros in the matrix on subsequent calculations, further analysis of M is required to delete redundant nodes. In Section 3.1, we describe a quick and effective way to remove redundant nodes.

Isolated Nodes
Definition 2. If the determinant of nth order matrix M is not zero, that is, |M| ≠ 0, then M is called a nonsingular matrix or full rank matrix. Otherwise, M is called a singular matrix or reduced-rank matrix.
Definition 3. Nodes in graph G are connected if and only if the rank of the complete incidence matrix is n − 1. e matrix whose order is min {p, q} is called a large submatrix of the p × q matrix.
By calculating whether |M| is 0, we can judge whether the nodes in G are connected or not. A reduced matrix D can be achieved by deleting redundant nodes in M. D is a full rank matrix, that is, |D| ≠ 0. At this time, D is the maximum complete incidence matrix. at is, all the nodes in the new graph G that are formed by D are reachable, and there are no isolated nodes for information transmission.
Take matrix M in Figure 2 as an example to illustrate the process of removing isolated nodes. e rank of M is obtained by calculating the maximum number of linearlyindependent crossings (that is, the maximum order of the nonzero submatrix): According to the abovementioned calculation results, R(M) � 6. is indicates the existence of isolated nodes in M. It can be seen that rows N 7 and N 8 are 0 → , so N 7 and N 8 are redundant, isolated nodes. Because the original data in line N 6 and N 7 are same and N 7 was determined to be an isolated node to be deleted, N 6 is also an isolated node. In conclusion, N 6 , N 7 , and N 8 are isolated nodes. After removing redundant nodes, it is necessary to determine whether there are redundant edges in the matrix. Because column e 6 is 0 → after the redundant nodes are deleted, e 6 is a redundant edge that needs to be deleted.
Matrix D is obtained after deleting the redundant nodes in M. Next, whether the nodes in D are connected must be calculated as follows: e result is R(D) � 5. at is, |D| ≠ 0, so D is a full rank matrix. e conclusion is that all nodes in D are connected. In other words, there are no isolated nodes of information transmission.
To discover all information transmission paths in M, it is necessary to further determine which nodes can be tentatively considered to be redundant. e deleted redundant nodes are reconstituted into a new matrix M and the abovementioned operations are repeated to obtain a matrix D. Finally, multiple matrix Ds are obtained.

Information Transmission Path.
To study the information transmission mechanism, it is necessary to identify all the information transmission paths from the information matrix.
erefore, further processing of the set of Ds is required.
Definition 4. Submatrix A is obtained by removing one row from the complete incidence matrix D. For A to be nonsingular, the edges that correspond to the columns of A must form a spanning tree of G.
Definition 4 provides a method for calculating all spanning trees in the connected graph G. By removing one row from matrix D and then calculating all the maximized nonsingular submatrices of the newly-generated matrix D, the edges that correspond to the columns of each nonsingular submatrix form a spanning tree of G.
e matrix D obtained in the previous section is taken as an example to illustrate the process of identifying information transmission paths according to Definition 4. Remove one row from D (delete row 5 here) to get a matrix A: By calculating the rank of A, we can get R(A) � 4. is value indicates that the nodes in A are connected. Although all nodes are connected to other nodes, there may be redundant edges. For example, the nodes N 1 , N 2 , and N 3 in Figure 1 have three edges, and these three nodes can be completely connected to each other by two of the edges. To remove redundant edges, we apply the following rules to the matrix: en, perform row operations on each of the abovementioned matrices according to the rules. e first two rows in Table 1 describe the cases in which the constructed matrix does not meet the judgment condition for generating a maximum information transmission path. In the first combination, edges (e 1 , e 2 , e 3 , and e 4 ) are selected. It is found that in the matrix M N 5 ,e 1 , M N 5 ,e 2 , M N 5 ,e 3 , and M N 5 ,e 4 are all 0, so this path does not contain N 5 . at is, it is not a maximum information transmission path, so the combination of (e 1 , e 2 , e 3 , and e 4 ) is deleted and the calculation is stopped. Similarly, in the second combination, M N 4 ,e 1 , M N 4 ,e 2 , M N 4 ,e 3 , and M N 4 ,e 4 , is 0, so the calculation result obtained by this structure does not include N 4 , that is, it is not a maximum information transmission path. e ranks of the third, fourth, and fifth matrixes are all 4, so they are full rank matrices that satisfy the condition of generating maximum information transmission paths. e fourth column in rows 3, 4, and 5 in Table 1 show the row transformation process. Number 1 is the lowest in the transformed matrix, so the matrix does not have redundant edges. Column 5 shows the graph structure of the matrix obtained after eliminating the redundant edges. It can be seen from the graphs that the method proposed in this paper can be used to identify all maximum information transmission paths.

Tie Strength between Nodes
According to the characteristics of information transmission, it is reasonable to assume that there must be some association between the nodes in the same transmission path. Here, it is assumed that if information is transmitted frequently between two nodes, then the degree of intimacy between these two nodes is high. After a period of data accumulation, data about maximum information transmission paths is added to the correlation strength matrix (denoted as T). Because the construction of T is executed according to information transmission flows, matrix T also keeps changing with the change of information transmission state. In matrix T, T i, I represents the occurrence number of node i in the process of information transmission and T i, j represents the information transfer times between nodes i and j. e following is the formula for calculating the weight of node i: e following is the formula for calculating the ties between node a and b: According to formulas (5) and (6), the degree of intimacy between different users is calculated. e specific algorithm is shown in Algorithm 1.

Experiments
Five datasets are used in this paper. For detailed information about the datasets, please refer to our paper [2] published earlier.      (1) PTPMF [9]: this method uses neighborhood overlap to approximate tie strength and extend the popular Bayesian Personalized Ranking (BPR) model to incorporate the distinction of strong and weak ties (2) TrustMF [10]: this is a model-based method that adopts matrix factorization technique that maps users into low-dimensional latent feature spaces in terms of their trust relationship and aims to more accurately reflect the users' reciprocal influence on the formation of their own opinions and to learn better preferential patterns of users for high-quality recommendations.
(3) SBPR: this method presents a generic optimization criterion BPR-Opt for personalized ranking, that is,  6 Complexity the maximum posterior estimator derived from a Bayesian analysis of the problem Figure 3 shows the information transmission graph without data processing. It contains 38,501 nodes and 20,354 edges. If all nodes in the Coauthor dataset were displayed in Figure 3, then the picture would be black and the structure would not be visible. erefore, only some of the nodes in the Coauthor dataset are shown in this figure. As can be seen in Figure 3, it is very difficult to process network data.
In the Coauthor dataset, the lengths of most information transmission paths are 2 or 3. Figure 4 shows the path with the maximum length in the Coauthor dataset.
By constructing a matrix according to the structure in Figure 4 and executing the algorithm proposed in this paper on this matrix, it can be found that several groups of the largest and nonsegmented information transmission paths can be found, as shown in Figure 5. As can be seen from Figure 5, all the paths are loop free and achieve the maximum coverage of all nodes. erefore, Figure 5 verifies the accuracy of the algorithm from the perspective of visualization. Figure 6 depicts the degree of all nodes in the maximum information propagation path. It is found that the degree of most nodes is 1, the degree of a few nodes is greater than or equal to 2, and the highest degree value is 13. Figure 6 illustrates that the algorithm achieves the maximum removal of redundant edges. e tie coefficients between different nodes are calculated according to information transmission paths.  shows the tie coefficient of nodes. In it, the darkness of the edges represents the correlation strengths between the node and the ego node. e darker the color is, the stronger the correlation is and vice versa. e number in the edge represents the tie strength between two connected nodes, which is the final result obtained by fusing multiple sets of maximum information transmission paths. In order to analyze the experimental results, we use the following measurement parameters [10]: Precision calculated by P � tp/(tp + fp), Recall by R � tp/(tp + fn), and F1-score by F � P × R × 2/(P + R). tp is the number of  Table 2 shows a comparison of the performances of different clustering algorithms on different datasets. It displays performance comparisons of SBPR, TrustMF, PTPMF, and TieCP using different datasets. According to Table 2, we can conclude that TieCP has the most stable execution effect and the best result regarding F-Score.

Conclusion
e algorithm proposed in this paper calculates the intimacy degrees between users according to the information transmission matrix. Compared to some mainstream methods, our method is simple and able to identify all the maximum information transmission paths. Beyond that our algorithm is relatively more stable when dealing with different kinds of data. Due to the small computational difficulty of constructing a matrix, the algorithm proposed in this paper performs more efficiently than other algorithms.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.