Application of Novel Features in Complex Network for Analyzing Virtual Community

: Virtual community (VC) arises rapidly and influences many aspects of human life styles in real world. Differentiated from traditional way to advertise products/services, VC also enables consumers to participate in interaction activities related to products via threads, learn greater insight about products in deep level while improve consumer loyalty. Most of the extant research did not emphasize or lack of effective methods on how to gain deep learning of product and explain the uniformity of users’ importance in VC. In this paper, based on knowledge in complex network, generalised variance of degree in directed network is proposed to ascertain uniformity of directed network, which is an innovative methodology. Research conclusions can guide enterprises more in-depth understanding of the complex network theory and its application to social network analysis (SNA) with big data streams.


Introduction
Nowadays, data from multimedia social networks provide many information such as where users went and what they ate. Multimedia social data not only changed the culture of users' past experience, it also boosts our economy activity and shorten the social gaps among people. Virtual community (VC) is a special media as a result of "Internet+" [1], [2], arising rapidly to present functions about the products and as enabler for consumer to engage in product innovation [3]. Differing from traditional advertisement on TV or on-site manual post and the proliferation of advance posts in VCs enable consumers to carryout online query and acquire answers easily. At the commercial context, it is a good way for enterprise to save the cost of operating physical real shops. With the aid of VCs, enterprises are able to explore and exploit the opinion from consumers freely, and make relevant product improvements to add value of customer experience in the future [4]. From praxeology, consumers' first shopping experience may become their standards of reference for future purchase [5]. So, it is essential and desirable for enterprise to mine the key information from the VC's advance posts. At the technical view, VC contains special graph structure in which users (consumers) are regarded as nodes and their connections are edges (links), which indicate complex network is a useful tool to explore the properties.
Social, biology and technical network contain individuals with their interaction activities, which lead complex networks as an extension of the graph theory resulting to a practical method to analyse the internal structure and dynamic process [6]- [8]. In Social Network Analysis (SNA), the interactions among the individuals have become an interesting part in information effective spreading and uncovering important nodes, communities and motifs [9]. In fact, nodes and edges represented different features of the data [10]. Networks were built according to intensive relations of nodes, and some properties such as "average degree" [11], "density of network" [12] and "eigenvector centrality" [13] were calculated to describe the key information of networks. The results can also help industries to govern the online service and optimize the costs.
Considering the practical applications to manufacturers, a question related with VC of enterprise are focused: How to judge whether the network holds the important macro-and micro-structures?
On this basis, this paper aims to explore the data from the methodology and case study from big data in Pollen Club. In methodology part, generalised variance of degree of directed network that proposed to judge the uniformity of the node importance in directed network. Thus, the consumer interaction network is established for case study. "Leaders" in network and special community are specified by PageRank and entropy method respectively.
The remainder of this paper is structured as follows. Section 1 introduces the research background and questions. Section 2 is related work on previous research methods. Section 3 describes the datasets of Pollen Club. Section 4 proposes a generalised variance of degree. Section 5 finds the "leaders" and core communities. Section 6 concludes and elaborates inspirations to enterprise management which are gained from case study.

Related Work
Many studies related with VC were conducted by worldwide researchers. last 20 years has seen the rapid growth of the global computer network known as the internet [14]. Schrott and Beimborn [15] reported that VC changed people's life, which is widely accepted as important part of corporate knowledge management. VCs do release people'slocations even if the community and relevant topics are limited to a place [16]. Overall, there are mainly four kinds of methods concerning interactions in VC: Statistics, structural equation modelling, case studies, questionnaires and complex network.
In method of statistics, Khan et al. [17] analysed 1,922 brand posts from five different brands in three countries and ordinary least square and hierarchical moderation regression were used to test the important factors affecting consumers decisions on purchasing. Wan et al. [2] introduced LS-SVM innovatively into the study on VC's function in supply chains. Some researchers use structural equation modelling to explore the important interaction factors. Islam and Rahman [18] analysed the data using structure equation modelling through a questionnaire survey of 430 Facebook users. Simon et al. [19] developed and tested a conceptual research model empirically through structural equation modelling which is based on social impact theory, social identity theory and social exchange theory. Although the researches can specify factors that influence the VC, they ignore the topology information about community, which may root in the mechanism.
Other researchers conclude the factors from case studies and questionnaires. Grantham and Habel [14] used 9-point digital Likert scale questionnaire to investigate the degree to which participants perceived their VCs to exhibit each of the 10 target characteristics. Kilgour et al. [20] employed depth interviews initially, followed by questionnaires, and then content analysis was performed on 723 online media articles relating to social media marketing to identify semantic and conceptual relationships. The data of case studies are easily interfered by environment and samples of interviewers [21].
What is more, complex network is contained to gain insight of the VC. Park et al. [22] made use of the HITS algorithm to quantify the influence exerted by the opinion leaders on Twitter during the 2011 Seoul mayoral elections. Chen et al. [23] presented a simulated annealing algorithm to find important users by optimising the user interest's concentration ratio in user groups. Chiang and Wang [24] conducted research on the interactive features of product-review networks with considering the out-degree centralisation, density, and microstructure of networks. However, these methods and traditional properties can not make a pre-judgement for the beingness of the important users ("leaders"). So, this paper proposes an innovative property in Section 4.

Data Sets
The dataset includes the replies of the corresponding main posts [25]. In detail, 129,362 users' information are retained containing 57,569 main posts and 886,087 replies.
Due to different roles that user played in the Pollen Club [26], users can be divided into three groups: Ordinary User Group (OUG), Intermediary User Group (IUG) and Enterprise User Group (EUG). OUG is the assemble of consumer who bought the product and register on Pollen Club to ask questions about product. IUG is responsible for helping OUG to solve problem under guidance of the EUG. And EUG contains employee, technical instructors, salesmen and propogandists of enterprise, which suggests different group undertake certain function in the Pollen Club. However, they also interact with each other, so data are firstly analyzed in the whole to specify key users in Pollen Club and consider the level information of users in community part. The relevant methods are showed in Section 4.

Methodology
Section 4 proposed one initiative property related with complex network to measure uniformity of directed network.

Problem Statement
Variance can be used to demonstrate the distribution of the random variables in statistics. Previous studies were done to design the properties like "Variance of degree" [27] and "Generalised Variance of degree of network" [26] in undirected network and judge whether some "leaders" are in the network, who hold the special position among the nodes. Some examples of undirected network and directed networks: Compared with Fig. 1(a), the edges should have directions. In this paper, user b in Fig. 1(b) reply user c 3 posts which could not illustrate complete in Fig. 1(c). So generalised variance of degree of undirected network could not uncover this information. However there exists no research to definite the variance of the directed network. This property is defined in Section 4.2.

Generalised Variance of Degree of Directed Network
Generalised variance of degree of directed network is to specify uniformity of the node importance. Since the degrees of nodes in weighted directed network have two kinds containing generalised indegree and generalised outdegree. The generalised indegree of node is the sum of the weight of arrows whose end point is between their neighbors, denoted . Conversely, the generalised outdegree of node is the sum of the weight of arrows whose start point is between their neighbors, denoted by .

314
Volume 10, Number 4, December 2020 As a result, this paper defines generalised variance of degree in directed network in the following way: What is more, ( ) in Eq. (1) can degenerate into variance of degree of unweighted directed network showed in Eq. (2): From the result showed in decomposition, ( ) excludes the covariance of generalised indegree and outdegree. It is reasonable that the great ( , ) means the great mutually equal between each two nodes, which can make ( ) judge the difference among the nodes precisely.
To judge whether the network has "leaders", null model ( ′ , ′ , ′ ) is introduced to make the rule. Generalised variance of degree of ( ′ , ′ , ′ ) is ( ) and the standard deviation of ( ′ , ′ , ′ ) is [ ( ) ]. Since generalised variance of degree of ( ′ , ′ , ′ ) matches the Z distribution,which is proved to approximate normal distribution. In this paper, "leader" network is defined as: if the generalised variance of degree of G is bigger than the "3 − σ" margin of null model ( ′ , ′ , ′ ); that is ( ) > ( ) = [ ( ) ] + 3 [ ( ) ], or "autonomy" network otherwise. From this definition, "leader" network has significant nodes named "leaders" controlling other nodes and influencing the generalised variance of degree, however the importance of "autonomy" network is relatively uniform.

Results
Firstly, user interaction network ( , , ) is established in the following way: the nodes represent 129,362 users. If user replied the post of user , there would be arrow from node to node , of which the weight notes the frequency of reply. In symbol ( , , ), is the set of all the users and is the set of links (edges) among the nodes, with noting the set of weights on links. A graph abut G is showed by Gephi in Fig. 2: After calculating by Matlab according to the Eq. (3), ( ) = 382850.6910 ≫ ( ) = 12.9252, which shows the network is a special network containing significant powerful nodes. So, PageRank is used to mine the "leaders" in G.
The thought of PageRank was established to rank the impotence of web page, which is determined by numbers and quality of pages which send a link to that page [28]. It is the same with interaction of people who has close relation with someone always replies to that person frequently. Since there are 129,362 nodes, so the network includes 12,936 "leaders". Due to the length of paper, table only list information about first 10 in Table 1: It can be seen from Table 1 that OUG and IUG take a great proportion of users. So, in this online club, EUG is not the major part to participate in interaction. However, IUG play an important role in the leading members in OUG to discover the solution and beauty of the phone. Overall, the Pollen Club is very active, fascinating the OUG to explore and exchange opinions with others. Moreover, sometimes the small communities assemble users with same interests. So, it is convenient to learn about the topics concerned, when it comes to a large scale of data. In another way, it can also inspire the enterprise to broadcast the product starting with which small group from analysis.
Due to difference in interaction among the users, network is composed of a series of weekly connected graphs. Through calculating the number of the nodes within the network, except for 2,746 isolated nodes, other comments two or more node. So the biggest component (noted 1 ) has over 120 thousand nodes, which is typical sample of G. Without loss of generality, this section separates the 1 into different communities, combing the PageRank value and level feature of the users to rank the communities. The Fig.  3(a) shows the separating results of communities, in which the nodes with same color stand for the same components: It can be seen from the color that 1 contains many communities. Which community holds the special position physically? To solve this problem, "community network" is established by using a "big node" to represent the community and showed in Fig. 3(b). And community I has an arrow to community J, of which

International Journal of e-Education, e-Business, e-Management and e-Learning
the weight is the sum of all arrows' weights from community I to J, when the nodes within the community I send a link to that of community J. Because the topology positions of communities determine their function about the ability in spreading information in the network. So betweenness centrality property is used to calculate the number of the shortest path passing each community by which it can describe whether the communities hold the "bridge" positions.
However, there exists an important procedure to adjust the weight. In "community network", weight means the closeness between each two communities. In computing betweenness centrality property, weights stand for the cost of each two communities. So, the bigger weights mean the more closeness and the less costs. Weights are changed by Eq. (4), according previous analysis: where ( ) and ( )represent the minimum and maximum weight of each two communities in "community network". The range of ̅̅̅̅ is [1,2]. Apart from the position of network, an important community has a strong relation with the number of nodes which has high PageRank value. Following this, the betweenness centrality of communities, the average value of the PageRank value of nodes in separate community, and the number of nodes whose level belongs to OUG, IUG and OUG are calculated by Matlab. Finally, entropy method is introduced to established a judgement function to rank the communities.
Entropy method is an objective method to determine the weights of features, avoiding influence from subjective matters like AHP (Analytic Hierarchy Processing). Take as data of the matrix of communities, features 1 − 3 represent the number of the users in each community whose levels are in OUG, IUG and EUG respectively. 4 stands for the average value of PageRank values of nodes in communities and 5 is the betweenness centrality of communities. Since 5 features hold different data range, firstly is normalized to and entropy weight is calculated as in Table 2: What is more, an evaluation function ( ) is as following (Eq. (5)): Since the thought of entropy method calculates the weights by divergence of features. So, in Table 2 there is a significant difference among communities' composition especially in number of users in OUG and IUG. At the same time, average PageRank values of nodes play an important role in discriminate the communities. Also, relevant small weight of the betweenness centrality indicate most of communities stay in the peripheral of the network. Finally, top 5 communities are listed in Table 3:  Table 3 figures out number 169 community holds the most important position. Although the number of

International Journal of e-Education, e-Business, e-Management and e-Learning
users in number 169 community approximates to that of the number 0 community, however with higher average value of PageRank value and betweenness centrality. What is more, number 169 community is a center of information, which is deserved for enterprise to analyze for broadcasting their information in the future. The values of evaluation function of other communities approximate to each other.

Conclusions
The consumer virtual community is explored by methodology and case study based on data from Pollen online club. In methodology part, generalised variance of degree of directed network is proposed to judge the uniformity of the node importance in directed network. Later, the consumer interaction network is established. Generalised variance of degree of directed network explore the uniformity of network. "Leaders" in network and special community are specified by PageRank and entropy method respectively.
Our new property related with complex network theory can be easily generalised into other fields. Generalised variance of degree of directed network can test the "leaders" ahead of time, when it comes to big data. Also results in this paper showed that consumer virtual community provides sufficient information where consumers and enterprise communicate with each other freely. Product manufacturers and services providers can provide better service and improve the product and service qualities according to the feedbacks from the web. Overall, enterprise should stimulate the IUG to keep on service for OUG to ease management pressure.