Lobby index as a network centrality measure

We study the lobby index (l-index for short) as a local node centrality measure for complex networks. The l-inde is compared with degree (a local measure), betweenness and Eigenvector centralities (two global measures) in the case of biological network (Yeast interaction protein-protein network) and a linguistic network (Moby Thesaurus II). In both networks, the l-index has poor correlation with betweenness but correlates with degree and Eigenvector. Being a local measure, one can take advantage by using the l-index because it carries more information about its neighbors when compared with degree centrality, indeed it requires less time to compute when compared with Eigenvector centrality. Results suggests that l-index produces better results than degree and Eigenvector measures for ranking purposes, becoming suitable as a tool to perform this task.


Introduction
The Hirsch index (h-index) has been thoroughly studied for scientometrics purposes.It has been applied to networks of individual researchers collaboration [1,2,3,4,5], research groups [6], journals [7,8] and countries [9] obtained from database of citations.In this context, the h-index is the largest integer h such that a node from a given network has at least h neighbors which have a degree of at least h [1].
Korn et al. [10] have proposed a general index to network node centrality based on the h-index.Korn et al. named it as lobby index ( l).Korn et al. argue that the proposed index contains a mix of properties of other well known centrality measures.However, they have studied it mainly in the context of artificial networks like the Barabasi-Albert model [11].
Like l, degree D is a local centrality measure that is equal to the number of links of a given node.If the network is directed, the number of outlinks is the outdegree and the number of inlinks is the indegree.Unlike l, betweenness and Eigenvector are global centrality measures that take into account all nodes in the network.The betweenness B of a given node is proportional to the number of geodesic paths (minimal paths between node pairs in the network) that pass through it.It seems to be an important measure for networks where such minimal paths represent transport channels for information (internet, social networks), energy (power grids), materials (airports network) or diseases (social and sexual networks).Eigenvector centrality of a node is proportional to the sum of the centralities of the nodes to which it is connected, α is the largest eigenvalue of A = a ij and n the number of nodes [12]: In this paper, we compare the l with degree, betweenness and Eigenvector centralities applied to associative (non-transport) networks to obtain the correlation between these measures.

Methods
We calculate the l, degree D, betweenness B and Eigenvector E centralities for the nodes in linguistic and biological networks already considered by the physics community.We also plot the dispersion of D versus l, B versus l and E versus l, to verify the correlation between these measures.
We use the linguistic database Moby Thesaurus II [13] composed by 30,260 words, for which some network properties have been studied [14,15].We choose the convention that an outlink goes from a root word to a synonym to construct the network.As an example, in the entry set,assign,assign to,assigned,... the word "set" is the root and the link goes to its synonyms.We obtain the directed links "set"→ "assign", "set"→"assign to" and "set"→ "assigned".
The raw thesaurus presents over 2.5 million links, but there are many words with only inlinks, that is, they are not root words.We worked with a filtered version containing about 1.7 million links where only root words constitute nodes.We choose the outlinks to calculate the centrality measures, and the minimal number of outlinks is 17 and the maximum is 1,106.
The biological network is the yeast protein-protein network downloaded from the BioGRID repository [16] that is a curated repository for 5,433 proteins and over 150,000 physical and genetic unambiguous interactions.
The BioGRID network is composed by gene products connected by a link [16].The links include direct physical binding of two proteins, coexistence in a stable complex or genetic interaction as given by one or several experiments described in the literature.As an example, using the entries YFL039C YBR243C YFL039C YKL052C extracted from BioGRID data set, two links are created: "YFL039C" − "YBR243C" and "YFL039C" − "YKL052C", and the network is undirected.

Local measure: degree
In Figure 1, we present dispersion plots of the l versus D for the networks studied.The l is correlated with D (h ∝ D) in the low D regime (D ≤ 100) in both networks.However, for higher D, one observe l proportional to D 0.4 for both networks.The origin of this anomalous exponent is not clear.Notwithstanding, although correlated, the two measures are not redundant.In the thesaurus case, the words with low frequency of use or non-polysemous present low l but high degree.

Global measures: betweenness and Eigenvector
We now compare the l with two standard global centrality measures, betweenness and Eigenvector.First, in Figure 2 we present the dispersion plots of l versus B. The l presents no strong correlation with B in both networks.In Figure 3, we give the dispersion plot for the l versus the Eigenvector centrality E for the thesaurus network.In the high E regime the maximal l values is bounded by h ∝ E 0.4 , as in the l versus D plot.We observe several nodes with high E but relatively low l (see Inset).Examining these nodes individually, we find that l seems to outperform E in the ranking task, since words with high l also have high E and are basic and important polysemous words.In contrast, terms with high E can have high or low l.Those with low l are mostly phrasal verbs or multiple word expressions derived from the words with high l.
It is difficult to qualify a ranking list, but the above effect is very clear, as can be observed in Table 1 (see Appendix) that shows the top 25 words ranked by l and E, and the same occurs for other high E and low l words.
In the case of the Yeast protein network, we observe a strong correlation between l and E for E > 0.2.The highest l seem also to be bounded by a h ∝ E 0.4 behavior.Also, the results suggest that the l could outperform E in the task of classifying relevant nodes.In the same Figure, one can observe a detaching cluster of nodes with low E and moderate l.We investigated these nodes and, to our surprise, they all seem to be related to ribosome assembly, meaning that, somehow, the l carries information that could be useful in the detection of modules of functionally related proteins.

Discussion
In the regime relevant for ranking purposes, the biological network data shows a strong correlation between the Eigenvector and lobby centralities, although the computation of the lobby index is much less demanding because it is not iterative and uses only local information.This suggests that the l centrality can be useful for ranking purposes in large databases with results comparable with Eigenvector centrality.This claim could be tested in the paper citation network studied by Chen et al. [17] where the Page-Rank algorithm, which core is the Eigenvector centrality, has given interesting results.
Local measures, such as l, seem to make more sense for non-transport networks where path distance or channel flux has little influence and are not important aspects to define centrality [18].The same does not occur with some global measures where path distance must be taken into account.Being local, l requires O(D) time to compute which is always less than the O(N L) required to calculate B using Brandes' algorithm [19], where N is the number of nodes and L is the number of links of a given network.As l requires less computational time than E (O(N )), the high correlation between the two measures showed for the highest ranks suggests that the l could be very suitable for ranking tools and search engines.Both centrality measures make sense for studying diffusion and epidemic processes in transport networks, but the relevance of minimal paths is not so clear for linguistic or cultural networks like thesauri or, as another example, the network of cultural culinary recipes studied by Kinouchi et al. [20] where links of ingredients represent associations but not channels.For networks similar to the linguistic one studied here, there is a strong decay of correlations: two words A and C with minimal path of two links (that is, A−B −C) are almost uncorrelated, since this means that C is not a word semantically related to A. The paths between words may be relevant to describe perhaps associative psychological processes (say, A remembers B that remembers C), but they are not channels in the same sense of physical transport networks.So, the locality of the l could be an advantage to its application for ranking nodes in non-transport networks where path distance or channel flux has poor relevance and are not important aspects to define centrality [18].We notice that this could be the case of web pages since links represent more associations than channels and users do not navigate from link to link by large distances.

Conclusions
In conclusion, we studied the l in the Moby II Thesaurus and the proteinprotein interaction Yeast networks.Several characteristics of this centrality index have been highlighted.The l seems to be a better local measure than the node degree D because it incorporates information about the importance of the node neighbors.Being local, l requires O(D) time to compute that is always less than O(N ) required to compute E and O(N L) time to compute B.
We also found that the l is more correlated to Eigenvector centrality than Betweenness centrality.Indeed, in the ranking task for words in the thesaurus, l seems even to outperform the E as a centrality index, detecting basic polysemous words instead of words with low frequency of use or nonpolysemous.
Since Eigenvector centrality corresponds to the core idea behind the original Page-Rank algorithm [17], which is computationally very demanding, we suggest that the l could furnish auxiliary information for ranking pages in the area of Search Engine Optimization.Due to the fact that l requires less time to compute when compared with standard global centrality measures, its use in other physical, biological and social networks promises very interesting results.

Figure 1 :
Figure 1: Log-log dispersion plot of l versus Degree centrality D for a) Moby Thesaurus II and b) Yeast network.

Figure 2 :
Figure 2: Log-log dispersion plot of l versus Betweenness B for a) Moby Thesaurus II network and b) Yeast network.

Figure 3 :
Figure 3: Log-log dispersion plot l versus Eigenvector centrality E for the Moby Thesaurus II.Inset: Linear scale, notice the several words with high E but low h.

Figure 4 :
Figure 4: Log-log dispersion plot of l versus E for the Yeast network.The l and E centralities are well correlated for E > 0.2 where there is a h ∝ E 0.4 bound for the highest l values.Inset: linear scale, notice the cluster of high l but low E ribosome proteins.