A measure of centrality based on the network efficiency

We introduce a new measure of centrality, the information centrality C^I, based on the concept of efficient propagation of information over the network. C^I is defined for both valued and non-valued graphs, and applies to groups and classes as well as individuals. The new measure is illustrated and compared to the standard centrality measures by using a classic network data set.


Introduction
The idea of centrality was first applied to human communication by Bavelas (Bavelas, 1948(Bavelas, , 1950) ) who was interested in the characterization of the communication in small groups of people and assumed a relation between structural centrality and influence in group processes.Since then various measures of structural centrality have been proposed over the years to quantify the importance of an individual in a social network (Wasserman and Faust, 1994).Most of the centrality measures are based on one of two quite different conceptual ideas and can be divided into two large classes.The measures in the first class are based on the idea that the centrality of an individual in a network is related to how he is near to the other persons.The simplest and most straightforward way to quantify the individual centrality is therefore the degree of the individual, i.e. the number of its first neighbours.The most systematic elaboration of this concept is to be found in (Nieminen, 1974).A degree-based measure of the individual centrality corresponds to the notion of how well connected the individual is within its local environment.The degree-based measure of centrality can be extended beyond first neighbours by considering the number of points that an individual can reach at distance two or three (Scott, 2003).A global measure based on the concept of closeness was proposed in (Freeman, 1979) in terms of the distances among the various points.One of the simplest notion of closeness is that calculated from the sum of the geodesic distances from an individual to all the other points in the graph (Sabidussi, 1966).The second class of measures is based on the idea that central individuals stand between others on the path of communication (Bavelas, 1948;Anthonisse, 1971;Freeman, 1977Freeman, , 1979)).The betweenness of a point measures to what extent the point can play the role of intermediary in the interaction between the others.The simplest and most used measure of betweenness was proposed by Freeman (Freeman, 1977(Freeman, , 1979) ) and is based on geodesic paths.In many real situations, however, communication does not travel through geodesic paths only.For such a reason two other measures of betweenness, the first based on all possible paths between a couple of points (Freeman Borgatti and White, 1991), and the second based on random paths, (Newman, 2003) have been introduced more recently.
In this paper we propose a new measure of point centrality which is a combination of the two main ideas of centrality mentioned above.The new measure in fact is sensitive to how much an individual is close to the others and also to how much he stands between the others.The measure is named information centrality since is based on the concept of efficient propagation of information over the network (Latora and Marchiori, 2001).The information centrality of an individual is defined as the relative drop in the network efficiency caused by the removal of the individual from the network.In other words we measure how the communication over the network is affected by the deactivation of the individual.The information centrality is defined for both valued and non-valued graph, and naturally applies to group and classes as well as individuals.The paper is organized as follows.In Section 2 we briefly review the most widely used measures of centrality, degree closeness and betweenness.In Section 3 we introduce point and group information centrality, while in Section 4 we discuss how the information centrality can be used to measure the centralization of the graph.Similarities and dissimilarities with respect to the standard measures are discussed and illustrated by means of simple examples in Section 5 and Section 6.

The standard centrality measures
We first review the three most commonly adopted measures of point centrality (Freeman, 1979): the degree centrality C D , the closeness centrality C C , and the betweenness centrality C B .Such measures of centrality imply three competing theories of how centrality might affect group processes, respectively centrality as activity, centrality as independence and centrality as control (Freeman, 1979).We represent a social network as a non-directed, non-valued graph G, consisting of a set of N points (vertices or nodes) and a set of K edges (or lines) connecting pairs of points.The points of the graph are the individuals, the actors of a social group and the lines represent the social links.The graph can be described by the so-called adjacency matrix, a N × N matrix whose entry a ij is 1 if there is an edge between i and j, and 0 otherwise.The entries on the diagonal, values of a ii , are undefined, and for convenience are set to be equal to 0.

Degree Centrality
The simplest definition of point centrality is based on the idea that important points must be the most active, in the sense that they have the largest number of ties to other points in the graph.Thus a centrality measure for an actor i, is the degree of i, i.e. the number of points adjacent to i. Two points are said adjacent if they are linked by an edge.The degree centrality of i can be defined as (Nieminen, 1974;Freeman, 1979): where k i is the degree of point i.Since a given point i can at most be adjacent to N − 1 other points, N − 1 is the normalization factor introduced to make the definition independent of the size of the network and to have 0 ≤ C D i ≤ 1.The degree centrality focuses on the most visible actors in the network.An actor with a large degree is in direct contact to many other actors and being very visible is immediately recognized by others as a hub, a very active point and major channel of communication.

Closeness Centrality
The degree centrality is a measure of local centrality.A definition of actor centrality on a global scale is based on how close an actor is to all the other actors.In this case the idea is that an actor is central if it can quickly interact with all the others, not only with first neighbours.The simplest notion of closeness is based on the concept of minimum distance or geodesic d ij , i.e. the minimum number of edges traversed to get from i to j.The closeness centrality of point i is (Sabidussi, 1966;Freeman, 1979;Wasserman and Faust, 1994): where L i is the average distance from actor i to all the other actors and the normalization makes 0 ≤ C C i ≤ 1. C C is to be used when measures based upon independence are desired (Freeman, 1979).Such a measure is meaningful for connected graphs only, unless one assumes d ij equal to a finite value, for instance the maximum possible distance N − 1, instead of d ij = +∞, when there is no path between two nodes i and j.Such an assumption will be used in Section 6 to study a non-connected graph.
Betweenness Centrality Interactions between two non-adjacent points might depend on the other actors, especially on those on the paths between the two.Therefore points in the middle can have a strategic control and influence on the others.The important idea at the base of this centrality measure is that an actor is central if it lies between many of the actors.This concept can be simply quantified by assuming that the communication travels just along the geodesic.If n jk is the number of geodesics linking the two actors j and k, and n jk (i) is the number of geodesics linking the two actors j and k that contain point i, the betweenness centrality of actor i can be defined as (Anthonisse, 1971;Freeman, 1977Freeman, , 1979)): In the double summation at the numerator, j and k must be different from i.
Similarly to the other centrality measures C B i takes on values between 0 and 1, and it reaches its maximum when actor i falls on all geodesics.There are several extensions to the original betweenness measure proposed by Freeman.In particular, in most of the cases the communication does not travel through geodesic paths only, and for this reason a more realistic betweenness measure should include non-geodesic as well as geodesic paths.Here we mention two other measures of betweenness that include contributions from non-geodesic paths: the flow betweenness and the random paths betweenness.We will disuss in detail the first one since it will be used in some of the examples of the following sections.The flow betweenness was introduced in (Freeman Borgatti and White, 1991) and is based on the concept of maximum flow.It is defined by assuming that each edge of the graph is like a pipe and can carry a unitary amount of flow (or an amount of flow equal to the edge's value in the extension to valued graphs).By considering a generic point j as the source of flow and a generic target point k as the target, it is possible to calculate the maximum possible flow from j to k by means of the min-cut, max-flow theorem (Ford and Fulkerson, 1962).In general it is expected that more than a single unit of flow is exchanged from j to k by making simultaneous use of the various possible paths.The flow betweenness centrality of point i is defined from the amount of flow m jk (i) passing through i when the maximum flow m jk is exchanged from j to k, by the formula: The second betweennes was introduced very recently in (Newman, 2003;Newman and Girvan, 2003) and is based on random paths.It is suited for such cases in which a message moves from a source point j to a target point k without any global knowledge of the network, and therefore at each step chooses where to go at random from all the possibilities.The betweennes of point i is equal to the number of times that the message passes through i in its journey, averaged over a large number of trials of the random walk.(Newman, 2003) 3 Point and group information centrality In this section we introduce the information centrality, a new measure based on the concept of efficient propagation of information over the network.The information centrality applies to points as well as to group/classes, and is defined both for valued and non-valued graph.For this reason we now consider a social network as a non-directed (however the extension to non-symmetric data -digraphs-do not present any special problem) valued graph G of N points and K edges.A valued graph is a better description of a social system if the intensity of the social relations is a relevant ingredient that one wants to take into account.In fact the numerical value attached to each of the edges can be thought as a measure of the social proximity between two persons.Consequently the entries of the adjacency matrix a ij that describes G are positive real numbers when there is an edge between i and j, and 0 otherwise.The most adopted convention is to consider the values a ij as proportional to the intensity of the social connection.An alternative, although equivalent description, that we adopt here, is to consider such numbers as inversely proportional to the intensity of the social connection; for instance a ij can be set to be equal to the inverse of the number of contacts between two individuals, or to the inverse of the amount of time they spend together.In our description the value of an edge can be imagined as a length associated to the edge: the stronger the intensity of the social link, the closer the two individuals are.In a valued graph, the shortest path length d ij between i and j is defined as the smallest sum of the edges lengths throughout all the possible paths in the graph from i to j.When a ij = 1 for all existing edges, i.e. in the particular case of a non-valued graph, d ij reduces to the minimum number of edges traversed to get from i to j.
The information centrality we are going to introduce is based on the following simple ideas: 1) information in social networks travels in parallel in the sense that all the individuals exchange packets of information concurrently; 2) the importance of a point (group) is related to the ability of the network to respond to the deactivation of that point (group) from the network.In particular, we measure the network ability in propagating information among its points, before and after a certain point (group) is deactivated.
In order to measure how efficiently the points of the network G exchange information, we use the network efficiency E, a quantity introduced in (Latora andMarchiori, 2001, 2003).The efficiency is a good measure of the performance of parallel systems, i.e. when all the points in the graph concurrently exchange packets of information (Latora and Marchiori, 2003).Such a variable is based on the assumption that the information/communication in a network travels along the shortest routes and that the efficiency ǫ ij in the communication between two points i and j is equal to the inverse of the shortest path lenght d ij .The efficiency of G is the average of ǫ ij : and measures the mean flow-rate of information over G.The quantity E[G] is perfectly defined in the case of non-connected graphs, in fact when there is no path between two points i and j, we assume d ij = +∞ and consistently ǫ ij = 0.For a non-valued graph E varies in the range [0, 1].We are now ready to define point and group information centrality.It is important to say that the same ideas can be applied to define the importance of the edges of the graph (Latora and Marchiori, 2004;Fortunato Latora and Marchiori, 2004) Point information centrality The information centrality of a point i is defined as the relative drop in the network efficiency caused by the removal from G of the edges incident in i: where by G ′ i we indicate a network with N points and K − k i edges obtained by removing from G the edges incident in point i.The removal of some of the edges affects the communication between various points of the graph increasing the length of the shortest paths.Consequentely the efficiency of the new graph The measure C I i is normalized by definition to take values in the interval [0,1].It is immediate to see that C I is somehow correlated to all the three stardard centrality measures: C D (formula 1), C C (formula 2), and C B (formula 3).In fact, the information centrality of point i depends on the degree of point i, since the efficiency E[G ′ i ] is smaller if the number k i of edges removed from the original graph is larger.
i since the efficiency of a graph is connected to ( i L i ) −1 .Finally C I i , similarly to C B i , depends on the number of geodesics passing by i, but it also depends on the lenghts of the new geodesics, the alternative paths that are used as communication channels, once the point i is deactivated.No information about the new shortest paths is contained in C B i , and in the other two standard measures.
Group information centrality Analogously to point centrality, the information centrality of a group of points S can be defined as the relative drop in the network efficiency caused by the deactivation of the points in S, i.e. by the removal from graph G of the edges incident in points belonging to S: Here by G ′ S we indicate the network obtained by removing from G the edges incident in points belonging to S. C I S is normalized to take values between 0 and 1.

Graph Centralization
We have concentrated in so far on the question of the centrality of a point (and of a particular group of points) in the graph.But it is also possible to examine to which extent the whole graph has a centralized structure.In fact, related to the point centrality measures, is the idea of an overall index of centralization of a graph describing to which extent the graph is organized around its most central point.Indexes of graph centralization based on the standard measures of point centrality have been proposed over the years (Freeman, 1979).Here we propose a measure of the graph centralization based on the information centrality.The two properties common to all the graph centralization indexes, no matter the point centrality measure upon which they are built on, are: 1) graph centralization should measure to which extent the centrality of the most central point exceeds the centrality of the other points; 2) graph centralization should be expressed as the ratio of that excess to its maximum possible value for a graph with the same number of points (Freeman, 1979).We define a graph centralization based on the information centrality as: is the maximum possible value of that is obtained for a star with N points.

Comparing C I to the other point centrality measures
The new measure of centrality we have introduced agrees with the three standard measures (degree, closeness, betweenness) on assignement of extremes.For instance it assignes the maximum importance to the central point of a star, and equal importance to the points of a complete graph.However the agreement breaks down between these extremes.Consider, for instance, the graph sketched in fig. 1, which is composed by two main parts, graph G 1 with N 1 points and graph G 2 with N 2 points (N 1 > N 2 ), and by a single node i, connecting G 1 to G 2 .For such simple example the information centrality contains some of the features of the betweenness (an actor is central if it lies between many of the actors).In fact the information centrality, similarly to the betweenness centrality, assigns the maximum importance to point i, which certainly plays an important role since it works as a bridge between G 1 and G 2 .On the other hand it is very unlikely that the degree or the closeness centrality would attribute to point i the highest score.The first, because G 1 and G 2 may contain points with degree larger than that of i.The second, because the point with smallest distance to all the other points will probably be in G 1 , especially if we assume N 1 ≫ N 2 .We will now illustrate similarities and dissimilarities with the three standard measures by using as an example a non-directed non-valued graph constructed ad hoc.The graph considered, drawn in fig.2, is a tree with N=16 points and K=N-1.The four centrality scores are reported in tab.1.The points are ordered in decreasing order of C I .Although the four measures show a certain overall agreement, for instance they all attribute the highest centrality to point 2, there are some differences worth of noting.The information centrality assignes the top score to point 2, second score to points 1,3, third score to points 7,12.But it also distinguish point 9,10,11 (fourth score) from the remaining points.The only other measure that operates such a distinction is C C which, on the other hand, assignes the second score to points 7,12 and the third score to points 1,3 inverting the result of C I .Neither the degree centrality C D nor the betweenness centrality C B have the resolution of C I and C C .In fact C D assignes the top score to three points, namely points 1,2,3 all having five neighbours, and the second score to points 7,12 both with 2 neighbours.While C B assignes the top score to point 2 and the second score to points 1,3,7,12.Both C D and C B does not distinguish points 9, 10, 11 from the remaining points: 4,5,6,8,13,14,15,16.From this simple example C I results as having, together with C C , the best resolution.On the other hand, as we have seen in the example of fig. 1, C I contains some of the features of C B .And in the next session we will show that in some cases C I can be strongly correlated to C D .6 Applications to the primate data In this section we study a classical data set, the primate data collected by Linda Wolfe (Borgatti Everett and Freeman, 1992;Everett and Borgatti, 1999), recording 3 months of interactions amongst a group of 20 monkeys, where interactions were defined as the joint presence at the river.The resulting nondirected non-valued graph is represented in fig. 3. The graph consists of 6 isolated points and a connected component of 14 points.The dataset also contains information on the sex and the age of each animal as reported in tab.2.Such a graph has been studied in (Everett and Borgatti, 1999), where the standard point centrality measures of degree, closeness and betweenness have been generalized to apply to groups as well as individuals.For such a particular dataset we can therefore compare the measure we have introduced to the standard measures of point centrality and also of group centrality.
In table 2 we report the point centrality scores obtained for each monkey, respectively C I , C D , C C and C B .The flow betweenness centrality C F (Freeman Borgatti and White, 1991) is also considered.As discussed in Section 2 the flow betweenness is not The dataset collected by Linda Wolfe (Borgatti Everett and Freeman, 1992;Everett and Borgatti, 1999) contains also information on the sex and age of each animal (see table 2) based on geodesic paths as in C B , but on all the independent paths between all pairs of points in the graph.Age and sex of each monkey are also reported in table.Monkey 3 results the most central according to all the centrality indexes considered.Again all the centrality measures considered assignes the second rank to monkey 12 and the third rank to monkeys 13 and 15.The six isolated monkeys are the least central points according to C I , C D and C C .Notice that for any of these six points C C is equal to 0.05 and not to zero, since it is assumed that d ij = N ∀j and therefore C C i = L −1 i = (N − 1)/(N − 1)N.On the other hand, the betweenness centrality C B assignes a zero score to fourteen points, the six isolated monkeys and other eight monkeys, namely 4,5,7,9,10,11,14,17.The latter points, although having a degree equal or larger than one -for instance monkey 14 has four neighbours, while monkeys 4,7,10 and 17 have three neighbours each -do not play any role in the communication between couples of points, in the sense that are not present in the shortest paths between couples of points.Of course such a result is a consequence of the assumption that communication between couples of point takes only the shortest path In fact, by considering the flow betweenness, only seven points have a zero score, the six isolated points and monkey 9.For the dataset considered, the ranking of the 20 points produced by C I , C D and C C is the same.Nevertheless the normalized values of these measures are different as reported in table 2 and as can be seen in Fig. 4, where we plot the centrality score for each of the 20 points.The points are ordered as a function of their score according to C I .The behavior of C I , C D and C C is similar although 3 the six monkeys having age 7-9 and group 4 the four monkeys having age 4-5.Group 5 is made by the five females, while group 6 is made by the fifteen males.As illustrated in Fig. 5, among the age groups the most central one is the 10-13 years old (group 2), according to all the four measures.This is the group containing monkey 3, who is the most central point also as an individual.The four age groups in decreasing order of importance are: 2, 3, 1, 4 for the information centrality, 2, 1, 3, 4 for the degree centrality, 2, 1, 4, 3 for the closeness centrality and 2, 1, 3-4 for the betweenness centrality which assigns a score equal to zero to the two youngest groups.The information centrality is the only measure assigning the second position to the 7-9 years old, while the other three measures assign the second position to the 14-16 years old.The information centrality assigns last position to group 4 (age 4-5), similarly to the degree centrality and to the betwenness centrality.
Among the sex groups the most central one is the male group (which is also the largest one) for both information and degree centrality.The situation is inverted according to the betweenness centrality, while the closeness centrality attributes the same score to the two groups.
In addition to groups formed a priori, like a team in a company or the division of the individuals according to age or sex, like the one we have considered above, the centrality measure we have proposed in this paper can be applied to set of individuals identified by cohesive subgroups techniques such as cliques, n-cliques, k-plexex, lambda sets etc. (Wasserman and Faust, 1994)).
Another possibility is to use the centrality measure as a criterion to forming groups: we are working on an algorithm to finding the groups inside a given Group Centrality Scores Fig. 5. Centrality score for each of the six groups considered, namely 1(age 14-16), 2(age 10-13), 3(age 7-9), 4(age 4-5), 5 (males), 6 (females), and the four centrality measures graph based on the concept of graph efficiency and information centrality (Fortunato Latora and Marchiori, 2004).

Conclusions
In this paper we have briefly reviewed the standard measures of centrality proposed for social networks and we have introduced a new measure of centrality, the information centrality C I , that is based on the concept of efficient propagation of information over the network.C I is defined for both valued and non-valued graphs, and applies to groups as well as individuals.The groups can be either a set of individuals formed a priori, such as a team in a company or a group of individuals chosen according to some attribute (age, sex, income), or a set of individuals identified by cohesive subgroups methods or by positional analysis method.We have illustrated similarities and dissimilarities with respect to the three standard measures of degree, closeness and betweenness in two non-directed non-valued graphs.It remains to be seen if, in the light of further empirical work, the new measure can be more appropriate than the others in some applications.

Fig. 1 .
Fig. 1.A graph G composed by two subgraphs G 1 and G 2 connected by node i where i * is the point with highest centrality.The normalization factor (N +1)(N −2) N +2

Fig. 2 .
Fig.2.A non-directed non-valued tree with N=16 points, a simple case constructed to compare the new measure of centrality we have introduced with the three standard measures: degree, closeness and betweenness.

Fig. 3 .
Fig. 3.The graph of the interactions amongst a group of 20 monkeys.The dataset collected by Linda Wolfe(Borgatti Everett and Freeman, 1992;Everett and Borgatti, 1999) contains also information on the sex and age of each animal (see table2)

Table 1
The point centrality C I is compared to the standard centrality measures C D , C C , and C B for the graph in fig.2.The points are ordered according to C I .

Table 2
Individual centralities: for each monkey we report age and sex group, the information centrality C I and the three standard centrality measures C D , C C and C B .The flow betweenness centrality C F is also reported in the last column.forthefirstfive points the normalized values of C I are smaller than C D and larger than C C .For instance the first point in the rank, namely monkey 3, has C D = 0.6842, C I = 0.3751 and C C = 0.1429.The two betweenness measures show some discrepancy with respect to the other measures.This is particularly evident in the figure for the flow betweenness: the two peaks at rank 9 and rank 12, corresponding respectively to point 8 and point 5, indicate that such two monkeys have, according to the flow betweenness, a rank larger that that assigned according to C Individual centrality score for each of the 20 points of the graph of interactions amongst monkeys.The points are ordered according to their value of C I (see table2).
(Everett and Borgatti, 1999)ifferent groups studied in(Everett and Borgatti, 1999): four formed by age and two formed by sex.Group 1 contains the five monkeys having age 14-16, group 2 the five monkeys having age 10-13, group