Ego Citation Networks Considered as Domination Networks

In this article we continue our study of power structures. A dynamic quantitative theory, including the measurement of power or dominance structures is applied to citation networks. A curve somewhat similar to the Lorenz curve for inequality measurement is used. We calculate the D-measure, a normalized measure expressing the degree of dominance in a network. The D-measure of a citation network consisting of the ego (the original article) with n references and m received citations is obtained. When n=m the network is symmetric and the D-measure is 0.5; when the article did not yet receive a citation the D-measure is n/(n+1). When the article has no references then the D-measure is 1/(m+1). A real-world case is described and the evolution of its D-measure over time is shown. Our work is one way of describing an evolving ego-citation network in a quantitative way.


INTRODUCTION
It goes without saying that network theory is an essential part of contemporary science. In the field of Library and information science (LIS) article citation networks, author citation networks, author collaboration networks, bibliographic coupling and co-citation networks are among the best known. [1] When collecting data and studying networks one may focus on one particular actor and their properties within the corresponding ego network (a network build from the perspective of one actor, the ego) or one may focus on the network as a whole, including all relations of all actors belonging to this network. Of course, one may also study global properties of ego networks. A typical example of an ego network is White's description of Eugene Garfield's research network. [2] In this contribution we study a global property, called dominance, of networks, focussing on citation networks. The included realworld example is an ego network.
This article applies the notion of dominance as introduced in an earlier article. [3] In this earlier work we discussed global dominance as a special network property based on the idea of zero-sum arrays. One may say that the 'dominance' idea is a way to operationalize ubiquitous terms such as 'top', 'leading' or 'superior'. Practically, we try to contribute to a quantitative description of the changing -global -structure of an evolving citation network. Our approach starts from graph theory and can be applied to real data of any kind. At this level there is no necessity for data to conform to certain modeling aspects such as scaling, self-organization or power law behavior. Experience with such data and resulting numerical values may in the future lead to a way to distinguish between different types of evolutions of citation networks. Parts of this contribution were presented during the ISSI conference in Wuhan (China). [4] Zero-sum arrays and D-curves In this section we briefly recall the definitions and main results of [3] as these will be necessary to understand the developments presented further on.

Definition: arrays
If X is a (finite) array, i.e. an N-tuple (N > 1), then the j-th element of X is denoted as (X) j = x j , where x j is a real number.
In this article components of any array are assumed to be ranked in decreasing order.
As in [3] we assume that X is not the trivial zero-array, hence I 0 ≠ {1,…, N}. This implies that I + (X) and I -(X) are always non-empty, but they may have different numbers of elements.
When it is clear about which array we are talking or when it does not matter we simply write I + , I 0 or I -.
We note that .
Next we put and With each zero-sum array X, we associate a corresponding A-array, denoted A X and equal to A X = (a 1 , …, a N ). Also A X is a zero-sum array. Related to the array X, we will further need the array Q X , with where |.| denotes the absolute value of a number. Clearly q N = 2.

Construction of a D-curve
This curve can be described as a function, denoted as D X (t): for 0,1 , t      we have: where # denotes the number of elements in a set. We see that a D-curve is partly concave (namely when the a's are positive) and partly convex (the part where the a's are negative), as illustrated in Figure 1.
If #(I + ) = N -#(I -) then the D-curve has no horizontal part. If I 0 ≠ ∅ then it has a horizontal middle part, at height 1.

Graph theory
We briefly recall the basic graph theoretical terminology needed in the sequel and note that in this article the terms graph and network are considered to be synonyms. A directed graph (in short: digraph), denoted G(V,E) consists of a set of vertices or nodes, denoted as V or V(G) and a set of edges or links, denoted as E or E(G). Nodes will be denoted by lower case letters such as i and j. An edge is an ordered pair of the form (i,j) where i and j are nodes, hence belong to V. Node i is called the initial node and node j is the terminal node of edge (i,j). A directed path, or chain, from node i to node k is a sequences of edges (v n ) n=1,…,M such that the terminal node of edge v n coincides with the initial node of edge v n+1 and such that node i is the initial node of edge v 1 and node k is the terminal node of edge v M . If node i coincides with node k the directed path is a directed circuit or loop. A directed graph is called acyclic or loopless if it does not contain directed circuits. A directed graph is weakly connected if a path exists between any two nodes in the underlying undirected graph. We will always assume that the graphs we study have a finite number of nodes, at least two and are acyclic and weakly connected.
In [3] we introduced a local and a global dominance theory. In this contribution, we restrict ourselves to the global theory, in short GDT. In this theory we use arrays of the form Σ = (σ 1 , σ 2, …, σ N ), defined as follows: Partial orders for zero-sum arrays Definition: the dominance relation ≤ D in Z Let X and Y be zero-sum arrays, not necessarily of the same length, then we say that X is D-smaller than Y (or Y is D-larger than X), denoted as . When X ≤ D Y it is clear that the D-curve of X lies nowhere strictly above the D-curve of Y. The relation ≤ D determines a partial order in the set of all equivalence classes of zero-sum arrays. Formally we write: As the dominance relation ≤ D is only a partial order, some arrays cannot be compared: they are said to be intrinsically incomparable.
It is clear that the idea of a D-curve is inspired by the idea of a Lorenz curve, but has different properties. [5][6][7] Maximum and minimum D-curves

Maximum D-curves
For fixed N the maximum D-curve occurs when (0,0) is linearly connected to the point with coordinates (1/N,1) and then linearly connected to the endpoint (1,2). This D-curve corresponds to all zero-sum arrays of the form X = (s,-t,…,-t), with s, t > 0 and s = (N-1)t.
Considering N as a variable, the line y = x + 1, passing through the points (0,1) and (1,2) is an upper bound for all these D-maximum N-curves. A measure respecting the dominance relation ≤ D in Z powerless subordinates. In applications of D-curves to institutes, research groups or scientists as nodes we want to gauge the power structure that is present. The more inequality among nodes with a positive flow the more powerful the order relation is. But also: the more even (in the sense of evenness) [8] the nodes with a negative flow, the more powerful the order.

Dynamic aspects of networks and properties of D-curves
When a new measure is proposed one usually derives theoretical properties and explains the possible benefits of using such a measure. This has been done in. [3] Studying dynamic aspects of networks and their corresponding dominance measures is a next and essential step for potential applications in fields such as business management, politics and social interactions. Here we restrict ourselves to applications in citation analysis.
We already know that adding a node in a digraph which dominates the network source, makes this new node the network source, hence it becomes a global dominance node. Linear structures are clear hierarchies but they are not interesting in the context of power structures: such structures are always intrinsically incomparable and because of their symmetry their D-measures are always equal to 0.5. In this article we study some aspects of evolving ego-citation networks From these definitions we see that also Σ leads to a zero-sum array and hence we can apply the zero-sum theory. Such a zero-sum array will be referred to as a global flow array. This array consists of global flow numbers. We want to characterize a weakly connected, acyclic network in terms of how much dominance is present.

Definitions
Definition: a local source of a digraph is a node having in-degree zero and strictly positive out-degree.
If a local source can reach any other node in a digraph it is called a network source. Since we have assumed that there are no loops in a network, we see that if a network source in a digraph exists it is necessarily unique hence it becomes the network source.
Definition: a local sink of a digraph is a node having outdegree zero and strictly positive in-degree.
If a local sink can be reached by any other node it is called a network sink. Also a network sink, if it exists, is necessarily unique and hence is referred to as the network sink.
It is obvious that a local source has a strictly positive flow number and that a local sink has a strictly negative flow number. Moreover, adding a node in a digraph which is linked only to the network source (linked from the new node to the original network source), makes this new node the network source. Next we recall the following definitions.

Definition: global dominance nodes
A node with the highest global flow in a D-graph is called a global dominance node.

Networks corresponding to maximum and minimum D-curves
For fixed N (the number of nodes), the graph shown in Figure 2 yields the only graph corresponding to a maximum global D-array.
Similarly, for fixed N, the graph shown in Figure 3 yields the only graph corresponding to a minimum D-array.
Terminology and meaning: hierarchies versus power (dominance) Let us reconsider the digraph shown in Figure 2. There is not much hierarchical structure in this digraph, but it reflects a very strong power structure: one ruler and many equally m n m n n n n m n m n m n m n

A simple example
An author chooses which articles to include in his/her reference list. For this reason one can consider a citing article to be in a dominating position with respect to the article which receives a citation. We note though that scientifically one may argue that the cited article is the superior one as the citing article recognizes it as an authority. From the abstract point of view of the network research presented here it does not really matter which point of view is taken (it is just a matter of reversing the arrows in a directed network), but, of course, socially and emotionally it does.
We consider an article that has four references and which receives step by step more and more (direct) citations, making the structure more and more top-heavy. In this simple example we only consider the position of the original article with respect to its references and articles citing it, neglecting other nodes in the citation network. The different steps are shown in Table 1. Such a citation network always begins with a maximum D-curve.

A general formula
It is possible to derive a general formula for the global D-measure in the case that the original article has n references and received m citations, hence N = n+m+1. Note that possible relations between cited and cited, citing and citing or cited and citing articles are not taken into account in this calculation. This case is illustrated in Figure 4.
The corresponding array is: and the Q-array becomes:    (9,9,9,9,0,-9,-9,-9,-9) 0.5 8 13 (9,9,9,9,9,9,9,9,-4  All necessary data are provided in the Appendix. We start with the network consisting of the ego and its references (links between references are not included) and expand the network, year by year, by articles citing the ego. If these citing articles cite nodes (articles) already present in the network, these links are added too. Hence this example differs from the theoretical example in a previous section as more nodes and links are included. Data are obtained from the WoS (up to and including the year 2017). This leads to a network consisting of the ego, its 17 references and 7 citing articles. For each year we calculate the corresponding D-measure. Results are shown in Table 2. Figure 5 illustrates the final configuration, from which all other configurations can be derived.
We note that the original D-value is (N-1)/N=17/18=0.944. D-values for the whole network decrease over time: by adding new nodes on top (which have just a few links), the total network dominance decreases. Finally, the D-value for the EGO (the smallest positive value in the X-arrays) decreases from 17 in 2009 to 8 in 2017. In theory this value may even become negative, when more and more citing articles are included. We recall here that the EGO in this example is just a node used to build a network. Moreover, the D-value If now m ≤ n then the positive part is equal to n (2m+1) and Also, here the expression becomes equal to 0.5 when m = n. If m = 0, corresponding with an article that has not been cited yet, then the result is n/ n/(n+1).

A real-world example
As a real-world example we consider the article [9] as the ego in a citation network. This article's full bibliographic data are:  characterizes networks as a whole and has little relation with a specific node, even when this node was used to build the network.

CONCLUSION
This articles deals with an indicator, called the D-measure, related to a global property of a directed network. In particular we applied our earlier domination theory to citation networks consisting of the ego (the original article) with n references and m received citations. When n=m the network is symmetric and the D-measure is 0.5; when the article did not yet receive a citation the D-measure is n/(n+1). When the article has no references then the D-measure is 1/(m+1). The theory was illustrated by a real-world example.
Our work is one way of describing an evolving ego-citation network in a quantitative way. Practically, we try to contribute to a quantitative description of the changing -global -structure of an evolving citation network. Our approach starts from a graph-theoretical theory and applies it to real data, skipping any necessity for data to conform to certain modeling aspects such as scaling, self-organization or power law behavior. Experience with such data and resulting numerical values may in the future lead to a way to distinguish between different types of evolutions of citation networks. Being rather mathematical, we consider our contribution as reflecting an aspect of the formalization or "hardening" of the social sciences as mentioned by Herbert Simon. [10]