A Game Theoretical Analysis of Academic Writing Co-authorship Networks

The field of Academic Writing is analyzed from a network perspective combined with a game theoretical approach by using co-authorship networks and the Shapley value concept. The Shapley value of each author indicates its average marginal collaboration potential. Results obtained on data from 2015 to 2019 offer interesting insights in the publication trends of the academic writing community.


INTRODUCTION
Network science and game theory offer indispensable tools for analysis of data in all fields of research, offering valuable information about connections hidden in various types of information. Although the emergence of social networks in the last years has driven a lot of research to focus on their analysis, many other fields of science use networks to represent and analyze data. On the other hand, game theory offers novel perspectives by introducing various concepts that can be used as alternatives to what are considered optimal solutions for different problems. Thus, equilibria concepts defined by game theorists can be used to explain and predict many natural or social phenomena. This paper presents an overview of the fields of Academic Writing from a network perspective, using a game theoretic approach. Using data collected from the Scopus database from 2015-2019, co-authorship networks for each year are constructed and analyzed. The Shapley value, a key solution concept from cooperative game theory, is used to identify key nodes in these networks.
The motivation behind this study is two-fold. On the one hand such a study offers a novel perspective over publications in the field of Academic Writing-which can be considered an emerging field from the publications point of view.
On the other hand, when considering Academic writing programs in the Disciplines such an analysis may provide students with a needed overview over their own field of study. Co-authorship networks have become a Scientometrics analysis staple and their analysis may offer a newcomer in the field valuable information not provided elsewhere.
The paper is structured as follows: A short Introduction to the concepts and methods used in the analysis is presented in Section 2, followed by the presentation of the data and results. The paper ends with Conclusions.

Game theoretical insights
Game theory is an important research field, having an interdisciplinary character with main applications in economics, biology, engineering, politics, etc. Games can be classified into different classes depending on various criteria such as the nature of collaboration among players (cooperative vs. non-cooperative games), or the information available to players (perfect information and imperfect information games), etc.
Non-cooperative game theory deals with independent players (agents), who can choose what to do -they have a strategy set and the gain for it is the payoff function. In a perfect information one shot non-cooperative game players know the possibilities for each other player and they decide how to play. A solution for such a situation is an equilibrium of the game. The most used solution equilibrium concept is the Nash equilibrium, [1] capturing a situation in which no one from the players can unilaterally change his strategy in order to obtain a higher payoff. In spite its popularity, the Nash equilibrium has also some limitations, as it assumes individual and rational players and does not consider cooperation among them.
In contrast to these, cooperative game theory deals with coalitions of players and, for example, how a collective gain can be fairly divided among individuals. Shapley value [2] offers a manner to establish the importance of each player to the coalition and how to split the payoff among players accordingly. Intuitively, the Shapley value of the player is computed as the average marginal contribution of the player to all possible coalitions he can be part of. The marginal contribution of a player is expressed as the difference between the value, or gain, of the coalition including the player and the value of the coalition without that player.
Formally, we can define the Shapely value of a player i as its average marginal contribution: where Π contains all possible ordering s and σ=(σ 1 ,…,σ N )∈Π is a certain ordering.
is the marginal contribution of player i at position k (σ k =i), calculated as: As a simple example let us consider 3 players (A,B,C). The values of the coalitions are the following: In this case we have 3!=6 possible orderings (Table 1). For all these orderings the corresponding marginal contribution is obtained. The marginal contribution of a player to a coalition is calculated as the difference between the value of the coalition of all players preceding the player and the value of the same coalition without that player in a certain ordering, for example in the ordering (A,B,C) m v σ (B)=v({A,B})-v({B})=7. Table 1 presents the possible orderings and the marginal contributions.
The Shapley value of each player is the average of its marginal contributions. For example, the Shapley value for player A is:

Network statistics
Because large networks are hard to analyze and compare visually to other networks, network statistics are used to characterize and provide insights into their structure and properties. The most commons statistics are described in what follows.
The degree of a node represents the number of edges adjacent to the node, i.e. the number of connections a node has. [3] The average degree is related to the density of the network.
The density of a network measures how close the network is to a complete one. In a complete network all nodes are linked and the density is 1. For a co-authorship network, high values of the average degree and network density could indicate a tendency of collaboration between authors.
The diameter of a network represents the longest shortest path between two nodes, [4] i.e. the maximum distance between all pairs of nodes. The diameter of a network shows how close nodes are, in a co-authorship network a low diameter could indicate a research group of authors. The average path length represents the average distance between all pairs of nodes.
Isolated nodes in a network are nodes that have no neighbors, in case of co-authorship network isolated nodes represent single author papers.
A connected component [5] of an undirected network represents a set of nodes in where there is a path from any node to any node in the set. In a co-authorship network a connected component could represent a research group (authors that have in common research papers).
The modularity measures how well it decomposes into cluster of nodes. [6] High values indicate complex network structure.
The clustering coefficient, [7] applied to a single node from the network, measures how complete is the neighborhood of the node, i.e. the proportion between the number of neighboring nodes and the number of total possible neighbors. In a network two nodes are neighbors if there is a link (direct connection) between them. For the network, the clustering coefficient represents the average over all clustering coefficients. It is a value between 0 and 1 and indicates the tendency of nodes to cluster together. For a node in the network a value of 1 indicates that it is connected to all other nodes and a value of 0 indicates an isolated node.
Centrality measures play an important role in analyzing networks. One of the most used centrality measures is the betweenness centrality which measures on how many  shortest paths lies a certain vertex. The closeness centrality of a node is calculated as the reciprocal of the sum of the distance to each node. Eigenvector centrality, in contrast to the above mentioned measures, takes into account the relative importance of a node in the network.

Co-authorship networks and Shapley value
Co-authorship networks are networks constructed from publications in which nodes represent authors and links in the network represent co-authored papers. [8,9] The analysis of these networks presents an overview of the academic community. Various type of information can be extracted from co-authorship networks, such as, scientific collaboration practices, innovation in academia, [10] recommendations for scientific cooperation. [11] Studies of co-authorship networks report various network indicators [9,12] through which patterns of collaboration are explored. A plethora of studies exist on various fields of science [13,14] focusing on network properties, [9] finding community structures, [15] small-world structures, [12] preferential attachment mechanisms [16,17] or scale free properties. [18] As mentioned before, network centrality is an important network indicator, with several centrality measures (e.g. betweenness, closeness, PageRank centrality) trying to capture a certain property of the studied networks. In [19] the authors introduce a cooperative game and the Shapley value based centrality indicator for networks, which they use for selecting the top k most influential nodes. The gain function for a coalition of this cooperative game is equal to the number of all nodes in the coalition together with those that are connected to them (they are at distance 1 from the nodes of the coalitions). In [20] an algorithm with polinomial complexity (O(|V|+|E|)) is described to obtain the Shapley values of the introduced cooperative game, which represent the average marginal contribution by each node to every coalition of the other nodes (Algorithm 1). As an example of calculation let us consider a simple network presented in Figure 1.
The Shapley value of node A is calculated, based on Algorithm 1, in the following way (node A has two neighbors: node B and C): For node B the Shapley value is: In order to be able to do that we introduce the normalized Shapley value of author i (in a network with N authors), which is a proportion of the calculated Shapley value and the sum of obtained Shapley value for all authors: For example for the network presented in Figure 1 the normalized Shapley value for author A will be: Academic Writing Co-authorship Networks We will use Algorithm 1 to compute the Shapley value of authors that published papers on the topic "academic writing" during 2015 and 2019.

Data
Data was collected from the Scopus database (www.scopus. com). We searched for articles which contain the 'academic writing' keyword and was published between 2015 and 2019. The Scopus search query was the following: where for x we used 5 years, from 2015 to 2019.
From these data we constructed 5 co-authorship networks, one for each year. General information, number of nodes, number of edges and other network measures are presented in Table 2. We use Gephi to compute these network statistics. [21] We find that the networks increase in size in the five years of the study. The first 4 indicators are increasing: the number of nodes indicating number of authors publishing in this field, the number of edges and the average degree indicating that later papers are published in collaboration. The number of connected components however indicates a high sparsity: most authors have published one or two papers on this topic.
The number of isolated nodes indicate single authors. As a percent of total number of papers there is a mild decreasing trend. The rest of the indicators do not vary among years, showing a consistency of practices in this area.

Shapley value
The Shapley value for authors in a network is computed as the average marginal contribution to the value of all coalitions of nodes that can be formed with nodes in the network. The value of a coalition is computed as the size of the coalition to which the number of nodes that can be reached at distance 1 from nodes in the coalition is added.
In a co-authorship network such a coalition is a group of authors that may or may not be linked to each other. The value of the coalition considered to be the number of authors in the coalition together with all their co-authors to which they are linked. Naturally, each author is counted only once when computing the size of the coalition. For researchers working together in a certain field the value of the coalition indicates the diversity of collaborations and various fields may exhibit different coalition values. This means that the size of the coalition in this case indicates a collaboration factor: how many other authors collaborate directly with coalition members. The Shapley value thus indicates the average potential for collaboration of an author.
The marginal contribution of an author to a co-authorship coalition is computed as the difference between the total number of authors and their co-authors in the coalition and the number of authors and their co-authors from the same coalition without the considered author. Two situations may arise: (i) the author has no collaboration with other authors in the coalition and (ii) the author has written some papers with other members of the coalition. However, the marginal contribution of the author to a coalition does not depend on the links within the coalition but on the number of nodes. It does depend if there are common co-authors outside the coalition, reducing the marginal contribution of a node to that coalition.
The Shapley value reveals authors that have the highest marginal contribution to their network -in this case to the publications of a certain year. In the field of academic writing such an information reveals who are the most influential authors in a year. An interesting feature of this field is that papers published in this subject are often aimed at a specific field and such a study would indicate also which fields are formally focusing on the form of presentation of research in articles. Table 3 presents the highest Shapley values and node degrees of the authors for each year. As we can see in the table there is no correlation between the degree and Shapley value, making it a better indicator of collaboration. A higher Shapley value when comparing two nodes with the same degree indicates a denser ego networks, hence a stronger collaboration potential. While we notice an increase in the highest Shapley value over the last years, normalized values indicate that the apparent change is due to the size of the network and not by its actual structure, as they indicate almost constant average marginal contributions of top authors to the field.    Table 4 presents the highest values of betweeness centralities in the studied co-authorship networks and the corresponding authors. We calculated also the eigenvector centrality and the closeness centrality, but in these networks tey cannot reveal significant information, for exmaple in the network for the year 2016, 277 authors have the value of 1 (this is the highest value in the network).The eigenvector centrality neither carries useful information, for example in the co-authorship network of the 2016 year 8 authors have the value of 1 as the highest value in the network. We find that in most years Tables 3 and 4 present the same top authors, but with different rankings. Shapley values is a more refined indicator of collaboration, because it includes in the marginal contribution all direct collaborators of an author (neighbouring nodes) and does not assume that the existence of a shortest path induces cooperation.  actually reflect one paper with many authors), but those that have contributed more papers with different collaborators, thus potentially contributing to the development of the field.
For the field of academic writing the information about the actual research domain of the authors is also important.
Journal of Scientometric Research, Vol 9, Issue 3, Sep-Dec 2020 for each year. We find that their research fields vary indicating a marginal interest to this field while bringing a substantial contribution in the form of journal papers on this topic. This is a specific feature for this field -a field that should permeate all disciplines providing formal specific support for research publication for new but also experienced authors.
Regarding the Shapley value, the main intuition behind using it in the analysis of co-authorship networks lies in the reality that the act of publishing an article is a collaborative endeavor and performance measures should take into account that authors "come together", the contribution of one depends also on its collaborators and their collaborators and so on. Not only from a scientometric point of view, but also because each successful cooperation will influence researchers involved and their future work in an indirect manner. In the form presented here, the Shapley value may capture also such influences. Table 5 presents the main research areas of the authors with the highest Shapley value. We find that authors that work in Social Sciences collaborate with scientists from other fields to help define academic writing guidelines. The Count column indicates how many times a certain field appeared on the list of Top 3 authors based on Shapley values. An author may have more fields associated to its name, based on the list of publications. Based on the Scopus research areas most authors are from the domain of Social sciences, but we can find also domains like Mathematics or Nursing. The diversity of domains in this list explains also why there are no repeating names in Table 3: in some instances experts from other fields have invested their effort into formalizing academic writing guidelines for their disciplines in journal articles.

CONCLUSION
The field of Academic writing is a currently developing research area with particular features, making publication data and trends worth studying. Co-authorship networks of the academic writing community are analyzed with the help of cooperative game theory. The Shapley value is a solution concept measuring the average marginal contribution of a player to a collective gain. In the case of co-authorship networks, it indicates the average potential for collaboration of an author.
The most influential authors in the field of academic writing, based on the Shapley value, during 2015-2019 are identified