Which Centralities Fit the Best? Network Centralities’ Ranking Based on the F-Measure

Abstract The applicability of network centralities may depend on various factors. In practice, it is frequently challenging to understand which centralities are best applicable in a particular network. The current study addresses this challenge. It presents a method of ranking centralities that permits analysts to better understand which centralities are most appropriate for measuring positions in a particular network. The approach is based on the well-known F measure from the domain of statistical analysis of binary classification. Specifically, the classic F1 estimator of classification accuracy of binary classifiers was adapted for estimating the accuracy of centralities to identify positions in networks.


Introduction
In modern social network theory, various centralities have been developed to analyze structures and measure positions in networks. In analyzing the positions of social network members, the selection of network centralities can become both simple and quite complex (Palak and Nguyen 2021). If it is known in advance which centralities are the most appropriate for a particular network, or if an analyst (i.e., expert) knows in advance which specific set of metrics should be employed, then the problem of centrality selection is solved (Golkar Amnieh and Kaedi 2015; Mirbabaie et al. 2020;Peng et al. 2017). However, in reality, the selection process is frequently a challenging task (G omez 2019; Newman 2018). Contemporary network analysis comprises more than 200 different centralities, differing to variable degrees in both their nature and applicability (Jalili et al. 2015). Generally, centrality selection is a laborious process that requires both expertise in the field of network theory and deep understanding of the specific features of the analyzed networks and cases (White 2004).
The problem of centrality selection in its various formulations has existed for quite some time. Freeman, Roeder, and Mulholland (1979) were among the first to conduct an experimental study of various families of centralities (Freeman, Roeder, and Mulholland 1979). Working on the MIT experiment on small-group structure and communication, as reported by Bavelas (1950) and Smith (1950), Freeman and colleagues explored a set of competing hypotheses associated with structural centralities. They were also among the first to perform a comparative analysis of centralities and to explore what kinds of centralities have a demonstrable effect on individual responses and group processes.
Later, Ibarra (1993) analyzed network centralities in terms of their applicability to measure individual power in organizations. More specifically, the applicability of different centralities in informal organizational networks was explored in terms of formal-structural sources of power in networks. Categorizing centralities as structural sources of power in networks, Ibarra performed an analysis of similarities in centralities' impacts on technical and administrative innovation in organizational networks. The research results have made a significant contribution to the literature that aims for a better understanding of the reliability and validity of centrality measures. In 2002, Otte and Rousseau conducted a comprehensive study of social network analysis as a powerful strategy for information sciences (Otte and Rousseau 2002). The authors provided a new angle of attack to the practical problem of describing the structure (cohesion) of networks and the role played by particular nodes (i.e., network members). With a particular emphasis on the active development and growth of social network analysis as a discipline, they identified and systemized various indicators of individual influence in networks of different natures. The research findings have contributed to a better understanding of how the concept of network centrality can be applied in various research areas, and in particular, in the information sciences.
In 2008, the research on modeling of influence in social networks was conducted by Grabisch and Rusinowska (2010). They proposed a concept of decision power in social networks aimed at better understanding the process of agents' influence in a social network. The authors investigated influence behavior and proposed indices for measuring the influence of agents on group decisions in networks. The published results greatly contributed to the understanding of the concept of centrality in networks in terms of the influence of individual network members on strategic group choices. In a substantial portion of studies affiliated with network theory, the analysis of centralities' applicability was conducted directly or indirectly in the context of solving specific applied problems and exploring individual business cases. One of the most representative examples is the 'CEO network centrality and merger performance' article published by El-Khatib, Fogel, and Jandik (2015). Based on the constructed social network of CEOs from S&P 1500 companies, the authors studied the extent and strength of CEOs' personal connections by employing different centrality measures. Focusing on how corporate decisions can be influenced by a CEO's position in the social hierarchy, the authors worked on the problem of centrality selection for the particular business case (based on the CEOs' network structure).
Working on the correlation analysis of centralities, Chen et al. (2012) employed the Susceptible-Infected-Recovered (SIR) epidemic model (Anderson, Anderson, and May 1992) to evaluate the correlation between four different centralities in terms of the wide spread influence of the topranked nodes on the other nodes in networks. Experiments were conducted on four real-life networks. Although not focused on developing any formalized approach for centrality ranking, the authors nonetheless introduced an interesting idea for how the SIR model in combination with correlation analysis can be helpful in achieving a better understanding of similarities in centralities' ability to identify positions in networks. Iacobucci et al. (2017) examined four selected focal centralities in terms of whether some of them may be more meaningful or applicable than others. The authors tested the centralities based on small and extended stylized networks. Even though the tested centralities were shown to be frequently highly correlated in terms of identifying actors' (i.e., nodes') positions in networks, some centralities were determined to be more relevant or appropriate (in tested networks) than others. The authors explained the variability in centralities' ability to identify the most central actors by the differences in the type and substance of relations (i.e., links) between actors. Although the authors did not propose a method for ranking centralities, they stated and experimentally confirmed the need for further research in this direction.
The growing need to analyze the positions of nodes in complex networks has given rise to an interesting line of research on centrality comparisons. To explore interdependencies between different centrality metrics, Li et al. (2015) analyzed the correlation between widely studied and recently proposed centrality metrics in real-world networks. Their study examined the Pearson correlation coefficients and the similarities in ranking of nodes. In particular, the authors introduced a new centrality measure called the degree mass and proved from observation that there is a strong linear correlation between degree and betweenness. Similarly, Ronqui and Travieso (2015) performed analysis of complex networks through correlations in centrality measurements. They found that in general, the most commonly used centralities are correlated, but that for network models (i.e., network simulations) the correlations are stronger than for real networks. Their research shows that correlation between centralities varies across networks, which leads to the need for individual selection of centrality metrics in each individual case. Later, comparing centralities via correlation analysis, Schoch, Valente, and Brandes (2017) argued that the correlation of centrality metrics does not necessarily indicate their formal and conceptual similarity. According to their research, correlations between centrality metrics are in fact determined by the structural properties of social networks. Oldham et al. (2019) analyzed correlations between different centrality metrics across a variety of real-world networks and investigated how the analyzed correlations are related to network density and global topology variations. According to their study, centrality metrics are generally positively correlate with each other, the strength of these correlations varies between networks, and network modularity plays an important role in facilitating structural variability in complex networks. Based on computational results, the authors illustrated how topological characteristics of networks directly affect the correlations between centralities. Later, Rajeh et al. (2020) analyzed hierarchies and centralities across diverse real-world networks and showed a key role of network density and transitivity in the formation of strong correlation between topological properties of networks and centrality metrics of different nature.
Analysis of influential positions in complex networks is one of the vibrant topics in applied network analysis. Recent research published in the journal Nature shows the growing importance of network centrality research. One such example is the article on identification of influential invaders (i.e., cheaters) in cooperative communities (Yang et al. 2019). Cheaters are characterized as nodes that evade the costs of collaboration by avoiding the expense of contributions in dynamically evolving cooperative communities. The key issue the study focuses on is discovering the nodes that cheaters can invade most successfully. Another example is an article devoted to identifying and ranking the influential invaders in static complex networks (Yang et al. 2020). Analyzing invading cheaters that can potentially lead to the collapse of social cooperation, the authors proposed a set of strategies for ranking cheaters based on different network centralities and statistical measures.
In the social network analysis literature, the development of methods for analyzing the applicability of network centralities is motivated by two general challenges. The first challenge concerns the fragmentation of existing analytical approaches. This is due to the common trend (in most of the studies) to conduct the centralities' selection process individually, i.e., for each particular application. The second challenge stems at least in part from the first: the necessity for developing unified approaches for estimating centralities' applicability. This implies the development of methods that can be easily adapted to work with different networks, not just with their particular varieties in specific business applications.
To address these challenges, a method for the pairwise ranking of centralities is introduced in the present research. For any number of tested network centralities, this method aims to identify those that are the most appropriate in detecting the importance of vertices in a particular network. The developed approach is not limited to the assessment of network centrality in specific cases or the correlation analysis of the interdependencies of metrics. Instead, it aims to formalize the centrality selection process based on the well-established F measure from the domain of statistical analysis of binary classifiers. The F-measure is driven by the simple idea of measuring the overlapping between the real (i.e., true) and estimated classes. According to Labatut and Cherifi (2012, 5), it is recommended "to choose the simplest measures, whose interpretation is straightforward" when solving well-formalized classification problems. The F-measure is a classic and simple method that has proven itself well in solving various classification problems, showing stable and high classification accuracy (Labatut and Cherifi 2012). In the present research, the concept of the classic F 1 score for evaluating the classification accuracy of binary classifiers (Sokolova, Japkowicz, and Szpakowicz 2006) was adapted for estimating the accuracy of network centralities to measure positions of vertices in networks.

Background
The idea of binary classification from statistical analysis constitutes the core of the developed approach for ranking centralities. Specifically, the approach is based on the computational mechanism of the F measure, also known as the F 1 score or harmonic mean, which is widely used in machine learning (ML) to estimate the accuracy of classification models (Powers 2011). In the 'Background' section, the fundamental principles of the conventional F 1 score are illuminated, as these principles comprise the basis of the developed method for ranking centralities.

The Binary Classification Problem (BCP)
From the practical point of view, the binary classification problem (BCP) is the task of classifying a set of objects into two predefined categories. The selection of classification methods may depend on the specifics of the problem being solved and on the nature of the objects to be classified. A variety of models, such as support vector machines, Bayesian networks, random forests, and neural networks have already been developed for solving the BCP (Kelleher, Mac Namee, and D'arcy 2020; Kuko and Pourhomayoun 2020). Due to the fact that solving the BCP is a multi-step process, it should be emphasized that the present research is focused on understanding how to estimate the final classification results rather than on how to select a particular classification model or its training and testing methods. Before describing the mechanism for evaluating the quality of classification results, the formalization of the BCP itself in terms of ML is first given below.
Consider a set of k objects, 1 called X, and a set of two classes, called Y: Each object x from X belongs to only one class y from Y, and there is a set Z of k pairs: , where x 2 X and y 2 Y The set Z is split into two subsets, Z 0 and Z 00 , called training and testing sets, respectively 2 : The ratio of the number of objects n (used for training) to the number of objects m (used for testing) may vary depending on specific applications and research goals (James et al. 2013). Frequently, training and testing sets are both assigned to be equal to the overall set Z (i.e., Z 0 ¼ Z 00 ¼Z) for the purpose of the initial estimation of the classification results. Suppose some ML model A is selected to solve the BCP. First, model A is trained based on training set Z 0 : In terms of ML, the training procedure means that the model "learns" how to classify objects. 3 After classifier A has been trained on the training set, it is applied to testing set Z 00 : In other words, for every object x i from Z 00 , model A predicts its class y, i.e., values þ1 or À1 from the predefined set of classes Y. The class prediction for an object x i from Z 00 (by model A) is correct if the predicted y value for the given object is equal to its actual y value from Z 00 : Otherwise, the prediction is incorrect. The classification results for all objects from 1 Generally, every object is characterized by a set of properties (i.e., variables) that form its description. Thus, every single object is presented by a vector of variables and their values (James et al. 2013). 2 There exists a variety of techniques to split objects into training and testing sets that are well described in the literature (James et al. 2013;M€ uller and Guido 2016). 3 The training process varies from model to model. In the current study, what is important is not how the models are trained but how the classification results are assessed (which is discussed next).
testing set Z 00 can be presented in the error matrix format (G eron 2019) as it is shown in Table 1.
According to Table 1, there are four possible combinations of the actual and predicted classes of objects. The TP quadrant shows the number of objects that were correctly classified by the model as class "þ1". The FP quadrant counts how many objects from class "-1" were incorrectly classified by the model as class "þ1". The FN quadrant stores the number of objects from class "þ1" that were incorrectly classified as "-1". Finally, the TN quadrant shows the number of objects from class "-1" that were correctly classified by model A.
For example, suppose there is a set of 200 objects split into two classes. One hundred objects are assigned to class "þ1" and another 100 objects are placed in class "-1". Further, some model A (i.e., classifier A) is trained and tested on the overall set of objects (i.e., Z 0 ¼ Z 00 ¼ Z). The hypothetical classification results are presented in Table 2.
Based on Table 2, model A assigned 35 objects to class "þ1", 33 of which were classified correctly (i.e., in the TP quadrant) and two incorrectly (i.e., in the FP quadrant). The number of objects that were classified as class "-1" is equal to 165, 98 and 67 of which were classified correctly (i.e., in the TN quadrant) and incorrectly (i.e., in the FN quadrant), respectively.

Evaluation of the Classification Results
Following the Error Matrix, the classification accuracy of classifier A can be measured based on the harmonic mean, also known as the F 1 score. The given measure combines two metrics, called Precision and Recall, to obtain a balanced estimation of the extent to which the classification results are useful and complete (G eron 2019).
More specifically, the Precision metric shows how much classifier A can be trusted in terms of its ability to maximize the value in the True Positive quadrant of the Error Matrix. The Precision metric represents the fraction of the number of objects correctly classified as "þ1" to the total number of objects classified by the model as "þ1": The Recall metric, also known as sensitivity, indicates the completeness of the classification results. It is calculated as the ratio of the number of objects correctly classified by model A as class "þ1" as well as the total number of objects classified by the model correctly (as class "þ1") and incorrectly (as class "-1"): The conventional F 1 score represents a harmonic mean that aims to find a balanced estimate of the classification accuracy based on the Precision and Recall metrics. The values of the F 1 score are in the range of [0, 1]: The F 1 score computational results for the illustrative example presented in Table 2 are as follows: The accuracy of the classification results based on model A is equal to 0.49 out of 1.0, where 1.0 is the maximum possible accuracy.

Approach
The estimation of the accuracy of the ML classifiers based on the F measure is considered as a platform for the method of ranking network centralities. Due to its well-formalized computational approach, the F 1 score is employed as a method for estimating the accuracy of network centralities in terms of their ability to identify the importance of vertices in networks. Following the idea of binary classification, the developed approach aims to identify the pair of centralities that are the most applicable for detecting the importance of vertices in a particular network. In other words, for any number of centralities tested (Kelleher, Mac Namee, and D'arcy 2020), the approach aims to rank pairwise combinations of centralities and to detect the most appropriate combination for a particular network.

The Centrality Error Matrix
Consider a network G and the combination of two centralities C 1 and C 2 .
In the initial step, the values of centralities C 1 and C 2 are calculated for all vertices in G and then normalized to the [0, 1] range based on the minmax scaling method (G eron 2019). The given normalization is necessary in the initial step as centralities frequently have different ranges of values. Consequently, the values of centralities C 1 and C 2 for all vertices are presented in the unified numeric space for further comparative analysis. As a result, each vertex in network G is characterized by the combination of two normalized centrality values (based on C 1 and C 2 ). Next, each vertex in G is assigned to one of two categories (i.e., class "þ1" or "-1" in binary classification terms) based on the decided "rigidity" level a from the range [0, 1]. The network analyst (i.e., decision maker) chooses a particular a value from the [0, 1] range, which is in accordance with the range of the normalized values of tested centralities. Following the idea of binary classification, the value of a is considered as a threshold value that determines the level of strictness in categorizing a vertex as the most central one (i.e., important or influential) compared to all others in a network. The first category contains all vertices with a centrality value below the given threshold, and the second category includes the remaining vertices. Thinking in terms of ML, the idea of the a parameter is similar to the concept of using analyst-defined hyperparameters to control the learning process of ML classifiers (G eron 2019; Kelleher, Mac Namee, and D'arcy 2020). Hyperparameters are tunable parameters that should be set before the learning process begins in order to obtain an expected model behavior. Similarly, a is an adjustable parameter that is set before the process of estimating the accuracy of network centralities based on the F measure begins. As mentioned above, the parameter value is selected from the unified range [0, 1] of tested centralities.
Based on the idea of the BCP Error Matrix (see Table 1), the a-based categorization results are represented in the format of the Centrality Error Matrix (see Table 3). As mentioned before, the first class includes all vertices that have centrality values below the value of a, and the second class takes all other vertices with a centrality value above or equal to a. Each quadrant in the Centrality Error Matrix takes the total number of vertices according to the following conditions: Table 3. Centrality error matrix.
The transition from the idea of the classic Error Matrix for the ML classifiers (presented in Table 1) to the adapted version of the Centrality Error Matrix for the analysis of centralities in networks (presented in Table 3) is based on the following interpretation.
In the classic BCP approach, the Error Matrix is considered as a basis for analyzing the accuracy of an ML model A in terms of its ability to correctly classify objects into two groups. For example, the FP quadrant in Table 1 contains the number of objects from class "-1" (i.e., y ¼ À1) that were classified by model A as class "þ1" (i.e., A(x) ¼ þ 1).
In the adapted approach (for the analysis of centralities), the Centrality Error Matrix is considered as a basis for accuracy analysis having network vertices divided into two classes, i.e., (1) "greater or equal to a" and (2) "lower than a," according to the mutual correspondence of centralities C 1 and C 2 to the threshold a value. In other words, the Centrality Error Matrix creates a basis for the a-driven accuracy analysis of metric C 1 in "predicting" (i.e., estimating) the vertices' classes with respect to the classes calculated based on the C 2 metric. For example, the FP quadrant in Table  3 contains the number of vertices whose C 1 centrality values are greater or equal to a (i.e., C 1 ! a), but whose C 2 centrality values are simultaneously below the given threshold (i.e., C 2 < a).
It is important to emphasize that the selection of a particular a value(s) is the analyst's responsibility. It may depend on different factors, such as network analysis applications, the nature of the analyzed networks, and particular policies when defining vertices as central (i.e., the most important or influential). If a is equal to zero, then all vertices in a network are considered as equally important, as no centrality values lower than a exist. If a is equal to one, then a very limited number of vertices will be identified as a-based central vertices, as this implies the highest possible requirement for vertices to be classified as central. In the current research, the median value of a equal to 0.5 is recommended as the starting point for the analysis of centralities based on the Centrality Error Matrix. The value of 0.5 divides all network vertices into two groups. This helps an analyst to calibrate the final value of a within the range [0, 1] based on the preferred level of strictness.

Evaluation of Accuracy
The F 1 score is employed for evaluating the mutual accuracy of network centralities C 1 and C 2 in terms of their ability to identify the most central (i.e., important) vertices within a network. More specifically, the harmonic mean F 1 , presented in Eq. (3), is employed to find a balanced estimate of the centralities' accuracy based on the Precision and Recall metrics presented in Eqs. (1) and (2), respectively. The Precision metric estimates the level of trust in the C 1 centrality in terms of its ability to maximize the number of vertices in the TP quadrant of the Centrality Error Matrix. The metric presents the fraction of those vertices that are mutually "classified" as the most central vertices based on the C 1 and C 2 centralities (i.e., the number of vertices with C 1 ! a & C 2 ! a) to the total number of vertices classified by the C 1 centrality as the most central vertices (i.e., the total number of vertices with C 1 ! a).
The Recall metric estimates the completeness of the a-based identification of the most central nodes. It is calculated as the ratio of the number of objects mutually identified by the C 1 and C 2 centralities as the most central vertices (i.e., number of vertices with C 1 ! a & C 2 ! a) and the total number of vertices "classified" by C 2 as the most central vertices (i.e., the total number of vertices with C 2 ! a).
As mentioned above, the F 1 score aims to find a balanced estimate of the centralities' accuracy based on the Precision and Recall metrics. The F 1 score is calculated based on Eq. (3). The highest accuracy is reached at the value of F 1 equal to one, and the lowest is reached at the value of zero. Special cases of dividing by zero in the F 1 calculation are presented in Appendix A. The example that follows illustrates the introduced approach for estimating the accuracy of F 1 based on the Centrality Error Matrix. Consider an abstract network G with 200 vertices and a combination of two centralities C 1 and C 2 . Assume that the values of both centralities were calculated (for all vertices), normalized to the range of [0, 1], and presented in Table 4. 4 Each quadrant of the given Centrality Error Matrix contains the number of vertices in accordance with the a-driven conditions presented in Eqs. (4)-(7). In the given step, the most important is to interpret the results presented in Table 4, but do not investigate the network's G structure, or compute the centralities' values. The main focus here is to interpret and understand the results presented in Table 4.
Based on Table 4, the Precision and Recall scores are equal to 0.94 and 0.33, respectively. The resulting value of the F 1 score is equal to 0.49. This means that the mutual accuracy of centralities C 1 and C 2 to identify the most important vertices in network G based on the a threshold is equal to 0.49.
It is important to emphasize that the row-column positions (i.e., order) of centralities C 1 and C 2 do not affect the resulting value of F 1 : Consider the illustrative Centrality Error Matrix presented in Table 4 that was used to calculate F 1 (C 1 , C 2 ). To compute the F 1 (C 2 , C 1 ) that corresponds to the reversed version of the row-column position of the centralities, the transposed Centrality Error Matrix should be employed (see Table  5). Based on Eqs. (1)-(3), the computed value of F 1 (C 2 , C 1 ) is equal to 0.49, which is the same result as for F 1 (C 1 , C 2 ).
Finally, it should be emphasized that the introduced accuracy estimation approach is applicable for any pairwise combinations of network centralities. Regardless of the number of analyzed centralities, the approach ranks all centralities and identifies a pair that gives the highest F 1 accuracy based on the a value. The final decision on the list of centralities to be tested depends on an analyst's (i.e., decision maker's) preferences.

Experiments
The programming implementation of the introduced approach was done in Python 3. All experiments were conducted based on NetworkX, an opensource Python package for the study of the structure and dynamics of complex networks (Hagberg, Swart, and Chult 2008). The approach was tested on the four most widely used network centrality measures-namely, degree (DGR), betweenness (BTW), closeness (CLS), and eigenvector (EIG) centralities (Newman 2018). All their possible pairwise combinations are as follows: DGR-BTW, DGR-CLS, DGR-EIG, BTW-CLS, BTW-EIG, and CLS-EIG.
In general, the list of centralities can be extended to any size depending on the analyst's preferences. In the current research, the aim was not to analyze all possible centralities in social network theory but rather to demonstrate how the F 1 -based approach works on the selected set of Table 5. Transposed centrality error matrix.
centralities. Three real-life datasets were employed to perform a grid search analysis based on all possible pairwise combinations of the selected centralities over all a values from the range [0, 1] with an increment of 0.1. The network's datasets were retrieved from the Stanford Large Network Data Collection, which was originally created for conducting experiments on real-life networks (Leskovec and Krevl 2014). The first dataset represents an online Facebook network based on "circles" (i.e., "friend lists"). It consists of 4,039 anonymized vertices interconnected by 88,234 edges that reflect a rich diversity of social interrelations between Facebook users (Leskovec and Mcauley 2012). The second dataset was generated based on email communications between the members of a large European research institution. This dataset represents a core part of the organizational network, with 1,005 vertices interconnected by 16,706 edges, that was built based on anonymized information about emails sent between the research institution's members (Yin et al. 2017). The third dataset was formed based on the analysis of scientific collaborations between authors of papers published in Arxiv GR-QC (Leskovec, Kleinberg, and Faloutsos 2007). This dataset is a co-authorship network that consists of 5,242 vertices (i.e., authors) interconnected by 14,496 edges (i.e., coauthorships). More details about the structural characteristics of all three networks are presented in Appendix B.
For each network, F 1 scores of all six pairwise centrality combinations were calculated based on the a values from the range [0, 1]. The visualized results are presented in Figures 1-3. As the midpoint of the a range, i.e., the value of a equal to 0.5, is taken as the recommended starting point, consider the corresponding computational results for the given value in more detail below. Based on Figure 1, the F 1 score of the pair of degree and betweenness is equal to 0.67. This corresponds to the highest accuracy among all possible combinations of centralities in terms of their ability to identify the most important (i.e., central) members in the Facebook network. All other possible pairs of centralities show significantly lower results, below the value of 0.1. For the European research institution network, the best result was achieved by the combination of degree and eigenvector centralities. Its F 1 score is equal to 0.38, which is almost twice as high as that for the pair with the second-highest result (i.e., DGR-BTW). According to Figure 2, the F 1 -driven "race" of the centralities on the level of a equal 0.5 is more competitive compared with the Facebook network "race," as the differences between the scores are less significant. Nevertheless, the F 1 scores for the majority of centralities in the European research institution network are much lower than 0.1, which makes a pair of degree and eigenvector centralities the best choice for the given network. Finally, in the Arxiv network, the F 1 score of degree and eigenvector centralities shows the highest result. It is equal to 0.79, which makes the given pair an absolute leader compared with all other centralities as their scores are much lower than the value of 0.1.
According to the computational results presented in Figures 1-3, the F 1 score for all pairwise combinations of centralities is equal to one when the value of a is equal to zero. The zero value is a lower-bound value of a, which corresponds to the minimum level of strictness that can be assigned by the analyst. In contrast, assigning the value of a equal to one (i.e. the upper-bound value of a) implies the highest level of strictness in classifying vertices as central. As mentioned earlier, the a-value selection is based on the analyst's preferences. Nevertheless, the value of a equal to 0.5 can be considered as a starting point (i.e., default value) for the F 1 -based analysis  of centralities as it splits the range of all possible a values into two symmetric intervals. The value of a equal to 0.5 is useful when the analyst does not have any particular preferences about the threshold values and, therefore, has an opportunity to calibrate the value of a later by starting the analysis from its default value.

Discussion and Conclusion
Regardless of how many centralities are included in the F 1 -based analysis, the F 1 scores can be obtained for any pair of centralities and for any a values from the [0, 1] range. The calculation of the F 1 score is based on the Centrality Error Matrix, whose quadrants' values correspond to the numbers of vertices obtained in accordance with a-based conditions. Each quadrant's value represents a simple counter whose value is calculated a posteriori, i.e., only when the centralities' values were computed. This makes the proposed F 1 -based accuracy estimation method independent of any centralities' computational approaches.
In practice, analysts can employ any network centralities. The selection of centralities may vary depending on the specifics of problems that analysts are trying to solve or any personal preferences (Newman 2018). The F 1 approach is applicable to any centralities whose computational mechanisms have been tested, formalized, and well described in the literature (Freeman 1978;Newman 2018). It should be emphasized that the presented F 1 approach does not aim to form or recommend any lists of network metrics/centralities. Its goal is instead to estimate the accuracy of centralities in identifying the importance of positions in networks. The F 1 approach reduces the initial set of tested centralities, which can be of any size, to a set of two, i.e., the pair of centralities with the highest F 1 score. Furthermore, the F 1 approach allows for the acquisition of an overall ranked list of pairs of tested centralities, thereby helping the analyst in interpreting the applicability of the centralities based on the resulting F 1 scores. The proposed approach is potentially applicable to a wide range of practical problems that involve network analysis. For example, with respect to organizational networks, the F 1 approach shows high potential in project coordination and leadership analysis (Gronn 2002). Studying the effect of organizational positions of employees in accordance with their network centralities helps to achieve a better understating of how to facilitate greater collaboration among employees working on projects within the formalized organizational structures (Hossain 2009). According to the previous studies conducted by Freeman, Roeder, and Mulholland (1979, 136), the analysis of structural positions "helps to fill in the picture of the relationship between communication structure and group performance". In the given context, the F 1 approach is applicable to investigating which groups of employees fit projects the best in terms of their structural positions in organizational networks. When working on project coordination, managers have the opportunity to analyze F 1 scores and select the most appropriate centrality metrics in terms of their ability to reflect the individual nature of organizational structures.
Another illustration of the potential applicability of F 1 is the analysis of consumer behavior in marketing networks. According to Hill, Provost, and Volinsky (2006), quantitative network analysis in combination with marketing analytical methods is extremely helpful in understanding how to increase sales based on the investigation of existing and potential links between consumers (or customers). One of the potential applications of the F 1 approach is therefore based on the necessity to detect the most influential consumers when developing individualized advertisement options. More specifically, the analysis of F 1 scores in accordance with individual characteristics of customers' networks is helpful in detecting the most influential customers based on their positions in marketing networks. The F 1based ranking can, for example, be employed for calibrating customer targeting strategies, as the F 1 -based results can assist in better illuminating and understanding the customers' network positions (Bartal, Pliskin, and Ravid 2019;Tong, Luo, and Xu 2020). The proposed F 1 -based approach has good potential for applications in marketing analytics, as the attractiveness of various quantitative methods for studying consumer behavior based on network data analysis is growing at an incredible rate. Various quantitative methods are actively being developed by, for example, such large market players as Facebook, Google, and Amazon (Can and Alatas 2019).
In considering the potential applications of the F 1 approach in applied network analysis, the current research highlights an interesting direction for future work. With access to large-scale datasets across different real-life social networks, it would be interesting to identify the correspondence/correlations between different networks' categories and specific centralities that give the highest F 1 scores (in each of the network categories). As the potential factors for networks' categorization, a type of network (such as organizational, marketing, or online social networks) or any structural characteristics (for example, density, connectivity, and assortativity) could be used. This means that on the basis of an extended set of data and an extended list of networks, it would be useful to draw up a so-called F 1based "heat map" of correspondence between various centralities and various network types or network categories that encapsulate various types of networks. For practitioners, this would simplify network analysis, as they could employ the visualized heat maps of centralities' correspondence (based on F 1 scores) to each specific type and/or category of networks in which they are interested. In addition, an interesting direction for future work can be the development of typical real-world scenarios for selecting centralities based on the "heat map" to highlight the pros and cons of the F 1 approach. Doing so would help enhance understanding of the centralities' applicability in different types and categories of networks.