Interdisciplinarity as Diversity in Citation Patterns among Journals: Rao-Stirling Diversity, Relative Variety, and the Gini coefficient

Questions of definition and measurement continue to constrain a consensus on the measurement of interdisciplinarity. Using Rao-Stirling (RS) Diversity produces sometimes anomalous results. We argue that these unexpected outcomes can be related to the use of"dual-concept diversity"which combines"variety"and"balance"in the definitions (ex ante). We propose to modify RS Diversity into a new indicator (DIV) which operationalizes variety, balance, and disparity independently and then combines them ex post."Balance"can be measured using the Gini coefficient. We apply DIV to the aggregated citation patterns of 11,487 journals covered by the Journal Citation Reports 2016 of the Science Citation Index and the Social Sciences Citation Index as an empirical domain and, in more detail, to the citation patterns of 85 journals assigned to the Web-of-Science category"information science&library science"in both the cited and citing directions. We compare the results of the indicators and show that DIV provides improved results in terms of distinguishing between interdisciplinary knowledge integration (citing) versus knowledge diffusion (cited). The new diversity indicator and RS diversity measure different features. A routine for the measurement of the various operationalizations of diversity (in any data matrix) is made available online.


Introduction
Policymakers and researchers continue to be interested in measures of "interdisciplinarity" (Wagner et al., 2011). Recently, a great deal of attention has been paid to using references as a way to measure "interdisciplinarity" (e.g., Boyack & Klavans, 2014;Mishra & Torvik, 2016;Tahamtan & Bornmann, 2018;Wang, 2016), These analyses are notable because of the increasing consensus, following Rao (1982) and Stirling (2007), for defining interdisciplinarity as diversity encompassing three features: variety, balance, and disparity. However, a problem arises when measuring the interrelationships among the three components: how can they be combined without losing either information or validity?
In this study, we revisit the definition and measurement of variety, balance, and disparity; we compare methods; and we elaborate an approach that addresses the anomalous outcomes that can occur when using Rao-Stirling (RS) Diversity. Leydesdorff et al. (2017), for example, compared twenty cities in terms of the RS diversity of patent portfolios; the results showed the unsatisfying finding that Rotterdam and Jerusalem scored above Shanghai and Paris. The new measure applied to this same data reverses the order: Shanghai is ranked in the first place, and Rotterdam only in the 16 th place among twenty cities. At that time, the new method of measurement was proposed, but without substantial empirical testing.
In RS diversity, two of the three components (variety and balance) are combined in the definitions (ex ante) using the Simpson Index. However, these components can be measured independently and thereafter combined (ex post). We argue that the ex ante definition of "dual-concept" diversity (e.g., the Simpson Index) is already known to be a source of unnecessary distortions (Stirling, 1998, pp. 48f.). Using the Gini coefficient, however, "balance" can be operationalized independently (Nijssen, Rousseau, & Van Hecked, 1998), as can "variety." We discuss these different measures applied to interdisciplinarity in terms of diversity and compare the empirical results. To this purpose, we use the full set of 11,487 journals contained in the Journal Citations Report 2016 of the Science Citation Index and the Social Sciences Citation Index, with the question whether and to what extent the different indicators measure the same or different dimensions of or perspectives on "interdisciplinarity." A case study using the aggregated citations among the 85 journals assigned within the larger set to the Web-of-Science Subject Category (WC) for "information science & library science" is further elaborated. Does the new diversity measure improve on the measurement of interdisciplinarity in comparison to RS diversity?
A routine for the computation is provided at https://leydesdorff.github.io/diversity_measurement/ (and http://www.leydesdorff.net/software/mode2div). The routine can be used to compute RS diversity, the new diversity measure DIV, and the respective components in any data matrix (e.g., a word/document matrix or a citation matrix) written as a Pajek file. 4 In this study, we chose the matrix of 10,000+ journals contained in the Journal Citation Reports 2016 citing each other. This provides a large empirical domain with which we are familiar from previous studies and in which we encountered the problems using RS diversity for the measurement of 4 The Pajek format can be used for virtually unlimited large matrices and is readily available in most network analysis and visualization programs. UCInet offers the option to rewrite Excel files in the Pajek format.

The Measurement of Interdisciplinarity in terms of Diversity
In a paper, ambitiously entitled "A general framework for analysing diversity in science, technology and society," Stirling (2007) addressed the problem of measuring interdisciplinarity based on his extensive review of the methodological and statistical literature (Stirling, 1998). Stirling's (2007) study has been influential in science & technology studies, to the extent that Rafols & Meyer (2010, pp. 266f.) have defined the "Stirling Index" of diversity-with only a footnote mentioning Rao's (1982) original formulation-as follows: 5 In this equation, pi is the proportion of elements assigned to each class i and dij denotes a disparity measure between the two classes i and j. Note that the classes can be defined at different levels of aggregation. For example, one can measure the diversity of references in articles in terms of the cited journals or in terms of the WCs attributed to the journals. The resulting values of Δ will be different. Analogously, diversity will vary with the number of digits in the case of using Medical Subject Heading (MeSH) or patent classes. 5 Stirling (2007, at p. 712) formulated the most general case of Δ = ∑ ( , ) . The introduction of exponents opens another parameter space. In most scientometric applications, authors assume the reference case of α = β = 1.
In the scientometric literature, this measure Δ is widely called the "Rao-Stirling diversity indicator" (e.g., Cassi, Champeimont, Mescheba, & de Turckheim, 2017) as different, for example, from Simpson diversity (Simpson, 1949) or Shannon entropy (Shannon, 1948). The right-most factor of Eq. 1 [∑ ( , )] is also known as the Hirschman-Herfindahl Index in economics or the Simpson index in biology. 6 The first term of the equation [∑ , ] adds the distribution in a (e.g., geographical) space. For example: if distances in a subset are small, this space can be considered as a niche of "related variety" (Frenken, Van Oort, & Verburg, 2007). Table 1 [from Rafols & Meyer (2010, p. 267), but based on Stirling (2007, p. 709)] summarizes the distinctions among the various indicators of diversity. However, we agree with Stirling's crucial argument that diversity-and by implication "interdisciplinarity"-is composed of three (and not more than three) components which he labeled "variety," "balance," and "disparity." He formulated the relations among these three components as follows: 6 ∑ = 1 when taken over all i and j. The Simpson index is equal to Σi (pi) 2 , and the Gini-Simpson to [1 -Σi (pi) 2 ]. See also Table 1. Each is a necessary but insufficient property of diversity (Sokal & Sneath, 1970;Clarke, 1978, Stirling, 2006d. Although addressed in different vocabularies, each is applicable across a range of disciplines and aggregated in various permutations in quantitative indices (Hill, 1973). Despite the multiple disciplines and divergent contexts, there seems no other obvious candidate for a fourth important general property of diversity beyond these three (Stirling, 2006e).
In other words: all else being equal, for each of these three factors (that is, ceteris paribus), an increase in one of the three components leads always to a greater diversity. This has also been called "the monotonicity" requirement (Rousseau, 2018a): diversity increases for each of the three components when the other two remain the same.  Stirling (1998, p. 41).
Source: Rafols & Meyer (2010, p. 266 Rafols & Meyer (2010, p. 266) provided Figure 1, which has become iconographic for visualizing the distinctions among the three components. From the perspective of hindsight, previous attempts to operationalize "interdisciplinarity" can be recognized as using one or two of the three components of diversity suggested in Figure 1. For example, Porter & Chubin (1985) proposed using the proportion of references to sources outside the WC of the paper under study as a measure of interdisciplinary knowledge integration into the citing paper (cf. Morillo, Bordons, & Gómez, 2001;Uzzi, Mukherjee, Stringer, & Jones, 2013). 7 The focus in these studies is limited to variety. Rafols, Porter, & Leydesdorff (2010) generalized this concept of diversity into spreads in portfolios-e.g., of references across WCs-projected on a map. The map provides distances among the nodes (dij) that can be used for the measurement of disparity. 8 From the perspective of network analysis, we have explored Betweenness Centrality (BC) as an indicator of diversity and interdisciplinarity (Leydesdorff, Goldstone, & Schank, 2008). Using the aggregated journal-journal citation relations provided by the Journal Citation Reports 2015 as a comprehensive set (n > 11,000 journals), Leydesdorff, Wagner, & Bornmann (2018) tested RS Diversity and BC against each other as measures of interdisciplinarity. However, the results were disappointing: whereas BC was found to indicate "multidisciplinarity" more than "interdisciplinarity," the authors cautioned (at p. 588) that "[…] Rao-Stirling 'diversity' is often used as an indicator of interdisciplinarity; but it remains only an indicator of diversity." Furthermore, "the interpretation of diversity as interdisciplinarity remains the problem" and the authors warn that "policy analysts seeking measures to assess interdisciplinarity can be advised to specify first the relevant contexts […]. The arguments provided in this study may be helpful In ecology, efforts have been made to integrate the two components of "variation" and "balance" (ex ante) into a single indicator such as the Simpson Index. This has also been called "dual concept diversity" (e.g., Junge, 1994). According to Stirling (1998, p. 48) "'dual concept diversity' has become synonymous with diversity itself to many authorities in ecology." In scientometrics, Rousseau et al. (1999), for example, formulated in a similar spirit as follows (at p. 213): It is generally agreed that diversity combines two aspects: species richness and evenness.
Disagreement arises at how these two aspects should be combined, and how to measure this combination, which is then called "diversity." Although Stirling (1998, p. 57) concluded that there are good reasons to prefer the Shannon measure above the Simpson Index if one wishes to measure the two concepts in a single operationalization as a "dual concept," he himself eventually chose to extend the Simpson Index-as a dual concept indicator-with disparity as a third dimension. The problem of the duality of the Simpson Index was thereby inherited into the RS diversity indicator. Stirling (1998, at p. 48) was aware of this problem when he formulated the following empirical question: Where a system displays simultaneously greater variety and balance, there is little need for a single integrated concept to recognise that it is intrinsically more (dual concept) 9 diverse.
However, it is much more likely to be the case that no single system can be considered unequivocally to be intrinsically more diverse than others in this sense. In such cases, the crucial questions concern the relative importance assigned to variety and balance in arriving at the overall notion of diversity.
Rousseau (2018a, at p. 651) concluded in a further reflection that "the balance aspect is not hidden in the 'dual concept,' but simply is not present" in the RS measure. By providing a counter-example, the author showed that RS diversity does not meet the ceteris paribus monotonicity requirement which states that for a given variety and disparity, the diversity increases monotonically with balance. Rousseau added that this same conclusion-the absence of an indicator of balance-holds equally for the "true diversity" variant of RS diversity offered by Zhang, Rousseau, & Glänzel (2016). Following Leinster and Cobbold (2012), however, Rousseau (2018a and b) argues that balance is not an essential component of diversity.
In a brief communication, Leydesdorff (2018) responded to Rousseau (2018a) that there is no need for such a drastic revision of Stirling's theoretical conceptualization in terms of "balance," "disparity," and "variety." The problem is the operationalization: instead of ex ante combining "balance" and "variety," however, Nijssen, Rousseau, & Van Hecke (1998) offered a possibility of distinguishing analytically between balance and variety. They proved mathematically that both the Gini index and the coefficient of variation (that is, the standard deviation divided by the mean of the distribution or, in formula format, σ/μ) are ideal indicators of balance. (Unlike the 9 In other words, assuming that both systems display equal disparity. Gini coefficient, however, the coefficient of variation is not bounded between zero and one.) Furthermore, the Gini index is not a measure of variety (Rousseau, 2018a, p. 649). In principle this conclusion enables us to distinguish operationally between "variety" and "balance" as two independent dimensions-represented by two different equations. The empirical results can then be combined by multiplying the values between zero and one ex post.
"Variety" can be independently operationalized-as in the number of classes (nc) in use-or as relative variety (bounded between zero and one) as nc / N-with N being the total number of classes available. As noted, "balance" can be operationalized using the Gini coefficient without co-mingling it with "variety" (Nijssen et al., 1998). Since the Gini coefficient is maximally diverse for Gini = 0 and fully homogeneous for Gini = 1, we use (1 -Gini) 10 so that one obtains a diversity measure with three components for each unit of analysis c, as follows: The right-most factor in this equation is similar to (i) the disparity measure used in the case of RS diversity. The two other factors, however, represent (ii) relative variety as nc / N-with N being the total number of classes available-and (iii) balance measured as (1the Gini coefficient) of the same distribution. (Variety and disparity have to be normalized so that all terms are bounded between zero and one.) In Eq. 2, nc is the number of classes with values larger than zero and N is the number of available classes in the domain. For example, Scientometrics was cited by articles in 38 of the 86 journals belonging to the WC of "information science and library science" in 2015, leading to a relative variety in this citation distribution of 38/86 = 0.442. In cases where the number of classes is not known, one can normalize pragmatically by using the maximum number of observed classes or, in other words, the longest vector in the reference set.
As noted, Leydesdorff (2018) compared the new measure to the RS diversity of the patent portfolios of 20 cities. However, in order to assess the quality of the two measures as an indicator of "interdisciplinarity," we required a larger data set. In this study, we return to the JCR data we used in the previous study and which resulted in unsatisfactory values for RS diversity. Does Eq.
2 provide us with more convincing results? In addition to the interpretability of the results, we can consider the (rank-order) correlation with BC across the distribution of 10,000+ journals as another indicator of the validity of the measurement. Are these indicators-RS diversity and our new measure (DIV)-significantly different in their relations to BC? Does the exercise bring us further towards indicating interdisciplinarity?
Before turning to these empirical questions, let us first consider the concept of "coherence." Rafols & Meyer (2010, pp. 268 ff.) conceptualized interdisciplinarity in terms of both diversity and coherence. Analogously, others use the words "novelty" and "conventionality" (Uzzi et al., 2013;Schilling & Green, 2015;Stephan et al., 2017), or 'atypical' combinations, but which are limited in terms of accounting for balance. Leydesdorff & Rafols (2011, at p. 852) and Rafols, Leydesdorff, O'Hare, Nightingale, & Stirling (2012, at p. 1268) have proposed to operationalize coherence as follows: ( This measure of coherence accounts for both the probability of co-occurrences of classes i and j (pij) and the distances (dij) between these classes. In other words, C measures the average distance among classes related in a network. Coherence C and RS diversity can also be compared as observed (pij) versus expected values (pi * pj) of interdisciplinarity (Rafols, 2014).
However, it is less clear whether and how coherence scores should be combined with diversity into a composed indicator of interdisciplinarity. Diversity in the referencing can also be considered as "interdisciplinary" knowledge integration, whereas diversity in being cited has been considered as diffusion (Rousseau, Zhang, & Hu, forthcoming;cf. Leydesdorff & Rafols, 2011a and b). For the purpose of this study, we limit ourselves to the debate about diversity.

Data
We test the measures both in the full set of the journals included in the JCRs 2016 and in the case of the subset of 85 journals subsumed by ISI/Clarivate under the WC of "information science and library science." (We used the analogous sets for 2015 in our previous study.) As being active practitioners in this field (LIS), we may be able to provide the results for the subset a more informed interpretation. Actually, this focus led to our worry about RS diversity as an indicator of "interdisciplinarity." We formulated (at pp. 579f.): In terms of knowledge integration indicated as diversity in the citing dimension, JASIST assumes the third position and Scientometrics trails in 45 th position. In the cited dimension, the diversity of Scientometrics is ranked 70 (among 86). Thus, the journal [Scientometrics] is cited in this environment much more specifically than in the larger context of all the journals included in the JCR, where it assumed the 339 th and 6,246 th position among 11,359 observations, respectively.
The 70 th position of Scientometrics within this set of 86 LIS journals is very counter-intuitive.
The new routine adds both RS diversity and the new diversity measure to a spreadsheet, as well as the other relevant indices such as Gini, Simpson, Shannon, disparity, and relative and absolute variety. Table 2 provides descriptive statistics for JCR data in 2016 of the Science Citation Index and Social Sciences Citation Index combined. As noted, the aggregated citation relations among more than 10k journals provide us with a rich domain containing cited and citing distributions for each of the journals that we can input in Eqs.
1 and 2. However, diversity can be measured in any set of values. The measure is a statistic and therefore dimensionless.

Methods
The data is first organized into a citation matrix of 11,487 journals citing one another. RS The co-occurrence values can be used as numerators in a large number of (dis)similarity measures (Jones & Furnas, 1987). In this study, we will use (1cosine) as a measure of the distance in the disparity term [∑ , ]. The cosine is a convenient (non-parametric) measure which varies between 0 and 1, disregards the zeros (Ahlgren, Jarneving, & Rousseau, 2003), and does not assume normality in the distribution. However, other (dis)similarity measures can also be used (e.g., Jaccard, Euclidean distances, etc.).
The classical definition of the Gini coefficient is as follows: ( 3) where x is an observed value, n is the number of values observed, and x bar is the mean value.
If the x values are first placed in ascending order, such that each x has rank i, some of the comparisons above can be avoided: x n The output file div_col.dbf contains (1) RS diversity, (2) "true diversity" which is equal to [1 / (1 -RS Div) ] as derived by Zhang et al. (2017, Eq. 6  fields of science as a form of local knowledge integration, but this is not the type of "interdisciplinarity" which is valued in the sciences or in the science policy domain (Wagner et al., 2011).
Using the new indicator, Table 4 shows findings where the more obvious candidates for "interdisciplinarity" such as PLOS ONE, Sci Rep-UK, etc., are indicated in the cited dimension.
Science and Nature, however, are not among the top-25 in the citing direction because referencing is very precise and disciplined within these journals. This accords with the intuition that articles in these two journals are cited broadly because of their status. In the cited direction,      Simpson, Shannon, and relative variety. In the case of RS, disparity is more prominent in the result of the multiplication (Eq. 1 above) because it is multiplied by only a single other factor (the Simpson index), while it is multiplied by two other components in the case of DIV. Thus, the synthesis into "dual concept" diversity reduces the influence of variety on RS diversity.
As noted, Stirling (1998, p. 57) concluded that there are good reasons to prefer the Shannon measure above the Simpson Index if one wishes to measure the two concepts in a single operationalization as a "dual concept." The respective differences in terms of the correlations with Shannon entropy illustrate our point, since DIV correlates much high with Shannon entropy than RS. (2016) Eighty-five journals were assigned to the WC labeled "information science & library science" in JCR 2016. We study the asymmetrical citation matrix among these 85 journals in both the cited and citing directions. Table 6 provides the 25 highest ranking-journals in the various dimensions of "citing"; Table 7 the corresponding values in the cited direction.   Interdisciplinarity in the citing dimension indicates knowledge integration and one expects more marginal journals to take this role, whereas larger and more leading journals can be expected to have a role in interdisciplinary knowledge diffusion (cited). This difference is reflected in the values for DIV in Table 7, but not for RS. Scientometrics and JASIST 12 lead the ranking in terms of DIV in the cited direction, but are not among the top-25 journals when using RS.

Eighty-five journals in the Library and Information Sciences
Scientometrics has the 69 th position on this list of 85 journals when ranked using RS. As noted, this was the 70 st position in our previous study using 2015 data; this finding triggered our worries about using RS for measuring interdisciplinarity.

Factor Analysis
Factor analysis allows us to test whether the indicators (RS, DIV, and BC) measure the same or different dimensions (of diversity). We used principal component analysis as extraction method and rotated using varimax in SPSS. The values of the Gini coefficient and the Simpson index are components of the diversity measures under discussion; inclusion of these variables into the factor analysis would therefore be redundant. However, we added JIF2 for the orientation of the reader as a kind of benchmark.
Two factors (components) in the analysis have eigenvalues higher than 1; the two factors explain 65.1% of the variance in the data. Table 8 shows the results of the factor analysis: the factor loadings of the different diversity measures on the two components. In the interpretation of the results, we focus on factor loadings with values greater than 0.5 (boldfaced in Table 8).   DIV in the cited direction (interdisciplinary diffusion) has the highest loading on the first factor and is completely uncoupled from Factor 2. The latter factor couples to interdisciplinary knowledge integration with highest factor loading for RS in the citing direction. Figure 2 shows the ranges of RS diversity and the new diversity measure across the 11,487 journals on a log-log scale. Of these 11,487 journals, 10,264 (89.3%) have RS-diversity values above 0.5. In other words, most journals are indicated as diverse. However, the lower values and the larger spread of the new diversity indicator is a consequence of multiplying three terms between zero and one, while only two terms (< 1) are multiplied in the case of RS diversity. 13

Range
However, the much larger range allows for more refined measurement

Summary and conclusions
We asked whether the measurement of interdisciplinarity can be improved by using a new measure of diversity (DIV), when compared with RS. We have shown that the three components of diversity (variety, balance, and disparity) can be measured independently and then combined, creating a more informed result. We operationalized these three components independently as follows: 1. "balance" is operationalized by the Gini coefficient; 2. "variety" is operationalized as "relative variety," that is, the number of classes in use divided by the number of classes available for use; 3. "disparity" can be operationalized using one distance measure or another as in the case of RS diversity; in this study, we used (1cosine).
The three components are all bounded between zero and one; diversity is measured by multiplying the three values for each element. The resulting indicator is then necessarily bound between zero and one.
In comparison to RS diversity, the new indicator (DIV) has the following advantages: 1. It is no longer based on "dual-concept" diversity, but on the independent operationalization of the three components of diversity: balance, variety, and disparity.
2. It is monotonic (Rousseau, personal communication, 18 June 2018): diversity increases for each of the three components when the other two remain the same.
3. "Balance" is operationalized as the Gini coefficient; including this indicator as a component provides greater specificity to the resulting indicator of diversity.
4. The empirical results of the measurement are less puzzling and counterintuitive that one doubts the values of the indicator because of possible flaws in its construction.
5. The new indictor correlates with betweenness centrality significantly more than RS diversity.
Comparing the measurements of two different indicators, one can always expect the results to be different. At best, they may point in the same direction (Hicks et al., 2015). Since there is no ground-truth of "interdisciplinarity," there are no obvious criteria to choose one indicator over another on the basis of empirical results unless the results show obvious (in)validity. This problem has been pointed out before, for instance by Stirling (2007). Furthermore, "interdisciplinarity" is based on the underlying concept of "disciplines" which are social constructs developed to allocate the privileges and responsibilities of expertise and the allocation of resources (Wagner, forthcoming). The boundaries among disciplines, however, are fluid.
In our case, the initial reason to deconstruct RS diversity was the puzzles it continued to pose when measuring diversity as an indicator of interdisciplinarity. These results were often incomprehensible and sometimes counter-intuitive. Via a series of communications, we came to the conclusion that RS is flawed as a measure of diversity because of the method of combining the three components of variety, balance, and disparity. In our opinion, the problem is the ex ante combination of variety and balance as a "dual-concept" indicator; there is no theoretical reason nor practical need for this shortcut. 14 Figure 7: The relative priority assigned to variety and balance in dual concept diversity. Source: Stirling (1998, at p. 49). Stirling (1998) used Figure 7 to show the dilemma when combining the two "subordinate properties" of variety and balance into a single "dual concept": "Where variety is held to be the most important property, System C might reasonably be held to be most (dual concept) diverse.
Where a greater priority is attached to the evenness in the balance between options, System A might be ranked highest. In addition, there are a multitude of possible intermediate possibilities, such as System B" (Stirling, 1998, p. 48).
14 Stirling (2007) was probably not aware of the possibility of using the Gini coefficient as an indicator of balance. There are no references to Rousseau's work, and the Gini is classified in Table 1 (at p. 709) as a "dual-concept" measure of variety and balance. Rousseau (2018a) added that RS is (i) not monotonic despite its aspiration to fulfill this requirement, and (ii) that "balance" is not even covered by RS diversity despite its crucial role in Stirling's (2007) theoretical argument . Nijssen, Rousseau, & Van Hecke (1998), however, have proven that "balance" can be indicated by the Gini coefficient. The Gini coefficient is conveniently bounded between zero and one, and relative variety can also be defined between zero and one (as nc / N). Thus, the reasoning behind Stirling (2007) can be conserved, but in the case of DIV the operationalization is changed, expanded, and made more specific.

Annex II:
Spearman rank-order correlations between the proposed measure of diversity (DIV), RS diversity, Betweenness Centrality (BC), and the Journal Impact Factor (JIF2) for 11,487 journals included in the JCR 2015.