Abstract
Each co-author (CA) of any scientist can be given a rank \((r)\) of importance according to the number \((J)\) of joint publications which the authors have together. In this paper, the Zipf–Mandelbrot–Pareto law, i.e. \( J \propto 1/(\nu +r)^{\zeta }\) is shown to reproduce the empirical relationship between \(J\) and \(r\) and shown to be preferable to a mere power law, \( J \propto 1/r^{\alpha } \). The CA core value, i.e. the core number of CAs, is unaffected, of course. The demonstration is made on data for two authors, with a high number of joint publications, recently considered by Bougrine (Scientometrics, 98(2): 1047–1064, 2014) and for seven authors, distinguishing between their “journal” and “proceedings” publications as suggested by Miskiewicz (Physica A, 392(20), 5119–5131, 2013). The rank-size statistics is discussed and the \(\alpha \) and \(\zeta \) exponents are compared. The correlation coefficient is much improved (\(\sim \)0.99, instead of 0.92). There are marked deviations of such a co-authorship popularity law depending on sub-fields. On one hand, this suggests an interpretation of the parameter \(\nu \). On the other hand, it suggests a novel model on the (likely time dependent) structural and publishing properties of research teams. Thus, one can propose a scenario for how a research team is formed and grows. This is based on a hierarchy utility concept, justifying the empirical Zipf–Mandelbrot–Pareto law, assuming a simple form for the CA publication/cost ratio, \(c_r = c_0\, log_2 (\nu +r)\). In conclusion, such a law and model can suggest practical applications on measures of research teams. In Appendices, the frequency-size cumulative distribution function is discussed for two sub-fields, with other technicalities
Similar content being viewed by others
Notes
Necessarily, \(-1 \le \nu \), since \(r \ge 1.\)
The effect has been so called when examining co-authorship sizes size by Ausloos (2013); it occurs when the data is flattening at low rank.
That would have led to too few papers per field, and it would have been nonsense to do some meaningful fit thereafter.
So does the 4 parameter ZMP (4-ZMP) law, see Appendix 1.
This has been recently examined considering pairs of leading CA through a binary scientific star concept (Ausloos 2014).
For simplicity of the writing, \(r\) is taken as a continuous variable though it is manifestly a positive integer only.
Benguigui and Blumenfeld-Lieberthal (2011) are perfectly right : (text adapted, but resulting from a \(quasi\) exact quotation) in order to be able to decide if Eq. (1) is (and Eqs. 2 and 3 are) verified or not, one has to fit the data to several functions and compare the results, using the same criterion. Naturally, it is not realistic to expect each [ \(J(r)\) ] would be fitted to numerous formulas; thus, we \(({\simeq }r)\) propose to use a visual inspection in order to help decide which formulas might represent the data correctly. \(\ldots \) we \(({\simeq }I)\) trust the human mind and believe that a visual inspection can indeed give essential information; particularly it helps deciding if the studied system is homogeneous or not \(\ldots \) a simple visual inspection \(\ldots \) shows that the system (\(\ldots \)) is not homogeneous. It can be divided into \(\ldots \) subsystems. This (\(\ldots \)) emphasizes the need for a visual inspection of the rank-size relation of the real data on log-log scales. This gives the possibility to see (in the simple meaning of the word, see with the eye) if the points may be fitted with some mathematical function (not necessarily a straight line).
References
Amati, G., & van Rijsbergen, C. J. (2002). Term frequency normalization via Pareto distributions. In F. Crestani, M. Girolami, & C. J. van Rijsbergen (Eds.), Advances in Information Retrieval (pp. 183–192)., LNCS Heidelberg: Springer.
Ausloos, M. (2013). A scientometrics law about co-authors and their ranking: the co-author. Scientometrics, 95(3), 895–909.
Ausloos, M. (2014). Binary scientific star coauthors core size. Scientometrics, 99(2), 331–351.
Benguigui, L., & Blumenfeld-Lieberthal, E. (2011). The end of a paradigm is Zipf’s law universal? Journal of Geographical Systems, 13(2), 87–100.
Bougrine, H. (2014). Subfield effects on the core of coauthors. Scientometrics, 98(2), 1047–1064.
Fairthorne, R. A. (1969). Empirical hyperbolic distributions (Bradford–Zipf–Mandelbrot) for bibliometric description and prediction. Journal of Documentation, 25(4), 319–343.
Glaeser, E. L. (2008). Cities, agglomeration and spatial equilibrium. New York: Oxford University Press.
Haitun, S. D. (1982). Stationary scientometric distributions part 1. Different approximations. Scientometrics, 4(1), 5–25.
Hsu, J. W., & Huang, D. W. (2009). Distribution for the number of co-authors. Physical Review E, 80(5), 057101.
Izsák, J. (2006). Some practical aspects of fitting and testing the Zipf–Mandelbrot model. Scientometrics, 67(1), 107–120.
Jarque, C. M., & Bera, A. K. (1980). Efficient tests for normality, homoscedasticity and serial independence of regression residuals. Economics Letters, 6(3), 255–259.
Jefferson, M. (1939). The law of primate city. Geographical Review, 29(2), 226–232.
Laherrère, J., & Sornette, D. (1998). Stretched exponential distributions in nature and economy fat tails with characteristic scales. European Physics Journal B, 2(4), 525–539.
Madden, C. H. (1958). Some temporal aspects of the growth of cities in the United States. Economic Development and Cultural Change, 6(2), 143–170.
Mandelbrot, B. (1960). The Pareto–Levy law and the distribution of income. International Economics Review, 1(2), 79–106.
Manin, D Yu. (2009). Mandelbrot’s model for Zipf’s law can Mandelbrot’s model explain Zipf’s law for language? Journal of Quantitative Linguistics, 16(3), 274–285.
Miskiewicz, J. (2013). Effects of publications in proceedings on the measure of the core size of coauthors. Physica A, 392(20), 5119–5131.
Pareto, V. (1896). Cours d’economie politique. Geneva: Droz.
Popescu, I. I., Altmann, G., & Köhler, R. (2010). Zip’s law—another view. Quality and Quantity, 44(4), 713–731.
Rosen, K. T., & Resnick, M. (1980). The size distribution of cities an examination of the Pareto law and primacy. Journal of Urban Economics, 8(2), 165–186.
Tsallis, C. (1988). Possible generalization of Boltzmann-Gibbs statistics. Journal of Statistical Physics, 52(1–2), 479–487.
Tsallis, C., & Albuquerque, M. P. (2000). Are citations of scientific papers a case of nonextensivity? European Physics Journal B, 13(4), 777–780.
Voloshynovska, I. A. (2011). Characteristic features of rank-probability word distribution in scientific and belletristic literature. Journal of Quantitative Linguistics, 18(3), 274–289.
West, B. J., & Deering, B. (1995). The lure of modern science fractal thinking. Singapore: World Scient.
Zipf, G. K. (1949). Human behavior and the principle of least effort an introduction to human ecology. Cambridge: Addison Wesley.
Acknowledgments
Thanks to J. Miskiewicz and H. Bougrine for private communications on their respective work, comments prior to manuscript submission and making available the relevant publication list data mentioned in the text. I warmly thank all colleagues who have kindly provided relevant data. This paper is part of scientific activities in COST Action TD1210.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: ZMP fits with 3 or 4 free parameters
Using the 3-parameter free ZMP function, Eq. (3), for data fitting is much more troublesome than fitting with the Zipf hyperbolic law (Fairthorne 1969; Haitun 1982; Izsák 2006). Thus, a variant of the ZMP law, i.e. the 4-parameter relation Eq. (3) is sometimes proposed, since it allows for one more scaling parameter. It is often observed that the 4-ZMP has some advantage with respect to the 3-ZMP, from the point of view of the stability of the solutions of the non linear system of equations for the fit parameters. This is interpreted as due to the fact that the numerical values of the other parameters (\(\mu ,\; \eta ,\; \lambda \), and the more so \(c\)) fall into more compact ranges. For examples, compare the amplitudes \(c\) and \(b\) for \(s_2\) and \(s_4\), respectively, in Tables 2 and Table 4 for the 4-ZMP and 3-ZMP fits.
However nothing drastic has been found in the present cases, as seen from Tables 3-5. Moreover, the meaning of \(\nu \), in the 3-ZMP case seems more easily interpretable than the \(\eta \) and \(\lambda \) values in the 4-ZMP.
It should be emphasized that the \(\hbox {R}^2\) values are identical, up to the third decimal, for the 3- and 4-ZMP parameter law fits, see Table 4, except for \(s_6\) and subsequently \(s_{63}\), nevertheless found close to each other, as likely due to a behavior pointing to a strong exponential tail cut-off, in which cases the empirical laws can be hardly expected to hold. Thus, it is observed that \(\mu \equiv \zeta \) in all cases, i.e. the relevant conclusion.
Appendix 2: on merging sub-fields
In order to investigate the effect of reduced size of data in considering sub-fields, Bougrine (2014) merged 2 sub-fields into a single one, both in the case of MRA and HES. For comparison, and completeness, ZMP and power law fits have been made on \(a_4\) and \(a_5\) merged into \(a_{54}\) on one hand, and on \(s_3\) and \(s_6\) merged into \(s_{65}\) on the other hand. The parameters resulting from the fits are given in Table 2. The fits are displayed in Fig. 8. In such cases, with not many data points, the co-author core is low, and the effect of many CAs at rank \(r\ge 4\) or 6 respectively is rather important. Thus, the instability of the fits with respect to initial conditions is due to the presence of a strong exponential cut-off superposed on the power law tail.
These features indicate the sensitivity of the sub-field definition, on one hand, and of the co-author distribution, on the other hand.
Appendix 3: on cumulative distribution functions (CDF)
In Informetrics, one prefers to fit empirical data to some size-frequency functional form using a maximum likelihood fit, rather than making a least squares fit for the rank-frequency distribution. Indeed, one can also ask, as did Pareto (1896), how many times one can find an “event” greater than some size \(y\), i.e. study the size-frequency relationship. Pareto found out that the cumulative distribution function (CDF) of such events follows an inverse power of \(y\), or in other words, \(P\;[Y>y] \sim y^{-\kappa }\). Thus, the (number or) frequency \(f\) of such events of size \(y\), (also) follows an inverse power of \(y\).
Thus, for illustration, ZMP and power law fits have been made on two of MRA major sub-fields, i.e. \(a_2\) and \(a_7\). A log-log scale display of the number of joint publications (NJP) with co-authors ranked by decreasing importance and the corresponding CDF are shown in Figs. 11 and 12. Both the power law and ZMP law fits are shown for the all \(r\) range. Note that the NJP data and fits are those seen in Fig. 7, with numerical values in Table 1.
The “queen effect” is well seen on the NJP data and fits, on Fig. 11, but not so much on the CDF. The “king effect” is well seen on the NJP data and fits, on Fig. 12, but the CDF shows a pronounced cut-off at high \(r\). Therefore it would seem that the CDF is less pertinent to observe minute effects. This is understandable since the CDF results from an integration scheme. However, again understandably, the CDF fits are much more stable.
Rights and permissions
About this article
Cite this article
Ausloos, M. Zipf–Mandelbrot–Pareto model for co-authorship popularity. Scientometrics 101, 1565–1586 (2014). https://doi.org/10.1007/s11192-014-1302-y
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-014-1302-y