Skip to main content
Log in

Zipf–Mandelbrot–Pareto model for co-authorship popularity

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Each co-author (CA) of any scientist can be given a rank \((r)\) of importance according to the number \((J)\) of joint publications which the authors have together. In this paper, the Zipf–Mandelbrot–Pareto law, i.e. \( J \propto 1/(\nu +r)^{\zeta }\) is shown to reproduce the empirical relationship between \(J\) and \(r\) and shown to be preferable to a mere power law, \( J \propto 1/r^{\alpha } \). The CA core value, i.e. the core number of CAs, is unaffected, of course. The demonstration is made on data for two authors, with a high number of joint publications, recently considered by Bougrine (Scientometrics, 98(2): 1047–1064, 2014) and for seven authors, distinguishing between their “journal” and “proceedings” publications as suggested by Miskiewicz (Physica A, 392(20), 5119–5131, 2013). The rank-size statistics is discussed and the \(\alpha \) and \(\zeta \) exponents are compared. The correlation coefficient is much improved (\(\sim \)0.99, instead of 0.92). There are marked deviations of such a co-authorship popularity law depending on sub-fields. On one hand, this suggests an interpretation of the parameter \(\nu \). On the other hand, it suggests a novel model on the (likely time dependent) structural and publishing properties of research teams. Thus, one can propose a scenario for how a research team is formed and grows. This is based on a hierarchy utility concept, justifying the empirical Zipf–Mandelbrot–Pareto law, assuming a simple form for the CA publication/cost ratio, \(c_r = c_0\, log_2 (\nu +r)\). In conclusion, such a law and model can suggest practical applications on measures of research teams. In Appendices, the frequency-size cumulative distribution function is discussed for two sub-fields, with other technicalities

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. Necessarily, \(-1 \le \nu \), since \(r \ge 1.\)

  2. The effect occurs when the data is upsurging at low rank and has been so called when examining city size by (Laherrère and Sornette 1998); it seems to have been emphasized first by Jefferson (1939), also when studying city sizes.

  3. The effect has been so called when examining co-authorship sizes size by Ausloos (2013); it occurs when the data is flattening at low rank.

  4. That would have led to too few papers per field, and it would have been nonsense to do some meaningful fit thereafter.

  5. So does the 4 parameter ZMP (4-ZMP) law, see Appendix 1.

  6. This has been recently examined considering pairs of leading CA through a binary scientific star concept (Ausloos 2014).

  7. For simplicity of the writing, \(r\) is taken as a continuous variable though it is manifestly a positive integer only.

  8. Benguigui and Blumenfeld-Lieberthal (2011) are perfectly right : (text adapted, but resulting from a \(quasi\) exact quotation) in order to be able to decide if Eq. (1) is (and Eqs. 2 and 3 are) verified or not, one has to fit the data to several functions and compare the results, using the same criterion. Naturally, it is not realistic to expect each [ \(J(r)\) ] would be fitted to numerous formulas; thus, we \(({\simeq }r)\) propose to use a visual inspection in order to help decide which formulas might represent the data correctly. \(\ldots \) we \(({\simeq }I)\) trust the human mind and believe that a visual inspection can indeed give essential information; particularly it helps deciding if the studied system is homogeneous or not \(\ldots \) a simple visual inspection \(\ldots \) shows that the system (\(\ldots \)) is not homogeneous. It can be divided into \(\ldots \) subsystems. This (\(\ldots \)) emphasizes the need for a visual inspection of the rank-size relation of the real data on log-log scales. This gives the possibility to see (in the simple meaning of the word, see with the eye) if the points may be fitted with some mathematical function (not necessarily a straight line).

References

  • Amati, G., & van Rijsbergen, C. J. (2002). Term frequency normalization via Pareto distributions. In F. Crestani, M. Girolami, & C. J. van Rijsbergen (Eds.), Advances in Information Retrieval (pp. 183–192)., LNCS Heidelberg: Springer.

    Chapter  Google Scholar 

  • Ausloos, M. (2013). A scientometrics law about co-authors and their ranking: the co-author. Scientometrics, 95(3), 895–909.

    Article  Google Scholar 

  • Ausloos, M. (2014). Binary scientific star coauthors core size. Scientometrics, 99(2), 331–351.

    Article  Google Scholar 

  • Benguigui, L., & Blumenfeld-Lieberthal, E. (2011). The end of a paradigm is Zipf’s law universal? Journal of Geographical Systems, 13(2), 87–100.

    Article  Google Scholar 

  • Bougrine, H. (2014). Subfield effects on the core of coauthors. Scientometrics, 98(2), 1047–1064.

    Article  Google Scholar 

  • Fairthorne, R. A. (1969). Empirical hyperbolic distributions (Bradford–Zipf–Mandelbrot) for bibliometric description and prediction. Journal of Documentation, 25(4), 319–343.

    Article  Google Scholar 

  • Glaeser, E. L. (2008). Cities, agglomeration and spatial equilibrium. New York: Oxford University Press.

    Google Scholar 

  • Haitun, S. D. (1982). Stationary scientometric distributions part 1. Different approximations. Scientometrics, 4(1), 5–25.

    Article  Google Scholar 

  • Hsu, J. W., & Huang, D. W. (2009). Distribution for the number of co-authors. Physical Review E, 80(5), 057101.

    Article  Google Scholar 

  • Izsák, J. (2006). Some practical aspects of fitting and testing the Zipf–Mandelbrot model. Scientometrics, 67(1), 107–120.

    Article  Google Scholar 

  • Jarque, C. M., & Bera, A. K. (1980). Efficient tests for normality, homoscedasticity and serial independence of regression residuals. Economics Letters, 6(3), 255–259.

    Article  MathSciNet  Google Scholar 

  • Jefferson, M. (1939). The law of primate city. Geographical Review, 29(2), 226–232.

    Article  Google Scholar 

  • Laherrère, J., & Sornette, D. (1998). Stretched exponential distributions in nature and economy fat tails with characteristic scales. European Physics Journal B, 2(4), 525–539.

    Article  Google Scholar 

  • Madden, C. H. (1958). Some temporal aspects of the growth of cities in the United States. Economic Development and Cultural Change, 6(2), 143–170.

    Article  MathSciNet  Google Scholar 

  • Mandelbrot, B. (1960). The Pareto–Levy law and the distribution of income. International Economics Review, 1(2), 79–106.

    Article  MATH  Google Scholar 

  • Manin, D Yu. (2009). Mandelbrot’s model for Zipf’s law can Mandelbrot’s model explain Zipf’s law for language? Journal of Quantitative Linguistics, 16(3), 274–285.

    Article  Google Scholar 

  • Miskiewicz, J. (2013). Effects of publications in proceedings on the measure of the core size of coauthors. Physica A, 392(20), 5119–5131.

    Article  Google Scholar 

  • Pareto, V. (1896). Cours d’economie politique. Geneva: Droz.

    Google Scholar 

  • Popescu, I. I., Altmann, G., & Köhler, R. (2010). Zip’s law—another view. Quality and Quantity, 44(4), 713–731.

    Article  Google Scholar 

  • Rosen, K. T., & Resnick, M. (1980). The size distribution of cities an examination of the Pareto law and primacy. Journal of Urban Economics, 8(2), 165–186.

    Article  Google Scholar 

  • Tsallis, C. (1988). Possible generalization of Boltzmann-Gibbs statistics. Journal of Statistical Physics, 52(1–2), 479–487.

  • Tsallis, C., & Albuquerque, M. P. (2000). Are citations of scientific papers a case of nonextensivity? European Physics Journal B, 13(4), 777–780.

    Article  Google Scholar 

  • Voloshynovska, I. A. (2011). Characteristic features of rank-probability word distribution in scientific and belletristic literature. Journal of Quantitative Linguistics, 18(3), 274–289.

    Article  Google Scholar 

  • West, B. J., & Deering, B. (1995). The lure of modern science fractal thinking. Singapore: World Scient.

    Google Scholar 

  • Zipf, G. K. (1949). Human behavior and the principle of least effort an introduction to human ecology. Cambridge: Addison Wesley.

    Google Scholar 

Download references

Acknowledgments

Thanks to J. Miskiewicz and H. Bougrine for private communications on their respective work, comments prior to manuscript submission and making available the relevant publication list data mentioned in the text. I warmly thank all colleagues who have kindly provided relevant data. This paper is part of scientific activities in COST Action TD1210.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcel Ausloos.

Appendices

Appendix 1: ZMP fits with 3 or 4 free parameters

Using the 3-parameter free ZMP function, Eq. (3), for data fitting is much more troublesome than fitting with the Zipf hyperbolic law (Fairthorne 1969; Haitun 1982; Izsák 2006). Thus, a variant of the ZMP law, i.e. the 4-parameter relation Eq. (3) is sometimes proposed, since it allows for one more scaling parameter. It is often observed that the 4-ZMP has some advantage with respect to the 3-ZMP, from the point of view of the stability of the solutions of the non linear system of equations for the fit parameters. This is interpreted as due to the fact that the numerical values of the other parameters (\(\mu ,\; \eta ,\; \lambda \), and the more so \(c\)) fall into more compact ranges. For examples, compare the amplitudes \(c\) and \(b\) for \(s_2\) and \(s_4\), respectively, in Tables 2 and Table 4 for the 4-ZMP and 3-ZMP fits.

However nothing drastic has been found in the present cases, as seen from Tables 3-5. Moreover, the meaning of \(\nu \), in the 3-ZMP case seems more easily interpretable than the \(\eta \) and \(\lambda \) values in the 4-ZMP.

It should be emphasized that the \(\hbox {R}^2\) values are identical, up to the third decimal, for the 3- and 4-ZMP parameter law fits, see Table 4, except for \(s_6\) and subsequently \(s_{63}\), nevertheless found close to each other, as likely due to a behavior pointing to a strong exponential tail cut-off, in which cases the empirical laws can be hardly expected to hold. Thus, it is observed that \(\mu \equiv \zeta \) in all cases, i.e. the relevant conclusion.

Appendix 2: on merging sub-fields

In order to investigate the effect of reduced size of data in considering sub-fields, Bougrine (2014) merged 2 sub-fields into a single one, both in the case of MRA and HES. For comparison, and completeness, ZMP and power law fits have been made on \(a_4\) and \(a_5\) merged into \(a_{54}\) on one hand, and on \(s_3\) and \(s_6\) merged into \(s_{65}\) on the other hand. The parameters resulting from the fits are given in Table 2. The fits are displayed in Fig. 8. In such cases, with not many data points, the co-author core is low, and the effect of many CAs at rank \(r\ge 4\) or 6 respectively is rather important. Thus, the instability of the fits with respect to initial conditions is due to the presence of a strong exponential cut-off superposed on the power law tail.

These features indicate the sensitivity of the sub-field definition, on one hand, and of the co-author distribution, on the other hand.

Table 5 Summary of fit parameter values to \(a_2\) and \(a_7\) frequency-size cumulative distribution function (CDF) data with notations explained in the text, - corresponding to Figs. 6 and 7 respectively; the parameters correspond to the various formulae discussed in the text, Eqs. (1)–(3); the regression fit coefficient \(\hbox {R}^2\) is given for the different cases
Fig. 11
figure 11

Log–log scale display of the number of joint publications (NJP) on magnetic materials, for MRA, with co-authors, ranked by decreasing importance, and of the corresponding frequency-size cumulative distribution function (CDF); best fits, over the whole data range are shown for the power law and ZMP law

Fig. 12
figure 12

Log–log scale display of the number of joint publications (NJP) on superconductivity, for MRA, with co-authors, ranked by decreasing importance, and of the corresponding frequency-size cumulative distribution function (CDF); best fits, over the whole data range are shown for the power law and ZMP law

Appendix 3: on cumulative distribution functions (CDF)

In Informetrics, one prefers to fit empirical data to some size-frequency functional form using a maximum likelihood fit, rather than making a least squares fit for the rank-frequency distribution. Indeed, one can also ask, as did Pareto (1896), how many times one can find an “event” greater than some size \(y\), i.e. study the size-frequency relationship. Pareto found out that the cumulative distribution function (CDF) of such events follows an inverse power of \(y\), or in other words, \(P\;[Y>y] \sim y^{-\kappa }\). Thus, the (number or) frequency \(f\) of such events of size \(y\), (also) follows an inverse power of \(y\).

Thus, for illustration, ZMP and power law fits have been made on two of MRA major sub-fields, i.e. \(a_2\) and \(a_7\). A log-log scale display of the number of joint publications (NJP) with co-authors ranked by decreasing importance and the corresponding CDF are shown in Figs. 11 and 12. Both the power law and ZMP law fits are shown for the all \(r\) range. Note that the NJP data and fits are those seen in Fig. 7, with numerical values in Table 1.

The “queen effect” is well seen on the NJP data and fits, on Fig. 11, but not so much on the CDF. The “king effect” is well seen on the NJP data and fits, on Fig. 12, but the CDF shows a pronounced cut-off at high \(r\). Therefore it would seem that the CDF is less pertinent to observe minute effects. This is understandable since the CDF results from an integration scheme. However, again understandably, the CDF fits are much more stable.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ausloos, M. Zipf–Mandelbrot–Pareto model for co-authorship popularity. Scientometrics 101, 1565–1586 (2014). https://doi.org/10.1007/s11192-014-1302-y

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-014-1302-y

Keywords

Navigation