Zipf–Mandelbrot–Pareto model for co-authorship popularity

Ausloos, Marcel

doi:10.1007/s11192-014-1302-y

Zipf–Mandelbrot–Pareto model for co-authorship popularity

Published: 06 May 2014

Volume 101, pages 1565–1586, (2014)
Cite this article

Scientometrics Aims and scope Submit manuscript

Marcel Ausloos^1,2

562 Accesses
9 Citations
3 Altmetric
Explore all metrics

Abstract

Each co-author (CA) of any scientist can be given a rank \((r)\) of importance according to the number \((J)\) of joint publications which the authors have together. In this paper, the Zipf–Mandelbrot–Pareto law, i.e. \( J \propto 1/(\nu +r)^{\zeta }\) is shown to reproduce the empirical relationship between \(J\) and \(r\) and shown to be preferable to a mere power law, \( J \propto 1/r^{\alpha } \). The CA core value, i.e. the core number of CAs, is unaffected, of course. The demonstration is made on data for two authors, with a high number of joint publications, recently considered by Bougrine (Scientometrics, 98(2): 1047–1064, 2014) and for seven authors, distinguishing between their “journal” and “proceedings” publications as suggested by Miskiewicz (Physica A, 392(20), 5119–5131, 2013). The rank-size statistics is discussed and the \(\alpha \) and \(\zeta \) exponents are compared. The correlation coefficient is much improved (\(\sim \)0.99, instead of 0.92). There are marked deviations of such a co-authorship popularity law depending on sub-fields. On one hand, this suggests an interpretation of the parameter \(\nu \). On the other hand, it suggests a novel model on the (likely time dependent) structural and publishing properties of research teams. Thus, one can propose a scenario for how a research team is formed and grows. This is based on a hierarchy utility concept, justifying the empirical Zipf–Mandelbrot–Pareto law, assuming a simple form for the CA publication/cost ratio, \(c_r = c_0\, log_2 (\nu +r)\). In conclusion, such a law and model can suggest practical applications on measures of research teams. In Appendices, the frequency-size cumulative distribution function is discussed for two sub-fields, with other technicalities

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modeling the coevolution between citations and coauthorship of scientific papers

Article 27 March 2017

Evolution of interdependent co-authorship and citation networks

Article 24 July 2020

The evolutions of the rich get richer and the fit get richer phenomena in scholarly networks: the case of the strategic management journal

Article 04 May 2018

Notes

Necessarily, \(-1 \le \nu \), since \(r \ge 1.\)
The effect occurs when the data is upsurging at low rank and has been so called when examining city size by (Laherrère and Sornette 1998); it seems to have been emphasized first by Jefferson (1939), also when studying city sizes.
The effect has been so called when examining co-authorship sizes size by Ausloos (2013); it occurs when the data is flattening at low rank.
That would have led to too few papers per field, and it would have been nonsense to do some meaningful fit thereafter.
So does the 4 parameter ZMP (4-ZMP) law, see Appendix 1.
This has been recently examined considering pairs of leading CA through a binary scientific star concept (Ausloos 2014).
For simplicity of the writing, \(r\) is taken as a continuous variable though it is manifestly a positive integer only.
Benguigui and Blumenfeld-Lieberthal (2011) are perfectly right : (text adapted, but resulting from a \(quasi\) exact quotation) in order to be able to decide if Eq. (1) is (and Eqs. 2 and 3 are) verified or not, one has to fit the data to several functions and compare the results, using the same criterion. Naturally, it is not realistic to expect each [ \(J(r)\) ] would be fitted to numerous formulas; thus, we \(({\simeq }r)\) propose to use a visual inspection in order to help decide which formulas might represent the data correctly. \(\ldots \) we \(({\simeq }I)\) trust the human mind and believe that a visual inspection can indeed give essential information; particularly it helps deciding if the studied system is homogeneous or not \(\ldots \) a simple visual inspection \(\ldots \) shows that the system (\(\ldots \)) is not homogeneous. It can be divided into \(\ldots \) subsystems. This (\(\ldots \)) emphasizes the need for a visual inspection of the rank-size relation of the real data on log-log scales. This gives the possibility to see (in the simple meaning of the word, see with the eye) if the points may be fitted with some mathematical function (not necessarily a straight line).

References

Amati, G., & van Rijsbergen, C. J. (2002). Term frequency normalization via Pareto distributions. In F. Crestani, M. Girolami, & C. J. van Rijsbergen (Eds.), Advances in Information Retrieval (pp. 183–192)., LNCS Heidelberg: Springer.
Chapter Google Scholar
Ausloos, M. (2013). A scientometrics law about co-authors and their ranking: the co-author. Scientometrics, 95(3), 895–909.
Article Google Scholar
Ausloos, M. (2014). Binary scientific star coauthors core size. Scientometrics, 99(2), 331–351.
Article Google Scholar
Benguigui, L., & Blumenfeld-Lieberthal, E. (2011). The end of a paradigm is Zipf’s law universal? Journal of Geographical Systems, 13(2), 87–100.
Article Google Scholar
Bougrine, H. (2014). Subfield effects on the core of coauthors. Scientometrics, 98(2), 1047–1064.
Article Google Scholar
Fairthorne, R. A. (1969). Empirical hyperbolic distributions (Bradford–Zipf–Mandelbrot) for bibliometric description and prediction. Journal of Documentation, 25(4), 319–343.
Article Google Scholar
Glaeser, E. L. (2008). Cities, agglomeration and spatial equilibrium. New York: Oxford University Press.
Google Scholar
Haitun, S. D. (1982). Stationary scientometric distributions part 1. Different approximations. Scientometrics, 4(1), 5–25.
Article Google Scholar
Hsu, J. W., & Huang, D. W. (2009). Distribution for the number of co-authors. Physical Review E, 80(5), 057101.
Article Google Scholar
Izsák, J. (2006). Some practical aspects of fitting and testing the Zipf–Mandelbrot model. Scientometrics, 67(1), 107–120.
Article Google Scholar
Jarque, C. M., & Bera, A. K. (1980). Efficient tests for normality, homoscedasticity and serial independence of regression residuals. Economics Letters, 6(3), 255–259.
Article MathSciNet Google Scholar
Jefferson, M. (1939). The law of primate city. Geographical Review, 29(2), 226–232.
Article Google Scholar
Laherrère, J., & Sornette, D. (1998). Stretched exponential distributions in nature and economy fat tails with characteristic scales. European Physics Journal B, 2(4), 525–539.
Article Google Scholar
Madden, C. H. (1958). Some temporal aspects of the growth of cities in the United States. Economic Development and Cultural Change, 6(2), 143–170.
Article MathSciNet Google Scholar
Mandelbrot, B. (1960). The Pareto–Levy law and the distribution of income. International Economics Review, 1(2), 79–106.
Article MATH Google Scholar
Manin, D Yu. (2009). Mandelbrot’s model for Zipf’s law can Mandelbrot’s model explain Zipf’s law for language? Journal of Quantitative Linguistics, 16(3), 274–285.
Article Google Scholar
Miskiewicz, J. (2013). Effects of publications in proceedings on the measure of the core size of coauthors. Physica A, 392(20), 5119–5131.
Article Google Scholar
Pareto, V. (1896). Cours d’economie politique. Geneva: Droz.
Google Scholar
Popescu, I. I., Altmann, G., & Köhler, R. (2010). Zip’s law—another view. Quality and Quantity, 44(4), 713–731.
Article Google Scholar
Rosen, K. T., & Resnick, M. (1980). The size distribution of cities an examination of the Pareto law and primacy. Journal of Urban Economics, 8(2), 165–186.
Article Google Scholar
Tsallis, C. (1988). Possible generalization of Boltzmann-Gibbs statistics. Journal of Statistical Physics, 52(1–2), 479–487.
Tsallis, C., & Albuquerque, M. P. (2000). Are citations of scientific papers a case of nonextensivity? European Physics Journal B, 13(4), 777–780.
Article Google Scholar
Voloshynovska, I. A. (2011). Characteristic features of rank-probability word distribution in scientific and belletristic literature. Journal of Quantitative Linguistics, 18(3), 274–289.
Article Google Scholar
West, B. J., & Deering, B. (1995). The lure of modern science fractal thinking. Singapore: World Scient.
Google Scholar
Zipf, G. K. (1949). Human behavior and the principle of least effort an introduction to human ecology. Cambridge: Addison Wesley.
Google Scholar

Download references

Acknowledgments

Thanks to J. Miskiewicz and H. Bougrine for private communications on their respective work, comments prior to manuscript submission and making available the relevant publication list data mentioned in the text. I warmly thank all colleagues who have kindly provided relevant data. This paper is part of scientific activities in COST Action TD1210.

Author information

Authors and Affiliations

eHumanities group, Royal Netherlands Academy of Arts and Sciences, Joan Muyskenweg 25, 1096 CJ, Amsterdam, Netherlands
Marcel Ausloos
Rés. Beauvallon, rue de la Belle Jardinière, 483/0021, 4031, Liège Angleur, Euroland
Marcel Ausloos

Authors

Marcel Ausloos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcel Ausloos.

Appendices

Appendix 1: ZMP fits with 3 or 4 free parameters

Using the 3-parameter free ZMP function, Eq. (3), for data fitting is much more troublesome than fitting with the Zipf hyperbolic law (Fairthorne 1969; Haitun 1982; Izsák 2006). Thus, a variant of the ZMP law, i.e. the 4-parameter relation Eq. (3) is sometimes proposed, since it allows for one more scaling parameter. It is often observed that the 4-ZMP has some advantage with respect to the 3-ZMP, from the point of view of the stability of the solutions of the non linear system of equations for the fit parameters. This is interpreted as due to the fact that the numerical values of the other parameters (\(\mu ,\; \eta ,\; \lambda \), and the more so \(c\)) fall into more compact ranges. For examples, compare the amplitudes \(c\) and \(b\) for \(s_2\) and \(s_4\), respectively, in Tables 2 and Table 4 for the 4-ZMP and 3-ZMP fits.

However nothing drastic has been found in the present cases, as seen from Tables 3-5. Moreover, the meaning of \(\nu \), in the 3-ZMP case seems more easily interpretable than the \(\eta \) and \(\lambda \) values in the 4-ZMP.

It should be emphasized that the \(\hbox {R}^2\) values are identical, up to the third decimal, for the 3- and 4-ZMP parameter law fits, see Table 4, except for \(s_6\) and subsequently \(s_{63}\), nevertheless found close to each other, as likely due to a behavior pointing to a strong exponential tail cut-off, in which cases the empirical laws can be hardly expected to hold. Thus, it is observed that \(\mu \equiv \zeta \) in all cases, i.e. the relevant conclusion.

Appendix 2: on merging sub-fields

In order to investigate the effect of reduced size of data in considering sub-fields, Bougrine (2014) merged 2 sub-fields into a single one, both in the case of MRA and HES. For comparison, and completeness, ZMP and power law fits have been made on \(a_4\) and \(a_5\) merged into \(a_{54}\) on one hand, and on \(s_3\) and \(s_6\) merged into \(s_{65}\) on the other hand. The parameters resulting from the fits are given in Table 2. The fits are displayed in Fig. 8. In such cases, with not many data points, the co-author core is low, and the effect of many CAs at rank \(r\ge 4\) or 6 respectively is rather important. Thus, the instability of the fits with respect to initial conditions is due to the presence of a strong exponential cut-off superposed on the power law tail.

These features indicate the sensitivity of the sub-field definition, on one hand, and of the co-author distribution, on the other hand.

Table 5 Summary of fit parameter values to \(a_2\) and \(a_7\) frequency-size cumulative distribution function (CDF) data with notations explained in the text, - corresponding to Figs. 6 and 7 respectively; the parameters correspond to the various formulae discussed in the text, Eqs. (1)–(3); the regression fit coefficient \(\hbox {R}^2\) is given for the different cases

Full size table

Appendix 3: on cumulative distribution functions (CDF)

In Informetrics, one prefers to fit empirical data to some size-frequency functional form using a maximum likelihood fit, rather than making a least squares fit for the rank-frequency distribution. Indeed, one can also ask, as did Pareto (1896), how many times one can find an “event” greater than some size \(y\), i.e. study the size-frequency relationship. Pareto found out that the cumulative distribution function (CDF) of such events follows an inverse power of \(y\), or in other words, \(P\;[Y>y] \sim y^{-\kappa }\). Thus, the (number or) frequency \(f\) of such events of size \(y\), (also) follows an inverse power of \(y\).

Thus, for illustration, ZMP and power law fits have been made on two of MRA major sub-fields, i.e. \(a_2\) and \(a_7\). A log-log scale display of the number of joint publications (NJP) with co-authors ranked by decreasing importance and the corresponding CDF are shown in Figs. 11 and 12. Both the power law and ZMP law fits are shown for the all \(r\) range. Note that the NJP data and fits are those seen in Fig. 7, with numerical values in Table 1.

The “queen effect” is well seen on the NJP data and fits, on Fig. 11, but not so much on the CDF. The “king effect” is well seen on the NJP data and fits, on Fig. 12, but the CDF shows a pronounced cut-off at high \(r\). Therefore it would seem that the CDF is less pertinent to observe minute effects. This is understandable since the CDF results from an integration scheme. However, again understandably, the CDF fits are much more stable.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ausloos, M. Zipf–Mandelbrot–Pareto model for co-authorship popularity. Scientometrics 101, 1565–1586 (2014). https://doi.org/10.1007/s11192-014-1302-y

Download citation

Received: 03 November 2013
Published: 06 May 2014
Issue Date: December 2014
DOI: https://doi.org/10.1007/s11192-014-1302-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Zipf–Mandelbrot–Pareto model for co-authorship popularity

Abstract

Access this article

Similar content being viewed by others

Modeling the coevolution between citations and coauthorship of scientific papers

Evolution of interdependent co-authorship and citation networks

The evolutions of the rich get richer and the fit get richer phenomena in scholarly networks: the case of the strategic management journal

Notes

References

Acknowledgments