Skip to main content
Log in

The productivity of top researchers: a semi-nonparametric approach

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Research productivity distributions exhibit heavy tails because it is common for a few researchers to accumulate the majority of the top publications and their corresponding citations. Measurements of this productivity are very sensitive to the field being analyzed and the distribution used. In particular, distributions such as the lognormal distribution seem to systematically underestimate the productivity of the top researchers. In this article, we propose the use of a (log)semi-nonparametric distribution (log-SNP) that nests the lognormal and captures the heavy tail of the productivity distribution through the introduction of new parameters linked to high-order moments. The application uses scientific production data on 140,971 researchers who have produced 253,634 publications in 18 fields of knowledge (O’Boyle and Aguinis in Pers Psychol 65(1):79–119, 2012) and publications in the field of finance of 330 academic institutions (Borokhovich et al. in J Finance 50(5):1691–1717, 1995), and shows that the log-SNP distribution outperforms the lognormal and provides more accurate measures for the high quantiles of the productivity distribution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. Different weight functions w(x) can be used; for details, see Abramowitz and Stegun (1972, pp. 774–775). We will consider P 0(x) = 1.

  2. For more details about the Edgeworth and Gram–Charlier series, see Kendall and Stuart (1977, pp. 167–172).

  3. It must be noted that given a truncating order, the resulting distribution is purely parametric, but the truncating order is flexible to achieve a more accurate approximation to a given distribution. Without loss of generality, we will assume that d 0 = 1.

  4. Log-SNP’s moments can be directly derived as \(E\left[ {z^{t} } \right] = e^{{\mu t + \frac{1}{2}t^{2} \sigma^{2} }} \left[ {1 + \sum\nolimits_{s = 1}^{n} {d_{s} \left( {\sigma t} \right)^{s} } } \right]\) (see Ñíguez et al. 2013).

  5. It should be noted that the different size of journals in the JCR categories represents a shortcoming of the selection procedure. Nevertheless, it is not clear if other arbitrary selection method would yield to better results and, anyhow, this issue does not affect the advantages of the methodology proposed in this paper.

  6. For details about the data treatment, see O’Boyle and Aguinis (2012), p. 86.

  7. We took the JCR of the year 2007 to be consistent with O’Boyle and Aguinis (2012), as that was the year used by the authors to select the five main journals within each field of knowledge.

  8. The code for the implementation of the maximum likelihood estimation algorithm in R package is available upon request.

  9. Note that we did not include the d s parameters for s odd, after having tested that they were not significantly different from zero. This result reinforces the fact that the parameter σ captures all relevant features about the skewness. It must be highlighted that the latter does not contradict the fact that the d s parameters for s even are highly significant, which means that productivity distributions have very thick tails and thus require different parameters to provide accurate measures of the “probability of being a very top researcher” in every field.

  10. The quantiles of the log-SNP distribution are obtained from the cdf displayed in Eq. (15) and the Inverse Transform Method (ITM).

References

  • Abramo, G., & D’Angelo, C. A. (2014). Assessing national strengths and weaknesses in research fields. Journal of Informetrics, 8(3), 766–775.

    Article  Google Scholar 

  • Abramo, G., D’Angelo, A. C., & Pugini, F. (2008). The measurement of Italian universities’ research productivity by a non parametric-bibliometric methodology. Scientometrics, 76(2), 225–244.

    Article  Google Scholar 

  • Abramowitz, M., & Stegun, I. A. (1972). Handbook of mathematical functions with formulas, graphs, and mathematical tables. New York: Dover Publications.

    MATH  Google Scholar 

  • Aguinis, H., O’Boyle, E., Gonzalez-Mulé, E., & Joo, H. (2015). Cumulative advantage: Conductors and insulators of heavy-tailed productivity distributions and productivity tars. Personnel Psychology,. doi:10.1111/peps.12095.

    Google Scholar 

  • Albarrán, P., Juan, A. C., Ortuño, I., & Ruiz-Castillo, J. (2011). The skewness of science in 219 sub-fields and a number of aggregates. Scientometrics, 88(2), 385–397.

    Article  Google Scholar 

  • Bertocchi, G., Gambardella, A., Jappelli, T., Nappi, C. A., & Peracchi, F. (2015). Bibliometric evaluation vs. informed peer review: Evidence from Italy. Research Policy, 44(2), 451–466.

    Article  Google Scholar 

  • Birkmaier, D., & Wohlrabe, K. (2014). The Matthew effect in economics reconsidered. Journal of Informetrics, 8(4), 880–889.

    Article  Google Scholar 

  • Blinnikov, S., & Moessner, R. (1998). Expansions for nearly Gaussian distributions. Astronomy and Astrophysics, Supplement Series, 130(1), 193–205.

    Article  Google Scholar 

  • Bornmann, L. (2011). Scientific peer review. Annual Review of Information Science and Technology, 45(1), 199–245.

    Article  Google Scholar 

  • Borokhovich, K. A., Bricker, R. J., Brunarski, K. R., & Simkins, B. J. (1995). Finance research productivity and influence. The Journal of Finance, 50(5), 1691–1717.

    Article  Google Scholar 

  • Broadus, R. N. (1987). Toward a definition of ‘bibliometrics’. Scientometrics, 12(5–6), 373–379.

    Article  Google Scholar 

  • Campanario, J. M. (2015). Providing impact: The distribution of JCR journals according to references they contribute to the 2-year and 5-year journal impact factors. Journal of Informetrics, 9(2), 398–407.

    Article  Google Scholar 

  • Chen, X. (2007). Large sample sieve estimation of semi-nonparametric models. In J. Heckman & E. Leamer (Eds.), Handbook of econometrics, Ch. 76, Part B (Vol. 6, pp. 5549–5632). Amsterdam: Elsevier.

    Google Scholar 

  • Chung, K. H., & Cox, R. A. (1990). Patterns of productivity in the finance literature: A study of the bibliometric distributions. The Journal of Finance, 45(1), 301–309.

    Article  Google Scholar 

  • Coupé, T. (2003). Revealed performances. Worldwide rankings of economists and economics departments. Journal of the European Economic Association, 1(6), 1309–1345.

    Article  Google Scholar 

  • Cramér, H. (1925). On some classes of series used in mathematical statistics. In Sixth scandinavian congress of mathematicians (pp. 399–425). Copenhagen.

  • Crespo, J. A., Ortuño-Ortín, I., & Ruiz-Castillo, J. (2012). The citation merit of scientific publications. PLoS ONE, 7(11), e49156.

    Article  Google Scholar 

  • Da Silva, R., Kalil, F., De Oliveira, J. M., & Martinez, A. S. (2012). Universality in bibliometrics. Physica A: Statistical Mechanics and its Applications, 391(5), 2119–2128.

    Article  Google Scholar 

  • Day, T. E. (2015). The big consequences of small biases: A simulation of peer review. Research Policy, 44(6), 1266–1270.

    Article  Google Scholar 

  • Del Brio, E. B., & Perote, J. (2012). Gram–Charlier densities: Maximum likelihood versus the method of moments. Insurance: Mathematics and Economics, 51(3), 531–537.

    MathSciNet  MATH  Google Scholar 

  • Duch, J., Zeng, X. T., Sales-Pardo, M., Radicchi, F., Otis, S., Woodruff, T. K., et al. (2012). The possible role of resource requirements and academic career-choice risk on gender differences in publication rate and impact. PLoS ONE, 7(12), e51332.

    Article  Google Scholar 

  • Dundar, H., & Lewis, D. (1998). Determinants of research productivity in higher education. Research in Higher Education, 39(6), 607–631.

    Article  Google Scholar 

  • Egghe, L. (2005). Power laws in the information production process: Lotkaian informetrics. Kidlington: Elsevier Academic Press.

    Google Scholar 

  • Ellison, G. (2013). How does the market use citation data? the hirsch index in economics. American Economic Journal: Applied Economics, 5(3), 63–90.

    Google Scholar 

  • Eom, Y. H., & Fortunato, S. (2011). Characterizing and modeling citation dynamics. PLoS ONE, 6(9), e24926.

    Article  Google Scholar 

  • Finardi, U. (2013). Correlation between journal impact factor and citation performance: An experimental study. Journal of Informetrics, 7(2), 357–370.

    Article  Google Scholar 

  • Frandsen, T. F. (2005). Geographical concentration. The case of economics journals. Scientometrics, 63(1), 69–85.

    Article  Google Scholar 

  • Gallant, A. R., & Nychka, D. W. (1987). Seminonparametric maximum likelihood estimation. Econometrica, 55(2), 363–390.

    Article  MathSciNet  MATH  Google Scholar 

  • Garfield, E. (1980). Bradford’s Law and related statistical pattern. Essays of an Information Scientist, 4(19), 476–483.

    Google Scholar 

  • Genest, C. (1997). Statistics on statistics: Measuring research productivity by journal publications between 1985 and 1995. The Canadian Journal of Statistics, 25(4), 427–443.

    Article  MATH  Google Scholar 

  • Guerrero-Bote, V. P., Zapico-Alonso, F., Espinosa-Calvo, M. E., Gomez-Crisostomo, R., & Moya-Anegon, F. (2007). Import–export of knowledge between scientific subject categories: The iceberg hypothesis. Scientometrics, 71(3), 423–441.

    Article  Google Scholar 

  • Harzing, A. (2008). Publish or Perish: A citation analysis software program. http://www.harzing.com/resources.htm.

  • Harzing, A. W. (2014). A longitudinal study of Google Scholar coverage between 2012 and 2013. Scientometrics, 98(1), 565–575.

    Article  Google Scholar 

  • Harzing, A. W., & Alakangas, S. (2016). Google Scholar, Scopus and the Web of Science: A longitudinal and cross-disciplinary comparison. Scientometrics, 106(2), 787–804.

    Article  Google Scholar 

  • Harzing, A. W., & Van der Wal, R. (2008). Google Scholar as a new source for citation analysis? Ethics in Science and Environmental Politics, 8(1), 61–73.

    Article  Google Scholar 

  • Heberger, A. E., Christie, C. A., & Alkin, M. C. (2010). A bibliometric analysis of the academic influences of and on evaluation theorists’ published works. American Journal of Evaluation, 31(1), 24–44.

    Article  Google Scholar 

  • Hodgson, G. M., & Rothman, H. (1999). The editors and authors of economics journals: A case of institutional oligopoly? The Economic Journal, 109(453), 165–186.

    Article  Google Scholar 

  • Kaur, J., Ferrara, E., Menczer, F., Flammini, A., & Radicchi, F. (2015). Quality versus quantity in scientific impact. Journal of Informetrics, 9(4), 800–808.

    Article  Google Scholar 

  • Kaur, J., Radicchi, F., & Menczer, F. (2013). Universality of scholarly impact metrics. Journal of Informetrics, 7(4), 924–932.

    Article  Google Scholar 

  • Kendall, M., & Stuart, A. (1977). The advanced theory of statistics, vol. I (4th ed.). London: C. Griffin.

    MATH  Google Scholar 

  • Kocher, M. G., Luptacik, M., & Sutter, M. (2006). Measuring productivity of research in economics: A cross-country study using DEA. Socio-Economic Planning Sciences, 40(4), 314–332.

    Article  Google Scholar 

  • Kretschmer, H., & Kretschmer, T. (2007). Lotka’s distribution and distribution of co-author pairs’ frequencies. Journal of Informetrics, 1(4), 308–337.

    Article  Google Scholar 

  • Kumar, S., Sharma, P., & Garg, K. C. (1998). Lotka’s law and institutional productivity. Information Processing and Management, 34(6), 775–783.

    Article  Google Scholar 

  • Lancho-Barrantes, B. S., Guerrero-Bote, V. P., & Moya-Anegón, F. (2010). The iceberg hypothesis revisited. Scientometrics, 85(2), 443–461.

    Article  Google Scholar 

  • Lotka, A. J. (1926). The frequency distribution of scientific productivity. Journal of the Washington Academy of Science, 16(12), 317–323.

    Google Scholar 

  • Martínez-Mekler, G., Martínez, R. A., del Río, M. B., Mansilla, R., Miramontes, P., & Cocho, G. (2009). Universality of rank-ordering distributions in the arts and sciences. PLoS ONE, 4(3), e4791.

    Article  Google Scholar 

  • Mauleón, I., & Perote, J. (2000). Testing densities with financial data: an empirical comparison of the Edgeworth–Sargan density to the Student’s t. European Journal of Finance, 6(2), 225–239.

    Article  Google Scholar 

  • Mingers, J., & Leydesdorff, L. (2015). A review of theory and practice in scientometrics. European Journal of Operational Research, 246(1), 1–19.

    Article  MATH  Google Scholar 

  • Momeni, F., & Mayr, P. (2016). Evaluating co-authorship networks in author name disambiguation for common names. arXiv:1606.03857.

  • Newman, M. J. (2005). Power laws, Pareto distributions and Zipf’s law. Contemporary Physics, 46(5), 323–351.

    Article  Google Scholar 

  • Nicholls, P. T. (1986). Empirical validation of Lotka’s law. Information Processing and Management, 22(5), 417–419.

    Article  Google Scholar 

  • Nicholls, P. T. (1989). Bibliometric modelling processes and the empirical validity of Lotka’s law. Journal of the American Society for Information Science, 40(6), 379–385.

    Article  Google Scholar 

  • Nicolaisen, J., & Hjørland, B. (2007). Practical potentials of Bradford’s law: A critical examination of the received view. Journal of Documentation, 63(3), 359–377.

    Article  Google Scholar 

  • Ñíguez, T.-M., Paya, I., Peel, D., & Perote, J. (2012). On the stability of the constant relative risk aversion (CRRA) utility under high degrees of uncertainty. Economics Letters, 115(2), 244–248.

    Article  MathSciNet  MATH  Google Scholar 

  • Ñíguez, T.-M., Paya, I., Peel, D., & Perote, J. (2013). Higher-order moments in the theory of diversification and portfolio composition. Economics Working Paper Series 2013/003. Lancaster University.

  • O’Boyle, E., & Aguinis, H. (2012). The best and the rest: Revisiting the norm of normality of individual performance. Personnel Psychology, 65(1), 79–119.

    Article  Google Scholar 

  • Perc, M. (2010). Zipf’s law and log-normal distributions in measures of scientific output across fields and institutions: 40 years of Slovenia’s research as an example. Journal of Informetrics, 4(2), 358–364.

    Article  Google Scholar 

  • Phillips, P. B. (1977). A general theorem in the theory of asymptotic expansions as approximations to the finite sample distributions of econometric estimators. Econometrica, 45(6), 1517–1534.

    Article  MathSciNet  MATH  Google Scholar 

  • Price, D. S. (1976). A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science, 27(5), 292–306.

    Article  Google Scholar 

  • Radicchi, F., Fortunado, S., & Castellano, C. (2008). Universality of citation distribution: Towards an objective measure of scientific impact. Proceedings of the National Academy of Sciences of the United States of America, 105(45), 17268–17272.

    Article  Google Scholar 

  • Redner, S. (1998). How popular is your paper? An empirical study of the citation distribution. The European Physical Journal B-Condensed Matter and Complex Systems, 4(2), 131–134.

    Article  Google Scholar 

  • Rousseau, R. (1994). Bradford curves. Information Processing and Management, 30(2), 267–277.

    Article  Google Scholar 

  • Ruiz-Castillo, J., & Costas, R. (2014). The skewness of scientific productivity. Journal of Informetrics, 8(4), 917–934.

    Article  Google Scholar 

  • Sabharwal, M. (2013). Comparing research productivity across disciplines and career stages. Journal of Comparative Policy Analysis: Research and Practice, 15(2), 141–163.

    Article  Google Scholar 

  • Sargan, D. (1975). Gram-Charlier approximation applied t ratios or k-class estimatiors. Econometrica, 43(2), 327–346.

    Article  MathSciNet  MATH  Google Scholar 

  • Seggie, S. H., & Griffith, D. A. (2009). What does it take to get promoted in marketing academia? Understanding exceptional publication productivity in the leading marketing journals. Journal of Marketing, 73(1), 122–132.

    Article  Google Scholar 

  • Van den Besselaar, P., & Sandström, U. (2016). What is the required level of data cleaning? A research evaluation case. Journal of Scientometric, 5(1), 07–12.

    Google Scholar 

  • Wallace, D. L. (1958). Asymptotic approximations to distributions. Annals of Mathematical Statistics, 29(3), 635–654.

    Article  MathSciNet  MATH  Google Scholar 

  • Williamson, I. O., & Cable, D. M. (2003). Predicting early career research productivity: The case of management faculty. Journal of Organizational Behavior, 24(1), 25–44.

    Article  Google Scholar 

  • Yang, K., & Meho, L. I. (2006). Citation analysis: A comparison of Google Scholar, Scopus, and Web of Science. Proceedings of the American Society for Information Science and Technology, 43(1), 1–15.

    Article  Google Scholar 

Download references

Acknowledgments

We thank Herman Aguinis and Ernest O’Boyle for allowing us to use their database on academic productivity compiled in O’Boyle and Aguinis (2012). We also thank two anonymous referees for their constructive and valuable suggestions. Financial support from the Spanish Ministry of Economics and Competitiveness, through the project ECO2013-44483-P, FAPA-Uniandes, through the project PR.3.2016.2807, and Universidad EAFIT are also gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lina M. Cortés.

Appendices

Appendix 1

This appendix lists the first eight d s parameters in terms of the central moments of the SNP distribution. For more information, see Del Brio and Perote (2012).

$$d_{1} = \mu_{1}$$
(18)
$$d_{2} = \frac{1}{2}\left( {\mu_{2} - 1} \right)$$
(19)
$$d_{3} = \frac{1}{6}\left( {\mu_{3} - 3\mu_{1} } \right)$$
(20)
$$d_{4} = \frac{1}{24}\left( {\mu_{4} - 6\mu_{2} + 3} \right)$$
(21)
$$d_{5} = \frac{1}{120}\left( {\mu_{5} - 10\mu_{3} + 15\mu_{1} } \right)$$
(22)
$$d_{6} = \frac{1}{720}\left( {\mu_{6} - 15\mu_{4} + 45\mu_{2} - 15} \right)$$
(23)
$$d_{7} = \frac{1}{5040}\left( {\mu_{7} - 21\mu_{5} + 105\mu_{3} - 105\mu_{1} } \right)$$
(24)
$$d_{8} = \frac{1}{40320}\left( {\mu_{8} - 28\mu_{6} + 210\mu_{4} - 420\mu_{2} + 105} \right)$$
(25)

Appendix 2

This appendix derives the cdf of the SNP distribution.

$$\begin{aligned} G_{x} \left( a \right) = & \int\limits_{ - \infty }^{a} {g\left( {x;\varvec{d}} \right)dx = \int\limits_{ - \infty }^{a} {\phi \left( x \right)dx} + \sum\limits_{s = 1}^{n} {d_{s} \int\limits_{ - \infty }^{a} {H_{s} \left( x \right)\phi \left( x \right)dx} } } \\ = & \int\limits_{ - \infty }^{a} {\phi \left( x \right)dx - \left. {\sum\limits_{s = 1}^{n} {d_{s} H_{s - 1} \left( x \right)\phi \left( x \right)} } \right|}_{ - \infty }^{a} \\ = & \int\limits_{ - \infty }^{a} {\phi \left( x \right)dx - \phi \left( a \right)\sum\limits_{s = 1}^{n} {d_{s} H_{s - 1} \left( a \right)} } \\ \end{aligned}$$

Given that \(\mathop {\lim }\limits_{x \to \pm \infty } H_{s} \left( x \right)\phi \left( x \right) = 0 \quad \forall s \ge 1,\) it follows that

$$\begin{aligned} \int {H_{s} \left( x \right)\phi \left( x \right)dx} = & \int {\left( { - 1} \right)^{s} \frac{{d^{s} \phi \left( x \right)}}{{dx^{s} }}dx_{t} = \left( { - 1} \right)^{s} \frac{{d^{s - 1} \phi \left( x \right)}}{{dx^{s - 1} }}} \\ = &\, \left( { - 1} \right)^{s} \left( { - 1} \right)^{s - 1} H_{s - 1} \left( x \right)\phi \left( x \right) = - H_{s - 1} \left( x \right)\phi \left( x \right) \\ \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cortés, L.M., Mora-Valencia, A. & Perote, J. The productivity of top researchers: a semi-nonparametric approach. Scientometrics 109, 891–915 (2016). https://doi.org/10.1007/s11192-016-2072-5

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-016-2072-5

Keywords

Navigation