The productivity of top researchers: a semi-nonparametric approach

Cortés, Lina M.; Mora-Valencia, Andrés; Perote, Javier

doi:10.1007/s11192-016-2072-5

The productivity of top researchers: a semi-nonparametric approach

Published: 23 July 2016

Volume 109, pages 891–915, (2016)
Cite this article

Scientometrics Aims and scope Submit manuscript

Lina M. Cortés¹,
Andrés Mora-Valencia² &
Javier Perote³

751 Accesses
17 Citations
2 Altmetric
Explore all metrics

Abstract

Research productivity distributions exhibit heavy tails because it is common for a few researchers to accumulate the majority of the top publications and their corresponding citations. Measurements of this productivity are very sensitive to the field being analyzed and the distribution used. In particular, distributions such as the lognormal distribution seem to systematically underestimate the productivity of the top researchers. In this article, we propose the use of a (log)semi-nonparametric distribution (log-SNP) that nests the lognormal and captures the heavy tail of the productivity distribution through the introduction of new parameters linked to high-order moments. The application uses scientific production data on 140,971 researchers who have produced 253,634 publications in 18 fields of knowledge (O’Boyle and Aguinis in Pers Psychol 65(1):79–119, 2012) and publications in the field of finance of 330 academic institutions (Borokhovich et al. in J Finance 50(5):1691–1717, 1995), and shows that the log-SNP distribution outperforms the lognormal and provides more accurate measures for the high quantiles of the productivity distribution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Annus Mirabilis paper: years of peak productivity in scientific careers

Article 08 June 2020

The law of limited excellence: publication productivity of Israel Prize laureates in the life and exact sciences

Article 17 July 2017

Skewed distributions of scientists’ productivity: a research program for the empirical analysis

Article Open access 01 March 2024

Notes

Different weight functions w(x) can be used; for details, see Abramowitz and Stegun (1972, pp. 774–775). We will consider P ₀(x) = 1.
For more details about the Edgeworth and Gram–Charlier series, see Kendall and Stuart (1977, pp. 167–172).
It must be noted that given a truncating order, the resulting distribution is purely parametric, but the truncating order is flexible to achieve a more accurate approximation to a given distribution. Without loss of generality, we will assume that d ₀ = 1.
Log-SNP’s moments can be directly derived as $E\left[ {z^{t} } \right] = e^{{\mu t + \frac{1}{2}t^{2} \sigma^{2} }} \left[ {1 + \sum\nolimits_{s = 1}^{n} {d_{s} \left( {\sigma t} \right)^{s} } } \right]$ (see Ñíguez et al. 2013).
It should be noted that the different size of journals in the JCR categories represents a shortcoming of the selection procedure. Nevertheless, it is not clear if other arbitrary selection method would yield to better results and, anyhow, this issue does not affect the advantages of the methodology proposed in this paper.
For details about the data treatment, see O’Boyle and Aguinis (2012), p. 86.
We took the JCR of the year 2007 to be consistent with O’Boyle and Aguinis (2012), as that was the year used by the authors to select the five main journals within each field of knowledge.
The code for the implementation of the maximum likelihood estimation algorithm in R package is available upon request.
Note that we did not include the d _s parameters for s odd, after having tested that they were not significantly different from zero. This result reinforces the fact that the parameter σ captures all relevant features about the skewness. It must be highlighted that the latter does not contradict the fact that the d _s parameters for s even are highly significant, which means that productivity distributions have very thick tails and thus require different parameters to provide accurate measures of the “probability of being a very top researcher” in every field.
The quantiles of the log-SNP distribution are obtained from the cdf displayed in Eq. (15) and the Inverse Transform Method (ITM).

References

Abramo, G., & D’Angelo, C. A. (2014). Assessing national strengths and weaknesses in research fields. Journal of Informetrics, 8(3), 766–775.
Article Google Scholar
Abramo, G., D’Angelo, A. C., & Pugini, F. (2008). The measurement of Italian universities’ research productivity by a non parametric-bibliometric methodology. Scientometrics, 76(2), 225–244.
Article Google Scholar
Abramowitz, M., & Stegun, I. A. (1972). Handbook of mathematical functions with formulas, graphs, and mathematical tables. New York: Dover Publications.
MATH Google Scholar
Aguinis, H., O’Boyle, E., Gonzalez-Mulé, E., & Joo, H. (2015). Cumulative advantage: Conductors and insulators of heavy-tailed productivity distributions and productivity tars. Personnel Psychology,. doi:10.1111/peps.12095.
Google Scholar
Albarrán, P., Juan, A. C., Ortuño, I., & Ruiz-Castillo, J. (2011). The skewness of science in 219 sub-fields and a number of aggregates. Scientometrics, 88(2), 385–397.
Article Google Scholar
Bertocchi, G., Gambardella, A., Jappelli, T., Nappi, C. A., & Peracchi, F. (2015). Bibliometric evaluation vs. informed peer review: Evidence from Italy. Research Policy, 44(2), 451–466.
Article Google Scholar
Birkmaier, D., & Wohlrabe, K. (2014). The Matthew effect in economics reconsidered. Journal of Informetrics, 8(4), 880–889.
Article Google Scholar
Blinnikov, S., & Moessner, R. (1998). Expansions for nearly Gaussian distributions. Astronomy and Astrophysics, Supplement Series, 130(1), 193–205.
Article Google Scholar
Bornmann, L. (2011). Scientific peer review. Annual Review of Information Science and Technology, 45(1), 199–245.
Article Google Scholar
Borokhovich, K. A., Bricker, R. J., Brunarski, K. R., & Simkins, B. J. (1995). Finance research productivity and influence. The Journal of Finance, 50(5), 1691–1717.
Article Google Scholar
Broadus, R. N. (1987). Toward a definition of ‘bibliometrics’. Scientometrics, 12(5–6), 373–379.
Article Google Scholar
Campanario, J. M. (2015). Providing impact: The distribution of JCR journals according to references they contribute to the 2-year and 5-year journal impact factors. Journal of Informetrics, 9(2), 398–407.
Article Google Scholar
Chen, X. (2007). Large sample sieve estimation of semi-nonparametric models. In J. Heckman & E. Leamer (Eds.), Handbook of econometrics, Ch. 76, Part B (Vol. 6, pp. 5549–5632). Amsterdam: Elsevier.
Google Scholar
Chung, K. H., & Cox, R. A. (1990). Patterns of productivity in the finance literature: A study of the bibliometric distributions. The Journal of Finance, 45(1), 301–309.
Article Google Scholar
Coupé, T. (2003). Revealed performances. Worldwide rankings of economists and economics departments. Journal of the European Economic Association, 1(6), 1309–1345.
Article Google Scholar
Cramér, H. (1925). On some classes of series used in mathematical statistics. In Sixth scandinavian congress of mathematicians (pp. 399–425). Copenhagen.
Crespo, J. A., Ortuño-Ortín, I., & Ruiz-Castillo, J. (2012). The citation merit of scientific publications. PLoS ONE, 7(11), e49156.
Article Google Scholar
Da Silva, R., Kalil, F., De Oliveira, J. M., & Martinez, A. S. (2012). Universality in bibliometrics. Physica A: Statistical Mechanics and its Applications, 391(5), 2119–2128.
Article Google Scholar
Day, T. E. (2015). The big consequences of small biases: A simulation of peer review. Research Policy, 44(6), 1266–1270.
Article Google Scholar
Del Brio, E. B., & Perote, J. (2012). Gram–Charlier densities: Maximum likelihood versus the method of moments. Insurance: Mathematics and Economics, 51(3), 531–537.
MathSciNet MATH Google Scholar
Duch, J., Zeng, X. T., Sales-Pardo, M., Radicchi, F., Otis, S., Woodruff, T. K., et al. (2012). The possible role of resource requirements and academic career-choice risk on gender differences in publication rate and impact. PLoS ONE, 7(12), e51332.
Article Google Scholar
Dundar, H., & Lewis, D. (1998). Determinants of research productivity in higher education. Research in Higher Education, 39(6), 607–631.
Article Google Scholar
Egghe, L. (2005). Power laws in the information production process: Lotkaian informetrics. Kidlington: Elsevier Academic Press.
Google Scholar
Ellison, G. (2013). How does the market use citation data? the hirsch index in economics. American Economic Journal: Applied Economics, 5(3), 63–90.
Google Scholar
Eom, Y. H., & Fortunato, S. (2011). Characterizing and modeling citation dynamics. PLoS ONE, 6(9), e24926.
Article Google Scholar
Finardi, U. (2013). Correlation between journal impact factor and citation performance: An experimental study. Journal of Informetrics, 7(2), 357–370.
Article Google Scholar
Frandsen, T. F. (2005). Geographical concentration. The case of economics journals. Scientometrics, 63(1), 69–85.
Article Google Scholar
Gallant, A. R., & Nychka, D. W. (1987). Seminonparametric maximum likelihood estimation. Econometrica, 55(2), 363–390.
Article MathSciNet MATH Google Scholar
Garfield, E. (1980). Bradford’s Law and related statistical pattern. Essays of an Information Scientist, 4(19), 476–483.
Google Scholar
Genest, C. (1997). Statistics on statistics: Measuring research productivity by journal publications between 1985 and 1995. The Canadian Journal of Statistics, 25(4), 427–443.
Article MATH Google Scholar
Guerrero-Bote, V. P., Zapico-Alonso, F., Espinosa-Calvo, M. E., Gomez-Crisostomo, R., & Moya-Anegon, F. (2007). Import–export of knowledge between scientific subject categories: The iceberg hypothesis. Scientometrics, 71(3), 423–441.
Article Google Scholar
Harzing, A. (2008). Publish or Perish: A citation analysis software program. http://www.harzing.com/resources.htm.
Harzing, A. W. (2014). A longitudinal study of Google Scholar coverage between 2012 and 2013. Scientometrics, 98(1), 565–575.
Article Google Scholar
Harzing, A. W., & Alakangas, S. (2016). Google Scholar, Scopus and the Web of Science: A longitudinal and cross-disciplinary comparison. Scientometrics, 106(2), 787–804.
Article Google Scholar
Harzing, A. W., & Van der Wal, R. (2008). Google Scholar as a new source for citation analysis? Ethics in Science and Environmental Politics, 8(1), 61–73.
Article Google Scholar
Heberger, A. E., Christie, C. A., & Alkin, M. C. (2010). A bibliometric analysis of the academic influences of and on evaluation theorists’ published works. American Journal of Evaluation, 31(1), 24–44.
Article Google Scholar
Hodgson, G. M., & Rothman, H. (1999). The editors and authors of economics journals: A case of institutional oligopoly? The Economic Journal, 109(453), 165–186.
Article Google Scholar
Kaur, J., Ferrara, E., Menczer, F., Flammini, A., & Radicchi, F. (2015). Quality versus quantity in scientific impact. Journal of Informetrics, 9(4), 800–808.
Article Google Scholar
Kaur, J., Radicchi, F., & Menczer, F. (2013). Universality of scholarly impact metrics. Journal of Informetrics, 7(4), 924–932.
Article Google Scholar
Kendall, M., & Stuart, A. (1977). The advanced theory of statistics, vol. I (4th ed.). London: C. Griffin.
MATH Google Scholar
Kocher, M. G., Luptacik, M., & Sutter, M. (2006). Measuring productivity of research in economics: A cross-country study using DEA. Socio-Economic Planning Sciences, 40(4), 314–332.
Article Google Scholar
Kretschmer, H., & Kretschmer, T. (2007). Lotka’s distribution and distribution of co-author pairs’ frequencies. Journal of Informetrics, 1(4), 308–337.
Article Google Scholar
Kumar, S., Sharma, P., & Garg, K. C. (1998). Lotka’s law and institutional productivity. Information Processing and Management, 34(6), 775–783.
Article Google Scholar
Lancho-Barrantes, B. S., Guerrero-Bote, V. P., & Moya-Anegón, F. (2010). The iceberg hypothesis revisited. Scientometrics, 85(2), 443–461.
Article Google Scholar
Lotka, A. J. (1926). The frequency distribution of scientific productivity. Journal of the Washington Academy of Science, 16(12), 317–323.
Google Scholar
Martínez-Mekler, G., Martínez, R. A., del Río, M. B., Mansilla, R., Miramontes, P., & Cocho, G. (2009). Universality of rank-ordering distributions in the arts and sciences. PLoS ONE, 4(3), e4791.
Article Google Scholar
Mauleón, I., & Perote, J. (2000). Testing densities with financial data: an empirical comparison of the Edgeworth–Sargan density to the Student’s t. European Journal of Finance, 6(2), 225–239.
Article Google Scholar
Mingers, J., & Leydesdorff, L. (2015). A review of theory and practice in scientometrics. European Journal of Operational Research, 246(1), 1–19.
Article MATH Google Scholar
Momeni, F., & Mayr, P. (2016). Evaluating co-authorship networks in author name disambiguation for common names. arXiv:1606.03857.
Newman, M. J. (2005). Power laws, Pareto distributions and Zipf’s law. Contemporary Physics, 46(5), 323–351.
Article Google Scholar
Nicholls, P. T. (1986). Empirical validation of Lotka’s law. Information Processing and Management, 22(5), 417–419.
Article Google Scholar
Nicholls, P. T. (1989). Bibliometric modelling processes and the empirical validity of Lotka’s law. Journal of the American Society for Information Science, 40(6), 379–385.
Article Google Scholar
Nicolaisen, J., & Hjørland, B. (2007). Practical potentials of Bradford’s law: A critical examination of the received view. Journal of Documentation, 63(3), 359–377.
Article Google Scholar
Ñíguez, T.-M., Paya, I., Peel, D., & Perote, J. (2012). On the stability of the constant relative risk aversion (CRRA) utility under high degrees of uncertainty. Economics Letters, 115(2), 244–248.
Article MathSciNet MATH Google Scholar
Ñíguez, T.-M., Paya, I., Peel, D., & Perote, J. (2013). Higher-order moments in the theory of diversification and portfolio composition. Economics Working Paper Series 2013/003. Lancaster University.
O’Boyle, E., & Aguinis, H. (2012). The best and the rest: Revisiting the norm of normality of individual performance. Personnel Psychology, 65(1), 79–119.
Article Google Scholar
Perc, M. (2010). Zipf’s law and log-normal distributions in measures of scientific output across fields and institutions: 40 years of Slovenia’s research as an example. Journal of Informetrics, 4(2), 358–364.
Article Google Scholar
Phillips, P. B. (1977). A general theorem in the theory of asymptotic expansions as approximations to the finite sample distributions of econometric estimators. Econometrica, 45(6), 1517–1534.
Article MathSciNet MATH Google Scholar
Price, D. S. (1976). A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science, 27(5), 292–306.
Article Google Scholar
Radicchi, F., Fortunado, S., & Castellano, C. (2008). Universality of citation distribution: Towards an objective measure of scientific impact. Proceedings of the National Academy of Sciences of the United States of America, 105(45), 17268–17272.
Article Google Scholar
Redner, S. (1998). How popular is your paper? An empirical study of the citation distribution. The European Physical Journal B-Condensed Matter and Complex Systems, 4(2), 131–134.
Article Google Scholar
Rousseau, R. (1994). Bradford curves. Information Processing and Management, 30(2), 267–277.
Article Google Scholar
Ruiz-Castillo, J., & Costas, R. (2014). The skewness of scientific productivity. Journal of Informetrics, 8(4), 917–934.
Article Google Scholar
Sabharwal, M. (2013). Comparing research productivity across disciplines and career stages. Journal of Comparative Policy Analysis: Research and Practice, 15(2), 141–163.
Article Google Scholar
Sargan, D. (1975). Gram-Charlier approximation applied t ratios or k-class estimatiors. Econometrica, 43(2), 327–346.
Article MathSciNet MATH Google Scholar
Seggie, S. H., & Griffith, D. A. (2009). What does it take to get promoted in marketing academia? Understanding exceptional publication productivity in the leading marketing journals. Journal of Marketing, 73(1), 122–132.
Article Google Scholar
Van den Besselaar, P., & Sandström, U. (2016). What is the required level of data cleaning? A research evaluation case. Journal of Scientometric, 5(1), 07–12.
Google Scholar
Wallace, D. L. (1958). Asymptotic approximations to distributions. Annals of Mathematical Statistics, 29(3), 635–654.
Article MathSciNet MATH Google Scholar
Williamson, I. O., & Cable, D. M. (2003). Predicting early career research productivity: The case of management faculty. Journal of Organizational Behavior, 24(1), 25–44.
Article Google Scholar
Yang, K., & Meho, L. I. (2006). Citation analysis: A comparison of Google Scholar, Scopus, and Web of Science. Proceedings of the American Society for Information Science and Technology, 43(1), 1–15.
Article Google Scholar

Download references

Acknowledgments

We thank Herman Aguinis and Ernest O’Boyle for allowing us to use their database on academic productivity compiled in O’Boyle and Aguinis (2012). We also thank two anonymous referees for their constructive and valuable suggestions. Financial support from the Spanish Ministry of Economics and Competitiveness, through the project ECO2013-44483-P, FAPA-Uniandes, through the project PR.3.2016.2807, and Universidad EAFIT are also gratefully acknowledged.

Author information

Authors and Affiliations

Department of Finance, School of Economics and Finance, Universidad EAFIT, Carrera 49 No 7 Sur-50, Medellin, Colombia
Lina M. Cortés
School of Management, Universidad de los Andes, Calle 21 No. 1-20, Bogota, Colombia
Andrés Mora-Valencia
Department of Economics and IME, University of Salamanca, Campus Miguel de Unamuno, 37007, Salamanca, Spain
Javier Perote

Authors

Lina M. Cortés
View author publications
You can also search for this author in PubMed Google Scholar
Andrés Mora-Valencia
View author publications
You can also search for this author in PubMed Google Scholar
Javier Perote
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lina M. Cortés.

Appendices

Appendix 1

This appendix lists the first eight d _s parameters in terms of the central moments of the SNP distribution. For more information, see Del Brio and Perote (2012).

$$d_{1} = \mu_{1}$$

(18)

$$d_{2} = \frac{1}{2}\left( {\mu_{2} - 1} \right)$$

(19)

$$d_{3} = \frac{1}{6}\left( {\mu_{3} - 3\mu_{1} } \right)$$

(20)

$$d_{4} = \frac{1}{24}\left( {\mu_{4} - 6\mu_{2} + 3} \right)$$

(21)

$$d_{5} = \frac{1}{120}\left( {\mu_{5} - 10\mu_{3} + 15\mu_{1} } \right)$$

(22)

$$d_{6} = \frac{1}{720}\left( {\mu_{6} - 15\mu_{4} + 45\mu_{2} - 15} \right)$$

(23)

$$d_{7} = \frac{1}{5040}\left( {\mu_{7} - 21\mu_{5} + 105\mu_{3} - 105\mu_{1} } \right)$$

(24)

$$d_{8} = \frac{1}{40320}\left( {\mu_{8} - 28\mu_{6} + 210\mu_{4} - 420\mu_{2} + 105} \right)$$

(25)

Appendix 2

This appendix derives the cdf of the SNP distribution.

$$\begin{aligned} G_{x} \left( a \right) = & \int\limits_{ - \infty }^{a} {g\left( {x;\varvec{d}} \right)dx = \int\limits_{ - \infty }^{a} {\phi \left( x \right)dx} + \sum\limits_{s = 1}^{n} {d_{s} \int\limits_{ - \infty }^{a} {H_{s} \left( x \right)\phi \left( x \right)dx} } } \\ = & \int\limits_{ - \infty }^{a} {\phi \left( x \right)dx - \left. {\sum\limits_{s = 1}^{n} {d_{s} H_{s - 1} \left( x \right)\phi \left( x \right)} } \right|}_{ - \infty }^{a} \\ = & \int\limits_{ - \infty }^{a} {\phi \left( x \right)dx - \phi \left( a \right)\sum\limits_{s = 1}^{n} {d_{s} H_{s - 1} \left( a \right)} } \\ \end{aligned}$$

Given that $\mathop {\lim }\limits_{x \to \pm \infty } H_{s} \left( x \right)\phi \left( x \right) = 0 \quad \forall s \ge 1,$ it follows that

$$\begin{aligned} \int {H_{s} \left( x \right)\phi \left( x \right)dx} = & \int {\left( { - 1} \right)^{s} \frac{{d^{s} \phi \left( x \right)}}{{dx^{s} }}dx_{t} = \left( { - 1} \right)^{s} \frac{{d^{s - 1} \phi \left( x \right)}}{{dx^{s - 1} }}} \\ = &\, \left( { - 1} \right)^{s} \left( { - 1} \right)^{s - 1} H_{s - 1} \left( x \right)\phi \left( x \right) = - H_{s - 1} \left( x \right)\phi \left( x \right) \\ \end{aligned}$$

□

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cortés, L.M., Mora-Valencia, A. & Perote, J. The productivity of top researchers: a semi-nonparametric approach. Scientometrics 109, 891–915 (2016). https://doi.org/10.1007/s11192-016-2072-5

Download citation

Received: 24 December 2015
Published: 23 July 2016
Issue Date: November 2016
DOI: https://doi.org/10.1007/s11192-016-2072-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The productivity of top researchers: a semi-nonparametric approach

Abstract

Access this article

Similar content being viewed by others

The Annus Mirabilis paper: years of peak productivity in scientific careers

The law of limited excellence: publication productivity of Israel Prize laureates in the life and exact sciences

Skewed distributions of scientists’ productivity: a research program for the empirical analysis

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1

Appendix 2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The productivity of top researchers: a semi-nonparametric approach

Abstract

Access this article

Similar content being viewed by others

The Annus Mirabilis paper: years of peak productivity in scientific careers

The law of limited excellence: publication productivity of Israel Prize laureates in the life and exact sciences

Skewed distributions of scientists’ productivity: a research program for the empirical analysis

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1

Appendix 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation