Skip to main content
Log in

Lexical analysis of scientific publications for nano-level scientometrics

Scientometrics Aims and scope Submit manuscript

Abstract

In earlier studies (e.g. Glänzel and Thijs in Scientometrics, 2017) we have used components of text analysis in combination with link-based techniques to cluster documents spaces and to detect emerging research topics on the large scale. Taking up now the objectives of evaluative scientometrics, we attempt to link the textual analysis of small sets of individual scientific papers to evaluative bibliometrics. The objective is, however, quite similar. We focus on the detection of similarities and on monitoring structural changes but this time on the small scale. We proceed from earlier approaches used in quantitative linguistics applied to bibliometrics (Telcs et al. in Math Soc Sci; 10(2):169–178, 1985). In the present pilot study we have selected 18 papers by András Schubert and published in three different periods with 6 papers each: 1983–1985, 1993–1998 and 2010–2013. The objective is twofold: We first try only to detect linguistic regularities in the scientometric text by applying a Waring model to the analysis of Schubert’s vocabulary on the basis of all words and nouns. The second goal refers to the identification of changes in the used vocabulary over a period of three decades. The main findings are discussed along with future research tasks, which arise from these result in the context of the analysis of dynamics and emergence of research topics at the micro and nano level.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

Notes

  1. Several authors (e.g., Wildgaard, Cronin) have used the term nano-bibliometrics for the evaluation of individual authors. Since we still consider this micro level, we use nano for a smaller scale.

  2. Schubert interpreted the role of actors as that of atoms, the directly interconnected complex structures as molecules.

References

  • Braun, T., Schubert, A., & Schubert, G. (2016). On the molecular structure of the co-author network of Alexandru T. Balaban. Revue Roumaine de Chimie 61 (4–5), 231–238.

    Google Scholar 

  • Gelbukh, A., & Sidorov, G. (2001). Zipf and Heaps Laws’ coefficients depend on language. In A. Gelbukh (Ed.), Computational linguistics and intelligent text processing (pp. 332–335)., LNCS 2004 Berlin: Springer.

    Chapter  Google Scholar 

  • Glänzel, W., & Thijs, B. (2017). Using hybrid methods and ‘core documents’ for the representation of clusters and topics. The astronomy dataset. Scientometrics. doi:10.1007/s11192-017-2301-6.

    Google Scholar 

  • Glänzel, W., Thijs, B., & Debackere, K. (2014). The application of citation-based performance classes to the disciplinary and multidisciplinary assessment in national comparison and institutional research assessment. Scientometrics, 101(2), 939–952.

    Article  Google Scholar 

  • Kelih, E., & Grzybek, P. (2004). Häufigkeiten von Satzlängen. Zum Faktor der Intervallgröße als Einflussvariable (am Beispiel slowenischer Texte). Glottometrics, 8, 23–41.

    Google Scholar 

  • Kelih, E., Grzybek, P., Antic, G., & Stadlober, E. (2006). Quantitative text typology the impact of sentence length. In M. Spiliopoulou, R. Kruse, C. Borgelt, A. Nürnberger, & W. Gaul (Eds.), From data and information analysis to knowledge engineering—Proceedings of the 29 th annual conference of the Gesellschaft für Klassifikation e.V., University of Magdeburg (pp. 382–389). Berlin: Springer.

    Google Scholar 

  • Kim, S. N., Medelyan, O., Kan, M.-Y., & Baldwin, T. (2013). Automatic keyphrase extraction from scientific articles. Language Resources & Evaluation, 47(3), 723–742.

    Article  Google Scholar 

  • Klein, D., & Manning, C. D. (2003). Accurate unlexicalized parsing. Proceedings of the 41st annual meeting of the association for computational linguistics (pp. 423–430).

  • Kornai, A. (2002). How many words are there? Glottometrics, 4, 61–86.

    Google Scholar 

  • Mullins, N., Snizek, W., & Oehlwer, K. (1988). The structural analysis of a scientific paper. In A. F. J. van Raan (Ed.), Handbook of quantitative studies of science and technology (pp. 81–105). Amsterdam: Elsevier.

    Chapter  Google Scholar 

  • Piantadosi, S. T. (2014). Zipf’s word frequency law in natural language: A critical review and future directions. Psychonomic Bulletin & Review, 21(5), 1112–1130.

    Article  Google Scholar 

  • Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137.

    Article  Google Scholar 

  • Schubert, A. (2013). Atoms and molecules of scientometrics. Authors, publications, references, citations and the bonds among them. Toulouse, 7 November, 2013.

  • Shah, P. K., Perez-Iratxeta, C., Bork, P., & Andrade, M. A. (2003). Methodology article. Information extraction from full text scientific articles: Where are the keywords? BMC Bioinformatics, 4(20), 1471–2105.

    Google Scholar 

  • Sichel, H. S. (1974). On a distribution representing sentence-length in written prose. Journal of the Royal Statistical Society. Series A, 137(1), 25–34.

    Article  Google Scholar 

  • Telcs, A., Glänzel, W., & Schubert, A. (1985). Characterization and statistical test using truncated expectations for a class of skew distributions. Mathematical Social Sciences, 10(2), 169–178.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wolfgang Glänzel.

Appendix

Appendix

The 18 papers by András Schubert analysed in our study.

  1. (1)

    Articles published in 1983–1985

    1. [1]

      Schubert, A., Zsindely, S., Telcs, A. Braun, T., Quantitative-analysis of a visible tip of the peer-review iceberg – Book Reviews in chemistry. Scientometrics, 6 (6), 1984, 433–443. DOI: 10.1007/BF02025830

    2. [2]

      Glänzel, W., Schubert, A., Price distribution. An exact formulation of Price’s Square Root Law. Scientometrics, 7 (3-6), 1985, 211–219. DOI: 10.1007/BF02017147

    3. [3]

      Schubert, A., Zsindely, S., Braun, T., Scientometric analysis of attendance at international scientific meetings. Scientometrics, 5 (3), 1983, 177–187. DOI: 10.1007/BF02095627

    4. [4]

      Schubert, A., Glänzel, W., Statistical reliability of comparisons based 1983 on the citation impact of scientific publications. Scientometrics, 5 (1), 1983, 59–74. DOI: 10.1007/BF02097178

    5. [5]

      Schubert, A., Zsindely, S., Braun, T., Scientometric indicators for evaluating medical-research output of mid-size countries. Scientometrics, 7 (3-6), 1985, 155–163. DOI: 10.1007/BF02017143

    6. [6]

      Schubert, A., Glänzel, W., A dynamic look at a class of skew distributions - a model with scientometric applications. Scientometrics, 6 (3), 1984, 149–167. DOI: 10.1007/BF02016759

  2. (2)

    Articles published in 1993–1998

    1. [7]

      Schubert, A., The profile of the Chemical Engineering Journal and Biochemical Engineering Journal as reflected in its publications, references and citations,

    2. [8]

      1983–1996. Chemical Engineering Journal, 69 (3), 1998, 151–156.

    3. [9]

      DOI: 10.1016/S1385-8947(98)00074-6

    4. [10]

      Braun, T., Schubert, A., Zsindely, S., Nanoscience and nanotechnology on the balance. Scientometrics, 38 (2), 1997, 321–325 DOI: 10.1007/BF02457417

    5. [11]

      Schubert, A., Maczelka, H., Cognitive changes in scientometrics during the 1980s, as reflected by the reference patterns of its core journal. Social Studies of Science, 23 (3), 1993, 571–581. DOI: 10.1177/0306312793023003007

    6. [12]

      Schubert, A., Little Scientometrics, Big Scientometrics - and beyond. Scientometrics, 30 (2-3), 1994, 411–413. DOI: 10.1007/BF02018114

    7. [13]

      Schubert, A., Braun, T., Cross-field normalization of scientometric indicators. Scientometrics, 36 (3), 1996, 311-324. DOI: 10.1007/BF02129597

    8. [14]

      Schubert, A., Braun, T., Reference-standards for citation based assessments. Scientometrics, 26 (1), 1993, 21–35. DOI: 10.1007/BF02016790

  3. (3)

    Articles published in 2010–2013

    1. [15]

      Schubert, A., Jazz discometrics – A network approach. Journal of Informetrics, 6 (4), 2012, 48–484. DOI: 10.1016/j.joi.2012.04.004

    2. [16]

      Schubert, A., A Hirsch-type index of co-author partnership ability. Scientometrics, 91 (1), 2012, 303–308. DOI: 10.1007/s11192-011-0559-7

    3. [17]

      Schubert, A., X-centage: a Hirsch-inspired indicator for distributions of percentage-valued variables and its use for measuring heterodisciplinarity. Scientometrics, 102 (1), 2015, 307–332. DOI: 10.1007/s11192-014-1281-z

    4. [18]

      Schubert, A., A reference-based Hirschian similarity measure for journals. Scientometrics, 84 (1), 2010, 133–147. DOI: 10.1007/s11192-009-0072-4

    5. [19]

      Schubert, A., Soos, S., Mapping of science journals based on h-similarity. Scientometrics, 83 (2), 2010, 589–600. DOI: 10.1007/s11192-010-0167-y

    6. [20]

      Schubert, A., Measuring the similarity between the reference and citation distributions of journals. Scientometrics, 96 (1), 2013, 305–313. DOI: 10.1007/s11192-012-0889-0

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Glänzel, W., Heeffer, S. & Thijs, B. Lexical analysis of scientific publications for nano-level scientometrics. Scientometrics 111, 1897–1906 (2017). https://doi.org/10.1007/s11192-017-2336-8

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-017-2336-8

Keywords

Navigation