Skip to main content
Log in

Contextualization of topics: browsing through the universe of bibliographic information

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

This paper describes how semantic indexing can help to generate a contextual overview of topics and visually compare clusters of articles. The method was originally developed for an innovative information exploration tool, called Ariadne, which operates on bibliographic databases with tens of millions of records (Koopman et al. in Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems. doi:10.1145/2702613.2732781, 2015b). In this paper, the method behind Ariadne is further developed and applied to the research question of the special issue “Same data, different results”—the better understanding of topic (re-)construction by different bibliometric approaches. For the case of the Astro dataset of 111,616 articles in astronomy and astrophysics, a new instantiation of the interactive exploring tool, LittleAriadne, has been created. This paper contributes to the overall challenge to delineate and define topics in two different ways. First, we produce two clustering solutions based on vector representations of articles in a lexical space. These vectors are built on semantic indexing of entities associated with those articles. Second, we discuss how LittleAriadne can be used to browse through the network of topical terms, authors, journals, citations and various cluster solutions of the Astro dataset. More specifically, we treat the assignment of an article to the different clustering solutions as an additional element of its bibliographic record. Keeping the principle of semantic indexing on the level of such an extended list of entities of the bibliographic record, LittleAriadne in turn provides a visualization of the context of a specific clustering solution. It also conveys the similarity of article clusters produced by different algorithms, hence representing a complementary approach to other possible means of comparison.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. For details of the data collection and cleaning process leading to the common used Astro dataset see Velden et al. (2017).

  2. http://astrothesaurus.org/.

  3. http://www.dataharmony.com/services-view/mai/.

  4. More efficient random projections are available. This version is more conservative and also computationally easier.

  5. Available at http://thoth.pica.nl/astro/relate?input=gamma+ray.

  6. Available at http://thoth.pica.nl/astro/relate?input=[cluster:ok%2021].

  7. More details about cluster labelling can be found in Koopman and Wang (2017).

  8. Available at http://thoth.pica.nl/astro/relate?input=%5Bsubject%3Ahubble+diagram%5D&type=2.

  9. Available at http://thoth.pica.nl/relate?input=young.

  10. Available at http://thoth.pica.nl/astro/relate?input=young.

  11. This scan option is applicable to any other type of entities, for example, to see all subjects which start with “quantum” by using “subject:quantum” as the search term and do the scanning.

  12. Available at http://thoth.pica.nl/astro/relate?input=%5Bcluster%3Ac%5D%5Bcluster%3Aok%5D&type=S&show=500.

  13. Available at http://thoth.pica.nl/astro/relate?input=%5Bcluster%3Au%5D%5Bcluster%3Asr%5D&type=S&show=500.

  14. Available at http://thoth.pica.nl/astro/relate?input=%5Bcluster%3Ac%5D%5Bcluster%3Au%5D%5Bcluster%3Aok%5D%5Bcluster%3Aol%5D%5Bcluster%3Aeb%5D%5Bcluster%3Aen%5D%5Bcluster%3Asr%5D%5Bcluster%3Ahd%5D&type=S&show=500.

  15. Available at http://thoth.pica.nl/astro/relate?input=%5Bcluster%3Ac%5D%5Bcluster%3Au%5D%5Bcluster%3Aok%5D%5Bcluster%3Aol%5D%5Bcluster%3Aeb%5D%5Bcluster%3Aen%5D&type=S&show=500.

References

  • Achlioptas, D. (2003). Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Journal of Computer and System Sciences, 66(4), 671–687. doi:10.1016/S0022-0000(03)00025-4. http://www.sciencedirect.com/science/article/pii/S0022000003000254.

  • Bingham, E., & Mannila, H. (2001). Random projection in dimensionality reduction: Applications to image and text data. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’01, (pp. 245–250). ACM, New York. doi:10.1145/502512.502546. http://doi.acm.org/10.1145/502512.502546

  • Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment. P10008(12pp)

  • Börner, K. (2011). Plug-and-play macroscopes. Communications of the ACM, 54(3), 60–69.

    Article  Google Scholar 

  • Boyack, K., & Klavans, R. (2010). Weaving the fabric of science. In K. Börner & E. F. Hardy (Eds.), 6th Iteration (2009): Science Maps for Scholars, Places and Spaces: Mapping Science. http://scimaps.org/.

  • Boyack, K. W. (2017a). Investigating the effect of global data on topic detection. In J. Gläser, A. Scharnhorst, & W. Glänzel (Eds.), Same data—Different results?. Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics.

  • Boyack, K. W. (2017b). Thesaurus-based methods for mapping contents of publication sets. In J. Gläser, A. Scharnhorst, & W. Glänzel (Eds.), Same data—Different results?. Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics.

  • de Solla Price, D. J. (1965). Networks of scientific papers. Science, 149(3683), 510–515. doi:10.1126/science.149.3683.510. http://www.sciencemag.org/content/149/3683/510.short.

  • Galison, P. (1997). Image and logic: A material culture of microphysics. Chicago: University of Chicago Press.

    Google Scholar 

  • Glänzel, W., & Schubert, A. (2004). Analysing scientific networks through co-authorship. In H. F. Moed, W. Glänzel, & U. Schmoch (Eds.), Handbook of quantitative science and technology research (pp. 257–276). Berlin: Springer. doi:10.1007/1-4020-2755-9_12.

    Google Scholar 

  • Glänzel, W., & Thijs, B. (2017). Using hybrid methods and ‘core documents’ for the representation of clusters and topics. the astronomy dataset. In: J. Gläser, A. Scharnhorst & W. Glänzel (Eds.), Same data—Different results? Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics.

  • Gläser, J., Glänzel, W., & Scharnhorst, A. (2017). Introduction to the special issue “same data, different results?”. In J. Gläser, A. Scharnhorst, & W. Glänzel (Eds.), Same data—Different results?. Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics.

  • Havemann, F., Gläser, J., & Heinz, M. (2017). Memetic search for overlapping topics. In J. Gläser, A. Scharnhorst, & W. Glänzel (Eds.), Same data—Different results?. Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics.

  • Havemann, F., & Scharnhorst, A. (2012). Bibliometric networks. CoRR arXiv:1212.5211.

  • Janssens, F., Zhang, L., Moor, B. D., & Glänzel, W. (2009). Hybrid clustering for validation and improvement of subject-classification schemes. Information Processing and Management, 45(6), 683–702. doi:10.1016/j.ipm.2009.06.003. http://www.sciencedirect.com/science/article/pii/S0306457309000673.

  • Johnson, W., & Lindenstrauss, J. (1984). Extensions of Lipschitz mappings into a Hilbert space. Contemporary Mathematics, 26, 189–206.

    Article  MathSciNet  MATH  Google Scholar 

  • Koopman, R., & Wang, S. (2017). Mutual information based labelling and comparing clusters. In J. Gläser, A. Scharnhorst, & W. Glänzel (Eds.), Same data—Different results?. Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics.

  • Koopman, R., Wang, S., & Scharnhorst, A. (2015). Contextualization of topics—Browsing through terms, authors, journals and cluster allocations. In A. A. Salah, Y. Tonta, A. A. A. Salah, C. R. Sugimoto, & U. Al (Eds.), Proceedings of ISSI 2015 Istanbul: 15th International Society of Scientometrics and Informetrics Conference, Istanbul, Turkey, 29 June to 3 July, 2015.

  • Koopman, R., Wang, S., Scharnhorst, A., & Englebienne, G. (2015). Ariadne’s thread: Interactive navigation in a world of networked information. In B. Begole, J. Kim, K. Inkpen, & W. Woo (Eds.), Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems, Seoul, CHI 2015 Extended Abstracts, Republic of Korea, April 18–23, 2015, (pp. 1833–1838). ACM. doi:10.1145/2702613.2732781. http://doi.acm.org/10.1145/2702613.2732781.

  • Kouw, M., Heuvel, C. V. D., & Scharnhorst, A. (2013). Exploring uncertainty in knowledge representations: Classifications, simulations, and models of the world. In P. Wouters, A. Beaulieu, A. Scharnhorst, & S. Wyatt (Eds.), Virtual knowledge. Experimenting in the humanities and the social sciences (pp. 89–126). Cambridge: MIT Press.

    Google Scholar 

  • Leydesdorff, L., & Welbers, K. (2011). The semantic mapping of words and co-words in contexts. Journal of Informetrics, 5(3), 469–475. doi:10.1016/j.joi.2011.01.008.

    Article  Google Scholar 

  • Lu, K., & Wolfram, D. (2012). Measuring author research relatedness: A comparison of word-based, topic-based, and author cocitation approaches. Journal of the American Society for Information Science and Technology, 63(10), 1973–1986. doi:10.1002/asi.22628.

    Article  Google Scholar 

  • Mahalanobis, P. C. (1936). On the generalised distance in statistics. Proceedings National Institute of Science, India, 2(1), 49–55.

    MathSciNet  MATH  Google Scholar 

  • Mali, F., Kronegger, L., Doreian, P., & Ferligoj, A. (2012). Dynamic scientific co-authorship networks. In A. Scharnhorst, K. Börner & P. van den Besselaar (Eds.), Models of Science Dynamics, Understanding Complex Systems (pp. 195–232). Springer, Berlin. doi:10.1007/978-3-642-23068-4_6.

  • Mayr, P., & Scharnhorst, A. (2015). Scientometrics and information retrieval: weak-links revitalized. Scientometrics, 102(3), 2193–2199. doi:10.1007/s11192-014-1484-3.

    Article  Google Scholar 

  • Mutschke, P., & Mayr, P. (2014). Science models for search: A study on combining scholarly information retrieval and scientometrics. Scientometrics 1–23. doi:10.1007/s11192-014-1485-2.

  • Papadimitriou, C. H., Raghavan, P., Tamaki, H., & Vempala, S. (2000). Latent semantic indexing: A probabilistic analysis. Journal of Computer and System Sciences, 61(2), 217–235. doi:10.1006/jcss.2000.1711. http://www.sciencedirect.com/science/article/pii/S0022000000917112.

  • Petersen, A. (2006). Simulating nature: A philosophical study of computer-simulation uncertainties and their role in climate science and policy advice. Apeldoorn: Het Spinhuis.

    Google Scholar 

  • Radicchi, F., Fortunato, S., & Vespignani, A. (2012). Citation networks. In A. Scharnhorst, K. Börner, & P. Besselaar (Eds.), Models of Science Dynamics, Understanding Complex Systems, vol. 69, chap. 7, (pp. 233–257). Springer, Berlin. doi:10.1007/978-3-642-23068-4_7.

  • Salton, G., & McGill, M. J. (1986). Introduction to modern information retrieval. New York: McGraw-Hill Inc.

    MATH  Google Scholar 

  • Van Eck, N. J., & Waltman, L. (2017). Citation-based clustering of publications. In J. Gläser, A. Scharnhorst, & W. Glänzel (Eds.), Same data—Different results?. Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics.

  • Van Heur, B., Leydesdorff, L., & Wyatt, S. (2013). Turning to ontology in STS? Turning to STS through “ontology”. Social Studies of Science, 43(3), 341–362. doi:10.1177/030631271245814.

    Article  Google Scholar 

  • Velden, T., Boyack, K., van Eck, N., Glänzel, W., Gläser, J., & Havemann, F., et al. (2017). Comparison of topic extraction approaches and their results. In J. Gläser, A. Scharnhorst, & W. Glänzel (Eds.), Same data—Different results? Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics.

  • Velden, T., Yan, S., & Lagoze, C. (2017). Mapping the cognitive structure of astrophysics by infomap. In J. Gläser, A. Scharnhorst, & W. Glänzel (Eds.), Same data—Different results? Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics.

  • Vinh, N. X., Epps, J., & Bailey, J. (2010). Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research, 11, 2837–2854.

    MathSciNet  MATH  Google Scholar 

  • Wang, S., & Koopman, R. (2017). Clustering articles based on semantic similarity. In J. Gläser, A. Scharnhorst, & W. Glänzel (Eds.), Same data—Different results? (pp. 234–556). Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics.

  • Zitt, M., & Bassecoulard, E. (2006). Delineating complex scientific fields by an hybrid lexical-citation method: An application to nanosciences. Information Processing and Management, 42(6), 1513–1531. doi:10.1016/j.ipm.2006.03.016. http://www.sciencedirect.com/science/article/pii/S0306457306000379. Special Issue on Informetrics.

  • Zitt, M., Lelu, A., & Bassecoulard, E. (2011). Hybrid citation-word representations in science mapping: Portolan charts of research fields? Journal of the American Society for Information Science and Technology, 62, 19–39.

    Article  Google Scholar 

Download references

Acknowledgements

Part of this work has been funded by the COST Action TD1210 Knowescape, and the FP7 Project ImpactEV. We would like to thank the internal reviewers Frank Havemann, Bart Thijs as well as the anonymous external referees for their valuable comments and suggestions. We would also like to thank Jochen Gläser, William Harvey and Jean Godby for comments on the text.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rob Koopman.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Koopman, R., Wang, S. & Scharnhorst, A. Contextualization of topics: browsing through the universe of bibliographic information. Scientometrics 111, 1119–1139 (2017). https://doi.org/10.1007/s11192-017-2303-4

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-017-2303-4

Keywords

Navigation