Discovering cross-topic collaborations among researchers by exploiting weighted association rules

Cagliero, Luca; Garza, Paolo; Kavoosifar, Mohammad Reza; Baralis, Elena

doi:10.1007/s11192-018-2737-3

Discovering cross-topic collaborations among researchers by exploiting weighted association rules

Published: 11 April 2018

Volume 116, pages 1273–1301, (2018)
Cite this article

Scientometrics Aims and scope Submit manuscript

Luca Cagliero ORCID: orcid.org/0000-0002-7185-5247¹,
Paolo Garza¹,
Mohammad Reza Kavoosifar¹ &
…
Elena Baralis¹

457 Accesses
6 Citations
2 Altmetric
Explore all metrics

Abstract

Identifying the most relevant scientific publications on a given topic is a well-known research problem. The Author-Topic Model (ATM) is a generative model that represents the relationships between research topics and publication authors. It allows us to identify the most influential authors on a particular topic. However, since most research works are co-authored by many researchers the information provided by ATM can be complemented by the study of the most fruitful collaborations among multiple authors. This paper addresses the discovery of research collaborations among multiple authors on single or multiple topics. Specifically, it exploits an exploratory data mining technique, i.e., weighted association rule mining, to analyze publication data and to discover correlations between ATM topics and combinations of authors. The mined rules characterize groups of researchers with fairly high scientific productivity by indicating (1) the research topics covered by their most cited publications and the relevance of their scientific production separately for each topic, (2) the nature of the collaboration (topic-specific or cross-topic), (3) the name of the external authors who have (occasionally) collaborated with the group either on a specific topic or on multiple topics, and (4) the underlying correlations between the addressed topics. The applicability of the proposed approach was validated on real data acquired from the Online Mendelian Inheritance in Man catalog of genetic disorders and from the PubMed digital library. The results confirm the effectiveness of the proposed strategy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Literature reviews as independent studies: guidelines for academic practice

Article Open access 14 October 2022

How to design bibliometric research: an overview and a framework proposal

Article Open access 06 March 2024

Artificial intelligence to automate the systematic review of scientific literature

Article Open access 11 May 2023

References

Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceedings of the 20th VLDB conference, pp. 487–499.
Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. In ACM SIGMOD, 1993, pp. 207–216.
Baralis, E., Cagliero, L., Cerquitelli, T., & Garza, P. (2012). Generalized association rule mining with constraints. Information Sciences, 194, 68–84. https://doi.org/10.1016/j.ins.2011.05.016.
Article Google Scholar
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022. http://dl.acm.org/citation.cfm?id=944919.944937.
Brin, S., & Page, L. (1998) The anatomy of a large-scale hypertextual web search engine. In Seventh international world-wide web conference (WWW 1998). http://ilpubs.stanford.edu:8090/361/.
Cagliero, L., & Garza, P. (2014). Infrequent weighted itemset mining using frequent pattern growth. IEEE Transactions on Knowledge and Data Engineering, 26(4), 903–915. https://doi.org/10.1109/TKDE.2013.69.
Article Google Scholar
Cagliero, L., Garza, P., Kavoosifar, M. R., & Baralis, E. (2017). Identifying collaborations among researchers: A pattern-based approach. In Proceedings of the 2nd joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL 2017) co-located with the 40th international ACM SIGIR conference on research and development in information retrieval (SIGIR 2017), Tokyo, Japan, August 11, 2017, pp. 56–68. http://ceur-ws.org/Vol-1888/paper5.pdf.
Ding, Y., Zhang, G., Chambers, T., Song, M., Wang, X., & Zhai, C. (2014). Content-based citation analysis: The next generation of citation analysis. JASIST, 65, 1820–1833.
Google Scholar
Dong, G., & Li, J. (1999). Efficient mining of emerging patterns: Discovering trends and differences. In Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’99, pp. 43–52. https://doi.org/10.1145/312129.312191.
Hamosh, A., Scott, A., Amberger, J., Valle, D., & McKusick, V. (2000). Online mendelian inheritance in man (OMIM). Human Mutation, 15(1), 57–61. https://doi.org/10.1002/(SICI)1098-1004(200001)15:1%3c57::AID-HUMU12%3e3.0.CO;2-G.
Article Google Scholar
Han, J., Pei, J., & Yin, Y. (2000). Mining frequent patterns without candidate generation. In SIGMOD’00, Dallas, TX.
Hirsch, J. E. (2010). An index to quantify an individual’s scientific research output that takes into account the effect of multiple coauthorship. Scientometrics, 85(3), 741–754. https://doi.org/10.1007/s11192-010-0193-9.
Article MathSciNet Google Scholar
Kim, H. J., An, J., Jeong, Y. K., & Song, M. (2016). Exploring the leading authors and journals in major topics by citation sentences and topic modeling. In BIRNDL@JCDL.
Kou, N. M., Hou, U. L., Mamoulis, N., & Gong, Z. (2015a). Weighted coverage based reviewer assignment. In Proceedings of the 2015 ACM SIGMOD international conference on management of data, ACM, New York, NY, USA, SIGMOD ’15, pp. 2031–2046. https://doi.org/10.1145/2723372.2723727.
Kou, N. M., U, L. H., Mamoulis, N., Li, Y., Li, Y., & Gong, Z. (2015b). A topic-based reviewer assignment system. Proceedings of the VLDB Endowment, 8(12), 1852–1855. https://doi.org/10.14778/2824032.2824084.
Article Google Scholar
Li, B., & Hou, Y. T. (2016). The new automated IEEE INFOCOM review assignment system. IEEE Network, 30(5), 18–24. https://doi.org/10.1109/MNET.2016.7579022.
Article Google Scholar
Liu, B., Hsu, W., Chen, S., & Ma, Y. (2000). Analyzing the subjective interestingness of association rules. IEEE Intelligent Systems and Their Applications, 15(5), 47–55. https://doi.org/10.1109/5254.889106.
Article Google Scholar
Loper, E., & Bird, S. (2002). NLTK: The natural language toolkit. In Proceedings of the ACL-02 workshop on effective tools and methodologies for teaching natural language processing and computational linguistics, Vol. 1. Association for Computational Linguistics, Stroudsburg, PA, USA, ETMTNLP ’02, pp. 63–70. https://doi.org/10.3115/1118108.1118117.
Lu, C., Zhang, C., & Ma, S. (2015). How does citing behavior for a scientific article change over time? A preliminary study. In Proceedings of the 78th ASIS&T annual meeting: Information science with impact: Research in and for the Community. American Society for Information Science, Silver Springs, MD, USA, ASIST ’15, pp. 97:1–97:4. http://dl.acm.org/citation.cfm?id=2857070.2857167.
Mutschke, P. (2003). Mining networks and central entities in digital libraries. A graph theoretic approach applied to co-author networks (pp. 155–166). Berlin: Springer. https://doi.org/10.1007/978-3-540-45231-7_15.
Google Scholar
NCBI. (2017). National Center for Biotechnology Information Website. Available at http://www.ncbi.nlm.nih.gov/. Last Access: May 2017.
Newman, M. E. J. (2001). Scientific collaboration networks. I. Network construction and fundamental results. Physical Review E, 64, 016131.
Article Google Scholar
Rosen-Zvi, M., Griffiths, T. L., Steyvers, M., & Smyth, P. (2012). The author-topic model for authors and documents. CoRR arxiv:abs/1207.4169.
Silverstein, C., Brin, S., & Motwani, R. (1998). Beyond market baskets: Generalizing association rules to dependence rules. Data Mining and Knowledge Discovery, 2(1), 39–68. https://doi.org/10.1023/A:1009713703947.
Article Google Scholar
Steyvers, M., Smyth, P., Rosen-Zvi, M., & Griffiths, T. (2004). Probabilistic author-topic models for information discovery. In Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’04, pp. 306–315. https://doi.org/10.1145/1014052.1014087.
Sun, K., & Bai, F. (2008). Mining weighted association rules without preassigned weights. IEEE Transactions on Knowledge and Data Engineering, 20(4), 489–495.
Article Google Scholar
Tan, P. N., Kumar, V., & Srivastava, J. (2002). Selecting the right interestingness measure for association patterns. In Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’02, pp. 32–41. https://doi.org/10.1145/775047.775053.
Tan, P. N., Steinbach, M., & Kumar, V. (2005). Introduction to data mining. Reading: Addison-Wesley.
Google Scholar
Tang, J., Zhang, J., Yao, L., Li, J. Z., Zhang, L., & Su, Z. (2008) Arnetminer: Extraction and mining of academic social networks. In KDD
Tao, F., Murtagh, F., & Farid, M. (2003). Weighted association rule mining using weighted support and significance framework. In Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, KDD’03, pp. 661–666.
Waltman, L., & van Eck, N. J. (2015). Field-normalized citation impact indicators and the choice of an appropriate counting method. Journal of Informetrics, 9(4), 872–894. https://doi.org/10.1016/j.joi.2015.08.001.
Article Google Scholar
Wang, J., Han, J., & Pei, J. (2003). Closet+: Searching for the best strategies for mining frequent closed itemsets. In L. Getoor, T.E. Senator, P. Domingos, C. Faloutsos (Eds.), Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp. 236–245.
Wang, W., Yang, J., & Yu, P. S. (2000). Efficient mining of weighted association rules (WAR). In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD’00, pp. 270–274.
White, S., & Smyth, P. (2003). Algorithms for estimating relative importance in networks. In Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’03, pp. 266–275. https://doi.org/10.1145/956750.956782.
Zhang, G., Ding, Y., & Milojevic, S. (2013). Citation content analysis (CCA): A framework for syntactic and semantic analysis of citation content. JASIST, 64, 1490–1503.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Automatica e Informatica, Politecnico di Torino, Torino, Italy
Luca Cagliero, Paolo Garza, Mohammad Reza Kavoosifar & Elena Baralis

Authors

Luca Cagliero
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Garza
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Reza Kavoosifar
View author publications
You can also search for this author in PubMed Google Scholar
Elena Baralis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luca Cagliero.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cagliero, L., Garza, P., Kavoosifar, M.R. et al. Discovering cross-topic collaborations among researchers by exploiting weighted association rules. Scientometrics 116, 1273–1301 (2018). https://doi.org/10.1007/s11192-018-2737-3

Download citation

Received: 02 October 2017
Published: 11 April 2018
Issue Date: August 2018
DOI: https://doi.org/10.1007/s11192-018-2737-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discovering cross-topic collaborations among researchers by exploiting weighted association rules

Abstract

Access this article

Similar content being viewed by others

Literature reviews as independent studies: guidelines for academic practice

How to design bibliometric research: an overview and a framework proposal

Artificial intelligence to automate the systematic review of scientific literature

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Discovering cross-topic collaborations among researchers by exploiting weighted association rules

Abstract

Access this article

Similar content being viewed by others

Literature reviews as independent studies: guidelines for academic practice

How to design bibliometric research: an overview and a framework proposal

Artificial intelligence to automate the systematic review of scientific literature

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation