Skip to main content
Log in

Discovering cross-topic collaborations among researchers by exploiting weighted association rules

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Identifying the most relevant scientific publications on a given topic is a well-known research problem. The Author-Topic Model (ATM) is a generative model that represents the relationships between research topics and publication authors. It allows us to identify the most influential authors on a particular topic. However, since most research works are co-authored by many researchers the information provided by ATM can be complemented by the study of the most fruitful collaborations among multiple authors. This paper addresses the discovery of research collaborations among multiple authors on single or multiple topics. Specifically, it exploits an exploratory data mining technique, i.e., weighted association rule mining, to analyze publication data and to discover correlations between ATM topics and combinations of authors. The mined rules characterize groups of researchers with fairly high scientific productivity by indicating (1) the research topics covered by their most cited publications and the relevance of their scientific production separately for each topic, (2) the nature of the collaboration (topic-specific or cross-topic), (3) the name of the external authors who have (occasionally) collaborated with the group either on a specific topic or on multiple topics, and (4) the underlying correlations between the addressed topics. The applicability of the proposed approach was validated on real data acquired from the Online Mendelian Inheritance in Man catalog of genetic disorders and from the PubMed digital library. The results confirm the effectiveness of the proposed strategy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  • Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceedings of the 20th VLDB conference, pp. 487–499.

  • Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. In ACM SIGMOD, 1993, pp. 207–216.

  • Baralis, E., Cagliero, L., Cerquitelli, T., & Garza, P. (2012). Generalized association rule mining with constraints. Information Sciences, 194, 68–84. https://doi.org/10.1016/j.ins.2011.05.016.

    Article  Google Scholar 

  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022. http://dl.acm.org/citation.cfm?id=944919.944937.

  • Brin, S., & Page, L. (1998) The anatomy of a large-scale hypertextual web search engine. In Seventh international world-wide web conference (WWW 1998). http://ilpubs.stanford.edu:8090/361/.

  • Cagliero, L., & Garza, P. (2014). Infrequent weighted itemset mining using frequent pattern growth. IEEE Transactions on Knowledge and Data Engineering, 26(4), 903–915. https://doi.org/10.1109/TKDE.2013.69.

    Article  Google Scholar 

  • Cagliero, L., Garza, P., Kavoosifar, M. R., & Baralis, E. (2017). Identifying collaborations among researchers: A pattern-based approach. In Proceedings of the 2nd joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL 2017) co-located with the 40th international ACM SIGIR conference on research and development in information retrieval (SIGIR 2017), Tokyo, Japan, August 11, 2017, pp. 56–68. http://ceur-ws.org/Vol-1888/paper5.pdf.

  • Ding, Y., Zhang, G., Chambers, T., Song, M., Wang, X., & Zhai, C. (2014). Content-based citation analysis: The next generation of citation analysis. JASIST, 65, 1820–1833.

    Google Scholar 

  • Dong, G., & Li, J. (1999). Efficient mining of emerging patterns: Discovering trends and differences. In Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’99, pp. 43–52. https://doi.org/10.1145/312129.312191.

  • Hamosh, A., Scott, A., Amberger, J., Valle, D., & McKusick, V. (2000). Online mendelian inheritance in man (OMIM). Human Mutation, 15(1), 57–61. https://doi.org/10.1002/(SICI)1098-1004(200001)15:1%3c57::AID-HUMU12%3e3.0.CO;2-G.

    Article  Google Scholar 

  • Han, J., Pei, J., & Yin, Y. (2000). Mining frequent patterns without candidate generation. In SIGMOD’00, Dallas, TX.

  • Hirsch, J. E. (2010). An index to quantify an individual’s scientific research output that takes into account the effect of multiple coauthorship. Scientometrics, 85(3), 741–754. https://doi.org/10.1007/s11192-010-0193-9.

    Article  MathSciNet  Google Scholar 

  • Kim, H. J., An, J., Jeong, Y. K., & Song, M. (2016). Exploring the leading authors and journals in major topics by citation sentences and topic modeling. In BIRNDL@JCDL.

  • Kou, N. M., Hou, U. L., Mamoulis, N., & Gong, Z. (2015a). Weighted coverage based reviewer assignment. In Proceedings of the 2015 ACM SIGMOD international conference on management of data, ACM, New York, NY, USA, SIGMOD ’15, pp. 2031–2046. https://doi.org/10.1145/2723372.2723727.

  • Kou, N. M., U, L. H., Mamoulis, N., Li, Y., Li, Y., & Gong, Z. (2015b). A topic-based reviewer assignment system. Proceedings of the VLDB Endowment, 8(12), 1852–1855. https://doi.org/10.14778/2824032.2824084.

    Article  Google Scholar 

  • Li, B., & Hou, Y. T. (2016). The new automated IEEE INFOCOM review assignment system. IEEE Network, 30(5), 18–24. https://doi.org/10.1109/MNET.2016.7579022.

    Article  Google Scholar 

  • Liu, B., Hsu, W., Chen, S., & Ma, Y. (2000). Analyzing the subjective interestingness of association rules. IEEE Intelligent Systems and Their Applications, 15(5), 47–55. https://doi.org/10.1109/5254.889106.

    Article  Google Scholar 

  • Loper, E., & Bird, S. (2002). NLTK: The natural language toolkit. In Proceedings of the ACL-02 workshop on effective tools and methodologies for teaching natural language processing and computational linguistics, Vol. 1. Association for Computational Linguistics, Stroudsburg, PA, USA, ETMTNLP ’02, pp. 63–70. https://doi.org/10.3115/1118108.1118117.

  • Lu, C., Zhang, C., & Ma, S. (2015). How does citing behavior for a scientific article change over time? A preliminary study. In Proceedings of the 78th ASIS&T annual meeting: Information science with impact: Research in and for the Community. American Society for Information Science, Silver Springs, MD, USA, ASIST ’15, pp. 97:1–97:4. http://dl.acm.org/citation.cfm?id=2857070.2857167.

  • Mutschke, P. (2003). Mining networks and central entities in digital libraries. A graph theoretic approach applied to co-author networks (pp. 155–166). Berlin: Springer. https://doi.org/10.1007/978-3-540-45231-7_15.

    Google Scholar 

  • NCBI. (2017). National Center for Biotechnology Information Website. Available at http://www.ncbi.nlm.nih.gov/. Last Access: May 2017.

  • Newman, M. E. J. (2001). Scientific collaboration networks. I. Network construction and fundamental results. Physical Review E, 64, 016131.

    Article  Google Scholar 

  • Rosen-Zvi, M., Griffiths, T. L., Steyvers, M., & Smyth, P. (2012). The author-topic model for authors and documents. CoRR arxiv:abs/1207.4169.

  • Silverstein, C., Brin, S., & Motwani, R. (1998). Beyond market baskets: Generalizing association rules to dependence rules. Data Mining and Knowledge Discovery, 2(1), 39–68. https://doi.org/10.1023/A:1009713703947.

    Article  Google Scholar 

  • Steyvers, M., Smyth, P., Rosen-Zvi, M., & Griffiths, T. (2004). Probabilistic author-topic models for information discovery. In Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’04, pp. 306–315. https://doi.org/10.1145/1014052.1014087.

  • Sun, K., & Bai, F. (2008). Mining weighted association rules without preassigned weights. IEEE Transactions on Knowledge and Data Engineering, 20(4), 489–495.

    Article  Google Scholar 

  • Tan, P. N., Kumar, V., & Srivastava, J. (2002). Selecting the right interestingness measure for association patterns. In Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’02, pp. 32–41. https://doi.org/10.1145/775047.775053.

  • Tan, P. N., Steinbach, M., & Kumar, V. (2005). Introduction to data mining. Reading: Addison-Wesley.

    Google Scholar 

  • Tang, J., Zhang, J., Yao, L., Li, J. Z., Zhang, L., & Su, Z. (2008) Arnetminer: Extraction and mining of academic social networks. In KDD

  • Tao, F., Murtagh, F., & Farid, M. (2003). Weighted association rule mining using weighted support and significance framework. In Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, KDD’03, pp. 661–666.

  • Waltman, L., & van Eck, N. J. (2015). Field-normalized citation impact indicators and the choice of an appropriate counting method. Journal of Informetrics, 9(4), 872–894. https://doi.org/10.1016/j.joi.2015.08.001.

    Article  Google Scholar 

  • Wang, J., Han, J., & Pei, J. (2003). Closet+: Searching for the best strategies for mining frequent closed itemsets. In L. Getoor, T.E. Senator, P. Domingos, C. Faloutsos (Eds.), Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp. 236–245.

  • Wang, W., Yang, J., & Yu, P. S. (2000). Efficient mining of weighted association rules (WAR). In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD’00, pp. 270–274.

  • White, S., & Smyth, P. (2003). Algorithms for estimating relative importance in networks. In Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’03, pp. 266–275. https://doi.org/10.1145/956750.956782.

  • Zhang, G., Ding, Y., & Milojevic, S. (2013). Citation content analysis (CCA): A framework for syntactic and semantic analysis of citation content. JASIST, 64, 1490–1503.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luca Cagliero.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cagliero, L., Garza, P., Kavoosifar, M.R. et al. Discovering cross-topic collaborations among researchers by exploiting weighted association rules. Scientometrics 116, 1273–1301 (2018). https://doi.org/10.1007/s11192-018-2737-3

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-018-2737-3

Keywords

Navigation