Automatic Hierarchical Categorization of Research Expertise Using Minimum Information

de Siqueira, Gustavo Oliveira; Canuto, Sérgio; Gonçalves, Marcos André; Laender, Alberto H. F.

doi:10.1007/978-3-319-67008-9_9

Gustavo Oliveira de Siqueira¹⁸,
Sérgio Canuto¹⁸,
Marcos André Gonçalves¹⁸ &
…
Alberto H. F. Laender¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10450))

Included in the following conference series:

International Conference on Theory and Practice of Digital Libraries

2410 Accesses
2 Citations

Abstract

Throughout the history of science, different knowledge areas have collaborated to overcome major research challenges. The task of associating a researcher with such areas makes a series of tasks feasible such as the organization of digital repositories, expertise recommendation and the formation of research groups for complex problems. In this paper we propose a simple yet effective automatic classification model that is capable of categorizing research expertise according to a hierarchical knowledge area classification scheme. Our proposal relies on discriminative evidence provided by the title of academic works, which is the minimum information capable of relating a researcher to its knowledge area. We also evaluate the use of learning-to-rank as an effective mean to rank experts with minimum information. Our experiments show that using supervised machine learning methods trained with manually labeled information, it is possible to produce effective classification and ranking models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://dblp.uni-trier.de.
2.
http://dl.acm.org.
3.
http://www.ndltd.org.
4.
http://lattes.cnpq.br.
5.
In this paper we use the terms categorization and classification interchangeably.
6.
http://bit.ly/1JM2j1k.
7.
http://scikit-learn.org.
8.
https://www.csie.ntu.edu.tw/~cjli/liblinear.

References

Aletras, N., Baldwin, T., Lau, J.H., Stevenson, M.: Representing topics labels for exploring digital libraries. In: Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 239–248 (2014)
Google Scholar
Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co. Inc., Boston (1999)
Google Scholar
Bakalov, A., McCallum, A., Wallach, H., Mimno, D.: Topic models for taxonomies. In: Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 237–240 (2012)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
Chen, M., Jin, X., Shen, D.: Short text classification improved by learning multi-granularity topics. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, vol. 3, pp. 1776–1781 (2011)
Google Scholar
Chen, Y., Fox, E.A.: Using ACM DL paper metadata as an auxiliary source for building educational collections. In: Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 137–140 (2014)
Google Scholar
de Sá, C.C., Gonçalves, M.A., Sousa, D.X., Salles, T.: Generalized BROOF-L2R: a general framework for learning to rank based on boosting and random forests. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 95–104 (2016)
Google Scholar
Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15(1), 3133–3181 (2014)
MathSciNet MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., New York (2001)
Book MATH Google Scholar
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)
Article Google Scholar
Lane, J.: Let’s make science metrics more scientific. Nature 464(7288), 488–489 (2010)
Article Google Scholar
Li, M., Liu, L., Li, C.-B.: An approach to expert recommendation based on fuzzy linguistic method and fuzzy text classification in knowledge management systems. Exp. Syst. Appl. 38(7), 8586–8596 (2011)
Article Google Scholar
Macdonald, C., Ounis, I.: Voting techniques for expert search. Knowl. Inf. Syst. 16(3), 259–280 (2008)
Article Google Scholar
Moreira, C., Calado, P., Martins, B.: Learning to rank for expert search in digital libraries of Academic publications. In: Antunes, L., Pinto, H.S. (eds.) Progress in Artificial Intelligence, pp. 431–445. Springer, Heidelberg (2011)
Chapter Google Scholar
Niu, W., Liu, Z., Caverlee, J.: On local expert discovery via geo-located crowds, queries, and candidates. ACM Trans. Spatial Algorithms Syst. 2(4), 14:1–14:24 (2016)
Article Google Scholar
Qin, T., Liu, T.-Y., Xu, J., Li, H.: Letor: a benchmark collection for research on learning to rank for information retrieval. Inf. Retr. 13(4), 346–374 (2010)
Article Google Scholar
Ribeiro, I.S., Santos, R.L.T., Gonçalves, M.A., Laender, A.H.F.: On tag recommendation for expertise profiling: a case study in the scientific domain. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 189–198 (2015)
Google Scholar
Ribeiro-Neto, B.A., Laender, A.H.F., de Lima, L.R.S.: An experimental study in automatically categorizing medical documents. JASIST 52(5), 391–401 (2001)
Article Google Scholar
Sanchez, D., Moreno, A.: Bringing taxonomic structure to large digital libraries. Int’l. J. Metadata Semant. Ontol. 2(2), 112–122 (2007)
Article Google Scholar
Seymour, E., Damle, R., Sette, A., Peters, B.: Cost sensitive hierarchical document classification to triage PubMed abstracts for manual curation. BMC Bioinform. 12(1), 482 (2011)
Article Google Scholar
Silla Jr., C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Mining Knowl. Disc. 22(1–2), 31–72 (2011)
Article MathSciNet MATH Google Scholar
Srinivasan, V., Fox, E.: Progress towards automated ETD cataloging. In: Proceedings of the 19th International Symposium on Electronic theses, dissertations: Data and dissertations (2016)
Google Scholar
Waltinger, U., Mehler, A., Lösch, M., Horstmann, W.: Hierarchical classification of OAI metadata using the DDC taxonomy. In: Bernardi, R., Anderson, S., Bjrn, C., Frdrique, G., Zaihrayeu, S. (eds.) Advanced Language Technologies for Digital Libraries, pp. 29–40. Springer, Heidelberg (2011)
Chapter Google Scholar
Yang, K.-W., Huh, S.-Y.: Automatic expert identification using a text categorization technique in knowledge management systems. Expert Syst. Appl. 34(2), 1445–1455 (2008)
Article Google Scholar
Yang, Y.: An evaluation of statistical approaches to text categorization. Inf. Retr. J. 1(1–2), 69–90 (1999)
Article MathSciNet Google Scholar
Yang, Y., Gopal, S.: Multilabel classification with meta-level features in a learning-to-rank framework. Mach. Learn. 88(1), 47–68 (2012)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work was partially funded by projects InWeb (grant MCT/CNPq 573871/2008-6) and MASWeb (grant FAPEMIG/PRONEX APQ-01400-14), and by the authors’ individual grants from CAPES, CNPq and FAPEMIG.

Author information

Authors and Affiliations

Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
Gustavo Oliveira de Siqueira, Sérgio Canuto, Marcos André Gonçalves & Alberto H. F. Laender

Authors

Gustavo Oliveira de Siqueira
View author publications
You can also search for this author in PubMed Google Scholar
Sérgio Canuto
View author publications
You can also search for this author in PubMed Google Scholar
Marcos André Gonçalves
View author publications
You can also search for this author in PubMed Google Scholar
Alberto H. F. Laender
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alberto H. F. Laender .

Editor information

Editors and Affiliations

Faculteit der Geesteswetenschappen, Universiteit van Amsterdam , Amsterdam, The Netherlands
Jaap Kamps
Library & Information Center, University of Patras , Patras, Greece
Giannis Tsakonas
Aristotle University of Thessaloniki , Thessaloniki, Greece
Yannis Manolopoulos
Civil Engineering, University of Thrace , Kimmeria, Greece
Lazaros Iliadis
Informatics, Ionian University , Kerkyra, Greece
Ioannis Karydis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de Siqueira, G.O., Canuto, S., Gonçalves, M.A., Laender, A.H.F. (2017). Automatic Hierarchical Categorization of Research Expertise Using Minimum Information. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2017. Lecture Notes in Computer Science(), vol 10450. Springer, Cham. https://doi.org/10.1007/978-3-319-67008-9_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-67008-9_9
Published: 02 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67007-2
Online ISBN: 978-3-319-67008-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics