Skip to main content

Automatic Hierarchical Categorization of Research Expertise Using Minimum Information

  • Conference paper
  • First Online:
Research and Advanced Technology for Digital Libraries (TPDL 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10450))

Included in the following conference series:

Abstract

Throughout the history of science, different knowledge areas have collaborated to overcome major research challenges. The task of associating a researcher with such areas makes a series of tasks feasible such as the organization of digital repositories, expertise recommendation and the formation of research groups for complex problems. In this paper we propose a simple yet effective automatic classification model that is capable of categorizing research expertise according to a hierarchical knowledge area classification scheme. Our proposal relies on discriminative evidence provided by the title of academic works, which is the minimum information capable of relating a researcher to its knowledge area. We also evaluate the use of learning-to-rank as an effective mean to rank experts with minimum information. Our experiments show that using supervised machine learning methods trained with manually labeled information, it is possible to produce effective classification and ranking models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://dblp.uni-trier.de.

  2. 2.

    http://dl.acm.org.

  3. 3.

    http://www.ndltd.org.

  4. 4.

    http://lattes.cnpq.br.

  5. 5.

    In this paper we use the terms categorization and classification interchangeably.

  6. 6.

    http://bit.ly/1JM2j1k.

  7. 7.

    http://scikit-learn.org.

  8. 8.

    https://www.csie.ntu.edu.tw/~cjli/liblinear.

References

  1. Aletras, N., Baldwin, T., Lau, J.H., Stevenson, M.: Representing topics labels for exploring digital libraries. In: Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 239–248 (2014)

    Google Scholar 

  2. Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co. Inc., Boston (1999)

    Google Scholar 

  3. Bakalov, A., McCallum, A., Wallach, H., Mimno, D.: Topic models for taxonomies. In: Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 237–240 (2012)

    Google Scholar 

  4. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  5. Chen, M., Jin, X., Shen, D.: Short text classification improved by learning multi-granularity topics. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, vol. 3, pp. 1776–1781 (2011)

    Google Scholar 

  6. Chen, Y., Fox, E.A.: Using ACM DL paper metadata as an auxiliary source for building educational collections. In: Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 137–140 (2014)

    Google Scholar 

  7. de Sá, C.C., Gonçalves, M.A., Sousa, D.X., Salles, T.: Generalized BROOF-L2R: a general framework for learning to rank based on boosting and random forests. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 95–104 (2016)

    Google Scholar 

  8. Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15(1), 3133–3181 (2014)

    MathSciNet  MATH  Google Scholar 

  9. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., New York (2001)

    Book  MATH  Google Scholar 

  10. Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)

    Article  Google Scholar 

  11. Lane, J.: Let’s make science metrics more scientific. Nature 464(7288), 488–489 (2010)

    Article  Google Scholar 

  12. Li, M., Liu, L., Li, C.-B.: An approach to expert recommendation based on fuzzy linguistic method and fuzzy text classification in knowledge management systems. Exp. Syst. Appl. 38(7), 8586–8596 (2011)

    Article  Google Scholar 

  13. Macdonald, C., Ounis, I.: Voting techniques for expert search. Knowl. Inf. Syst. 16(3), 259–280 (2008)

    Article  Google Scholar 

  14. Moreira, C., Calado, P., Martins, B.: Learning to rank for expert search in digital libraries of Academic publications. In: Antunes, L., Pinto, H.S. (eds.) Progress in Artificial Intelligence, pp. 431–445. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  15. Niu, W., Liu, Z., Caverlee, J.: On local expert discovery via geo-located crowds, queries, and candidates. ACM Trans. Spatial Algorithms Syst. 2(4), 14:1–14:24 (2016)

    Article  Google Scholar 

  16. Qin, T., Liu, T.-Y., Xu, J., Li, H.: Letor: a benchmark collection for research on learning to rank for information retrieval. Inf. Retr. 13(4), 346–374 (2010)

    Article  Google Scholar 

  17. Ribeiro, I.S., Santos, R.L.T., Gonçalves, M.A., Laender, A.H.F.: On tag recommendation for expertise profiling: a case study in the scientific domain. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 189–198 (2015)

    Google Scholar 

  18. Ribeiro-Neto, B.A., Laender, A.H.F., de Lima, L.R.S.: An experimental study in automatically categorizing medical documents. JASIST 52(5), 391–401 (2001)

    Article  Google Scholar 

  19. Sanchez, D., Moreno, A.: Bringing taxonomic structure to large digital libraries. Int’l. J. Metadata Semant. Ontol. 2(2), 112–122 (2007)

    Article  Google Scholar 

  20. Seymour, E., Damle, R., Sette, A., Peters, B.: Cost sensitive hierarchical document classification to triage PubMed abstracts for manual curation. BMC Bioinform. 12(1), 482 (2011)

    Article  Google Scholar 

  21. Silla Jr., C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Mining Knowl. Disc. 22(1–2), 31–72 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  22. Srinivasan, V., Fox, E.: Progress towards automated ETD cataloging. In: Proceedings of the 19th International Symposium on Electronic theses, dissertations: Data and dissertations (2016)

    Google Scholar 

  23. Waltinger, U., Mehler, A., Lösch, M., Horstmann, W.: Hierarchical classification of OAI metadata using the DDC taxonomy. In: Bernardi, R., Anderson, S., Bjrn, C., Frdrique, G., Zaihrayeu, S. (eds.) Advanced Language Technologies for Digital Libraries, pp. 29–40. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  24. Yang, K.-W., Huh, S.-Y.: Automatic expert identification using a text categorization technique in knowledge management systems. Expert Syst. Appl. 34(2), 1445–1455 (2008)

    Article  Google Scholar 

  25. Yang, Y.: An evaluation of statistical approaches to text categorization. Inf. Retr. J. 1(1–2), 69–90 (1999)

    Article  MathSciNet  Google Scholar 

  26. Yang, Y., Gopal, S.: Multilabel classification with meta-level features in a learning-to-rank framework. Mach. Learn. 88(1), 47–68 (2012)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work was partially funded by projects InWeb (grant MCT/CNPq 573871/2008-6) and MASWeb (grant FAPEMIG/PRONEX APQ-01400-14), and by the authors’ individual grants from CAPES, CNPq and FAPEMIG.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alberto H. F. Laender .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

de Siqueira, G.O., Canuto, S., Gonçalves, M.A., Laender, A.H.F. (2017). Automatic Hierarchical Categorization of Research Expertise Using Minimum Information. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2017. Lecture Notes in Computer Science(), vol 10450. Springer, Cham. https://doi.org/10.1007/978-3-319-67008-9_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67008-9_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67007-2

  • Online ISBN: 978-3-319-67008-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics