Learning Domain Labels Using Conceptual Fingerprints: An In-Use Case Study in the Neurology Domain

Afzal, Zubair; Tsatsaronis, George; Doornenbal, Marius; Coupet, Pascal; Gregory, Michelle

doi:10.1007/978-3-319-49004-5_47

Learning Domain Labels Using Conceptual Fingerprints: An In-Use Case Study in the Neurology Domain

Zubair Afzal¹⁷,
George Tsatsaronis¹⁷,
Marius Doornenbal¹⁷,
Pascal Coupet¹⁷ &
…
Michelle Gregory¹⁷

Conference paper
First Online: 04 November 2016

2208 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10024))

Abstract

Modelling a science domain for the purposes of thematically categorizing the research work and enabling better browsing and search can be a daunting task, especially if a specialized taxonomy or ontology does not exist for this domain. Elsevier, the largest academic publisher, faces this challenge often, for the needs of supporting the journals submission system, but also for supplying ScienceDirect and Scopus, two flagship platforms of the company, with sufficient metadata, such as conceptual labels that characterize the research works, which can improve the user experience in browsing and searching the literature. In this paper we describe an Elsevier in-use case study of learning appropriate domain labels from a collection of 6, 357 full text articles in the neurology domain, exploring different document representations and clustering mechanisms. Besides the baseline approaches for document representation (e.g., bag-of-words) and their variations (e.g., n-grams), we employ a novel in-house methodology which produces conceptual fingerprints of the research articles, starting from a general domain taxonomy, such as the Medical Subject Headings (MeSH). A thorough empirical evaluation is presented, using a variety of clustering mechanisms and several validity indices to evaluate the resulting clusters. Our results summarize the best practices in modelling this specific domain and we report on the advantages and disadvantages of using the different clustering mechanisms and document representations that were examined, with the aim to learn appropriate conceptual labels for this domain.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Aggarwal, C.C., Zhai, C.: A survey of text clustering algorithms. In: Aggarwal, C.C., Zhai, C. (eds.) Mining Text Data, pp. 77–128. Springer, Heidelberg (2012)
Chapter Google Scholar
Batet, M., Valls, A., Gibert, K., Sánchez, D.: Semantic clustering using multiple ontologies. In: Proceedings of the 13th International Conference of the Catalan Association for Artificial Intelligence, pp. 207–216 (2010)
Google Scholar
Dagher, G.G., Fung, B.C.: Subject-based semantic document clustering for digital forensic investigations. Data Knowl. Eng. 86, 224–241 (2013)
Article Google Scholar
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1(2), 224–227 (1979)
Article Google Scholar
Dietze, H., Schroeder, M.: GoWeb: a semantic search engine for the life science web. BMC Bioinform. 10(S–10), 7 (2009)
Article Google Scholar
Fodeh, S.J., Punch, W.F., Tan, P.: On ontology-driven document clustering using core semantic features. Knowl. Inf. Syst. 28(2), 395–421 (2011)
Article Google Scholar
Funk, C., Baumgartner, W., Garcia, B., Roeder, C., Bada, M., Cohen, K.B., Hunter, L.E., Verspoor, K.: Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters. BMC Bioinform. 15(1), 1–29 (2014)
Article Google Scholar
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2–3), 107–145 (2001)
Article MATH Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Book MATH Google Scholar
Nasir, J.A., Varlamis, I., Karim, A., Tsatsaronis, G.: Semantic smoothing for text clustering. Knowl.-Based Syst. 54, 216–229 (2013)
Article Google Scholar
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Article MATH Google Scholar
Staab, S., Hotho, A.: Ontology-based text document clustering. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) IIPWM 2003, pp. 451–452. Springer, Heidelberg (2003)
Chapter Google Scholar
Tsatsaronis, G., Balikas, G., Malakasiotis, P., Partalas, I., Zschunke, M., Alvers, M.R., Weissenborn, D., Krithara, A., Petridis, S., Polychronopoulos, D., Almirantis, Y., Pavlopoulos, J., Baskiotis, N., Gallinari, P., Artières, T., Ngonga, A., Heino, N., Gaussier, É., Barrio-Alvers, L., Schroeder, M., Androutsopoulos, I., Paliouras, G.: An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinform. 16, 138 (2015)
Article Google Scholar
Vestdam, T., Rasmussen, H., Doornenbal, M.: Black magic meta data - get a glimpse behind the scene. Procedia Comput. Sci. 33, 239–244 (2014)
Article Google Scholar
Willet, P.: Document clustering using an inverted file approach. J. Inf. Sci. 2, 223–231 (1980)
Article Google Scholar
Zhao,Y., Karypis, G.: Topic-driven clustering for document datasets. In: Proceedings of the SDM, pp. 358–369 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Content and Innovation Group, Operations Division, Elsevier B.V., Radarweg 29, 1043 NX, Amsterdam, The Netherlands
Zubair Afzal, George Tsatsaronis, Marius Doornenbal, Pascal Coupet & Michelle Gregory

Authors

Zubair Afzal
View author publications
You can also search for this author in PubMed Google Scholar
George Tsatsaronis
View author publications
You can also search for this author in PubMed Google Scholar
Marius Doornenbal
View author publications
You can also search for this author in PubMed Google Scholar
Pascal Coupet
View author publications
You can also search for this author in PubMed Google Scholar
Michelle Gregory
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to George Tsatsaronis .

Editor information

Editors and Affiliations

Linköping University, Linköping, Sweden
Eva Blomqvist
University of Bologna, Bologna, Italy
Paolo Ciancarini
University of Bologna, Bologna, Italy
Francesco Poggi
University of Bologna, Bologna, Italy
Fabio Vitali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Afzal, Z., Tsatsaronis, G., Doornenbal, M., Coupet, P., Gregory, M. (2016). Learning Domain Labels Using Conceptual Fingerprints: An In-Use Case Study in the Neurology Domain. In: Blomqvist, E., Ciancarini, P., Poggi, F., Vitali, F. (eds) Knowledge Engineering and Knowledge Management. EKAW 2016. Lecture Notes in Computer Science(), vol 10024. Springer, Cham. https://doi.org/10.1007/978-3-319-49004-5_47

Download citation

DOI: https://doi.org/10.1007/978-3-319-49004-5_47
Published: 04 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49003-8
Online ISBN: 978-3-319-49004-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics