ABSTRACT
\beginabstract High quality taxonomies play a critical role in various domains such as e-commerce, web search and ontology engineering. While there has been extensive work on expanding taxonomies from externally mined data, there has been less attention paid to enriching taxonomies by exploiting existing concepts and structure within the taxonomy. In this work, we show the usefulness of this kind of enrichment, and explore its viability with a new taxonomy completion system ICON (I mplicit CON cept Insertion). ICON generates new concepts by identifying implicit concepts based on the existing concept structure, generating names for such concepts and inserting them in appropriate positions within the taxonomy. ICON integrates techniques from entity retrieval, text summary, and subsumption prediction; this modular architecture offers high flexibility while achieving state-of-the-art performance. We have evaluated ICON on two e-commerce taxonomies, and the results show that it offers significant advantages over strong baselines including recent taxonomy completion models and the large language model, ChatGPT.
Supplemental Material
- Franz Baader, Bernhard Hollunder, Bernhard Nebel, Hans-Jürgen Profitlich, and Enrico Franconi. 1994. Am empirical analysis of optimization techniques for terminological representation systems: Or: Making KRIS get a move on. Applied Intelligence 4 (1994), 109--132.Google ScholarCross Ref
- Franz Baader, Ian Horrocks, Carsten Lutz, and Uli Sattler. 2017. Introduction to description logic. Cambridge University Press.Google Scholar
- Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877--1901.Google Scholar
- Jiaoyan Chen, Yuan He, Ernesto Jimenez-Ruiz, Hang Dong, and Ian Horrocks. 2022. Contextual Semantic Embeddings for Ontology Subsumption Prediction. arXiv preprint arXiv:2202.09791 (2022).Google Scholar
- Bhuwan Dhingra, Christopher J Shallue, Mohammad Norouzi, Andrew M Dai, and George E Dahl. 2018. Embedding text in hyperbolic spaces. arXiv preprint arXiv:1806.04313 (2018).Google Scholar
- Hang Dong, Jiaoyan Chen, Yuan He, and Ian Horrocks. 2023. Ontology Enrich- ment from Texts: A Biomedical Dataset for Concept Discovery and Placement. In Proceedings of the 32nd ACM International Conference on Information & Knowl- edge Management (Birmingham, United Kingdom). Association for Computing Machinery, New York, NY, USA, 5 pages. https://doi.org/10.1145/3583780.3615126Google ScholarDigital Library
- Hang Dong, Jiaoyan Chen, Yuan He, Yinan Liu, and Ian Horrocks. 2023. Reveal the Unknown: Out-of-Knowledge-Base Mention Discovery with Entity Linking. In Proceedings of the 32nd ACM International Conference on Information & Knowledge Management (Birmingham, United Kingdom). Association for Computing Ma- chinery, New York, NY, USA, 11 pages. https://doi.org/10.1145/3583780.3615036Google ScholarDigital Library
- Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821 (2021).Google Scholar
- Birte Glimm, Ian Horrocks, Boris Motik, Rob Shearer, and Giorgos Stoilos. 2012. A novel approach to ontology classification. Journal of Web Semantics 14 (2012), 84--101.Google ScholarDigital Library
- Yuhang Guo, Wanxiang Che, Ting Liu, and Sheng Li. 2011. A graph-based method for entity linking. In Proceedings of 5th International Joint Conference on Natural Language Processing. 1010--1018.Google Scholar
- Nicolas Heist and Heiko Paulheim. 2023. NASTyLinker: NIL-Aware Scalable Transformer-Based Entity Linker. In The Semantic Web - 20th International Con- ference, ESWC 2023, Hersonissos, Crete, Greece, May 28 - June 1, 2023, Proceedings (Lecture Notes in Computer Science, Vol. 13870), Catia Pesquita, Ernesto Jiménez- Ruiz, Jamie P. McCusker, Daniel Faria, Mauro Dragoni, Anastasia Dimou, Raphaël Troncy, and Sven Hertling (Eds.). Springer, Cham, 174--191.Google Scholar
- Minhao Jiang, Xiangchen Song, Jieyu Zhang, and Jiawei Han. 2022. Taxoenrich: Self-supervised taxonomy completion via structure-semantic representations. In Proceedings of the ACM Web Conference 2022. 925--934.Google ScholarDigital Library
- Daniel Jurafsky and James H. Martin. 2023. Speech and Language Processing (3rd Edition). Online, Chapter 10 Transformers and Pretrained Language Models.Google Scholar
- Zornitsa Kozareva and Eduard Hovy. 2010. A semi-supervised method to learn and construct taxonomies using the web. In Proceedings of the 2010 conference on empirical methods in natural language processing. 1110--1118.Google Scholar
- Matt Le, Stephen Roller, Laetitia Papaxanthos, Douwe Kiela, and Maximilian Nickel. 2019. Inferring concept hierarchies from text corpora via hyperbolic embeddings. arXiv preprint arXiv:1902.00913 (2019).Google Scholar
- Carolyn E Lipscomb. 2000. Medical subject headings (MeSH). Bulletin of the Medical Library Association 88, 3 (2000), 265.Google Scholar
- Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. arXiv:1711.05101 [cs.LG]Google Scholar
- Mingyu Derek Ma, Muhao Chen, Te-Lin Wu, and Nanyun Peng. 2021. Hyper- expan: Taxonomy expansion with hyperbolic representation learning. arXiv preprint arXiv:2109.10500 (2021).Google Scholar
- Emaad Manzoor, Rui Li, Dhananjay Shrouty, and Jure Leskovec. 2020. Expanding taxonomies with implicit edge semantics. In Proceedings of The Web Conference 2020. 2044--2054.Google ScholarDigital Library
- Yuning Mao, Xiang Ren, Jiaming Shen, Xiaotao Gu, and Jiawei Han. 2018. End- to-end reinforcement learning for automatic taxonomy induction. arXiv preprint arXiv:1805.04044 (2018).Google Scholar
- George A Miller. 1995. WordNet: a lexical database for English. Commun. ACM 38, 11 (1995), 39--41.Google ScholarDigital Library
- Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettlemoyer. 2022. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 11048--11064. https://doi.org/10. 18653/v1/2022.emnlp-main.759Google ScholarCross Ref
- Roberto Navigli, Paola Velardi, and Stefano Faralli. 2011. A graph-based algorithm for inducing lexical taxonomies from scratch. In IJCAI, Vol. 11. 1872--1877.Google Scholar
- Maximillian Nickel and Douwe Kiela. 2017. Poincaré embeddings for learning hierarchical representations. Advances in neural information processing systems 30 (2017).Google Scholar
- Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730--27744.Google Scholar
- Simone Paolo Ponzetto, Michael Strube, et al. 2007. Deriving a large scale taxon- omy from Wikipedia. In AAAI, Vol. 7. 1440--1445.Google Scholar
- Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research 21, 140 (2020), 1--67. http://jmlr.org/papers/v21/20-074.htmlGoogle Scholar
- Jiaming Shen, Zhihong Shen, Chenyan Xiong, Chi Wang, Kuansan Wang, and Jiawei Han. 2020. TaxoExpan: Self-supervised taxonomy expansion with position- enhanced graph neural network. In Proceedings of The Web Conference 2020. 486--497.Google ScholarDigital Library
- Jiaming Shen, Zeqiu Wu, Dongming Lei, Chao Zhang, Xiang Ren, Michelle T Vanni, Brian M Sadler, and Jiawei Han. 2018. Hiexpan: Task-guided taxonomy construction by hierarchical tree expansion. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2180-- 2189.Google ScholarDigital Library
- Wei Shen, Jianyong Wang, Ping Luo, and Min Wang. 2012. A graph-based approach for ontology population with named entities. In Proceedings of the 21st ACM international conference on Information and knowledge management. 345--354.Google ScholarDigital Library
- Jingchuan Shi, Jiaoyan Chen, Hang Dong, Ishita Khan, Lizzie Liang, Qunzhi Zhou, Zhe Wu, and Ian Horrocks. 2023. Subsumption Prediction for E-Commerce Taxonomies. In European Semantic Web Conference. Springer, 244--261.Google Scholar
- Nikhita Vedula, Patrick K Nicholson, Deepak Ajwani, Sourav Dutta, Alessandra Sala, and Srinivasan Parthasarathy. 2018. Enriching taxonomies with functional domain knowledge. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 745--754.Google ScholarDigital Library
- Chengyu Wang, Xiaofeng He, and Aoying Zhou. 2017. A Short Survey on Taxonomy Learning from Text Corpora: Issues, Resources and Recent Advances. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Copenhagen, Denmark, 1190--1203. https://doi.org/10.18653/v1/D17--1123Google ScholarCross Ref
- Suyuchen Wang, Ruihui Zhao, Xi Chen, Yefeng Zheng, and Bang Liu. 2021. Enquire one's parent and child before decision: Fully exploit hierarchical structure for self-supervised taxonomy expansion. In Proceedings of the Web Conference 2021. 3291--3304.Google ScholarDigital Library
- Suyuchen Wang, Ruihui Zhao, Yefeng Zheng, and Bang Liu. 2022. Qen: Applicable taxonomy completion via evaluating full taxonomic relations. In Proceedings of the ACM Web Conference 2022. 1008--1017.Google ScholarDigital Library
- Wentao Wu, Hongsong Li, Haixun Wang, and Kenny Q Zhu. 2012. Probase: A probabilistic taxonomy for text understanding. In Proceedings of the 2012 ACM SIGMOD international conference on management of data. 481--492.Google ScholarDigital Library
- Yue Yu, Yinghao Li, Jiaming Shen, Hao Feng, Jimeng Sun, and Chao Zhang. 2020. Steam: Self-supervised taxonomy expansion with mini-paths. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1026--1035.Google ScholarDigital Library
- Qingkai Zeng, Jinfeng Lin, Wenhao Yu, Jane Cleland-Huang, and Meng Jiang. 2021. Enhancing taxonomy completion with concept generation via fusing relational representations. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2104--2113.Google ScholarDigital Library
- Jieyu Zhang, Xiangchen Song, Ying Zeng, Jiaze Chen, Jiaming Shen, Yuning Mao, and Lei Li. 2021. Taxonomy completion via triplet matching network. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 4662--4670.Google ScholarCross Ref
- Tianyi Zhang*, Varsha Kishore*, Felix Wu*, Kilian Q. Weinberger, and Yoav Artzi. 2020. BERTScore: Evaluating Text Generation with BERT. In International Confer- ence on Learning Representations. https://openreview.net/forum?id=SkeHuCVFDrGoogle Scholar
Index Terms
- Taxonomy Completion via Implicit Concept Insertion
Recommendations
Enhancing Taxonomy Completion with Concept Generation via Fusing Relational Representations
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data MiningAutomatic construction of a taxonomy supports many applications in e-commerce, web search, and question answering. Existing taxonomy expansion or completion methods assume that new concepts have been accurately extracted and their embedding vectors ...
TaxoEnrich: Self-Supervised Taxonomy Completion via Structure-Semantic Representations
WWW '22: Proceedings of the ACM Web Conference 2022Taxonomies are fundamental to many real-world applications in various domains, serving as structural representations of knowledge. To deal with the increasing volume of new concepts needed to be organized as taxonomies, researchers turn to automatically ...
TaxoComplete: Self-Supervised Taxonomy Completion Leveraging Position-Enhanced Semantic Matching
WWW '23: Proceedings of the ACM Web Conference 2023Taxonomies are used to organize knowledge in many applications, including recommender systems, content browsing, or web search. With the emergence of new concepts, static taxonomies become obsolete as they fail to capture up-to-date knowledge. Several ...
Comments