ABSTRACT
We present part of Huawei's efforts in building a Product Knowledge Graph (PKG). We want to identify which product attributes (i.e. properties) are relevant and important in terms of shopping decisions to product categories (i.e. classes). This is particularly challenging when the attributes and their values are mined from online product catalogues, i.e. HTML pages. These web pages contain semi-structured data, which do not follow a concerted format and use diverse vocabulary to designate the same features. We propose a system for key attribute identification (KATIE) based on fine-tuning pre-trained models (e.g., DistilBERT) to predict the applicability and importance of an attribute to a category. We also propose an attribute synonyms identification module that allows us to discover synonymous attributes by considering not only their labels' similarities but also the similarity of their values sets. We have evaluated our approach to Huawei categories taxonomy and a set of internally mined attributes from web pages. KATIE guarantees promising performance results compared to the most recent baselines.
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https://doi.org/10.18653/v1/N19-1423Google ScholarCross Ref
- Xin Luna Dong. 2018. Challenges and Innovations in Building a Product Knowledge Graph. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, London United Kingdom, 2869--2869. https://doi.org/10.1145/3219819.3219938Google ScholarDigital Library
- Xin Luna Dong. 2019. Building a Broad Knowledge Graph for Products. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, Macao, China, 25--25. https://doi.org/10.1109/ICDE.2019.00010Google ScholarCross Ref
- Xin Luna Dong, Xiang He, Andrey Kan, Xian Li, Yan Liang, Jun Ma, Yifan Ethan Xu, Chenwei Zhang, Tong Zhao, Gabriel Blanco Saldana, Saurabh Deshpande, Alexandre Michetti Manduca, Jay Ren, Surender Pal Singh, Fan Xiao, Haw-Shiuan Chang, Giannis Karamanolakis, Yuning Mao, Yaqing Wang, Christos Faloutsos, Andrew McCallum, and Jiawei Han. 2020. AutoKnow: Self-Driving Knowledge Collection for Products of Thousands of Types. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, Virtual Event CA USA, 2724--2734. https://doi.org/10.1145/3394486.3403323Google ScholarDigital Library
- Zhiwei Hu, Víctor Gutiérrez-Basulto, Zhiliang Xiang, Xiaoli Li, Ru Li, and Jeff Z. Pan. 2022. Type-aware Embeddings for Multi-Hop Reasoning over Knowledge Graphs. In Proc. of the 31st International Joint Conference on Artificial Intelligence and the 25th European Conference on Artificial Intelligence (IJCAI-ECAI 22). 3078--3084.Google Scholar
- Saratchandra Indrakanti and Gyanit Singh. 2018. A Framework to Discover Significant Product Aspects from e-Commerce Product Reviews. In eCOM@SIGIR.Google Scholar
- Kushal Kumar and Anoop Saladi. 2022. PAVE: Lazy-MDP based Ensemble to Improve Recall of Product Attribute Extraction Models. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. ACM, Atlanta GA USA, 3233--3242. https://doi.org/10.1145/3511808.3557119Google ScholarDigital Library
- Feng-Lin Li, Hehong Chen, Guohai Xu, Tian Qiu, Feng Ji, Ji Zhang, and Haiqing Chen. 2020. AliMeKG: Domain Knowledge Graph Construction and Application in E-commerce. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. ACM, Virtual Event Ireland, 2581--2588. https://doi.org/10.1145/3340531.3412685Google ScholarDigital Library
- Ilya Loshchilov and Frank Hutter. 2019. DecoupledWeight Decay Regularization. http://arxiv.org/abs/1711.05101 arXiv:1711.05101 [cs, math].Google Scholar
- Xusheng Luo, Le Bo, Jinhang Wu, Lin Li, Zhiy Luo, Yonghua Yang, and Keping Yang. 2021. AliCoCo2: Commonsense Knowledge Extraction, Representation and Application in E-commerce. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD '21). Association for Computing Machinery, New York, NY, USA, 3385--3393. https://doi.org/10.1145/3447548. 3467203Google ScholarDigital Library
- Xusheng Luo, Luxin Liu, Yonghua Yang, Le Bo, Yuanpeng Cao, Jinghang Wu, Qiang Li, Keping Yang, and Kenny Q. Zhu. 2020. AliCoCo: Alibaba E-commerce Cognitive Concept Net. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD '20). Association for Computing Machinery, New York, NY, USA, 313--327. https://doi.org/10.1145/3318464.3386132Google ScholarDigital Library
- Karin Mauge, Khash Rohanimanesh, and Jean-David Ruvini. 2012. Structuring ecommerce inventory. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1 (ACL '12). Association for Computational Linguistics, USA, 805--814.Google Scholar
- Chris Mellish and Jeff Z. Pan. 2008. Natural Language Directed Inference from Ontologie". In Artificial Intelligence Journal.Google Scholar
- Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, and Armand Joulin. 2018. Advances in Pre-Training Distributed Word Representations. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan, 52--55. https://aclanthology.org/L18-1008Google Scholar
- J.Z. Pan, G. Vetere, J.M. Gomez-Perez, and H. Wu (Eds.). 2017. Exploiting Linked Data and Knowledge Graphs for Large Organisations. Springer.Google Scholar
- Jeff Z. Pan. 2009. Resource Description Framework. In Handbook on ontologies.Google Scholar
- Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3980--3990. https://doi.org/10.18653/v1/D19-1410Google ScholarCross Ref
- Yuan Ren, Artemis Parvizi, Chris Mellish, Jeff Z. Pan, Kees van Deemter, and Robert Stevens. 2014. Towards Competency Question-driven Ontology Authoring. In Proc. of 11th Conference on Extended Semantic Web Conference (ESWC 2014).Google ScholarCross Ref
- Julien Romero, Simon Razniewski, Koninika Pal, Jeff Z. Pan, Archit Sakhadeo, and Gerhard Weikum. 2019. Commonsense Properties from Query Logs and Question Answering Forums. In Proc. of 28th ACM International Conference on Information and Knowledge Management (CIKM 2019). 1411--1420.Google ScholarDigital Library
- Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2020. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. https: //doi.org/10.48550/arXiv.1910.01108 arXiv:1910.01108 [cs].Google ScholarCross Ref
- Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to Fine-Tune BERT for Text Classification?. In Chinese Computational Linguistics, Maosong Sun, Xuanjing Huang, Heng Ji, Zhiyuan Liu, and Yang Liu (Eds.). Springer International Publishing, Cham, 194--206.Google Scholar
- Shengjie Sun, Dong Yang, Hongchun Zhang, Yanxu Chen, Chao Wei, Xiaonan Meng, and Yi Hu. 2018. Important Attribute Identification in Knowledge Graph. https://doi.org/10.48550/arXiv.1810.05320 arXiv:1810.05320 [cs].Google ScholarCross Ref
- Haozheng Tian and Morgan White. 2020. A Pipeline of Aspect Detection and Sentiment Analysis for E-Commerce Customer Reviews. In eCOM@SIGIR.Google Scholar
- Joe Torraca. 2022. Etsy Engineering | Using Real-Time Streaming to Power Etsy's Offsite Ads. https://www.etsy.com/codeascraft/using-real-time-streaming-topower- etsy-offsite-ads?utm_source=OpenGraph&utm_medium=PageTools&utm_campaign=Share Section: Search, Ads, Recs.Google Scholar
- Da Xu, Chuanwei Ruan, Evren Korpeoglu, Sushant Kumar, and Kannan Achan. 2020. Product Knowledge Graph Embedding for E-Commerce. In Proceedings of the 13th International Conference on Web Search and Data Mining (Houston, TX, USA) (WSDM '20). Association for Computing Machinery, New York, NY, USA, 672--680. https://doi.org/10.1145/3336191.3371778Google ScholarDigital Library
Index Terms
- KATIE: A System for Key Attributes Identification in Product Knowledge Graph Construction
Recommendations
Heterogeneous star graph attention network for product attributes prediction
AbstractThe target of product attributes prediction is to complete the characteristics set for defining a particular product. Most of the existing methods treat the product attributes prediction as a Named-Entity Recognition (NER) problem from ...
Using knowledge-based systems to manage quality attributes in software product lines
SPLC '11: Proceedings of the 15th International Software Product Line Conference, Volume 2Product configuration in a feature model in software product line engineering is a process, in which the desired features are selected based on the customers' functional requirements and non-functional requirements. The functional requirements of the ...
A hybrid model words-driven approach for web product duplicate detection
CAiSE'13: Proceedings of the 25th international conference on Advanced Information Systems EngineeringThe detection of product duplicates is one of the challenges that Web shop aggregators are currently facing. In this paper, we focus on solving the problem of product duplicate detection on the Web. Our proposed method extends a state-of-the-art ...
Comments