skip to main content
10.1145/3539618.3591846acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper
Open Access

KATIE: A System for Key Attributes Identification in Product Knowledge Graph Construction

Published:18 July 2023Publication History

ABSTRACT

We present part of Huawei's efforts in building a Product Knowledge Graph (PKG). We want to identify which product attributes (i.e. properties) are relevant and important in terms of shopping decisions to product categories (i.e. classes). This is particularly challenging when the attributes and their values are mined from online product catalogues, i.e. HTML pages. These web pages contain semi-structured data, which do not follow a concerted format and use diverse vocabulary to designate the same features. We propose a system for key attribute identification (KATIE) based on fine-tuning pre-trained models (e.g., DistilBERT) to predict the applicability and importance of an attribute to a category. We also propose an attribute synonyms identification module that allows us to discover synonymous attributes by considering not only their labels' similarities but also the similarity of their values sets. We have evaluated our approach to Huawei categories taxonomy and a set of internally mined attributes from web pages. KATIE guarantees promising performance results compared to the most recent baselines.

References

  1. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https://doi.org/10.18653/v1/N19-1423Google ScholarGoogle ScholarCross RefCross Ref
  2. Xin Luna Dong. 2018. Challenges and Innovations in Building a Product Knowledge Graph. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, London United Kingdom, 2869--2869. https://doi.org/10.1145/3219819.3219938Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Xin Luna Dong. 2019. Building a Broad Knowledge Graph for Products. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, Macao, China, 25--25. https://doi.org/10.1109/ICDE.2019.00010Google ScholarGoogle ScholarCross RefCross Ref
  4. Xin Luna Dong, Xiang He, Andrey Kan, Xian Li, Yan Liang, Jun Ma, Yifan Ethan Xu, Chenwei Zhang, Tong Zhao, Gabriel Blanco Saldana, Saurabh Deshpande, Alexandre Michetti Manduca, Jay Ren, Surender Pal Singh, Fan Xiao, Haw-Shiuan Chang, Giannis Karamanolakis, Yuning Mao, Yaqing Wang, Christos Faloutsos, Andrew McCallum, and Jiawei Han. 2020. AutoKnow: Self-Driving Knowledge Collection for Products of Thousands of Types. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, Virtual Event CA USA, 2724--2734. https://doi.org/10.1145/3394486.3403323Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Zhiwei Hu, Víctor Gutiérrez-Basulto, Zhiliang Xiang, Xiaoli Li, Ru Li, and Jeff Z. Pan. 2022. Type-aware Embeddings for Multi-Hop Reasoning over Knowledge Graphs. In Proc. of the 31st International Joint Conference on Artificial Intelligence and the 25th European Conference on Artificial Intelligence (IJCAI-ECAI 22). 3078--3084.Google ScholarGoogle Scholar
  6. Saratchandra Indrakanti and Gyanit Singh. 2018. A Framework to Discover Significant Product Aspects from e-Commerce Product Reviews. In eCOM@SIGIR.Google ScholarGoogle Scholar
  7. Kushal Kumar and Anoop Saladi. 2022. PAVE: Lazy-MDP based Ensemble to Improve Recall of Product Attribute Extraction Models. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. ACM, Atlanta GA USA, 3233--3242. https://doi.org/10.1145/3511808.3557119Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Feng-Lin Li, Hehong Chen, Guohai Xu, Tian Qiu, Feng Ji, Ji Zhang, and Haiqing Chen. 2020. AliMeKG: Domain Knowledge Graph Construction and Application in E-commerce. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. ACM, Virtual Event Ireland, 2581--2588. https://doi.org/10.1145/3340531.3412685Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ilya Loshchilov and Frank Hutter. 2019. DecoupledWeight Decay Regularization. http://arxiv.org/abs/1711.05101 arXiv:1711.05101 [cs, math].Google ScholarGoogle Scholar
  10. Xusheng Luo, Le Bo, Jinhang Wu, Lin Li, Zhiy Luo, Yonghua Yang, and Keping Yang. 2021. AliCoCo2: Commonsense Knowledge Extraction, Representation and Application in E-commerce. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD '21). Association for Computing Machinery, New York, NY, USA, 3385--3393. https://doi.org/10.1145/3447548. 3467203Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Xusheng Luo, Luxin Liu, Yonghua Yang, Le Bo, Yuanpeng Cao, Jinghang Wu, Qiang Li, Keping Yang, and Kenny Q. Zhu. 2020. AliCoCo: Alibaba E-commerce Cognitive Concept Net. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD '20). Association for Computing Machinery, New York, NY, USA, 313--327. https://doi.org/10.1145/3318464.3386132Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Karin Mauge, Khash Rohanimanesh, and Jean-David Ruvini. 2012. Structuring ecommerce inventory. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1 (ACL '12). Association for Computational Linguistics, USA, 805--814.Google ScholarGoogle Scholar
  13. Chris Mellish and Jeff Z. Pan. 2008. Natural Language Directed Inference from Ontologie". In Artificial Intelligence Journal.Google ScholarGoogle Scholar
  14. Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, and Armand Joulin. 2018. Advances in Pre-Training Distributed Word Representations. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan, 52--55. https://aclanthology.org/L18-1008Google ScholarGoogle Scholar
  15. J.Z. Pan, G. Vetere, J.M. Gomez-Perez, and H. Wu (Eds.). 2017. Exploiting Linked Data and Knowledge Graphs for Large Organisations. Springer.Google ScholarGoogle Scholar
  16. Jeff Z. Pan. 2009. Resource Description Framework. In Handbook on ontologies.Google ScholarGoogle Scholar
  17. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3980--3990. https://doi.org/10.18653/v1/D19-1410Google ScholarGoogle ScholarCross RefCross Ref
  18. Yuan Ren, Artemis Parvizi, Chris Mellish, Jeff Z. Pan, Kees van Deemter, and Robert Stevens. 2014. Towards Competency Question-driven Ontology Authoring. In Proc. of 11th Conference on Extended Semantic Web Conference (ESWC 2014).Google ScholarGoogle ScholarCross RefCross Ref
  19. Julien Romero, Simon Razniewski, Koninika Pal, Jeff Z. Pan, Archit Sakhadeo, and Gerhard Weikum. 2019. Commonsense Properties from Query Logs and Question Answering Forums. In Proc. of 28th ACM International Conference on Information and Knowledge Management (CIKM 2019). 1411--1420.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2020. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. https: //doi.org/10.48550/arXiv.1910.01108 arXiv:1910.01108 [cs].Google ScholarGoogle ScholarCross RefCross Ref
  21. Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to Fine-Tune BERT for Text Classification?. In Chinese Computational Linguistics, Maosong Sun, Xuanjing Huang, Heng Ji, Zhiyuan Liu, and Yang Liu (Eds.). Springer International Publishing, Cham, 194--206.Google ScholarGoogle Scholar
  22. Shengjie Sun, Dong Yang, Hongchun Zhang, Yanxu Chen, Chao Wei, Xiaonan Meng, and Yi Hu. 2018. Important Attribute Identification in Knowledge Graph. https://doi.org/10.48550/arXiv.1810.05320 arXiv:1810.05320 [cs].Google ScholarGoogle ScholarCross RefCross Ref
  23. Haozheng Tian and Morgan White. 2020. A Pipeline of Aspect Detection and Sentiment Analysis for E-Commerce Customer Reviews. In eCOM@SIGIR.Google ScholarGoogle Scholar
  24. Joe Torraca. 2022. Etsy Engineering | Using Real-Time Streaming to Power Etsy's Offsite Ads. https://www.etsy.com/codeascraft/using-real-time-streaming-topower- etsy-offsite-ads?utm_source=OpenGraph&utm_medium=PageTools&utm_campaign=Share Section: Search, Ads, Recs.Google ScholarGoogle Scholar
  25. Da Xu, Chuanwei Ruan, Evren Korpeoglu, Sushant Kumar, and Kannan Achan. 2020. Product Knowledge Graph Embedding for E-Commerce. In Proceedings of the 13th International Conference on Web Search and Data Mining (Houston, TX, USA) (WSDM '20). Association for Computing Machinery, New York, NY, USA, 672--680. https://doi.org/10.1145/3336191.3371778Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. KATIE: A System for Key Attributes Identification in Product Knowledge Graph Construction

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Article Metrics

              • Downloads (Last 12 months)284
              • Downloads (Last 6 weeks)51

              Other Metrics

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader