KATIE: A System for Key Attributes Identification in Product Knowledge Graph Construction

Authors:
Btissam Er-Rahmadi

Huawei Technologies R&D UK, Edinburgh, United Kingdom

Huawei Technologies R&D UK, Edinburgh, United Kingdom

0000-0003-0526-661X
View Profile

,
Arturo Oncevay

The University of Edinburgh, Edinburgh, United Kingdom

The University of Edinburgh, Edinburgh, United Kingdom

0000-0001-7675-6208
View Profile

,
Yuanyi Ji

Huawei Technologies R&D UK, Edinburgh, United Kingdom

Huawei Technologies R&D UK, Edinburgh, United Kingdom

0009-0004-9700-9438
View Profile

,
Jeff Z. Pan

Huawei Technologies R&D UK & The University of Edinburgh, Edinburgh, United Kingdom

Huawei Technologies R&D UK & The University of Edinburgh, Edinburgh, United Kingdom

0000-0002-9779-2088
View Profile

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information RetrievalJuly 2023Pages 3320–3324https://doi.org/10.1145/3539618.3591846

Published:18 July 2023Publication History

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 3320–3324

ABSTRACT

We present part of Huawei's efforts in building a Product Knowledge Graph (PKG). We want to identify which product attributes (i.e. properties) are relevant and important in terms of shopping decisions to product categories (i.e. classes). This is particularly challenging when the attributes and their values are mined from online product catalogues, i.e. HTML pages. These web pages contain semi-structured data, which do not follow a concerted format and use diverse vocabulary to designate the same features. We propose a system for key attribute identification (KATIE) based on fine-tuning pre-trained models (e.g., DistilBERT) to predict the applicability and importance of an attribute to a category. We also propose an attribute synonyms identification module that allows us to discover synonymous attributes by considering not only their labels' similarities but also the similarity of their values sets. We have evaluated our approach to Huawei categories taxonomy and a set of internally mined attributes from web pages. KATIE guarantees promising performance results compared to the most recent baselines.

References

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https://doi.org/10.18653/v1/N19-1423Google ScholarCross Ref
Xin Luna Dong. 2018. Challenges and Innovations in Building a Product Knowledge Graph. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, London United Kingdom, 2869--2869. https://doi.org/10.1145/3219819.3219938Google ScholarDigital Library
Xin Luna Dong. 2019. Building a Broad Knowledge Graph for Products. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, Macao, China, 25--25. https://doi.org/10.1109/ICDE.2019.00010Google ScholarCross Ref
Xin Luna Dong, Xiang He, Andrey Kan, Xian Li, Yan Liang, Jun Ma, Yifan Ethan Xu, Chenwei Zhang, Tong Zhao, Gabriel Blanco Saldana, Saurabh Deshpande, Alexandre Michetti Manduca, Jay Ren, Surender Pal Singh, Fan Xiao, Haw-Shiuan Chang, Giannis Karamanolakis, Yuning Mao, Yaqing Wang, Christos Faloutsos, Andrew McCallum, and Jiawei Han. 2020. AutoKnow: Self-Driving Knowledge Collection for Products of Thousands of Types. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, Virtual Event CA USA, 2724--2734. https://doi.org/10.1145/3394486.3403323Google ScholarDigital Library
Zhiwei Hu, Víctor Gutiérrez-Basulto, Zhiliang Xiang, Xiaoli Li, Ru Li, and Jeff Z. Pan. 2022. Type-aware Embeddings for Multi-Hop Reasoning over Knowledge Graphs. In Proc. of the 31st International Joint Conference on Artificial Intelligence and the 25th European Conference on Artificial Intelligence (IJCAI-ECAI 22). 3078--3084.Google Scholar
Saratchandra Indrakanti and Gyanit Singh. 2018. A Framework to Discover Significant Product Aspects from e-Commerce Product Reviews. In eCOM@SIGIR.Google Scholar
Kushal Kumar and Anoop Saladi. 2022. PAVE: Lazy-MDP based Ensemble to Improve Recall of Product Attribute Extraction Models. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. ACM, Atlanta GA USA, 3233--3242. https://doi.org/10.1145/3511808.3557119Google ScholarDigital Library
Feng-Lin Li, Hehong Chen, Guohai Xu, Tian Qiu, Feng Ji, Ji Zhang, and Haiqing Chen. 2020. AliMeKG: Domain Knowledge Graph Construction and Application in E-commerce. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. ACM, Virtual Event Ireland, 2581--2588. https://doi.org/10.1145/3340531.3412685Google ScholarDigital Library
Ilya Loshchilov and Frank Hutter. 2019. DecoupledWeight Decay Regularization. http://arxiv.org/abs/1711.05101 arXiv:1711.05101 [cs, math].Google Scholar
Xusheng Luo, Le Bo, Jinhang Wu, Lin Li, Zhiy Luo, Yonghua Yang, and Keping Yang. 2021. AliCoCo2: Commonsense Knowledge Extraction, Representation and Application in E-commerce. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD '21). Association for Computing Machinery, New York, NY, USA, 3385--3393. https://doi.org/10.1145/3447548. 3467203Google ScholarDigital Library
Xusheng Luo, Luxin Liu, Yonghua Yang, Le Bo, Yuanpeng Cao, Jinghang Wu, Qiang Li, Keping Yang, and Kenny Q. Zhu. 2020. AliCoCo: Alibaba E-commerce Cognitive Concept Net. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD '20). Association for Computing Machinery, New York, NY, USA, 313--327. https://doi.org/10.1145/3318464.3386132Google ScholarDigital Library
Karin Mauge, Khash Rohanimanesh, and Jean-David Ruvini. 2012. Structuring ecommerce inventory. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1 (ACL '12). Association for Computational Linguistics, USA, 805--814.Google Scholar
Chris Mellish and Jeff Z. Pan. 2008. Natural Language Directed Inference from Ontologie". In Artificial Intelligence Journal.Google Scholar
Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, and Armand Joulin. 2018. Advances in Pre-Training Distributed Word Representations. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan, 52--55. https://aclanthology.org/L18-1008Google Scholar
J.Z. Pan, G. Vetere, J.M. Gomez-Perez, and H. Wu (Eds.). 2017. Exploiting Linked Data and Knowledge Graphs for Large Organisations. Springer.Google Scholar
Jeff Z. Pan. 2009. Resource Description Framework. In Handbook on ontologies.Google Scholar
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3980--3990. https://doi.org/10.18653/v1/D19-1410Google ScholarCross Ref
Yuan Ren, Artemis Parvizi, Chris Mellish, Jeff Z. Pan, Kees van Deemter, and Robert Stevens. 2014. Towards Competency Question-driven Ontology Authoring. In Proc. of 11th Conference on Extended Semantic Web Conference (ESWC 2014).Google ScholarCross Ref
Julien Romero, Simon Razniewski, Koninika Pal, Jeff Z. Pan, Archit Sakhadeo, and Gerhard Weikum. 2019. Commonsense Properties from Query Logs and Question Answering Forums. In Proc. of 28th ACM International Conference on Information and Knowledge Management (CIKM 2019). 1411--1420.Google ScholarDigital Library
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2020. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. https: //doi.org/10.48550/arXiv.1910.01108 arXiv:1910.01108 [cs].Google ScholarCross Ref
Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to Fine-Tune BERT for Text Classification?. In Chinese Computational Linguistics, Maosong Sun, Xuanjing Huang, Heng Ji, Zhiyuan Liu, and Yang Liu (Eds.). Springer International Publishing, Cham, 194--206.Google Scholar
Shengjie Sun, Dong Yang, Hongchun Zhang, Yanxu Chen, Chao Wei, Xiaonan Meng, and Yi Hu. 2018. Important Attribute Identification in Knowledge Graph. https://doi.org/10.48550/arXiv.1810.05320 arXiv:1810.05320 [cs].Google ScholarCross Ref
Haozheng Tian and Morgan White. 2020. A Pipeline of Aspect Detection and Sentiment Analysis for E-Commerce Customer Reviews. In eCOM@SIGIR.Google Scholar
Joe Torraca. 2022. Etsy Engineering | Using Real-Time Streaming to Power Etsy's Offsite Ads. https://www.etsy.com/codeascraft/using-real-time-streaming-topower- etsy-offsite-ads?utm_source=OpenGraph&utm_medium=PageTools&utm_campaign=Share Section: Search, Ads, Recs.Google Scholar
Da Xu, Chuanwei Ruan, Evren Korpeoglu, Sushant Kumar, and Kannan Achan. 2020. Product Knowledge Graph Embedding for E-Commerce. In Proceedings of the 13th International Conference on Web Search and Data Mining (Houston, TX, USA) (WSDM '20). Association for Computing Machinery, New York, NY, USA, 672--680. https://doi.org/10.1145/3336191.3371778Google ScholarDigital Library

Index Terms

KATIE: A System for Key Attributes Identification in Product Knowledge Graph Construction
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
        Supervised learning by regression
    2. Machine learning approaches
      1. Neural networks
2. Information systems
  1. World Wide Web
    1. Web applications
      1. Electronic commerce
        E-commerce infrastructure

Recommendations

Heterogeneous star graph attention network for product attributes prediction
Abstract
The target of product attributes prediction is to complete the characteristics set for defining a particular product. Most of the existing methods treat the product attributes prediction as a Named-Entity Recognition (NER) problem from ...
Read More
Using knowledge-based systems to manage quality attributes in software product lines
SPLC '11: Proceedings of the 15th International Software Product Line Conference, Volume 2

Product configuration in a feature model in software product line engineering is a process, in which the desired features are selected based on the customers' functional requirements and non-functional requirements. The functional requirements of the ...
Read More
A hybrid model words-driven approach for web product duplicate detection
CAiSE'13: Proceedings of the 25th international conference on Advanced Information Systems Engineering

The detection of product duplicates is one of the challenges that Web shop aggregators are currently facing. In this paper, we focus on solving the problem of product duplicate detection on the Web. Our proposed method extends a state-of-the-art ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2023
3567 pages
ISBN:9781450394086
DOI:10.1145/3539618
General Chairs:
Hsin-Hsi Chen
National Taiwan University
,
Wei-Jou (Edward) Duh
National Taiwan University
,
Hen-Hsen Huang
Academia Sinica
,
Program Chairs:
Makoto P. Kato
Spotify
,
Josiane Mothe
Universite de Toulouse
,
Barbara Poblete
University of Chile and Amazon Visiting Academic
Copyright © 2023 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 July 2023
Check for updates
Author Tags
entity resolution
fine-tuning
pre-trained language model
product knowledge graph
relation discovery
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 284
  Total Downloads
- Downloads (Last 12 months)284
- Downloads (Last 6 weeks)51
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

KATIE: A System for Key Attributes Identification in Product Knowledge Graph Construction

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Heterogeneous star graph attention network for product attributes prediction

Using knowledge-based systems to manage quality attributes in software product lines

A hybrid model words-driven approach for web product duplicate detection