Abstract
This paper proposes a semi-supervised relation extraction methodology to extract hypernymy (Is-A) relations. We developed a pattern learning-based model based on a "most reliable pattern". After each iteration, the algorithm generates trusted instances of hypernym–hyponym pairs using only a corpus of text and a set of seed instances as the input. Sentences are masked and extracted, and patterns are discovered and ranked. A pattern-matching algorithm generates pairs, and a scoring function appropriately filters pairs. The generated pairs are added to the initial seed set via a bootstrapping approach to facilitate further the iterative algorithm in generating a new trusted pair set. The work presented here is a semi-supervised approach, and to facilitate the experiments conducted, we are using two freely available public Wikipedia text corpus to extract hypernyms. We use Hearst patterns, an extended version of Hearst patterns (adding more patterns), and a dependency-based approach to form a base for comparison to our developed pattern learning approach. To evaluate the proposed algorithm, the hypernym–hyponym relations obtained are tested against five standard publicly available datasets, namely, BLESS, WBLESS, WEEDS, EVAL, and LEDS datasets as criteria for comparison. The results of the two Wikipedia text corpus and five evaluation datasets show that the pattern learning approach performs better than the three comparison base algorithms. The lack of heavy skewness in results across the two datasets also indicates that the algorithms implemented are independent of the corpus used and can be used on any large corpus.
Similar content being viewed by others
Data availability
The data that supports the findings of this study are available within the article as references.
References
Roller S, Kiela D, Nickel M. Hearst patterns revisited: Automatic hypernym detection from large text corpora. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 358–363. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-2057.
Lai S, Leung KS, Leung Y. SUNNYNLP at SemEval-2018 task 10: A support-vector-machine-based method for detecting semantic difference using taxonomy and word embedding features. In: Proceedings of The 12th International Workshop on Semantic Evaluation, pp. 741–746. Association for Computational Linguistics, New Orleans, Louisiana (2018). https://doi.org/10.18653/v1/S18-1118.
Nityasya M, Mahendra R, Adriani M. Hypernym-hyponym relation extraction from indonesian wikipedia text. In: Proceedings of the 2018 International Conference on Asian Language Processing (IALP), pp. 285–289 (2018). https://doi.org/10.1109/IALP.2018.8629216
Sheena N, Jasmine S, Joseph S. Automatic extraction of hypernym & meronym relations in english sentences using dependency parser. Procedia Computer Science. 2016;93:539–46. https://doi.org/10.1016/j.procs.2016.07.269.
Ustalov D, Panchenko A, Biemann C, Ponzetto SP. Unsupervised Sense-Aware Hypernymy Extraction. arXiv (2018). https://doi.org/10.48550/ARXIV.1809.06223.
Issa Alaa Aldine A, Harzallah M, Berio G, Béchet N, Faour A. DHPs: Dependency Hearst’s Patterns for Hypernym Relation Extraction, pp. 228–244 (2020). https://doi.org/10.1007/978-3-030-49559-6_11
Vallabhajosyula MS. Hypernym discovery over wordnet and english corpora - using hearst patterns and word embeddings (2018). https://hdl.handle.net/11299/200144
Bhatia J, Evans MC, Wadkar S, Breaux TD. Automated extraction of regulated information types using hyponymy relations. In: 2016 IEEE 24th International Requirements Engineering Conference Workshops (REW), pp. 19–25 (2016). https://doi.org/10.1109/REW.2016.018
Piasecki M, Szpakowicz S, Marcińczuk M, Broda B. Classification-Based Filtering of Semantic Relatedness in Hypernymy Extraction. 2008;5221:393–404. https://doi.org/10.1007/978-3-540-85287-2_38.
Yıldırım S, Yildiz T. Automatic extraction of turkish hypernym-hyponym pairs from large corpus, pp. 493–500 (2012)
Snow, R., Jurafsky, D., Ng, A.Y.: Semantic taxonomy induction from heterogenous evidence. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. ACL-44, pp. 801–808. Association for Computational Linguistics, USA (2006). https://doi.org/10.3115/1220175.1220276.
Yamane J, Takatani T, Yamada H, Miwa M, Sasaki Y. Distributional hypernym generation by jointly learning clusters and projections. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 1871–1879. The COLING 2016 Organizing Committee, Osaka, Japan (2016). https://aclanthology.org/C16-1176
Hearst MA. Automatic acquisition of hyponyms from large text corpora. In: COLING 1992 Volume 2: The 14th International Conference on Computational Linguistics (1992). https://aclanthology.org/C92-2082
Lyu S, Chen H. Relation Classification with Entity Type Restriction. arXiv (2021). https://doi.org/10.48550/ARXIV.2105.08393.
Bowman SR, Angeli G, Potts C, Manning CD A large annotated corpus for learning natural language inference. In: EMNLP (2015)
Levy O, Remus S, Biemann C, Dagan I. Do supervised distributional methods really learn lexical inference relations?, pp. 970–976 (2015). https://doi.org/10.3115/v1/N15-1098
Seitner J, Bizer C, Eckert K, Faralli S, Meusel R, Paulheim H, Ponzetto S.P. A large DataBase of hypernymy relations extracted from the web. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pp. 360–367. European Language Resources Association (ELRA), Portorož, Slovenia (2016). https://aclanthology.org/L16-1056
Baroni M, Lenci A. How we BLESSed distributional semantic evaluation. In: Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics, pages 1–10, Edinburgh, UK. Association for Computational Linguistics; 2011
Weeds J, Clarke D, Reffin J, Weir D, Keller B. Learning to distinguish hypernyms and co-hyponyms. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. Dublin City University and Association for Computational Linguistics; 2014. p. 2249–59.
Kober T, Weeds J, Bertolini L, Weir D. Data Augmentation for Hypernymy Detection. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume Online. Association for Computational Linguistics; 2021. p. 1034–48.
Santus E, Yung F, Lenci A, Huang C-R. Evalution 1.0: an evolving semantic dataset for training and evaluation of distributional semantic models. In: Proceedings of the 4th Workshop on Linked Data in Linguistics: Resources and Applications. 2015. p. 64–69.
Baroni M, Bernardi R, Do N-Q, Shan C-C. Entailment above the word level in distributional semantics. In: In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics; 2012. p. 23–32.
Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Funding
There is no funding required for this project.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Pinto, O., Gole, S., Srushti, H.P. et al. Relation Extraction: Hypernymy Discovery Using a Novel Pattern Learning Algorithm. SN COMPUT. SCI. 4, 730 (2023). https://doi.org/10.1007/s42979-023-02161-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-023-02161-w