skip to main content
10.1145/3289600.3291020acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Lightweight Lexical and Semantic Evidence for Detecting Classes Among Wikipedia Articles

Published:30 January 2019Publication History

ABSTRACT

A supervised method relies on simple, lightweight features in order to distinguish Wikipedia articles that are classes (Shield volcano) from other articles (Kilauea). The features are lexical or semantic in nature. Experimental results in multiple languages over multiple evaluation sets demonstrate the superiority of the proposed method over previous work.

References

  1. C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. 2009. DBpedia - a Crystallization Point for the Web of Data. Journal of Web Semantics, Vol. 7, 3 (2009), 154--165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Blanco, G. Ottaviano, and E. Meij. 2015. Fast and Space-Efficient Entity Linking in Queries. In Proceedings of the 8th ACM Conference on Web Search and Data Mining (WSDM-15). Shanghai, China, 179--188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. 2008. Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge. In Proceedings of the 2008 International Conference on Management of Data (SIGMOD-08) . Vancouver, Canada, 1247--1250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Chen, A. Fisch, J. Weston, and A. Bordes. 2017. Reading Wikipedia to Answer Open-Domain Questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL-17) . Vancouver, Canada, 1870--1879.Google ScholarGoogle Scholar
  5. A. Chisholm and B. Hachey. 2015. Entity disambiguation with Web links. Transactions of the Association for Computational Linguistics, Vol. 3 (2015), 145--156.Google ScholarGoogle ScholarCross RefCross Ref
  6. P. Downing. 1977. On the Creation and Use of English Compound Nouns. Language, Vol. 53 (1977), 810--842.Google ScholarGoogle ScholarCross RefCross Ref
  7. X. Du and C. Cardie. 2018. Harvesting Paragraph-level Question-Answer Pairs from Wikipedia. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL-18) . Melbourne, Australia, 1907--1917.Google ScholarGoogle Scholar
  8. F. Ensan and E. Bagheri. 2017. Document Retrieval Model Through Semantic Linking. In Proceedings of the 10th ACM Conference on Web Search and Data Mining (WSDM-17). Cambridge, United Kingdom, 181--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. Ernst, A. Siu, and G. Weikum. 2018. HighLife: Higher-Arity Fact Harvesting. In Proceedings of the 2018 Web Conference (WWW-18) . Lyon, France, 1013--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. O. Etzioni, A. Fader, J. Christensen, S. Soderland, and Mausam. 2011. Open Information Extraction: The Second Generation. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI-11) . Barcelona, Spain, 3--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Fader, S. Soderland, and O. Etzioni. 2011. Identifying Relations for Open Information Extraction. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP-11) . Edinburgh, Scotland, 1535--1545. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Fellbaum (Ed.). 1998. WordNet: An Electronic Lexical Database and Some of its Applications .MIT Press.Google ScholarGoogle Scholar
  13. T. Flati, D. Vannella, T. Pasini, and R. Navigli. 2014. Two Is Bigger (and Better) Than One: the Wikipedia Bitaxonomy Project. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL-14). Baltimore, Maryland, 945--955.Google ScholarGoogle Scholar
  14. O. Ganea, M. Ganea, A. Lucchi, C. Eickhoff, and T. Hofmann. 2016. Probabilistic Bag-Of-Hyperlinks Model for Entity Linking. In Proceedings of the 25th World Wide Web Conference (WWW-16). Montreal, Canada, 927--938. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. O. Ganea and T. Hofmann. 2017. Deep Joint Entity Disambiguation with Local Neural Attention. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP-17) . Copenhagen, Denmark, 2619--2629.Google ScholarGoogle Scholar
  16. A. Gupta, R. Lebret, H. Harkous, and K. Aberer. 2018. 280 Birds With One Stone: Inducing Multilingual Taxonomies From Wikipedia Using Character-Level Classification. In Proceedings of the 32nd National Conference on Artificial Intelligence (AAAI-18). New Orleans, Louisiana, 4824--4831.Google ScholarGoogle Scholar
  17. J. Hoffart, F. Suchanek, K. Berberich, and G. Weikum. 2013. YAGO2: a Spatially and Temporally Enhanced Knowledge Base from Wikipedia. Artificial Intelligence Journal. Special Issue on Artificial Intelligence, Wikipedia and Semi-Structured Resources, Vol. 194 (2013), 28--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Hu, G. Wang, F. Lochovsky, J. Sun, and Z. Chen. 2009. Understanding User's Query Intent with Wikipedia. In Proceedings of the 18th World Wide Web Conference (WWW-09). Madrid, Spain, 471--480. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Konovalov, B. Strauss, A. Ritter, and B. O'Connor. 2017. Learning to Extract Events from Knowledge Base Revisions. In Proceedings of the 26th World Wide Web Conference (WWW-17). Perth, Australia, 1007--1014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Langford, A. Strehl, and L. Li. 2007. Vowpal Wabbit. http://hunch.net/ vw.Google ScholarGoogle Scholar
  21. D. Lenat. 1995. CYC: a Large-Scale Investment in Knowledge Infrastructure. Commun. ACM, Vol. 38, 11 (1995), 32--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Ma, Y. Chen, K. Chang, and X. Du. 2018. Leveraging Fine-Grained Wikipedia Categories for Entity Search. In Proceedings of the 2018 Web Conference (WWW-18). Lyon, France, 1623--1632. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Mausam, M. Schmitz, S. Soderland, R. Bart, and O. Etzioni. 2012. Open Language Learning for Information Extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL-12). Jeju Island, Korea, 523--534. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Mihalcea. 2007. Using Wikipedia for Automatic Word Sense Disambiguation. In Proceedings of the 2007 Conference of the North American Association for Computational Linguistics (NAACL-HLT-07). Rochester, New York, 196--203.Google ScholarGoogle Scholar
  25. V. Nastase and M. Strube. 2008. Decoding Wikipedia Categories for Knowledge Acquisition. In Proceedings of the 23rd National Conference on Artificial Intelligence (AAAI-08). Chicago, Illinois, 1219--1224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. V. Nastase and M. Strube. 2013. Transforming Wikipedia into a Large Scale Multilingual Concept Network. Artificial Intelligence, Vol. 194 (2013), 62--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Pacsca. 2018. Finding Needles in an Encyclopedic Haystack: Detecting Classes Among Wikipedia Articles. In Proceedings of the 2018 Web Conference (WWW-18) . Lyon, France, 1267--1276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Pacsca and H. Buisman. 2015. Dissecting German Grammar and Swiss Passports: Open-Domain Decomposition of Compositional Entries in Large-Scale Knowledge Repositories. In Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI-15) . Buenos Aires, Argentina, 896--902. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. X. Pan, T. Cassidy, U. Hermjakob, H. Ji, and K. Knight. 2015. Unsupervised Entity Linking with Abstract Meaning Representation. In Proceedings of the 2015 Conference of the North American Association for Computational Linguistics (NAACL-HLT-15). Denver, Colorado, 1130--1139.Google ScholarGoogle Scholar
  30. T. Piccardi, M. Catasta, L. Zia, and R. West. 2018. Structuring Wikipedia Articles with Section Recommendations. In Proceedings of the 41st International Conference on Research and Development in Information Retrieval (SIGIR-18). Ann Arbor, Michigan, 665--674. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Ponzetto and R. Navigli. 2009. Large-Scale Taxonomy Mapping for Restructuring and Integrating Wikipedia. In Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI-09) . Pasadena, California, 2083--2088. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Ponzetto and M. Strube. 2007. Deriving a Large Scale Taxonomy from Wikipedia. In Proceedings of the 22nd National Conference on Artificial Intelligence (AAAI-07). Vancouver, British Columbia, 1440--1447. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Qu, X. Ren, Y. Zhang, and J. Han. 2018. Weakly-Supervised Relation Extraction by Pattern-Enhanced Embedding Learning. In Proceedings of the 2018 Web Conference (WWW-18) . Lyon, France, 1257--1266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. L. Ratinov and D. Roth. 2012. Learning-Based Multi-Sieve Co-Reference Resolution with Knowledge. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL-12) . Jeju Island, Korea, 1234--1244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. L. Ratinov, D. Roth, D. Downey, and M. Anderson. 2011. Local and Global Algorithms for Disambiguation to Wikipedia. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL-11) . Portland, Oregon, 1375--1384. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Remy. 2002. Wikipedia: The Free Encyclopedia. Online Information Review, Vol. 26, 6 (2002), 434.Google ScholarGoogle ScholarCross RefCross Ref
  37. Z. Bouraoui S. Jameel and S. Schockaert. 2017. MEmbER: Max-Margin Based Embeddings. In Proceedings of the 40th International Conference on Research and Development in Information Retrieval (SIGIR-17) . Tokyo, Japan, 783--792. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. U. Scaiella, P. Ferragina, A. Marino, and M. Ciaramita. 2012. Topical Clustering of Search Results. In Proceedings of the 5th ACM Conference on Web Search and Data Mining (WSDM-12). Seattle, Washington, 223--232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. A. Singhal. 2012. Introducing the Knowledge Graph: Things, not Strings. Corporate blog.Google ScholarGoogle Scholar
  40. M. Sun, X. Li, X. Wang, M. Fan, Y. Feng, and P. Li. 2018. Logician: A Unified End-to-End Neural Approach for Open-Domain Information Extraction. In Proceedings of the 11th ACM Conference on Web Search and Data Mining (WSDM-18) . Marina del Rey, California, 556--564. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. C. Tan, F. Wei, P. Ren, W. Lv, and M. Zhou. 2017. Entity Linking for Queries by Searching Wikipedia Sentences. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP-17) . Copenhagen, Denmark, 68--77.Google ScholarGoogle Scholar
  42. D. Tsurel, D. Pelleg, I. Guy, and D. Shahaf. 2017. Fun Facts: Automatic Trivia Fact Extraction from Wikipedia. In Proceedings of the 10th ACM Conference on Web Search and Data Mining (WSDM-17) . Cambridge, United Kingdom, 345--354. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. D. Vrandeucić and M. Krötzsch. 2014. Wikidata: A Free Collaborative Knowledge Base. Commun. ACM, Vol. 57 (2014), 78--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Z. Wang, Z. Li, J. Li, J. Tang, and J. Pan. 2013. Transfer Learning Based Cross-lingual Knowledge Extraction for Wikipedia. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL-13). Sofia, Bulgaria, 641--650.Google ScholarGoogle Scholar
  45. F. Wu and D. Weld. 2010. Open Information Extraction using Wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL-10) . Uppsala, Sweden, 118--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. W. Wu, H. Li, H. Wang, and K. Zhu. 2012. Probase: a Probabilistic Taxonomy for Text Understanding. In Proceedings of the 2012 International Conference on Management of Data (SIGMOD-12) . Scottsdale, Arizona, 481--492. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Y. Yan, N. Okazaki, Y. Matsuo, Z. Yang, and M. Ishizuka. 2009. Unsupervised Relation Extraction by Mining Wikipedia Texts Using Information from the Web. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics (ACL-IJCNLP-09) . Singapore, 1021--1029. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. X. Yao and B. Van Durme. 2014. Information Extraction over Structured Data: Question Answering with Freebase. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL-14) . Baltimore, Maryland, 956--966.Google ScholarGoogle Scholar
  49. S. Zhang and K. Balog. 2018. Ad Hoc Table Retrieval Using Semantic Similarity. In Proceedings of the 2018 Web Conference (WWW-18). Lyon, France, 1553--1562. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. C. Zirn, V. Nastase, and M. Strube. 2008. Distinguishing Between Instances and Classes in the Wikipedia Taxonomy. In Proceedings of the 5th European Semantic Web Conference (ESWC-08). Tenerife, Spain, 376--387. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Lightweight Lexical and Semantic Evidence for Detecting Classes Among Wikipedia Articles

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining
          January 2019
          874 pages
          ISBN:9781450359405
          DOI:10.1145/3289600

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 30 January 2019

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          WSDM '19 Paper Acceptance Rate84of511submissions,16%Overall Acceptance Rate498of2,863submissions,17%

          Upcoming Conference

        • Article Metrics

          • Downloads (Last 12 months)2
          • Downloads (Last 6 weeks)1

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader