ABSTRACT
A supervised method relies on simple, lightweight features in order to distinguish Wikipedia articles that are classes (Shield volcano) from other articles (Kilauea). The features are lexical or semantic in nature. Experimental results in multiple languages over multiple evaluation sets demonstrate the superiority of the proposed method over previous work.
- C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. 2009. DBpedia - a Crystallization Point for the Web of Data. Journal of Web Semantics, Vol. 7, 3 (2009), 154--165. Google ScholarDigital Library
- R. Blanco, G. Ottaviano, and E. Meij. 2015. Fast and Space-Efficient Entity Linking in Queries. In Proceedings of the 8th ACM Conference on Web Search and Data Mining (WSDM-15). Shanghai, China, 179--188. Google ScholarDigital Library
- K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. 2008. Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge. In Proceedings of the 2008 International Conference on Management of Data (SIGMOD-08) . Vancouver, Canada, 1247--1250. Google ScholarDigital Library
- D. Chen, A. Fisch, J. Weston, and A. Bordes. 2017. Reading Wikipedia to Answer Open-Domain Questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL-17) . Vancouver, Canada, 1870--1879.Google Scholar
- A. Chisholm and B. Hachey. 2015. Entity disambiguation with Web links. Transactions of the Association for Computational Linguistics, Vol. 3 (2015), 145--156.Google ScholarCross Ref
- P. Downing. 1977. On the Creation and Use of English Compound Nouns. Language, Vol. 53 (1977), 810--842.Google ScholarCross Ref
- X. Du and C. Cardie. 2018. Harvesting Paragraph-level Question-Answer Pairs from Wikipedia. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL-18) . Melbourne, Australia, 1907--1917.Google Scholar
- F. Ensan and E. Bagheri. 2017. Document Retrieval Model Through Semantic Linking. In Proceedings of the 10th ACM Conference on Web Search and Data Mining (WSDM-17). Cambridge, United Kingdom, 181--190. Google ScholarDigital Library
- P. Ernst, A. Siu, and G. Weikum. 2018. HighLife: Higher-Arity Fact Harvesting. In Proceedings of the 2018 Web Conference (WWW-18) . Lyon, France, 1013--1022. Google ScholarDigital Library
- O. Etzioni, A. Fader, J. Christensen, S. Soderland, and Mausam. 2011. Open Information Extraction: The Second Generation. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI-11) . Barcelona, Spain, 3--10. Google ScholarDigital Library
- A. Fader, S. Soderland, and O. Etzioni. 2011. Identifying Relations for Open Information Extraction. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP-11) . Edinburgh, Scotland, 1535--1545. Google ScholarDigital Library
- C. Fellbaum (Ed.). 1998. WordNet: An Electronic Lexical Database and Some of its Applications .MIT Press.Google Scholar
- T. Flati, D. Vannella, T. Pasini, and R. Navigli. 2014. Two Is Bigger (and Better) Than One: the Wikipedia Bitaxonomy Project. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL-14). Baltimore, Maryland, 945--955.Google Scholar
- O. Ganea, M. Ganea, A. Lucchi, C. Eickhoff, and T. Hofmann. 2016. Probabilistic Bag-Of-Hyperlinks Model for Entity Linking. In Proceedings of the 25th World Wide Web Conference (WWW-16). Montreal, Canada, 927--938. Google ScholarDigital Library
- O. Ganea and T. Hofmann. 2017. Deep Joint Entity Disambiguation with Local Neural Attention. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP-17) . Copenhagen, Denmark, 2619--2629.Google Scholar
- A. Gupta, R. Lebret, H. Harkous, and K. Aberer. 2018. 280 Birds With One Stone: Inducing Multilingual Taxonomies From Wikipedia Using Character-Level Classification. In Proceedings of the 32nd National Conference on Artificial Intelligence (AAAI-18). New Orleans, Louisiana, 4824--4831.Google Scholar
- J. Hoffart, F. Suchanek, K. Berberich, and G. Weikum. 2013. YAGO2: a Spatially and Temporally Enhanced Knowledge Base from Wikipedia. Artificial Intelligence Journal. Special Issue on Artificial Intelligence, Wikipedia and Semi-Structured Resources, Vol. 194 (2013), 28--61. Google ScholarDigital Library
- J. Hu, G. Wang, F. Lochovsky, J. Sun, and Z. Chen. 2009. Understanding User's Query Intent with Wikipedia. In Proceedings of the 18th World Wide Web Conference (WWW-09). Madrid, Spain, 471--480. Google ScholarDigital Library
- A. Konovalov, B. Strauss, A. Ritter, and B. O'Connor. 2017. Learning to Extract Events from Knowledge Base Revisions. In Proceedings of the 26th World Wide Web Conference (WWW-17). Perth, Australia, 1007--1014. Google ScholarDigital Library
- J. Langford, A. Strehl, and L. Li. 2007. Vowpal Wabbit. http://hunch.net/ vw.Google Scholar
- D. Lenat. 1995. CYC: a Large-Scale Investment in Knowledge Infrastructure. Commun. ACM, Vol. 38, 11 (1995), 32--38. Google ScholarDigital Library
- D. Ma, Y. Chen, K. Chang, and X. Du. 2018. Leveraging Fine-Grained Wikipedia Categories for Entity Search. In Proceedings of the 2018 Web Conference (WWW-18). Lyon, France, 1623--1632. Google ScholarDigital Library
- Mausam, M. Schmitz, S. Soderland, R. Bart, and O. Etzioni. 2012. Open Language Learning for Information Extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL-12). Jeju Island, Korea, 523--534. Google ScholarDigital Library
- R. Mihalcea. 2007. Using Wikipedia for Automatic Word Sense Disambiguation. In Proceedings of the 2007 Conference of the North American Association for Computational Linguistics (NAACL-HLT-07). Rochester, New York, 196--203.Google Scholar
- V. Nastase and M. Strube. 2008. Decoding Wikipedia Categories for Knowledge Acquisition. In Proceedings of the 23rd National Conference on Artificial Intelligence (AAAI-08). Chicago, Illinois, 1219--1224. Google ScholarDigital Library
- V. Nastase and M. Strube. 2013. Transforming Wikipedia into a Large Scale Multilingual Concept Network. Artificial Intelligence, Vol. 194 (2013), 62--85. Google ScholarDigital Library
- M. Pacsca. 2018. Finding Needles in an Encyclopedic Haystack: Detecting Classes Among Wikipedia Articles. In Proceedings of the 2018 Web Conference (WWW-18) . Lyon, France, 1267--1276. Google ScholarDigital Library
- M. Pacsca and H. Buisman. 2015. Dissecting German Grammar and Swiss Passports: Open-Domain Decomposition of Compositional Entries in Large-Scale Knowledge Repositories. In Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI-15) . Buenos Aires, Argentina, 896--902. Google ScholarDigital Library
- X. Pan, T. Cassidy, U. Hermjakob, H. Ji, and K. Knight. 2015. Unsupervised Entity Linking with Abstract Meaning Representation. In Proceedings of the 2015 Conference of the North American Association for Computational Linguistics (NAACL-HLT-15). Denver, Colorado, 1130--1139.Google Scholar
- T. Piccardi, M. Catasta, L. Zia, and R. West. 2018. Structuring Wikipedia Articles with Section Recommendations. In Proceedings of the 41st International Conference on Research and Development in Information Retrieval (SIGIR-18). Ann Arbor, Michigan, 665--674. Google ScholarDigital Library
- S. Ponzetto and R. Navigli. 2009. Large-Scale Taxonomy Mapping for Restructuring and Integrating Wikipedia. In Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI-09) . Pasadena, California, 2083--2088. Google ScholarDigital Library
- S. Ponzetto and M. Strube. 2007. Deriving a Large Scale Taxonomy from Wikipedia. In Proceedings of the 22nd National Conference on Artificial Intelligence (AAAI-07). Vancouver, British Columbia, 1440--1447. Google ScholarDigital Library
- M. Qu, X. Ren, Y. Zhang, and J. Han. 2018. Weakly-Supervised Relation Extraction by Pattern-Enhanced Embedding Learning. In Proceedings of the 2018 Web Conference (WWW-18) . Lyon, France, 1257--1266. Google ScholarDigital Library
- L. Ratinov and D. Roth. 2012. Learning-Based Multi-Sieve Co-Reference Resolution with Knowledge. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL-12) . Jeju Island, Korea, 1234--1244. Google ScholarDigital Library
- L. Ratinov, D. Roth, D. Downey, and M. Anderson. 2011. Local and Global Algorithms for Disambiguation to Wikipedia. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL-11) . Portland, Oregon, 1375--1384. Google ScholarDigital Library
- M. Remy. 2002. Wikipedia: The Free Encyclopedia. Online Information Review, Vol. 26, 6 (2002), 434.Google ScholarCross Ref
- Z. Bouraoui S. Jameel and S. Schockaert. 2017. MEmbER: Max-Margin Based Embeddings. In Proceedings of the 40th International Conference on Research and Development in Information Retrieval (SIGIR-17) . Tokyo, Japan, 783--792. Google ScholarDigital Library
- U. Scaiella, P. Ferragina, A. Marino, and M. Ciaramita. 2012. Topical Clustering of Search Results. In Proceedings of the 5th ACM Conference on Web Search and Data Mining (WSDM-12). Seattle, Washington, 223--232. Google ScholarDigital Library
- A. Singhal. 2012. Introducing the Knowledge Graph: Things, not Strings. Corporate blog.Google Scholar
- M. Sun, X. Li, X. Wang, M. Fan, Y. Feng, and P. Li. 2018. Logician: A Unified End-to-End Neural Approach for Open-Domain Information Extraction. In Proceedings of the 11th ACM Conference on Web Search and Data Mining (WSDM-18) . Marina del Rey, California, 556--564. Google ScholarDigital Library
- C. Tan, F. Wei, P. Ren, W. Lv, and M. Zhou. 2017. Entity Linking for Queries by Searching Wikipedia Sentences. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP-17) . Copenhagen, Denmark, 68--77.Google Scholar
- D. Tsurel, D. Pelleg, I. Guy, and D. Shahaf. 2017. Fun Facts: Automatic Trivia Fact Extraction from Wikipedia. In Proceedings of the 10th ACM Conference on Web Search and Data Mining (WSDM-17) . Cambridge, United Kingdom, 345--354. Google ScholarDigital Library
- D. Vrandeucić and M. Krötzsch. 2014. Wikidata: A Free Collaborative Knowledge Base. Commun. ACM, Vol. 57 (2014), 78--85. Google ScholarDigital Library
- Z. Wang, Z. Li, J. Li, J. Tang, and J. Pan. 2013. Transfer Learning Based Cross-lingual Knowledge Extraction for Wikipedia. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL-13). Sofia, Bulgaria, 641--650.Google Scholar
- F. Wu and D. Weld. 2010. Open Information Extraction using Wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL-10) . Uppsala, Sweden, 118--127. Google ScholarDigital Library
- W. Wu, H. Li, H. Wang, and K. Zhu. 2012. Probase: a Probabilistic Taxonomy for Text Understanding. In Proceedings of the 2012 International Conference on Management of Data (SIGMOD-12) . Scottsdale, Arizona, 481--492. Google ScholarDigital Library
- Y. Yan, N. Okazaki, Y. Matsuo, Z. Yang, and M. Ishizuka. 2009. Unsupervised Relation Extraction by Mining Wikipedia Texts Using Information from the Web. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics (ACL-IJCNLP-09) . Singapore, 1021--1029. Google ScholarDigital Library
- X. Yao and B. Van Durme. 2014. Information Extraction over Structured Data: Question Answering with Freebase. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL-14) . Baltimore, Maryland, 956--966.Google Scholar
- S. Zhang and K. Balog. 2018. Ad Hoc Table Retrieval Using Semantic Similarity. In Proceedings of the 2018 Web Conference (WWW-18). Lyon, France, 1553--1562. Google ScholarDigital Library
- C. Zirn, V. Nastase, and M. Strube. 2008. Distinguishing Between Instances and Classes in the Wikipedia Taxonomy. In Proceedings of the 5th European Semantic Web Conference (ESWC-08). Tenerife, Spain, 376--387. Google ScholarDigital Library
Index Terms
- Lightweight Lexical and Semantic Evidence for Detecting Classes Among Wikipedia Articles
Recommendations
Finding Needles in an Encyclopedic Haystack: Detecting Classes Among Wikipedia Articles
WWW '18: Proceedings of the 2018 World Wide Web ConferenceA lightweight method distinguishes articles within Wikipedia that are classes (Novel, Book) from other articles (Three Men in a Boat, Diary of a Pilgrimage). It exploits clues available within the article text and within categories associated with ...
Approximate Definitional Constructs as Lightweight Evidence for Detecting Classes Among Wikipedia Articles
CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge ManagementA lightweight method applies a few extraction patterns to the task of distinguishing Wikipedia articles that are classes ("Walled garden", "Garden") from other articles ("High Hazels Park"). The method acquires a set of classes, based on patterns ...
The Role of Query Sessions in Interpreting Compound Noun Phrases
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge ManagementThe meaning of compound noun phrases can be approximated in the form of lexical interpretations extracted from text. The interpretations hint at the role that modifiers play relative to heads within the noun phrases. In a study examining the role of ...
Comments