Abstract
Natural language search relies strongly on perceiving semantics in a query sentence. Semantics is captured by the relationship among the query words, represented as a network (graph). Such a network of words can be fed into larger ontologies, like DBpedia or Google Knowledge Graph, where they appear as subgraphs— fashioning the name subnetworks (subnets). Thus, subnet is a canonical form for interfacing a natural language query to a graph database and is an integral step for graph-based searching. In this article, we present a novel standalone NLP technique that leverages the cognitive psychology notion of semantic strata for semantic subnetwork extraction from natural language queries. The cognitive model describes some of the fundamental structures employed by the human cognition to construct semantic information in the brain, called semantic strata. We propose a computational model based on conditional random fields to capture the cognitive abstraction provided by semantic strata, facilitating cognitive canonicalization of the query. Our results, conducted on approximately 5000 queries, suggest that the cognitive canonicals based on semantic strata are capable of significantly improving parsing and role labeling performance beyond pure lexical approaches, such as parts-of-speech based techniques. We also find that cognitive canonicalized subnets are more semantically coherent compared to syntax trees when explored in graph ontologies like DBpedia and improve ranking of retrieved documents.
- Allemang, D. and Hendler, J. 2011. Semantic Web for the Working Ontologist: Effective Modeling in RDFS and OWL. Morgan Kaufmann, San Fransisco, LA. Google ScholarDigital Library
- Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., and Ives, Z. 2007. DBpedia: A nucleus for a web of open data. In 6th International Semantic Web Conference. Google ScholarDigital Library
- Booth, J., Di Eugenio, B., Cruz, I. F., and Wolfson, O. 2009. Query sentences as semantic (sub) networks. In Proceedings of the IEEE International Conference on Semantic Computing (ICSC'09). 89--94. Google ScholarDigital Library
- Broccolo, D., Marcon, L., Nardini, F. M., Perego, R., and Silvestri, F. 2012. Generating suggestions for queries in the long tail with an inverted index. Inf. Process. Manag. 48, 2, 326--339. Google ScholarDigital Library
- Bruce, R. and Wiebe, J. 1994. Word-sense disambiguation using decomposable models. In Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics (ACL'94). 139--146. Google ScholarDigital Library
- Cardona, G., Rossello, F., and Valiente, G. 2009. Comparison of tree-child phylogenetic networks. IEEE/ACM Trans. Comput. Biol. Bioinf. 6, 4, 552--569. Google ScholarDigital Library
- Carpineto, C. and Romano, G. 2012. A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44, 1. Google ScholarDigital Library
- Carroll, J. B. 1993. Human Cognitive Abilities. Cambridge University Press.Google Scholar
- Clarke, C. A., Craswell, N., Soboroff, I., and Voorhees, E. M. 2011. Overview of the trec 2011 web track. In Proceedings of the 20th Text Retrieval Conference.Google Scholar
- Chater, N., Tenenbaum, J. B., and Yuille, A. 2006. Probabilistic models of cognition: Conceptual foundations. Trends Cogn. Sci. 10, 7, 287--291.Google ScholarCross Ref
- Clifford, J. 1990. Formal Semantics and Pragmatics for Natural Language Querying: Cambridge Tracks in Theoretical Computer Science. Cambridge University Press. Google ScholarDigital Library
- Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., and KUKSA, P. 2011. Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493--2537. Google ScholarDigital Library
- Coursey, K. and Mihalcea, R. 2009. Topic identification using wikipedia graph centrality. In Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers (NAACL-Short'09). 117--120. Google ScholarDigital Library
- Croce, D., Giannone, C., Annesi, P., and Basili, R. 2010. Towards open-domain semantic role labeling. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 237--246. Google ScholarDigital Library
- Croft, W. and Cruse, D. A. 2004. Cognitive Linguistics. Cognitive Canonicalization of Natural Language Queries using Semantic Strata. Cambridge University Press.Google Scholar
- Cucerzan, S. 2007. Large-scale named entity disambiguation based on wikipedia data. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 708--716.Google Scholar
- Cui, H., Wen, J.-R., Nie, J.-Y., and Ma, W.-Y. 2002. Probabilistic query expansion using query logs. In Proceedings of the 11th International Conference on World Wide Web (WWW'02). ACM Press, New York, 325--332. Google ScholarDigital Library
- Dangalchev, C. 2006. Residual closeness in networks. Phisica A 365, 2, 556--564.Google ScholarCross Ref
- Finkel, J. R. and Manning, C. D. 2009. Joint parsing and named entity recognition. In Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL'09). 326--334. Google ScholarDigital Library
- Gabrilovich, E. and Markovitch, S. 2007. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of the International Joint Conference on Artificial Intelligence. Google ScholarDigital Library
- Guilford, J. P. 1977. Way Beyond the IQ. Creative Education Foundation.Google Scholar
- Guo, J., Xu, G., Cheng, X., and Li, H. 2009. Named entity recognition in query. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'09). ACM Press, New York, 267--274. Google ScholarDigital Library
- Gurevych, I., Malaka, R., Porzel, R., and Zorn, H.-P. 2003. Semantic coherence scoring using an ontology. In Proceedings of Human Language Technology: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL'03). Vol. 1. 9--16. Google ScholarDigital Library
- Hatcher, E., Gospodnetic, O., and Mccandless, M. 2004. Lucene in Action. Manning Publications Company. Google ScholarDigital Library
- Herdagdelen, A., Ciaramita, M., Mahler, D., Holmqvist, M., Hall, K., and Riezler, S. 2010. Generalized syntactic and semantic models of query reformulation. In Proceedings of the 33rd ACM SIGIR International Conference on Research and Development in Information Retrieval (SIGIR'10). 283--290. Google ScholarDigital Library
- Hu, J. J., Wang, G., Lochovsky, F., Sun, J.-T., and Chen, Z. 2009. Understanding user's query intent with wikipedia. In Proceedings of the 18th International Conference on World Wide Web (WWW'09). ACM Press, New York, 471--480. Google ScholarDigital Library
- Huang, M. and Haralick, R. M. 2009. Identifying patterns in texts. In Proceedings of the IEEE International Conference on Semantic Computing (ICSC'09). 59--64. Google ScholarDigital Library
- Kaufmann, E., Bernstein, A., and Fisher, L. 2007. NLP-reduce: A “naïve” but domain-independent natural language interface for querying ontologies. In Proceedings of the 4th European Semantic Web Conference (ESWC'07).Google Scholar
- Korf, R. E. 1985. Depth-first iterative-deepening: An optimal admissible tree search. Artif. Intell. 27, 1, 97--109. Google ScholarDigital Library
- Lafferty, J. D., McCallum, A., and Pereira, F. C. N. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning (ICML'01). 282--289. Google ScholarDigital Library
- Manning, C. D. and Schuetze, H. 1999. Foundations of Statistical Natural Language Processing. MIT Press. Google ScholarDigital Library
- McCallum, A., and Li, W. 2003. Early results for named entity recognition with conditional random fields feature induction and web-enhanced lexicons. In Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL. Vol. 4, Association for Computational Linguistics, 188--191. Google ScholarDigital Library
- McCallum, A., Freitag, D., and Pereira, F. C. N. 2000. Maximum entropy markov models for information extraction and segmentation. In Proceedings of the 17th International Conference on Machine Learning (ICML'00). 591--598. Google ScholarDigital Library
- McClosky, D., Surdeanu, M., and Manning, C. D. 2011. Event extraction as dependency parsing. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (HLT'11). Vol. 1, 1626--1635. Google ScholarDigital Library
- MSQA. 2008. http://research.microsoft.com/en-us/downloads/88c0021c-328a-4148-a158-a42d7331c6cf/.Google Scholar
- Navigli, R. and Lapata, M. 2010. An experimental study of graph connectivity for unsupervised word sense disambiguation. IEEE Trans. Pattern Anal. Mach. Intell. 32, 4, 678--692. Google ScholarDigital Library
- Opsahl, T., Agneessens, F., and Skvoretz, J. 2010. Node centrality in weighted networks: Generalizing degree and shortest paths. Social Netw. 32, 3, 245--251.Google ScholarCross Ref
- Popescu, A.-M., Etzioni, O., and Kautz, H. 2003. Towards a theory of natural language interfaces to databases. In Proceedings of the 8th International Conference on Intelligent User Interfaces (IUI'03). ACM Press, New York, 149--157. Google ScholarDigital Library
- Punyakanok, V., Roth, D., and Yih, W. 2008. The importance of syntactic parsing and inference in semantic role labeling. Comput. Linguist. 34, 2, 257--287. Google ScholarDigital Library
- Richardson, M. and Domingos, P. 2001. The intelligent surfer: Probabilistic combination of link and content information in pagerank. In Proceedings of the Conference on Advances in Neural Information Processing Systems.Google Scholar
- Rinaldi, A. M. 2009. An ontology-driven approach for semantic information retrieval on the web. ACM Trans. Internet Technol. 9, 3, 10. Google ScholarDigital Library
- Robertson, N. and Seymour, P. D. 1985. Graph minors—A survey. Surv. Combinatorics 103, 153--171.Google Scholar
- Roy, S. D. and Zeng, W. 2012. A computational cognitive model for semantics sub-network extraction from natural language queries. In Proceedings of the International Conference on Computational Linguistics (COLING'12).Google Scholar
- Sha, F. and Pereira, F. 2003. Shallow parsing with conditional random fields. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (NAACL'03). Vol. 1. 134--141. Google ScholarDigital Library
- Steyvers, M. and Tenenbaum, J. B. 2010. The large-scale structure of semantic networks: Statistical analyses and a model of semantic growth. Cogn. Sci. 29, 1, 41--78.Google ScholarCross Ref
- Traugott, E. C. 2010. From polysemy to semantic change. Linguist. Typol. 14, 2--3, 292--299.Google Scholar
- Wang, Y., Wang, L., Li, Y., He, D., Chen, W., and Liu, T.-Y. 2013. A theoretical analysis of ndcg ranking measures. In Proceedings of the 26th Annual Conference on Learning Theory.Google Scholar
- Ye, N., Li, W. S., Chieu, H. L., and Wu, D. 2009. Conditional random fields with high-order features for sequence labeling. In Advances in Neural Information Processing Systems 22, 2196--2204.Google Scholar
- Zadeh, L. A. 1998. Some reflections on soft computing, granular computing and their roles in the conception, design and utilization of information/intelligent systems. Soft Comput. 2, 1, 23--25.Google ScholarCross Ref
Index Terms
- Cognitive canonicalization of natural language queries using semantic strata
Recommendations
Language-model-based ranking for queries on RDF-graphs
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementThe success of knowledge-sharing communities like Wikipedia and the advances in automatic information extraction from textual and Web sources have made it possible to build large "knowledge repositories" such as DBpedia, Freebase, and YAGO. These ...
A graph-based approach to indexing semantic web data
ISWC-PD'10: Proceedings of the 2010 International Conference on Posters & Demonstrations Track - Volume 658To the best of our knowledge, existing Semantic Web (SW) search systems fail to index RDF graph structures as graphs. They either do not index graph structures and retrieve them by run-time formal queries, or index all row triples from the back-end ...
Evaluation of scalable multi-agent system architectures for searching the Semantic Web
The Semantic Web (SW) is an open environment where heterogeneous and distributed knowledge is rendered machine-processable using ontologies, their extensions, and annotation metadata. The volume of resources available for semantic annotation indicates a ...
Comments