ABSTRACT
We compute and evaluate relevance scores for knowledge-base triples from type-like relations. Such a score measures the degree to which an entity "belongs" to a type. For example, Quentin Tarantino has various professions, including Film Director, Screenwriter, and Actor. The first two would get a high score in our setting, because those are his main professions. The third would get a low score, because he mostly had cameo appearances in his own movies. Such scores are essential in the ranking for entity queries, e.g. "American actors" or "Quentin Tarantino professions". These scores are different from scores for "correctness" or "accuracy" (all three professions above are correct and accurate). We propose a variety of algorithms to compute these scores. For our evaluation we designed a new benchmark, which includes a ground truth based on about 14K human judgments obtained via crowdsourcing. Inter-judge agreement is slightly over 90%. Existing approaches from the literature give results far from the optimum. Our best algorithms achieve an agreement of about 80% with the ground truth.
- A. Balmin, V. Hristidis, and Y. Papakonstantinou. ObjectRank: Authority-based keyword search in databases. In VLDB, pages 564--575, 2004. Google ScholarDigital Library
- K. Balog, P. Serdyukov, and A. P. de Vries. Overview of the TREC 2011 Entity Track. In TREC, 2011.Google Scholar
- H. Bast, F. Baurle, B. Buchhold, and E. Haussmann. Broccoli: Semantic full-text search at your fingertips. CoRR, abs/1207.2615, 2012.Google Scholar
- H. Bast and E. Haussmann. Open information extraction via contextual sentence decomposition. In ICSC, pages 154--159, 2013. Google ScholarDigital Library
- G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using BANKS. In ICDE, pages 431--440, 2002. Google ScholarDigital Library
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. In NIPS, pages 601--608, 2001.Google ScholarDigital Library
- K. D. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD, pages 1247--1250, 2008. Google ScholarDigital Library
- J. P. Cede\ no and K. S. Candan. R2DF framework for ranked path queries over weighted RDF graphs. In WIMS, page 40, 2011. Google ScholarDigital Library
- S. Chaudhuri, G. Das, V. Hristidis, and G. Weikum. Probabilistic ranking of database query results. In VLDB, pages 888--899, 2004. Google ScholarDigital Library
- A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc., pages 1--38, 1977.Google ScholarCross Ref
- R. Q. Dividino, G. Gröner, S. Scheglmann, and M. Thimm. Ranking RDF with provenance via preference aggregation. In EKAW, pages 154--163, 2012. Google ScholarDigital Library
- X. L. Dong, E. Gabrilovich, G. Heitz, W. Horn, K. Murphy, S. Sun, and W. Zhang. From data fusion to knowledge fusion. PVLDB, 7(10):881--892, 2014. Google ScholarDigital Library
- S. Elbassuoni, M. Ramanath, R. Schenkel, M. Sydow, and G. Weikum. Language-model-based ranking for queries on RDF-graphs. In CIKM, pages 977--986, 2009. Google ScholarDigital Library
- R. Fagin, R. Kumar, M. Mahdian, D. Sivakumar, and E. Vee. Comparing and aggregating rankings with ties. In PODS, pages 47--58, 2004. Google ScholarDigital Library
- J. L. Fleiss. Measuring nominal scale agreement among many raters. Psychological bulletin, 76(5):378, 1971.Google ScholarCross Ref
- T. Franz, A. Schultz, S. Sizov, and S. Staab. TripleRank: Ranking semantic web data by tensor decomposition. In ISWC, pages 213--228, 2009. Google ScholarDigital Library
- T. Hofmann. Probabilistic latent semantic indexing. In SIGIR, pages 50--57, 1999. Google ScholarDigital Library
- C. N. S. Jr. and A. A. Freitas. A survey of hierarchical classification across different application domains. Data Min. Knowl. Discov., 22(1--2):31--72, 2011. Google ScholarDigital Library
- C. D. Paice. Another stemmer. SIGIR Forum, 24(3):56--61, 1990. Google ScholarDigital Library
- S. Parsons. Current approaches to handling imperfect information in data and knowledge bases. IEEE Trans. Knowl. Data Eng., 8(3):353--372, 1996. Google ScholarDigital Library
- D. Ramage, D. L. W. Hall, R. Nallapati, and C. D. Manning. Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In EMNLP, pages 248--256, 2009. Google ScholarDigital Library
- C. J. Van Rijsbergen, S. E. Robertson, and M. F. Porter. New models in probabilistic information retrieval. Computer Laboratory, University of Cambridge, 1980.Google Scholar
Index Terms
- Relevance Scores for Triples from Type-Like Relations
Recommendations
Displaying relevance scores for search results
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrievalInternet search engines typically compute a relevance score for webpages given the query terms, and then rank the pages by decreasing relevance scores. The popular search engines do not, however, present the relevance scores that were computed during ...
Predicting clinical scores using semi-supervised multimodal relevance vector regression
MLMI'11: Proceedings of the Second international conference on Machine learning in medical imagingWe present a novel semi-supervised multimodal relevance vector regression (SM-RVR) method for predicting clinical scores of neurological diseases from multimodal brain images, to help evaluate pathological stage and predict future progression of ...
Relevance assessment: are judges exchangeable and does it matter
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrievalWe investigate to what extent people making relevance judgements for a reusable IR test collection are exchangeable. We consider three classes of judge: "gold standard" judges, who are topic originators and are experts in a particular information ...
Comments