Skip to main content
Log in

Wikiometrics: a Wikipedia based ranking system

World Wide Web Aims and scope Submit manuscript

Abstract

We present a new concept—Wikiometrics—the derivation of metrics and indicators from Wikipedia. Wikipedia provides an accurate representation of the real world due to its size, structure, editing policy and popularity. We demonstrate an innovative “mining” methodology, where different elements of Wikipedia – content, structure, editorial actions and reader reviews – are used to rank items in a manner which is by no means inferior to rankings produced by experts or other methods. We test our proposed method by applying it to two real-world ranking problems: top world universities and academic journals. Our proposed ranking methods were compared to leading and widely accepted benchmarks, and were found to be extremely correlative but with the advantage of the data being publically available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4

Notes

  1. https://en.wikipedia.org/wiki/Wikipedia:Statistics

  2. http://www.shanghairanking.com/index.html

  3. http://www.timeshighereducation.co.uk/world-university-rankings/

  4. http://www.webometrics.info/

  5. http://www.timeshighereducation.co.uk/

  6. http://thomsonreuters.com/

  7. http://www.majesticseo.com/

  8. https://www.ahrefs.com

  9. http://www.scimagoir.com/

  10. www.openlinksw.com

  11. http://en.wikipedia.org/wiki/Wikipedia:Database_reports/WikiProjects_by_changes

  12. http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia

References

  1. Agrawal, V.K., Agrawal, V., Rungtusanatham, M.: Theoretical and interpretation challenges to using the author affiliation index method to rank journals. Prod. Oper. Manag. 20(2), 280–300 (2011)

    Article  Google Scholar 

  2. Aguillo, I.F., Bar-Ilan, J., Levene, M., Ortega, J.L.: Comparing university rankings. Scientometrics. 85(1), 243–256 (2010)

    Article  Google Scholar 

  3. Al-Maskari, A., Sanderson, M., and Clough, P. The relationship between IR effectiveness measures and user satisfaction. in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM (2007)

  4. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives Z.: DBpedia: a nucleus for a Web of open data. In: Aberer, K., et al. (eds.) The semantic Web. Lect. Notes Comput. Sci. vol 4825. Springer, Berlin (2007)

  5. Balog, K., M. Bron, and M. De Rijke, Category-based query modeling for entity search, in Advances in Information Retrieval, Springer. p. 319–331 (2010)

  6. Bergstrom, C.: Measuring the value and prestige of scholarly journals. College & Research Libraries News. 68(5), 314–316 (2007)

    Article  Google Scholar 

  7. Brynjolfsson, E., Hu, Y., Simester, D.: Goodbye pareto principle, hello long tail: the effect of search costs on the concentration of product sales. Manag. Sci. 57(8), 1373–1386 (2011)

    Article  Google Scholar 

  8. Calver, M., Bradley, J.: Should we use the mean citations per paper to summarise a journal’s impact or to rank journals in the same field? Scientometrics. 81(3), 611–615 (2009)

    Article  Google Scholar 

  9. Cheng, C.H., Holsapple, C.W., Lee, A.: Citation-based journal rankings for AI research a business perspective. AI Mag. 17(2), 87 (1996)

    Google Scholar 

  10. Chepelianskii, A.D., Towards physical laws for software architecture. arXiv preprint arXiv:1003.5455, (2010)

  11. Cronin, B., Meho, L.I.: Applying the author affiliation index to library and information science journals. J. Am. Soc. Inf. Sci. Technol. 59(11), 1861–1865 (2008)

    Article  Google Scholar 

  12. Demartini, G., C.S. Firan, T. Iofciu, and W. Nejdl, Semantically enhanced entity ranking, in Web Information Systems Engineering-WISE 2008. Springer. p. 176–188 (2008)

  13. Eom, Y.-H., Frahm, K.M., Benczúr, A., and Shepelyansky, D.L, Time evolution of Wikipedia network ranking. arXiv preprint arXiv:1304.6601, (2013)

  14. Fader, A., Soderland, S., Etzioni, O., and Center, T. Scaling Wikipedia-based named entity disambiguation to arbitrary web text. in Proceedings of the IJCAI Workshop on User-contributed Knowledge and Artificial Intelligence: An Evolving Synergy, Pasadena, CA, USA. (2009)

  15. Ferron, M., Massa, P.: The Arab spring| wikirevolutions: Wikipedia as a lens for studying the real-time formation of collective memories of revolutions. International Journal of Communication. 5, 20 (2011)

    Google Scholar 

  16. Garfield, E.: The history and meaning of the journal impact factor. JAMA. 295(1), 90–93 (2006)

    Article  Google Scholar 

  17. Hachey, B., Radford, W., Nothman, J., Honnibal, M., Curran, J.R.: Evaluating entity linking with Wikipedia. Artif. Intell. 194, 130–150 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  18. Harless, D. and Reilly, R., Revision of the journal list for doctoral designation. Unpublished report, Virginia Commonwealth University, Richmond, VA. Retrieved June, 1998. 17: (2008)

  19. Harzing, A.-W., Van der Wal, R.: Google scholar: the democratization of citation analysis. Ethics in Science and Environmental Politics. 8(1), 61–73 (2007)

    Google Scholar 

  20. Hoffart, J., M.A. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust disambiguation of named entities in text. in Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (2011)

  21. Holsapple, C.W.: A publication power approach for identifying premier information systems journals. J. Am. Soc. Inf. Sci. Technol. 59(2), 166–185 (2008)

    Article  Google Scholar 

  22. Kaptein, R., Kamps, J.: Exploiting the category structure of Wikipedia for entity ranking. Artif. Intell. 194, 111–129 (2013)

    Article  MATH  Google Scholar 

  23. Kaptein, R., P. Serdyukov, A. De Vries, and J. Kamps. Entity ranking using Wikipedia as a pivot. in Proceedings of the 19th ACM international conference on Information and Knowl. Manag. ACM (2010)

  24. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM. 46(5), 604–632 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  25. Lages, J., Patt, A., and Shepelyansky, D.L., Wikipedia Ranking of World Universities. arXiv preprint arXiv:1511.09021, (2015)

  26. Marginson, S., Van der Wende, M.: To rank or to be ranked: the impact of global rankings in higher education. J. Stud. Int. Educ. 11(3–4), 306–329 (2007)

    Article  Google Scholar 

  27. McKean, J. and T. Hettmansperger, Robust nonparametric statistical methods: CRC Press (2011)

  28. McKinnon, K.I.: Convergence of the Nelder--mead simplex method to a Nonstationary point. SIAM J. Optim. 9(1), 148–158 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  29. Mestyán, M., Yasseri, T., Kertész, J.: Early prediction of movie box office success based on Wikipedia activity big data. PLoS One. 8(8), e71226 (2013)

    Article  Google Scholar 

  30. Mirizzi, R., A. Ragone, T. Di Noia, and E. Di Sciascio, Ranking the linked data: the case of dbpedia: Springer (2010)

  31. Myers, L. and Robe, J., College rankings: history, criticism and reform. Center for College Affordability and Productivity (NJ1), (2009)

  32. Nielsen, F.Å., Wikipedia research and tools: Review and comments. (2011)

  33. Page, L., S. Brin, R. Motwani, and T. Winograd, The PageRank citation ranking: Bringing order to the web. (1999)

  34. Pehcevski, J., A.-M. Vercoustre, and J.A. Thom, Exploiting locality of Wikipedia links in entity ranking, in Advances in Information Retrieval, Springer. p. 258–269 (2008)

  35. Pehcevski, J., Thom, J.A., Vercoustre, A.-M., Naumovski, V.: Entity ranking in Wikipedia: utilising categories, links and topic difficulty prediction. Inf. Retr. 13(5), 568–600 (2010)

    Article  Google Scholar 

  36. Raviv, H., D. Carmel, and O. Kurland. A ranking framework for entity oriented search using Markov random fields. in Proceedings of the 1st Joint International Workshop on Entity-Oriented and Semantic Search. ACM (2012)

  37. Raviv, H., O. Kurland, and D. Carmel. The cluster hypothesis for entity oriented search. in Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. ACM (2013)

  38. Rokach, L.: Applying the publication power approach to artificial intelligence journals. J. Am. Soc. Inf. Sci. Technol. 63(6), 1270–1277 (2012)

    Article  Google Scholar 

  39. Schloegl, C., Stock, W.G.: Impact and relevance of LIS journals: a scientometric analysis of international and German-language LIS journals—citation analysis versus reader survey. J. Am. Soc. Inf. Sci. Technol. 55(13), 1155–1168 (2004)

    Article  Google Scholar 

  40. Serenko, A.: The development of an AI journal ranking based on the revealed preference approach. Journal of Informetrics. 4(4), 447–459 (2010)

    Article  Google Scholar 

  41. Serenko, A., Dohan, M.: Comparing the expert survey and citation impact journal ranking methods: example from the field of artificial intelligence. Journal of Informetrics. 5(4), 629–648 (2011)

    Article  Google Scholar 

  42. Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a large ontology from Wikipedia and WordNet. Web Semant. Sci. Serv. Agents World Wide Web. 6(3), 203–217 (2008)

    Article  Google Scholar 

  43. Vercoustre, A.-M., J.A. Thom, and J. Pehcevski. Entity ranking in Wikipedia. in Proceedings of the 2008 ACM symposium on Applied computing. ACM (2008a)

  44. Vercoustre, A.-M., J. Pehcevski, and J.A. Thom, Using wikipedia categories and links in entity ranking, in Focused Access to XML Documents, Springer. p. 321–335 (2008b)

  45. Zar, J.H., Spearman rank correlation. Encyclopedia of Biostatistics, (1998)

  46. Zaragoza, H., H. Rode, P. Mika, J. Atserias, M. Ciaramita, and G. Attardi. Ranking very many typed entities on wikipedia. in Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. ACM (2007)

  47. Zhirov, A., Zhirov, O., Shepelyansky, D.L.: Two-dimensional ranking of Wikipedia articles. The European Physical Journal B. 77(4), 523–531 (2010)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gilad Katz.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Katz, G., Rokach, L. Wikiometrics: a Wikipedia based ranking system. World Wide Web 20, 1153–1177 (2017). https://doi.org/10.1007/s11280-016-0427-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-016-0427-8

Keywords

Navigation