Search markets and search results: The case of Bing

https://doi.org/10.1016/j.lisr.2013.04.006Get rights and content

Highlights

  • Webometric studies should combine results from three or more search markets.

  • Search market choice altered a small majority of the Bing top 10 results.

  • Search market choice altered less than a third of the complete sets of results.

  • Variation in the Bing top 10 results over time was greater for complete results sets.

  • Searches returned almost no ubiquitous authoritative results (individual URLs).

Abstract

Bing and Google customize their results to target people with different geographic locations and languages but, despite the importance of search engines for web users and webometric research, the extent and nature of these differences are unknown. This study compares the results of seventeen random queries submitted automatically to Bing for thirteen different English geographic search markets at monthly intervals. Search market choice alters a small majority of the top 10 results but less than a third of the complete sets of results. Variation in the top 10 results over a month was about the same as variation between search markets but variation over time was greater for the complete results sets. Most worryingly for users, there were almost no ubiquitous authoritative results: only one URL was always returned in the top 10 for all search markets and points in time, and Wikipedia was almost completely absent from the most common top 10 results. Most importantly for webometrics, results from at least three different search markets should be combined to give more reliable and comprehensive results, even for queries that return fewer than the maximum number of URLs.

Introduction

According to Alexa.com (2012), web search is central to 12 of the world's 25 most visited websites: Google (rank 1), Yahoo! (4; search and portal, now owned by Microsoft and with results driven by Bing), Baidu (5), Google India (14), Yahoo! Japan (15), Google Germany (17), Yandex (18), MSN (19; portal and Bing search), Google Hong Kong (21), Google Japan (22), Bing (23) and Google UK (25). Although the broad details of how search engines work are known, the details of their operations, and particularly the ranking of results and spam filtering, are unknown and seem to be closely guarded commercial secrets. It is known that the same query will generate different results over time and between search engines, but the same query can also generate different results at the same time for different users based upon their geographic location or search preferences and the nature and extent of this variation are unclear.

Some search engines, including Bing, segment users into search markets when calculating the results of their query. These search markets are based upon geographic location and language. For example, one Bing search market is English-India; people in India searching Bing with English as their default setting will get different results than people searching in English in the US (the English-USA market). The search market is chosen by Bing, with users having (as of November 2012) the option to override it by clicking the Preferences icon on the Bing.com home page and then the Change your country/region link. Bing's list of 40 options includes 10 that specify a language, of which three include English: Arab countries, Canada, and United States (all areas with other popular languages spoken). In contrast, Google appears to have more fine-grained location based results (perhaps, in part, for its map-based search results) because its results pages (as of November 2012) include a Change location link with a free-text field that recognizes individual towns. Google also apparently has 188 different national or regional variants branded by domain name, such as Google.co.uk (Google.com, 2012). For both search engines, the regional results include international results and the user is able to separately request that only results from a certain country or region are returned. There seems, however, to have been no research into the impact of search markets on search engine results.

Section snippets

Problem statement

Differences between search results are of particular interest in the field of webometrics, which often involves counting web search matches for large sets of queries. Many studies need lists of URLs matching a search that are as complete as possible or, alternatively, hit-count estimates (figures reported near the top of a search results page estimating the total number of results) as proxies for these (Ortega and Aguillo, 2009, Park, 2010, Spörrle and Tumasjan, 2011, Thelwall et al., 2010,

Search engines and search results

Although the performance and algorithms used by the major commercial search engines are not public, some general information is known about how search engines work from publications (Brin & Page, 1998) and patents (Page, 2001) produced by their architects. In addition, some information science research has investigated the output of search engines, typically focusing on variations in results over time.

Research questions

The research questions concern the extent of variation of the top 10 results and all results returned for a query. The questions concerning the top 10 are most relevant to users who typically may not visit any more results; and the questions concerning all URLs are most relevant to webometric studies, although in both cases the results may vary for different types of query.

  • In terms of the overlaps between the results sets for the same query, do the top 10 and all search results vary more over

Methods

The overall research design was to conduct a series of identical searches in different search markets, at a series of different points in time, and to compare the results for the extent of overlap between them, using the Bing API 2.0 as the data source. The Bing API allows programmers to access the Bing search engine on a limited basis. The choice of the API was made partly because it is used in webometric research and partly to ensure reliable results. An alternative way to collect the data

Results

The first question asked whether search results varied more over time or more between markets. The top 10 results between different markets at the same point in time (Table 1, diagonal values) overlapped approximately the same amount as between the same market at different points in time (Table 1, off-diagonal values), at least for gaps of one or two months. Although the overall difference was significant at p < 0.001 using an independent samples t-test, the overall Jaccard similarity difference

Discussion

There is a surprising amount of variation among the URLs returned for the different English search markets, as returned by the Bing API. This finding should not be taken as indicative of all queries because of potential variations due to different types of search, such as academic, educational, cultural, or commercial queries. Indeed, it seems likely that academic queries would exhibit less variation, and cultural and commercial queries would exhibit more variation. Similarly, the results may

Conclusions

The top 10 results for the queries tested here showed substantial variations with the average overlap between any pair of search markets being less than 50%. There was more overlap between the full sets of results, with a majority of URLs being the same, on average, between pairs of different search markets. The extent of overlap between different search markets' results was about the same as the overlap for the same market a month later for the top 10 results, but less for the complete set of

Acknowledgment

This paper is supported by ACUMEN (Academic Careers Understood through Measurement and Norms), grant agreement number 266632, under the Seventh Framework Program of the European Union. The funding source had no role in the study, including: design, the collection, analysis and interpretation of data; the writing of the report; and the decision to submit the article for publication.

Mike Thelwall is professor of information science and leader of the Statistical Cybermetrics Research Group at the University of Wolverhampton, UK, and a research associate at the Oxford Internet Institute. He has developed tools for gathering and analyzing web data, including hyperlink analysis, sentiment analysis, and content analysis for Twitter, YouTube, blogs and the general web. His publications include 210 refereed journal articles, seven book chapters, and two books, including

References (40)

  • J. Bar-Ilan et al.

    A method for measuring the evolution of a topic on the web: The case of “informetrics”

    Journal of the American Society for Information Science and Technology

    (2009)
  • F. Diaz

    Integration of news content into web results

  • Google

    Google basics: Learn how Google discovers, crawls, and serves web pages

  • Google.com
  • Ka, C., Cuncannan, H., Wong, K., Steele, M., Gordon, M., Prakash, S., et al. (2010). U.S. Patent No. 7,693,901....
  • W. Koehler

    A longitudinal study of web pages continued: A report after six years

    Information Research

    (2004)
  • K. Kousha et al.

    Google scholar citations and Google Web/URL citations: A multi-discipline exploratory analysis

    Journal of the American Society for Information Science and Technology

    (2007)
  • S. Lawrence et al.

    Accessibility of information on the web

    Nature

    (1999)
  • D. Lewandowski

    A three-year study on the freshness of web search engine databases

    Journal of Information Science

    (2008)
  • Cited by (25)

    • Rethinking the corporate digital divide: The complementarity of technologies and the demand for digital skills

      2021, Technological Forecasting and Social Change
      Citation Excerpt :

      This was introduced in May 2011 and has become the only major international data source for web search engines, available for automatic offline processing in webometric research (Thelwall and Sud, 2012). The Google search engine is not used as it does not allow automatic searching through API (Wilkinson and Thelwall, 2013). Since Microsoft Bing API allows a researcher to perform an analysis using R or Python scripts, rather than carrying out a manual search, we have written the Python script to collect data relating to the number of mentions of a company's name and the keywords from the list above.

    • Can we use Google Scholar to identify highly-cited documents?

      2017, Journal of Informetrics
      Citation Excerpt :

      These are three aspects that all require detailed discussion. The way search engines (not only academic search engines, such as Google Scholar, but also general search engines, such as Google or Bing) function can cause two identical queries, made on different computers in different geographical locations, or simply repeated after a short period of time, to generate slightly different results (Wilkinson & Thelwall, 2013). This, in turn, can cause some documents to appear or disappear, or to move to another position within the search results page.

    • Citation Indexing and Indexes

      2021, Knowledge Organization
    View all citing articles on Scopus

    Mike Thelwall is professor of information science and leader of the Statistical Cybermetrics Research Group at the University of Wolverhampton, UK, and a research associate at the Oxford Internet Institute. He has developed tools for gathering and analyzing web data, including hyperlink analysis, sentiment analysis, and content analysis for Twitter, YouTube, blogs and the general web. His publications include 210 refereed journal articles, seven book chapters, and two books, including Introduction to Webometrics. He is an associate editor of the Journal of the American Society for Information Science and Technology and sits on three other editorial boards.

    David Wilkinson is a member of the Statistical Cybermetrics Research Group in the School of Technology at the University of Wolverhampton, UK, as well as head of the maths subject. He conducts link analysis and pure maths research and has published 10 refereed journal articles in journals such as, Journal of the American Society for Information Science and Technology, Journal of Information Science, and Information Processing & Management.

    View full text