Same search, different results: algorithm bias in various discovery tools in library search

GIDIF-RBM (Italian Association of Health Librarians) in collaboration with TDNet Group performed a trial to test different discovery tools (DTs) functionalities. DTs represent the effect of that long digital revolution which has upset the library universe as it was known up to that moment. Aim of the trial was to test two search queries: “cystic fibrosis”, “osteoarthritis” AND “chondrocyte” AND “cell therapy” using PubMed, Google Scholar, Ebsco EDS, Ex Libris Summon, and TDNet discover tools. The working group examined the first twenty-five results for each query to determine the quality of the results in each tool, using PubMed as a benchmark. The search analysis included an evaluation of quality of the journals ranking via Shimago SJR, the number of citations for each paper, the years of publication, and how many of the first 25 results were open access. Research findings indicate that DTs are powerful tools when managed consistently and holistically under team supervision. To make the best use of them, students and teachers must have information literacy skills, such as the ability to identify, evaluate, organize, use, and communicate information.


Introduction
GIDIF-RBM (Italian Association of Health Librarians) in collaboration with TDNet Group performed a trial to test different discovery tools (DTs) functionality.Results of this special experience and collaboration were presented during the annual meeting in Milan, Italy (Bibliostar) and in Trondheim at the TDNet group booth.Discovery tools (DTs) represent the effect of that long digital revolution which, starting in the 1980s, has upset the library universe as it was known up to that moment.It was 1993, a few years after the advent of the World Wide Web.The discovery tools, present in the online portals of libraries, especially academic ones, are tools that allow access with a single search to all the bibliographic resources present in the OPAC, whether they are books, articles, or entire periodicals both in paper and electronic format.Aim of the trial was to test two search queries: "cystic fibrosis", and "osteoarthritis AND "chondrocyte" AND "cell therapy" using PubMed, Google Scholar, Ebsco EDS, Ex Libris Summon, and TDNet Discover tools.

Same search, different results
The working group examined the first twenty-five results for each query in all the listed platforms to determine the quality of the results using PubMed as a benchmark.The search analysis included an evaluation of the journals, the quality of the journals ranking via Shimago SJR, the number of citations for each paper, the years of publication, and how many of the first 25 results were open access.Additionally, the overlap of articles found on more than one list of twenty-five was analysed.Two searches were conducted to effectively compare search results for a specific topic across multiple tools, with each search limited to the first 25 results.The first search was: [sub] "Cystic Fibrosis" AND [sub] "Therapy" OR "Therapeutic Use" [key] "Systematic Reviews", limits from 2018, and the second one was: [sub] "Chondrocyte AND cell therapy AND osteoarthritis".In Google Scholar, free text was used since subject fields were unavailable.In PubMed, the correct Mesh terms [MeSH] were used.All results from all platforms were imported into the TDNet to enable an analysis with the same metrics.

GIDIF-RBM (Italian Association of Health Librarians) in collaboration with TDNet Group performed a trial to test different discovery tools (DTs) functionalities. DTs represent the effect of that long digital revolution which has upset the library universe as it was known up to that moment. Aim of the trial was to test two search queries: "cystic fibrosis", "osteoarthritis" AND "chondrocyte" AND "cell therapy" using PubMed, Google Scholar, Ebsco EDS, Ex Libris Summon, and TDNet discover tools. The working group examined the first twenty-five results
for each query to determine the quality of the results in each tool, using PubMed as a benchmark.The search analysis included an evaluation of quality of the journals ranking via Shimago SJR, the number of citations for each paper, the years of publication, and how many of the first 25 results were open access.Research findings indicate that DTs are powerful tools when managed consistently and holistically under team supervision.To make the best use of them, students and teachers must have information literacy skills, such as the ability to identify, evaluate, organize, use, and communicate information.The key metrics to evaluate the results were: Dates, Journals, Citations, Journal Ranking, OA, and Overlap (https://drive.google.com/file/d/1HmyRxmUQi-pqc5gZN4_BdN3wGeZS9pJP/view?usp=sharing).

The results from the cystic fibrosis search
PubMed results were filtered for "Systematic Review" and the first 25 items were consider as high calibre.However, upon closer examination, it can't be overlooked that PubMed would benefit from additional tools to assess the value of search results quickly.Cochrane published half of the articles, and 14 out of the top 25 articles were open access.Other discovery tools such as Summon, EDS, TDNet, and Google Scholar use various methods to evaluate search results, with Summon and TDNet providing a more balanced set of results.Google Scholar relies on an algorithm that prioritises referenced material, which can be dated and limited to a narrow set of journals.EDS prioritises currency over referenced content, while PubMed prioritises sources, with Cochrane dominating the results.Searching for subjects and keywords in abstracts yielded relevant results, further refined using a custom filter managed by a librarian for systematic reviews.However, it's important to understand what determines an optimal set of results.There isn't an optimal search strategy to fit all the possible criteria.One researcher will prioritise sources, another currency and another how well it has been referenced; this is in addition to the relevance bias of each platform.Many articles are published without Subject metadata, and this research shows up a sizeable amount of peer review articles with that field not provided upon publication.Therefore, a search strategy with a subject potentially limits the amount of valuable content.It's worth mentioning that 47 papers out of 125 were found on two or more platforms, with one paper appearing in four and one in three.Overall, it's clear that each discovery tool has its strengths and weaknesses, and it's essential to use multiple tools to ensure a comprehensive search.

The results from the chondrocyte search
After conducting a Chondrocyte search on PubMed, the top 25 articles were retrieved, spanning 20 years with an average publication date of 2010.These articles were referenced heavily, although a significant number of them had poor or no SJR classification.Google Scholar also offered several articles, although many were outdated and had suboptimal SJR rankings, focusing on referenced content.EDS prioritised currency over referenced material, resulting in several journals with low SJR rankings.Summon balanced currency and referencing well, with journals featuring strong SJR rankings.TDNet had the strongest date currency, with a good SJR ranking, although the referenced content in the first 25 articles was not as good as other tools.Interestingly, this search strategy yielded fewer overlaps in the top 25 results, with only about ten articles appearing in two tools and no tools having three overlapping titles.This is surprising, given that the total number of valid results across all tools was around 200.

Discussion and conclusion
The comprehensive list of results is determined by the indexes and content that each platform uses.What determines the differences in the first 25 are deduplication tools and the relevance algorithms.Each platform, as the study has flagged, uses a different algorithm.The referenced material, currency, quality sources, and metadata-matching search terms are weighed.This study has been valuable in furthering the TDNet platform development towards providing solutions that support researchers' workflow, collecting scholarly material from various sources and enabling selection and evaluation.
Research findings indicate that DTs are powerful tools when managed consistently and holistically under team supervision.To make the best use of them, students and teachers must possess information literacy skills, such as the ability to identify, locate, evaluate, organize, use, and communicate information.The role of the librarian, cultural mediator, is clearly necessary above all in the evaluation of the answers that these systems provide (ranking) and not only, in fact, through the use of ad hoc ontologies, will it be possible to understand the relationships between the information and coordinate this ability to understand with the specific requests of the user, linking the information present in the web pages to abstract concepts organized hierarchically (ontology).