On the impact of domain expertise on query formulation, relevance assessment and retrieval performance in clinical settings

https://doi.org/10.1016/j.ipm.2016.11.004Get rights and content

Highlights

  • A crowd-based evaluation on the impact of domain expertise on query formulation, relevance assessment and retrieval performance.

  • Queries issues by experts are significantly longer and more technical than queries issued by novices.

  • There is a low level of relevance assessment agreement among both experts and novices but the reasons of the relevance assessment difficulty significantly differ between them.

  • Traditional information retrieval models which mainly consider the presence or absence of query terms within documents are particularly unsuccessful for experts who rather leverage from their knowledge and past experience to assess a multi-dimensional relevance.

Abstract

The large volumes of medical information available on the web may provide answers for a wide range of users attempting to solve health-related problems. While experts generally utilize reliable resources for diagnosis search and professional development, novices utilize different (social) web resources to obtain information that helps them manage their health or the health of people who they care for. A diverse number of related search topics address clinical diagnosis, advice searching, information sharing, connecting with experts, etc. This paper focuses on the extent to which expertise can impact clinical query formulation, document relevance assessment and retrieval performance in the context of tailoring retrieval models and systems to experts vs. non-experts. The results show that medical domain expertise 1) plays an important role in the lexical representations of information needs; 2) significantly influences the perception of relevance even among users with similar levels of expertise and 3) reinforces the idea that a single ground truth does not exist, thereby leading to the variability of system rankings with respect to the level of user’s expertise. The findings of this study presents opportunities for the design of personalized health-related IR systems, but also for providing insights about the evaluation of such systems.

Introduction

Several studies (Fox, Duggan, 2013, Fox, 2011) have clearly shown that people, both experts (e.g., physicians and nurses) and novices (e.g., patients and their family), have strong desires for medical information. Regardless of the domain expertise of users seeking information, medical and health-search have been acknowledged as a complex search tasks leading to search failures or biases (Ely, Jerome, Osheroff, Chamblis, Mark, Ebbell, Marcy, Rosenbaum, 2005, Ely, Jerome, Osheroff, Saverio, Maveglia, Marcy, Rosenbaum, 2007, Roberts, Simpson, Demner-Fushman, Voorhees, Hersh, 2015, White, Horvitz, 2015). Even if it appears that the effectiveness of specialized search engines within the medical domain is not significantly higher than the effectiveness of general web search engines (Bin & Lun, 2001), several previous studies have revealed that significant differences between them in search intents may be linked to the information resources being used (Choudhury, Morris, White, 2014, Natarajan, Stein, Jain, Elhadad, 2010, Zhang, Fu, 2011):

  • General web resources: This category of resources includes resources indexed by general web search tools and social platforms not particularly devoted to or certified for health concerns, thereby leading to general web searching (which is different from a vertical search). Searching the web for health-related information has been acknowledged as a frequent activity of a wide variety of users (Fox, Duggan, 2013, Spink, Yang, Jansen, Nykanen, Lorence, Ozmutlu, 2004, Zhang, Fu, 2011). The web is used for addressing a wide range of search topics, such as those concerning (Choudhury, Morris, White, 2014, Eysenbach, Powell, Englesakis, 2004, Spink, Yang, Jansen, Nykanen, Lorence, Ozmutlu, 2004, White, Horvitz, 2009, Zhang, Fu, 2011): 1) general health, drug and dosing, and disease management (searching for rare diseases or updates on common diseases); 2) (differential) diagnosis or referral guidelines; 3) professional development; 4) personal and opinion-oriented goals (personalized healthy lifestyle information such as diet, nutrition, and sexual health information); 5) advice (e.g., advice after being dissatisfied with professional care); 6) information sharing (e.g., with doctors and/other patients) ; 7) people with similar conditions on social platforms; and 8) connecting with experts.

  • Clinical information resources: This category of resources is used within a domain-specific or vertical search, including 1) electronic health records (EHRs) that are used by medical professionals and 2) medical scientific reviews or content from certified health and medical sites that are both used by experts (e.g., clinicians) and non-experts (novices) for different purposes. Expert clinical information searches are generally performed by clinicians under the Evidence-Based Medicine (EBM) approach (Sackett, 1997) as the basis for clinical decisions that better suit the patient under consideration. In contrast, non-expert clinical information searches are completed to help patients and their representatives to better understand their own health conditions or conditions of people they care for. Searching for clinical information is also a common pursuit. A previous study showed, for example, that 1.8 billion clinical searches were conducted on PubMed in 2011 (NLM, 2012); another previous study showed that one-third of PubMed users is not medical experts (Lacroix & Mehnert, 2002).

    Early studies (Ely, Osheroff, Gorman, Ebell, Chambliss, Pifer, Stavri, 2000, Pratt, Wasserman, 2000) proposed a general classification for search topics hidden behind clinical queries that are clearly less diversified than are health-related searches performed on general web resources. In Pratt and Wasserman (2000), the authors classified clinical queries that were addressed to MEDLINE into 10 category topics, including prevention, risk factors, diagnosis, symptoms, treatments and side effects.

In this paper, clinical information search is specifically investigated, the performance of which remains questionable and subject to numerous issues (Cohen, Stavri, Hersh, 2004, Francke, Smit, de Veer, 2008, Natarajan, Stein, Jain, Elhadad, 2010, Suominen, Salanter, Velupillai, Chapman, Savova, Elhadad, Pradhan, South, Mowery, Jones, Leveling, Kelly, Goeuriot, Martinez, Zuccon, 2013, White, Horvitz, 2015). These issues mainly arise from the following: 1) the complexity of expressing precise, context-specific clinical queries that better facilitate the identification of the relevant evidence and 2) the lack of a higher level expertise that can be used to perform evidence appraisal. Thus, we argue that an ideal clinical search engine should exploit information nuggets from both the query and the domain expertise level of the user to accurately identify clinically relevant information. Achieving this requires a deep understanding of the key differences that exist between expert-based and non-expert-based clinical information searches. To the best of our knowledge, how expert-clinical queries differ from non-expert queries is not well established in the literature; furthermore, the differences in the relevance assessment provided by either experts or non-experts and their impact on system ranking stability have not thoroughly investigated. With this in mind, we attempt to investigate the differences, commonalities, and relationships between expert-based and novice-based clinical searches. We focus on: 1) the query formulation in terms of length, domain-specificity and difficulty attributes, acknowledged as being important factors that could contribute to search success/failure (Ely, Jerome, Osheroff, Chamblis, Mark, Ebbell, Marcy, Rosenbaum, 2005, Ely, Jerome, Osheroff, Saverio, Maveglia, Marcy, Rosenbaum, 2007, Tamine, Chouquet, Palmer, 2015); 2) the relevance assessment in terms of difficulty and related reasons, relevance agreement between assessors, time spent to assess relevance and 3) the relationship between user’s expertise level and retrieval effectiveness with respect to his relevance assessment. We conducted our study by assigning search tasks to experts and novices via two distinct crowdsourcing platforms allowing to recruit the two categories of clinical information seekers (experts/novices). To design reliable simulated clinical search tasks, we used the medical cases provided within major medical IR evaluation tracks namely the TREC1 Filtering (Robertson & Hull, 2000) and the CLEF2 e-Health (Suominen et al., 2013) with related search contexts.

The remainder of this paper is structured as follows. In Section 2 we describe research work related to the effects of domain expertise on search and relevance assessment and those related to crowdsourced user studies. To put this work in context, the findings are reported for both cross-domain expertise and specific medical domain expertise. Section 3 announces the research questions and then describes the studies that we perform in order to identify the commonalities and the differences between expert-based search and novice-based search within the medical domain, including query formulation, relevance assessment and retrieval performance. In Section 5 we report the findings of our studies based on quantitative and qualitative analysis. Section 6 discusses the results and highlights the study implications. Section 7 concludes this article.

Section snippets

On the influence of domain expertise on information search: query formulation, search behavior and search difficulty

Based on intensive research work that has been performed in information science, researchers agree that information seeking and retrieval are perceived as cognitive activities constrained by several contextual factors used for reducing the complexity of the retrieval process (Ingwersen & Belkin, 2004). One of the major factors identified is knowledge, which can be divided into search knowledge and domain knowledge. While search knowledge concerns the knowledge of search processes, domain

Research questions

As outlined in the literature review, only a few studies have examined the differences between expert-based and non-expert-based information searches in the medical domain (Palotti, Hanbury, Henning, 2014, White, Dumais, Teevan, 2009, White, Dumais, Teevan, 2008). Furthermore, previous research did not focus on understanding the differences within clinical information searches specifically. We identified the following gap in previous research:

  • There is a lack of studies that thoroughly identify

Results

The central goal of this study was to investigate the similarities and commonalities between the formulations, relevance assessments and performance of expert searches and non-expert ones. The statistical analysis were performed using the SAS (http://www.sas.com/) software, version 9.3. Document indexing and retrieval were performed using Terrier framework (http://www.terrier.org) version 4.0. In this section, the results are grouped by research question and the main findings that arise from

Discussion and design implications

The study investigated the differences and commonalities between expert-oriented and novice-oriented clinical searches using library resources. The results show that queries issued by experts are longer than those issued by non-experts, as previously shown in web searches (White, Dumais, Teevan, 2009, White, Dumais, Teevan, 2008); Moreover, consistent with previous findings in the medical web searches (White et al., 2008), the results show that experts searching medical repositories are more

Conclusion

Medical information search is a common pursuit in the daily life of an increasing number of users either experts or not. Even medical search services grown in popularity, there is a lack of studies that investigated the differences between medical-related searches involved by experts and novices using clinical resources. We employed two crowdsourcing platforms to gain access to experts and novices. In this study, it was found that expert-based searches are significantly different than

Acknowledgment

We acknowledge the support of Paul Sabatier University under the grant project ERC.

References (73)

  • D.L. Sackett

    Evidence-based medicine

    Seminars in perinatology

    (1997)
  • A.R. Taylor et al.

    Relationships between categories of relevance criteria and stage in task completion

    Information Processing and Management

    (2007)
  • E.M. Voorhees

    Variations in the relevance of judgments and the measurement of retrieval effectiveness

    Information Processing and Management

    (2000)
  • O. Alonso et al.

    Crowdsourcing for relevance evaluation

    SIGIR Forum

    (2008)
  • P. Bailey et al.

    Relevance assessment: are judges exchangeable and does it matter

    Proceedings of the 31st annual international acm sigir conference on research and development in information retrieval. SIGIR ’

    (2008)
  • K. Bhavnani

    Important cognitive components of domain-specific search knowledge

    Proceedings of the tenth text retrivel conference. TREC ’01

    (2001)
  • B. Carterette et al.

    The effect of assessor error on ir system evaluation

    Proceedings of the 33rd international acm sigir conference on research and development in information retrieval. SIGIR ’10

    (2010)
  • M. Choudhury et al.

    Seeking and sharing health information online: comparing search engines and social media

    Proceedings of the acm chi conference

    (2014)
  • C.L. Clarke et al.

    Novelty and diversity in information retrieval evaluation

    Proceedings of the 31st annual international acm sigir conference on research and development in information retrieval. SIGIR ’08

    (2008)
  • S. Cronen-Townsend et al.

    Predicting query performance

    Proceedings of the 25th annual international acm sigir conference on research and development in information retrieval. SIGIR ’02

    (2002)
  • C.S. Davis

    Statistical methods for the analysis of repeated measurements

    (2002)
  • T. Demeester et al.

    Predicting relevance based on assessor disagreement: analysis and practical applications for search evaluation

    Information Retrieval Journal

    (2015)
  • D. Dinh et al.

    Combining global and local semantic contexts for improving biomedical information retrieval

    European conference on information retrieval (ecir)

    (2011)
  • J. Ely et al.

    Analysis of questions asked by family doctors regarding patient care

    British Medical Journal

    (1999)
  • J. Ely et al.

    Obstacles to answering doctor’s questions about patient care with evidence

    British Medical Journal

    (2002)
  • J. Ely et al.

    A taxonomy of generic clinical questions: classification study

    Biomedical Journal

    (2000)
  • G. Eysenbach et al.

    Health virtual communities and electronic support groups: systematic review of the effects of online peer to peer interactions

    Biomedical journal

    (2004)
  • K. Fox et al.

    Health online 2013

    Technical Report

    (2013)
  • S. Fox

    80% of internet users look for health information online

    Technical Report

    (2011)
  • A. Francke et al.

    Factors influencing the implementation of clinical guidelines for health care professionals: a systematic meta-review

    BMC Medical Information Decision Making

    (2008)
  • M. Gupta et al.

    Survey on social tagging techniques

    SIGKDD Explorations Newsletter

    (2010)
  • H.A. Hembrooke et al.

    The effects of expertise and feedback on search term selection and subsequent learning: research articles

    Journal of American Society in Information Science and Technology

    (2005)
  • H. Hersh et al.

    Ohsumed: an interactive retrieval evaluation and new large test collection for research

    Proceedings of the 17th annual international acm sigir conference on research and development in information retrieval. SIGIR ’94

    (1994)
  • W. Hersh et al.

    Trec 2004 genomics track overview

    Proceedings of the text retrieval conference trec

    (2004)
  • W. Hersh et al.

    Trec 2005 genomics track overview

    Proceedings of the text retrieval conference trec

    (2005)
  • W. Hersh et al.

    Factors associated with success in searching medline and applying evidence to answer clinical questions

    Journal of the American Medical Informatics Association

    (2002)
  • Cited by (0)

    View full text