Biomedical Document Retrieval for Clinical Decision Support System

The availability of huge amount of biomedical literature have opened up new possibilities to apply Information Retrieval and NLP for mining documents from them. In this work, we are focusing on biomedical document retrieval from literature for clinical decision support systems. We compare statistical and NLP based approaches of query reformulation for biomedical document retrieval. Also, we have modeled the biomedical document retrieval as a learning to rank problem. We report initial results for statistical and NLP based query reformulation approaches and learning to rank approach with future direction of research.


Introduction and Motivation
Medical and Healthcare related searches are having major focus of internet search now a days.The recent statistics shows that 61% of adults look online for health information (Jones, 2009).This demands proper search and retrieval systems for health related biomedical queries.Biomedical Information Retrieval (BIR) seeks special attention due to the characteristics of biomedical terminologies.Major challenges in biomedical domain are in handling complex, ambiguous, inconsistent medical terms and their ad-hoc abbreviations.Many medical terms are very complex.The average length of biomedical entities is much higher than general entities which makes entity identification task difficult for biomedical domain.Entity identification and normalization helps to better solve the problems of retrieval and ranking of documents for medical search systems, biomedical text summarization, biomedical text data visualization, etc.
As we are focusing here on biomedical document retrieval and ranking system, biomedical literature should be in consideration.Biomedical literature is an important source of study in medical science.Thousands of articles are being added into biomedical literature each year.This large set of biomedical text articles can be used as a collection for Clinical Decision Support System where the related biomedical articles are extracted and suggested to medical practitioners to best care their patients.For this purpose, dataset from Clinical Decision Support (CDS) track is used which contains millions of full text biomedical articles from PMC (PubMed Central)1 .The statistics of CDS 2014, 2015 and 2016 datasets are given in the table 1. CDS2 track focuses on retrieval of biomedical articles which are related to patient's medical case reports.These medical case reports which are being used as queries are case narratives of patients medical condition.They describes patients' medical condition i.e. medical history, symptoms, tests performed, treatments etc.For a given query/case report, the main problem is to find relevant documents from the available collection and rank them.

Background
'Information Retrieval: A Health and Biomedical Perspective' (Hersh, 2008) (Maron and Kuhns, 1960;Carpineto and Romano, 2012) which has a long history in information retrieval can be useful to deal with such problems.For instance, medical queries were expanded with other related terms from RxNorm, a drug dictionary, to improve the representation of a query for relevance estimation (Demner-Fushman et al., 2011).The emergence of medical domain specific knowledge like UMLS can contribute to the retrieval system to gain more understanding of the biomedical documents and queries.The Unified Medical Language System (UMLS) (Bodenreider, 2004) is a metathesaurus for medical domain.It is maintained by National Library of Medicine (NLM) and it is the most comprehensive resource, unifying over 100 dictionaries, terminologies, and ontologies.Various approaches of information retrieval with the UMLS Metathesaurus have been reported: some with decline in results (Hersh et al., 2000) and some with gain in results (Aronson and Rindflesch, 1997).The next section of this paper includes statistical approaches as well as NLP based approaches.

Query Reformulation for Biomedical Document Retrieval
Here, we present statistical and NLP based query reformulation approaches for biomedical document retrieval.Statistical approaches include feedback based query expansion and feedback document discovery based query expansion.An NLP based approach that is UMLS concept based query reformulation is also discussed here.

Automatic Query Expansion With Pseudo Relevance Feedback & Relevance Feedback
Query Expansion (QE) is the process of reformulating a query to improve retrieval performance and efficiency of IR systems.QE is proved to be efficient in case of document retrieval (Carpineto and Romano, 2012).It helps to overcome vocabulary mismatch issues by expanding the user query with additional relevant terms and by re-weighting all terms.Query Expansion which uses the top retrieved relevant documents is known as Relevance Feedback.It requires human judgment to identify relevant documents from top retrieved documents.While pseudo Relevance Feedback technique assumes the top retrieved documents to be relevant and uses as feedback documents.It does not require human input at all.The Query expansion based approaches for biomedical domain gives better results as compared to retrieval without query expansion (Sankhavara et al., 2014).Table 2 and table 3 shows the results of standard retrieval (without expansion), Pseudo-Relevance Feedback (PRF) based Query Expansion and Relevance Feedback (RF) based Query Expansion with BM25 and In expC2 retrieval models (Amati et al., 2003) on CDS 2014, 2015and 2016 datasets.The retrieval model BM25 is a ranking function based on probabilistic retrieval framework while In expC2 is also a probabilistic but based on Divergence From Randomness (DFR).These models are available in Terrier IR Plateform3 (Ounis et al., 2005)  Higher the value of evaluation measure, better the retrieval result of system.The result improves with PRF and RF based query expansion giving statistically significant results (p < 0.05) as compared to no expansion.Here RF is giving 50-60% more improvement than PRF over no expansion.We argue that biomedical retrieval should be done keeping human in the loop.A small human intervention can increase the retrieval accuracy to 60% more.

Feedback Document Discovery for Query Reformulation
Feedback Document Discovery based query expansion as described in (Sankhavara and Majumder, 2017) learns to identify relevant documents for query expansion from top retrieved documents.The main aim is to use small amount of human judgement and learn pseudo judgement for other documents to reformulate the queries.One approach is based on classification.If we have human judgements available for some of the feedback documents, then it will serve as a training data for classification.The documents were repre-sented as a collection of bag-of-words, the TF-IDF scores of the words represent features and human relevance scores provides the classes.Then the relevance is predicted for other top retrieved feedback documents.The second approach is based on and classification+clustering. It first applies classification in similar way as in first approach and then applies clustering on relevance predicted class by the classification method, thus filtering out more non-relevant documents from relevant ones.Since, the convergence of K-means clustering depends on the initial choice of cluster centroids, the initial cluster centroids are chosen as the average of relevant documents vectors and the average of non-relevant documents vectors from training data.
Here we have used that approach with different features.The TF-IDF features are weighted based on type of words.CliNER tool (Boag et al., 2015) has been used to identify medical entities of type 'problem', 'test' and 'treatment' from documents, which was trained on i2b2 2010 dataset (Uzuner et al., 2011)

UMLS Concepts Based Query Reformulation
Medical domain-specific knowledge can be incorporated to the process of query reformulation in Biomedical IR system.There are knowledge based approaches proposed in the literature (Aronson and Rindflesch, 1997;Demner-Fushman et al., 2011;Hersh, 2008).In the biomedical text retrieval, medical concepts and entities are more informative than other common terms.Moreover, medical ontologies, thesaurus and biomedical entity identifiers are available to identify medical related concepts.
Here we have used the resource UMLS.The following three query reformulation experiments are done using it.First: The UMLS concepts are identified from the query text and used with queries.Second: Along with the UMLS concepts, MeSH (Medical Subject Heading) terms are also identified and used in queries.MeSH is a hierarchically organized vocabulary of UMLS.Third: Medical entities are identified manually and used with queries.One example query with all these reformulations is presented in Appendix A.
Table 5 shows the results of these reformulated queries of CDS 2014.PRF and RF based query expansion is also carried out on each form of the queries.The results shows clear improvement when using UMLS concepts in queries as compared to original queries.One more important observation here to make is that, for no-expansion

Learning To Rank
Learning to rank (LTR) (Liu et al., 2009) is an application of machine learning in the construction of ranking models for information retrieval systems where retrieval problem is modeled as a ranking problem.LTR framework requires training data of queries and documents matching them together with relevance degree of each match.Training data is used by a learning algorithm to produce a ranking model which computes relevance of documents for actual queries.
The LTR framework is applied on CDS 2014 dataset where the features for query document pairs are computed similarly as the features used for OHSUMED LETOR dataset (Qin et al., 2010).These features are mainly based on TF, IDF and their normalized versions.Since the whole document pool is too large, document pooling has been done and top K documents (by BM25) for each query are used for feature extraction.SVMRank has been used as a machine learning framework.
Table 6 shows the results of LTR when the features are computed on Title+Abstract part of the documents, on Title+Abstract+Content of the documents (i.e.full documents All these LTR experiments require human judgement for training.To overcome the need of manual judgement, pseudo judgements were considered where out of k training documents, Top k/2 documents are considered to be relevant and other k/2 documents to be non-relevant.
As shown in table 7, the results of LTR trained using pseudo qrels are better than one with actual human judged qrels but the difference is not statistically significant.The results of LTR are comparable to retrieval using BM25.

Future Research Directions
Biomedical text processing and information retrieval being a new field of research opens up many research directions.In this article, we have presented a preliminary study of statistical and NLP based biomedical document retrieval techniques for clinical decision support systems.It included query reformulation based information retrieval framework with pseudo relevance feedback, relevance feedback, feedback document discovery and UMLS concept based reformulation for Biomedical domain.Standard IR frameworks PRF and RF works good enough for Clinical Decision Support System.Feedback document discovery based query reformulation which is a statistical approaches can be improvised in future for significant improvement.Another statistical model Learning to Rank is also having future scope for more improvement.The initial framework for NLP based approach UMLS concept based retrieval also shows improvement in the results.Therefore, we plan to combine statistical and NLP based approaches and come up with new better model for biomedical document retrieval for Decision Support Systems.Also, we are planning to do feature weighting using NLP at entity level in feedback document discovery approach.

Figure 1 :
Figure 1: Query wise difference graph of infNDCG for feedback document discovery and relevance feedback

Table 3 :
which is developed at School of Computing Science, University of Glas- (Manning et al., 2008)ery Expansion with PRF and RF gow.Here, we have used terrier plateform for the experiments.Summary part of the query is used for retrieval with top 10 and 50 top documents for feedback in expansion.MAP and infNDCG are used as evaluation metrics(Manning et al., 2008).

Table 4 :
Results of Feedback Document Discovery

Table 5 :
Results of UMLS based query processing and PRF, the manual entities fail to improve MAP when compared to UMLS entities but certainly give better results in terms of infNDCG.
).With these variations of features, the experiments are carried out on original queries, queries with UMLS concepts and queries with manually identified medical con-