Head Concepts Selection for Verbose Medical Queries Expansion

Semantic concepts and relations encoded in domain-specific ontologies and other medical semantic resources play a crucial role in deciphering terms in medical queries and documents. The exploitation of these resources for tackling the semantic gap issue has been widely studied in the literature. However, there are challenges that hinder their widespread use in real-world applications. Among these challenges is the insufficient knowledge individually encoded in existing medical ontologies, which is magnified when users express their information needs using long-winded natural language queries. In this context, many of the users’ query terms are either unrecognized by the used ontologies, or cause retrieving false positives that degrade the quality of current medical information search approaches. In this article, we explore the combination of multiple extrinsic semantic resources in the development of a full-fledged medical information search framework to: i) highlight and expand head medical concepts in verbose medical queries (i.e. concepts among query terms that significantly contribute to the informativeness and intent of a given query), ii) build semantically-enhanced inverted index documents, and iii) contribute to a heuristical weighting technique in the query-document matching process. To demonstrate the effectiveness of the proposed approach, we conducted several experiments over the CLEF e-Health 2014 dataset. Findings indicate that the proposed method combining several extrinsic semantic resources proved to be more effective than related approaches in terms of precision measure.


I. INTRODUCTION
Contrarily to generic web-based search queries that tend to be short [1], medical queries are long-winded with a reported average length of five terms when examining the query log of an Electronic Health Record search engine. 1 Furthermore, their processing through statistical techniques alone appears insufficient since they encompass several domain-specific medical concepts [2], [3] that require making use of extrinsic knowledge for their deciphering [4]. This forms a crucial challenge for Medical Information Retrieval (MIR) systems that aim to find matches between medical documents and their corresponding queries in the same domain [5]- [7] and motivates the use for language resources when further The associate editor coordinating the review of this manuscript and approving it for publication was Vlad Diaconita . 1 http://project-emerse.org/ expanding these queries. Recently, MIR systems have shifted to exploiting medical semantic resources and ontologies in an attempt to capture knowledge in this domain through formally and explicitly defining medical concepts, instances, as well as semantic and taxonomic relations that link related concepts. Several examples of these resources can be found at the BioPortal 2 website. However, despite the constant growth of current medical semantic resources, they are still insufficient in terms of their domain coverage at both breadth and depth levels (i.e. they formally encode domain conceptualizations at different granularity levels) [8]- [10]. The main reason behind this limitation is referred to the fact that they are being developed by experts who adopt different standards and use various languages to describe them [11]. Indeed, the incompleteness of the captured semantic information can substantially affect the quality of systems relying on them [12]. On the other hand, MIR systems face another important challenge that has a major impact on their effectiveness. This challenge is manifested by the diversity of users, their information needs and their background knowledge in the medical domain [6], [13], [14]. Addressing each of these challenges plays a crucial role in the way medical query processing and expansion techniques are developed, and has a direct impact on the quality of the retrieved results by MIR systems [15]- [18]. Starting from this position, we propose a semantics-based MIR system that aims to improve the quality of the returned results through incorporating multiple medical semantic resources and query expansion techniques. In particular, we use the UMLS Metathesaurus, which is a large-scale biomedical thesaurus that provides explicit specifications of biomedical knowledge, consisting of concepts classified by semantic type, in addition to the hypernymy-hyponymy relation and other non-hierarchical relationships among the concepts. We use two resources to exploit the UMLS Metathesaurus in our proposed system: 1. The MetaMap tool which maps biomedical texts to the UMLS Metathesaurus. It locates all UMLS concepts associated with terms in biomedical texts using knowledge intensive methods based on symbolic, natural language processing and computational linguistic techniques. 2. The MRDEF relational table that contains UMLS concept definitions from multiple medical semantic resources. In our approach, we use the 'MSH' source which is obtained from the Medical Subject Headings MeSH) thesaurus and contains 29,244 different concepts. We furthermore consider the UMLS SPECIALIST lexicon (a.k.a. UMLS lexicon) that is provided by the National Library of Medicine (NLM). This lexicon is one of the richest available sources of medical lexical information and has been employed for the purpose of analyzing medical text. In the context of our work, the UMLS lexicon is used to carry out the following tasks: 1. Extract medical acronyms and abbreviations from user queries. 2. Expand the extracted acronyms and abbreviations into their full representations and use them to reformulate the original queries. 3. Expand medical terms in the original queries by finding their related medical synonyms using the exploited medical semantic resource. In this context, when a user submits a medical query, the proposed system analyzes the query to identify head concepts in addition to other supportive query terms and enriches them with semantically-relevant terms derived from the collective integration of results from the used resources. The expanded queries are then matched with their corresponding medical documents. In our approach, for each medical document, a semantics-based inverted index is automatically constructed through utilizing the same medical resources that we employ for enriching user queries. By carrying out this step, the matching task is performed at the semantics-level wherein medical documents are ranked according to their semantic closeness to their relevant queries. The main contributions of our proposition are summarized as follows: 1. Employing multiple medical semantic resources for: a. Identifying and enriching head concepts of medical queries with semantically and taxonomically related terms. In this context and unlike conventional methods that attempt to enrich queries using individual semantic resources, we employ multiple resources that collectively suggest enrichment candidates. b. Constructing semantically-enriched inverted files that encode the latent semantic information within the content of medical documents. Accordingly, rather than relying on representative keywords, our method constructs indexes comprising additional semantic dimensions that are utilized for retrieval purposes. 2. Identifying and re-weighting medical query terms based on the employed semantic resources. In this context, medical terms are assigned higher weights against other supportive terms. We demonstrate the importance of this step and its impact on the quality of the proposed system in the experimental evaluation section. The rest of this article is organized as follows. In Section 2, we review related literature. In Section 3, we describe the overall organization of the proposed framework and detail the query and document processing steps, as well as the matching and weighting techniques. We present the empirical setup and the produced results of the conducted experiments in Section 4. In Section 5, we discuss the conclusions and outline the future extensions to our current work.

II. RELATED WORK
The utilization of Natural Language Processing (NLP) techniques and medical semantic resources for processing medical queries has been at the heart of MIR systems for years [4], [6], [7], [13]- [30]. For instance, Zhu and Carterette proposed a medical record search system for identifying cohorts required in clinical studies [28]. To do so, the authors employed a query-adaptive weighting method that can dynamically aggregate and score evidence within multiple medical reports. They proposed using a number of features such as length of the query, number of concepts in the query, broad/narrow query concepts, etc. that can be exploited to assign weights for medical concepts in the supplied queries. Medical concepts are detected using MetaMap [31], a medical NLP tool developed by the National Library of Medicine (NLM) to map biomedical text to concepts in the Unified Medical Language System (UMLS) Metathesaurus. The authors cross-validated that their weighting technique is better than a fixed-weighting method across several evaluation metrics. Though, according to the authors, the improvement was not statistically significant, and the proposed method had the potential to be further improved by incorporating other useful features or by using advanced prediction models. In light of this argument, we would like to also point out that using the UMLS Metathesaurus alone for mapping medical concepts is not sufficient due to the limited domain coverage of this knowledge base. In a similar line of research, Martinez et al. proposed to automatically expand medical queries based on the concepts and relations included in the UMLS [29]. The query expansion method relied on an algorithm known as Personalized PageRank, which runs over the graph representation of the UMLS structure. The intention of using this algorithm was to initialize the probability distribution of the UMLS graph with the terms highlighted in the query to identify relevant terms, which can be used to expand the query for improving the retrieval of relevant medical documents. To demonstrate the effectiveness of their proposed approach, the authors conducted experiments using the TREC Medical Record track, showing improvements in both the 2011 and 2012 datasets over baseline methods. However, we argue that despite the achieved improvement by the proposed approach, the reliance only on the UMLS knowledge base is not sufficient. This is mainly because of the domain knowledge incompleteness problem [8], [14], which is also acknowledged by the authors of the proposed approach.
In a recent work detailed in [7], the authors proposed an automatic medical query expansion method that starts by identifying key terms (i.e. the most effective candidate expansion terms among the query terms) to be used in the matching and retrieval process. To identify key terms, the authors re-used the method proposed in their previous work [4]. Using this method, they located all the contexts in the original document collections that matched the contexts of the key terms in verbose queries. Although the proposed method proved to be efficient in accomplishing the matching task, its effectiveness was hindered by the following facts. First, as stated by the authors, query terms can be single terms or phrases. The authors referred to these types of terms as key terms without considering their semantic dimensions that could have an impact on the overall quality of the proposed method. For instance, the authors did not consider the semantic relations that may exist between key terms. Also, they ignored the synonyms as well as other lexically-related terms to each key term. Second, the process of locating all contexts in the document collections may lead to retrieving several false positives as the method relied on keyword overlap between the extracted contexts.
Stanton and his colleagues explored the scenarios wherein a user expresses his information needs using many words to describe a certain symptom [13]. To do this, they proposed a supervised machine learning approach to link terms among the given queries to their corresponding medical concepts. In the context of their work, they first obtained the formal definitions of diseases using medical semantic resources in an attempt to reformulate queries through incorporating the derived medical concepts and their definitions. Although the proposed approach achieved an improvement in mapping symptoms to the proper relevant disease/s, the authors ignored other query term types (those that do not belong to symptoms and diseases) such as laboratory tests, medical devices, etc. In a similar line of research, Shen et al. proposed the bag-of-concepts model to identify medical concepts in user queries through exploiting medical knowledge resources [19]. To retrieve medical documents, they used the selected concepts and their mapping entities in the used resources. However, the proposed approach was hindered by two obstacles. First, all non-medical query terms were ignored in the proposed retrieval process. Second, due to limited domain coverage issues, many concepts highlighted in the user queries were not recognized by the used medical knowledge resources. In a similar work by Choi and Choi, the authors proposed a concept-based query expansion model using selective query concepts [20]. In this context, discharge summary reports (defined as the resources queries have been built from) from the CLEF eHealth14 dataset and UMLS were used to extract and expand medical concepts. Other concepts that were not in the discharge summary reports were ignored. The proposed system demonstrated minimal improvement on the quality of the produced results because of two reasons. First, the authors did not consider compound terms and stopwords that may exist in the queries and medical documents. Second, they restricted the expansion scope to query-related discharge summary reports provided in the dataset. However, such reports were provided as example results only. On the other hand, Goeuriot et al. focused on using local resources for query reformulation rather than using external medical semantic resources and NLP techniques [30]. In this context, the authors used the Pseudo Relevance Feedback (PRF) model for query reformulation. To do so, terms occurring in the top-k documents retrieved by the system in its initial run were selected as expansion candidates. In addition, they incorporated medical concepts that appeared in the discharge summary of each query. The main limitation of this approach was the utilization of resources that suffered from a restrained number of medical concepts. As such, many medical concepts could not be mapped to their corresponding terms in the given queries. To overcome shortcomings associated with the use of limited local resources, Zuccon et al. proposed using other external data sources [22]. In this context, the authors analyzed the results retrieved by two commercial web search engines (Google and Bing) on a set of queries formulated by laypeople to describe medical symptoms.
The authors found that only three out of the top ten retrieved results by both search engines were marked as relevant. They concluded that existing commercial search engines cannot perform well when they are used in specific domains requiring expert knowledge such as the medical domain. We provide in Table 1 a summary of the representative research works discussed herein. VOLUME 8, 2020 In order to address the above discussed limitations, we propose combining multiple medical semantic resources and query expansion techniques in a single MIR framework.
Our attempt in this context is to bridge the semantic gap between medical queries and their corresponding medical documents. Accordingly, inspired by the strengths of previous approaches, we exploit trusted and well-recognized extrinsic medical resources, i.e. the UMLS lexicon and UMLS Metathesaurus, in our approach. Therefore, rather than using a limited resource such as the previously introduced discharge summary reports or a generic source such as web search engines, we use a combination of medical knowledge bases (such as MeSH, SNOMED, RxNorm ) for semantic concept highlighting and expansion. In addition, we propose a heuristical approach for re-weighting medical query terms based on their mappings to their relevant medical concepts in the used resources and experimentally demonstrate its impact. Figure 1 depicts a block diagram for the main components of our proposed system, including their interactions. We detail in the remainder: a) the linguistic pre-processing step, b) the semantic processing of queries and their UMLS-based expansion, c) the semantic processing of documents and the construction of a semantic-based inverted index, d) the similarity computation between query and index documents and heuristic-based weighting strategy.

A. QUERY PROCESSING & UMLS-BASED EXPANSION
Medical queries are verbose natural language queries that are inherently ambiguous and contain many terms that are hard to resolve. The fundamental goal is therefore to decipher their intent after highlighting and expanding their key components while removing less important entities that would impact retrieval performance. As depicted in Figure 1, a query is pre-processed through a sequence of NLP steps that include n-gram tokenization and stopword removal in order to filter out irrelevant terms [32]. We would like to point out that instead of only using a manually-constructed list of stopwords as proposed in [1], we utilize a term weighting scheme that employs the inverse document frequency [33] to assign term weights. Accordingly, in addition to the pre-defined list of stopwords, the proposed scheme assists in automatically constructing a list of additional stopwords that have weights below a threshold value v, which are obtained using Equation 1: where, • idf t : is the inverse document frequency of a term t • N : is the total number of documents in the dataset • df t : is an inverse measure of the informativeness of the term t For the sake of generalization, the threshold value v is automatically determined as follows. For each term t that belongs to a document d, we obtain idf _list = idf t1 , idf t2 , idf t3 , . . . , idf tn , and find the maximum difference among the elements of the idf _list using v = Max idf (idf ti , idf tj ). Accordingly, all terms with weights less than v are automatically added to the stopword list. After this filtering step, a query q is represented as: where each t i belongs to any of the following categories [14]: • Medical Terms, i.e. terms that can be mapped to medical concepts in the exploited medical semantic resources (e.g. the medical term aortic).
• Acronyms, e.g. ARV that stands for 'Adelaide River Virus', 'Average Rectified Value' or other medical terms.
• Supportive Terms, defined as any other terms in the query that could not be classified as acronyms, abbreviations, or medical terms (such as: replacement, status. . .). To perform the category mapping step, we define a sliding window of length n = 3 for finding uni-, bi-and trigram tokens among query terms. This task is performed with the assistance of the exploited medical semantic resources rather than conventional methods based on statistical information such as term frequency and inverse document frequency [34], residual inverse document frequency and weighted information gain, and google n-gram term and query frequency [22]. In this context, we submit all n-gram tokens to the UMLS lexicon [35] in order to classify them into the four categories. In addition, we utilize the MetaMap tool to detect synonymous medical terms for any of the terms that fall under the three first categories based on the UMLS Metathesaurus; which includes data from MeSH, SNOMED, RxNorm, and other collections [36].
The multiple semantic resource based query processing scenario is formalized in Algorithm 1: To demonstrate these steps, we consider the following two example medical queries that are obtained from two different datasets (CLEF e-Health 2014 and TREC).
After the stopword filtering step, query terms are stemmed using Porter stemmer [37]. We use the n-gram tokenization technique to highlight lists of uni-grams (Ug), bi-grams (Bg) and trigrams (Tg). The output for the example queries is shown in Table 2.
The UMLS lexicon is used to extract and expand medical acronyms and abbreviations in the user's query through its ACRONYM table. We also utilize the UMLS lexicon to find the synonyms of all medical query terms by using the LEXSYNONYM table. The query is then enhanced by incorporating all of the full representations of the extracted acronyms and abbreviations, and also by including the extracted synonyms. The resulting lists are presented in Table 3.  In addition, we employ the MetaMap tool which maps tokens to the UMLS Metathesaurus. It locates all UMLS concepts associated with terms in biomedical texts using the knowledge intensive method that is based on symbolic, NLP and computational linguistic techniques as detailed in [31]. The results of this step are the lists of medical terms Mt 1 and Mt 2 that are described below: • For q 1  In this context and unlike conventional approaches that use the bag-of-words model to index medical documents, we construct an inverted index that stores medical terms and their semantically-relevant terms that are obtained from the exploited medical semantic resources. A language resource pre-processing step consists in first applying the Jsoup 4 4 https://jsoup.org/ parser for cleaning and extracting textual content from the medical documents as they are provided as raw HTML web pages. We then resort to downcasing and removing stopwords as described in the previous section. Next, the Porter stemmer is utilized to stem each term in the remaining text. All stemmed terms are added to the inverted index. If a term represents a medical acronym or abbreviation, then we add all full representations of the term to the index document. Similarly, all compounds, i.e. bi-and trigram terms and their acronyms, are also added. Accordingly, the automatic construction of the inverted index documents is summarized in Algorithm 2.

C. SIMILARITY COMPUTATION & HEURISTIC BASED WEIGHTING
In order to determine the similarity between the expanded queries and the indexed medical documents, we use a stateof-the-art vector space model [38] to demonstrate the benefits of our semantic-based processing techniques. It employs the tf-idf weighting scheme to assign a weight for each term t in a document d. In our approach, we use the Normalized-tf t,d where term occurrences are usually normalized to prevent a bias towards longer documents (which may have a higher term count regardless of the actual importance of that term in the document) to give a measure of the importance of term t within a particular document d: where tf t,d is the number of occurrences of term t in d, and |d| is the length of document d.
We furthermore heuristically argue that medical terms, their synonyms, abbreviations and acronyms are more informative, essential for retrieval and have a higher degree of contribution to the meaning of the query. Therefore, we assign higher weights for these terms against other terms. Accordingly, expansion terms characterizing the full representations of medical acronyms and abbreviations get lower weights and supportive terms get the lowest weights among all query terms. We use the following heuristical formula for calculating the occurrences of query terms tf t,q to give a higher weight for medical terms, acronyms, abbreviations and medical synonyms, against other supportive terms and other semantically-related concepts added to the query: where |q| is the length of the original query and |rq| is the length of the reformulated query. In Formula (4), we give medical synonyms in the expanded query the same weight as their semantically-related terms in the original query. For the example query q 2 , the term 'localized prostate cancer' and its synonym 'malignant neoplasm of prostate' have the same weight. But, we reduce the weight of all other terms that are semantically related to the original query terms without being synonymous. As far as the example query q 1 is concerned, the term 'mrsa' is given a higher weight than its full representation 'methicillin resistant staphylococcus aureus'. Algorithm 3 is used for assigning weights to query terms based on their category. To illustrate this step, we apply the algorithm for queries q 1 and q 2 and its results are compiled in Table 4. After determining the tf − idf results for query and document terms, the cosine similarity model is used to find the semantic similarity between document − → d and reformulated query − → rq according to: The results of the scoring function are returned as a list of relevant medical documents that are ordered in a descending manner starting from the most relevant document (i.e. the first result with the highest number of matching terms).

IV. EXPERIMENTAL SETUP AND EVALUATION A. EXPERIMENTAL SETUP
In order to carry out experiments and evaluate the quality of our proposed head concepts selection approach, we used the CLEF e-Health 2014 medical dataset that comprises verbose queries associated with their relevance judgments. In the same manner as proposed in [1], we decided to use this dataset rather than the TREC queries. The reason is because CLEF queries are less artificial than TREC queries and are also more informative than queries obtained from a web search query log, where users often provide short queries with a small number of keywords to express their information needs. The components of the dataset are: • Medical documents that are acquired automatically from various medical web sites, including pages certified by the Health On the Net 5 and other well-known medical databases [30]. The dataset comprises around one million semi-structured medical documents in HTML format that are distributed over 8.zip files; where each file contains multiple.dat files with different medical topics. Each file contains multiple documents with the following format as depicted in Figure 2  • #UID: unique identifier for each document • #DATE: date the document was obtained • #URL: URL for the source of the document • #CONTENT: the raw HTML content of web pages • Verbose medical queries divided into one set of five training queries and one set of fifty test queries created by experts (i.e. registered nurses and clinical documentation researchers) involved in the CLEF e-Health consortium. Queries are created based on the main disorders diagnosed in a set of selected patients' discharge summaries. As depicted in Figure 3, queries have a standard format that includes the following elements: • id: a unique identifier for each query • discharge_summary: the resource queries have been built from • title: a short version of the user query • desc: the verbose form of the query in the title field • profile: a brief description about the patient who submitted the query • narr: expected content in relevant documents • Relevance judgments collected from professional assessors using Relevation 6 : a system designed to record relevance judgments for information retrieval evaluation [39]. It provides a web interface through which judges can upload their documents, queries and relevance assessments. Relevance grades are four-valued in the interval 0-3. The value 0 means that a document is irrelevant to a given query, while 1 refers to a document that is on topic of a given query but deemed unreliable. The values 2 and 3 refer to documents that are relevant to the given query where value 3 is assigned to the highest relevant documents. These relevance grades are mapped into a binary scale, with grades 0 and 1 corresponding to the binary grade 0 (irrelevant) and grades 2 and 3 corresponding to the binary grade 1 (relevant).
In order to experimentally validate our proposal and evaluate the quality of the produced results, we have performed several runs as described below: 1. Incremental runs starting from a baseline run in which we solely utilized a standard inverted index to a final run that incorporates the techniques covered in this article including the use of extrinsic semantic resources and heuristical term re-weighting. The idea of using the baseline run is to set an initial measure of process functionality before carrying out any modifications. The goal is to demonstrate the quantitative improvements brought about by enriching the search framework with the processing modules described in Section III. 2. Several comparative-based runs allowing us to compare the results produced by our system with those of three state-of-the-art systems using the same dataset.
As far as the incremental runs are concerned, we executed five different runs. First, we started with the baseline RUN 1 where we used only the primitive inverted index and basic NLP techniques for both query and document processing. Second, in RUN 2, we re-indexed the document collection by incorporating compound medical terms, their acronyms and abbreviations using both the UMLS lexicon and UMLS Metathesaurus. For query processing, we identifed both acronyms and abbreviations and included unigram tokens only. The full representations of both acronyms and abbreviations were then added to the expanded query. Next, we performed three additional runs developed based on RUN 2. In RUN 3, we incorporated compound terms in addition to their acronyms and abbreviations. In RUN 4, we used the UMLS Metathesaurus via the MetaMap tool for categorical classification (cf. Section III.A). In this context, we extracted medical terms and other supportive query terms, and assigned weights based on their categorization as discussed in Section III.C. Finally, in RUN 5, we used the UMLS lexicon to expand query terms by adding their synonyms that are recognized by the exploited semantic resources. Synonyms in this context were assigned higher weights against other semantically-relevant terms based on Formula (4). In Table 5 we provide a brief summary of each of these runs, in addition to the parameters used and their role during each run. Similar to the evaluation of equivalent MIR systems, we considered using the Precision@10 (P@10) evaluation metric to assess the performance of our medical search framework. This metric is among the most commonly used metrics among web-scale information retrieval systems. The  P@10 corresponds to the number of relevant results among the first page results (top 10 documents) that are retrieved by the system. Formally, the precision metric is defined as follows:

B. EXPERIMENTAL RESULTS
As depicted in Figure 4, for RUN 2, among the 55 test and training queries, one query (qtest2014.27) showed low quality results compared to the baseline run, 3 queries (qtest2014.3, qtest2014.433 and qtest2014.47) yielded better results and the rest 51 queries produced equal results to those in the baseline run. The main reason behind the low quality results of qtest2014.27 is that this query contains an acronym that has multiple full representation forms which lead to retrieving documents that are irrelevant to the query context. In Figure 5, we show the comparison between the results obtained from RUN 3 and those obtained from the baseline run. Here, we find 2 queries (qtest2014.24 and qtest2014.27) obtaining lower precision results than those in the baseline VOLUME 8, 2020  run, 4 queries (qtest2014.3, qtest2014.15, qtest2014.433 and qtest2014.47) yielding higher precision results and the other 49 queries with equal precision. The main reason for the low precision of the qtest2014.24 and qtest2014.27 queries lies in the fact that these queries contain bi-and trigrams that can be related to many contexts other than the query context only. They therefore returned a significant number of irrelevant documents, thereby decreasing the precision of the system.
The most significant improvement was achieved in RUN 4. As we can see in Figure 6, 11 queries among the 55 queries produced more precise results compared to their corresponding queries in the baseline run. This is mainly due to the utilization of the proposed heuristical re-weighting technique discussed in section III.C. In this context, query terms were classified and assigned different weights based on the exploited medical resources (i.e. the UMLS Metathesaurus via the MetaMap tool and the MRDEF table). Based on this run, we find that it is important to distinguish between query terms and assign higher weights to those that belong to the Acronym, Abbreviation, or Medical term categories. The results demonstrate that identifying terms that belong to these categories and enriching them with additional semantically-relevant medical terms produce more precise retrieval results. This leads to reducing the semantic gap between user queries and their corresponding medical documents in the dataset.
In the last run (RUN 5), we expanded the queries through adding medical synonyms that are related to the medical terms. Figure 7 shows a comparison between the P@10 results of RUN 5 and their counterparts of the baseline run. As we can see in this figure, further improvements  in terms of precision were achieved, which confirms the findings in [3].
In Figure 8, we provide a comprehensive comparison between all system runs with the baseline run. These results represent the overall system effectiveness after testing all training and test queries provided in the CLEF e-health 2014 dataset.
Next, we compare the results produced by our system to those produced by three CLEF participant teams who used the CLEF e-Health 2014 dataset for evaluating their proposed systems. These systems are: • GRIUM [19] in their best run (EN-Run.5).
• SNUMEDINFO [20] in their best run (EN-Run.2). • KISTI [40] in their best run (EN-Run.2). As shown in Table 6, the precision of the results produced by our proposed system is higher than those produced by the three compared systems (GRIUM, SNUMEDINFO and KISTI). The main reason for this improvement is due to the exploitation of n-grams, medical acronyms and abbreviations (using the employed medical semantic resources) in the indexing process. The three systems discussed in this article use basic indexing and retrieval algorithms provided in the Indri and Lucene frameworks. The authors of GRIUM proposed a retrieval model using a bag-of-concepts model rather than the traditional bag-of-words. They used the MetaMap tool for extracting medical concepts that exist in the user query to be considered in their query-document matching process. The main drawback of GRIUM lies in the fact that it ignores all query terms that are not identified as medical concepts by the MetaMap, which led to ignoring some important medical concepts such as medical acronyms and abbreviations. This explains their marginal improvement in terms of precision (i.e. 0.756) over the baseline run (i.e. 0.718). In SNUMEDINFO, the authors used a simple inverted index where compound terms were not included with the UMLS Metathesaurus for query expansion. The main drawbacks of SNUMEDINFO are: i) ignoring compound terms that may occur in both the document collection and user query, ii) ignoring the semantic nature of the extracted concepts (i.e. synonymy, meronym, etc.) from the UMLS Metathesaurus, and iii) giving all extracted concepts the same weight as in the original query. In addition, the authors did not tackle problems associated with acronyms and abbreviations in the documents and queries. Finally, the KISTI system demonstrated a slight improvement in terms of precision compared to the baseline technique. This is explained by the use of the related discharge summary information provided with each query in the dataset as their expansion resource. However, the discharge summary reports cannot be considered as a trusted medical semantic resource that can be used for expansion of medical queries because such reports were provided as example results only. It is important to point out that although our system was able to outperform the three systems in terms of precision, it still suffers from low computational efficiency running on an average configuration (i.e. a PC with core i7 CPU (2.5GHz) and (8 GB) of RAM) as it processes several large-scale semantic resources. We plan to address this issue in the upcoming prototype version through incorporating a classification module wherein medical documents as well as their corresponding queries will be classified under their relevant medical topics. In this context, instead of matching each query with every document in the dataset, we aim to find matches between queries and medical documents that fall under the same medical topic/s.

V. CONCLUSIONS AND FUTURE WORK
In this paper, we discussed the crucial role of medical semantic resources in addressing a key challenge for medical information retrieval systems; that is highlighting key medical concepts in queries and expanding them in order to decipher the query intent. Our aim in this context was to improve the query processing, document indexing and query-document matching step. We attempted to extend conventional methods that proposed to process queries using individual medical resources by employing a combination of extrinsic semantic resources, i.e. the MetaMap tool, MRDEF relational table and UMLS SPECIALIST lexicon. Additionally, we constructed semantically-enriched inverted files that captured key concepts, their acronyms, abbreviations as well as other semantically-relevant medical terms. As such, instead of only relying on representative key terms, our method constructs indexes that capture additional semantic dimensions which are utilized for matching, ranking and retrieval purposes. We also proposed to categorize query terms and assign higher weights to medical terms, their acronyms and abbreviations against other supportive terms. To validate our proposal, we evaluated the effectiveness of the proposed methods through developing a full-fledged prototype system comprising both query processing and document indexing techniques. We carried out several incremental runs on the one hand and comparative runs on the other hand against three state-of-the-art medical retrieval systems in order to quantify the impact of improvement achieved by the proposed system. Future developments include the incorporation of a query and document classification module in order to alleviate the current computational load. In addition, we noticed an important factor requiring further exploration: that is the consideration of semantic relations between the identified key medical concepts in the user queries. We believe that incorporating these relations will lead to better deciphering and understanding of the query intent and accordingly to the retrieval of more relevant results. This will definitely require updating/changing the current structure of the constructed inverted indexes into semantic networks that not only consist of head concepts, but also semantic and taxonomic relations that link them. In this context, we will update our query-document matching process to become a semantic-network based matching algorithm, wherein the closer the similarity between the networks, the higher the rank of the index document in the retrieval results.
ISRAA NOOR received the B.S. degree (Hons.) in computer information systems from An-Najah National University, Nablus, Palestine, in 2012, and the M.Sc. degree in computer science from Arab American University, Jenin, Palestine, in 2017. She is currently the Head of the Programming and Electronic Design Unit, An-Najah National University Hospital, Nablus, Palestine.
KHALED S. RABAYAH is currently an Associate Professor in computer science and information systems. He is also the Founder and the Manger of an Information System research Centre, Arab American University, Jenin, Palestine. He is currently an Active Researcher in three areas related to information systems and computing; information systems, wireless networks and the Internet of Things (IoT); and data analysis and mining. His research interest in Information systems includes knowledge management, e-commerce, e-learning, technology adoption modeling and diffusion, especially in the context of developing countries, and cross-cultural issues in the use of IT. He authored or coauthored more than 15 articles in this domain. His research interest in wireless networks and the Internet of Things, started in 2014. He is also involved in a project which focuses on enhancing the quality of services (QoS) of the Internet of Things (IoT), based on light weight security protocols. Up till this date, he authored or coauthored two articles in this domain. His research interest in data mining and data analysis started back in 2011. His research experience in data mining and analysis focuses on the use of the statistical software package of IBM SPSS and AMOS. He professionally uses these packages in developing predictive models for business processes and discovering hidden patterns in unstructured data and big data.
MOHAMMED BELKHATIR received the M.Phil. and Ph.D. degrees in computer science from the University of Grenoble, France with research grants supported by the French Ministry of Research. He is currently an Associate Professor with the University of Lyon, France.
SAADAT M. ALHASHMI received the Ph.D. degree from Sheffield Hallam University, Sheffield, U.K. Over the years, he has supervised a number of Ph.D. students and published extensively in various high impact journals and conferences. He joined the University of Sharjah, Sharjah, United Arab Emirates, in 2015 as an Associate Professor of MIS. His current research interests are business analytics, big data, and the impact of technology on businesses. VOLUME 8, 2020