Semantic Deep Learning: Prior Knowledge and a Type of Four-Term Embedding Analogy to Acquire Treatments for Well-Known Diseases

Background How to treat a disease remains to be the most common type of clinical question. Obtaining evidence-based answers from biomedical literature is difficult. Analogical reasoning with embeddings from deep learning (embedding analogies) may extract such biomedical facts, although the state-of-the-art focuses on pair-based proportional (pairwise) analogies such as man:woman::king:queen (“queen = −man +king +woman”). Objective This study aimed to systematically extract disease treatment statements with a Semantic Deep Learning (SemDeep) approach underpinned by prior knowledge and another type of 4-term analogy (other than pairwise). Methods As preliminaries, we investigated Continuous Bag-of-Words (CBOW) embedding analogies in a common-English corpus with five lines of text and observed a type of 4-term analogy (not pairwise) applying the 3CosAdd formula and relating the semantic fields person and death: “dagger = −Romeo +die +died” (search query: −Romeo +die +died). Our SemDeep approach worked with pre-existing items of knowledge (what is known) to make inferences sanctioned by a 4-term analogy (search query −x +z1 +z2) from CBOW and Skip-gram embeddings created with a PubMed systematic reviews subset (PMSB dataset). Stage1: Knowledge acquisition. Obtaining a set of terms, candidate y, from embeddings using vector arithmetic. Some n-gram pairs from the cosine and validated with evidence (prior knowledge) are the input for the 3cosAdd, seeking a type of 4-term analogy relating the semantic fields disease and treatment. Stage 2: Knowledge organization. Identification of candidates sanctioned by the analogy belonging to the semantic field treatment and mapping these candidates to unified medical language system Metathesaurus concepts with MetaMap. A concept pair is a brief disease treatment statement (biomedical fact). Stage 3: Knowledge validation. An evidence-based evaluation followed by human validation of biomedical facts potentially useful for clinicians. Results We obtained 5352 n-gram pairs from 446 search queries by applying the 3CosAdd. The microaveraging performance of MetaMap for candidate y belonging to the semantic field treatment was F-measure=80.00% (precision=77.00%, recall=83.25%). We developed an empirical heuristic with some predictive power for clinical winners, that is, search queries bringing candidate y with evidence of a therapeutic intent for target disease x. The search queries -asthma +inhaled_corticosteroids +inhaled_corticosteroid and -epilepsy +valproate +antiepileptic_drug were clinical winners, finding eight evidence-based beneficial treatments. Conclusions Extracting treatments with therapeutic intent by analogical reasoning from embeddings (423K n-grams from the PMSB dataset) is an ambitious goal. Our SemDeep approach is knowledge-based, underpinned by embedding analogies that exploit prior knowledge. Biomedical facts from embedding analogies (4-term type, not pairwise) are potentially useful for clinicians. The heuristic offers a practical way to discover beneficial treatments for well-known diseases. Learning from deep learning models does not require a massive amount of data. Embedding analogies are not limited to pairwise analogies; hence, analogical reasoning with embeddings is underexploited.


Analogies for Shakespeare's Romeo in a small common-English corpus 1.1 The baseline corpus and pre-processing
We take as a baseline corpus the one introduced by A. Thomo in a tutorial for a topic model [1].The pre-processing to be performed over the corpus (e.g.removing stop words) can affect the candidate words yielding the highest value for both the cosine and the 3CosAdd formula [2,3].
Textbox 1 contains the corpus as input for the word2vec software [4], and it can be downloaded as a text file from [5].The corpus from [1] consists of 5 documents, each document is a line of text.As can be observed in Textbox 1: • The text is not lower-case and stop words (e.g.subject pronouns like you) are not removed.
• Most of the punctuation marks have not been removed.A white space has been added before the punctuation marks as well as the beginning and end of each sentence (line of text).• Quotation marks (e.g."") have been removed, although the contractions (e.g.'s) have not.
Textbox 1 -The common-English small corpus after the pre-processing

Creating "good" embeddings: answering Q1
We interpret as a "good" embeddings those performing well in semantic similarity and relatedness tasks.A "good" vector semantic model should be able to find a candidate word y that is a "semantic similar" to the target word x = Romeo.We use cosine (1) as the similarity measure.cos (x,y) = ( x .y ) / ( ||x|| ||y|| ) (1) [2] Textbox 2 contains a command line, invoking the executable code for the file word2vec, with the values for the variables (hyperparameters) of the vector semantics model.As can be seen from Textbox 2, four pre-processing hyperparameters need to be considered: • Vector dimension is 5, i.e. "-size 5".

Textbox 2 -The command line to create the CBOW model
The two association metric hyperparameters are left at the default values: • Negative sampling ("-negative") is not used.
• Learning rate ("-alpha") is left at the default value of 0.025.
The word2vec model is CBOW (Continuous Bag-of-Words) [3] ("-cbow 1").The vectors created are saved in the binary file "Romeo_emb.bin".We use the target word x = Romeo to obtain the candidate word y with the highest cosine value: Word Cosine you 0.644935 The word you is a personal pronoun, and the word Romeo is a proper noun.The words you and Romeo can be interpreted as near-synonyms as they can be used "interchangeably in some contexts" [6].The word you can be a subject pronoun as well as an object pronoun.Examples: "You (subject) are Romeo.I saw you (object) yesterday.
The person I love is you (object), Romeo".Hence, the model created with CBOW is interpreted as "good" embeddings, i.e. it performs well i n the semantic similarity and relatedness tasks.

Romeo and Juliet
Juliet : Oh happy dagger !Romeo died by dagger .
Live free or die , that ' s the New-Hampshire ' s motto Did you know New-Hampshire is in New-England ?

Finding analogies using Romeo as a target: answering Q2
We utilise the 3CosAdd formula as in [2], although interpreting the formula as "find the word y, which is similar to the word z1 and the word z2, while different from the word x", where x provides the semantic context.The 3CosAdd formula is rewritten below (2) as: arg max (cos(y,z1) -cos(y,x) + cos(y,z2)) We apply the 3CosAdd formula, where Romeo = x provides the semantic context.The first 3 documents (lines of text) in the corpus seem taken from Romeo and Juliet, a famous theatre play by Shakespeare.
The words die = z1 and died = z2 appear in the corpus and are representative of inflectional morphology infinitive:past.Hence, the query we pose to the CBOW model is "find the word y, which is similar to die and died, while different from Romeo".The candidate word y with the highest value for the 3CosAdd formula: Word 3CosAdd dagger 0.858598 It is easy to envision the creation of a semantic field death containing the words {die; died; dagger}, i.e. a set of words that "belong together under the same conceptual heading" [7].The noun "death" and the verb "die" are in the same lexical field, but are different parts of speech.The word pairs die:dagger or died:dagger are closely related (dagger is an instrument that causes death).
In conclusion, a vector semantics model created with CBOW can find analogies in a very small corpus.

Prior knowledge
The stage 1 utilises prior knowledge (open-access reusable datasets from [8]), consisting of n-gram pairs obtained by applying the cosine to embeddings, then mapped to the Unified Medical Language System (UMLS) Metathesaurus [9] concept pairs, and finally validated with evidence from biomedical literature using BMJ Best Practice [10] as the main information source.
BMJ Best Practice is outside PubMed/MEDLINE [11], and is acknowledged for its editorial quality and for evidencebased methodology [12].In the UK, BMJ Best Practice is provided (free access) to all NHS healthcare professionals in England, Scotland, and Wales [13].BMJ Best Practice provides advice on symptom evaluation, tests to order and treatment approach structured around the patient consultation.The advice is organised into condition and assessment topics and is synthesized manually from clinical practice guidelines, systematic reviews and clinical trials.It uses expert contributors and editors, and requires a subscription.Table 2 shows the 36 unique UMLS CUI pairs (disease X,treatment Z) taken from additional file 3 in [8].The treatment CUIs Z are mapped to n-grams z belonging to the semantic field treatment (Tx for short).The n-gram z reused by the current study (i.e.z1 and z2) are from additional file 1 in [8].Every reused n-gram z is mapped to UMLS concept Z with the UMLS Semantic Type "T061|Therapeutic or Preventive Procedure" or "T121|Pharmacologic Substance".

Methods: the three sequential subtasks under NER
The stage 2 accomplishes a task known as Named Entity Recognition (NER) [15], comprising three sequential subtasks (detailed below), and involving three domain experts: two biomedical terminologists (rater A and B) and a medical consultant with many years of experience doing clinical coding (regarded as an empirical gold standard or rater GS for short).
Firstly, disambiguation of n-grams y difficult to interpret for being truncated strings of characters or containing short forms (e.g.abbreviations or acronyms).Using the disease x as context, more information for candidate treatment y is obtained from manual searches: string of characters exact-match searches in the PMSB dataset; and/or Web searches the sense inventory Allie [16] looking for short-long form pairs.For example, for the unigram "HES" (i.e. a short form), a Web search with Allie brings as the top-1 result the long form "hydroxyethyl starch".
Secondly, manual binary classification of candidate n-gram y as to whether or not it belongs to the semantic field Tx (i.e.y Tx ).Section 2.3.1 (page 9) reproduces the three textual definitions from [17] encompassing Tx along with examples.We report the interrater agreement between raters following [18]: calculating Cohen's Kappa measure [19] for each pair of raters, and Krippendorff's alpha [20] (a generalised version) for more than two raters.
Thirdly, entity normalisation (a.k.a.grounding) [21] with MetaMap [22] where three domain experts follow the NER guidelines for MetaMap's output from [23] and together judge the automatic mapping of n-grams y Tx to UMLS Metathesaurus concepts Y Tx .For each n-gram, MetaMap provides: a) a single CUI; b) multiple CUIs (either a single list of CUIs or multiple lists of CUIs); and c) no CUI.MetaMap produces two types of errors: missing (False Negative or FN) and spurious (False Positive or FP).MetaMap performance is assessed with the conventional evaluation measures [24]: Precision = TP/(TP + FP), Recall = TP/(TP + FN), and F measure = (2 x Precision x Recall)/(Recall + Precision).As in [25], we calculate a weaker precision and recall where an exact match and a partial match are equally counted as a TP.The calculations ignore the number of CUIs provided by MetaMap for an n-gram.Precision, recall, and F measure are calculated per target disease x.We also report macro-averaging (by averaging each evaluation measure) [24] and micro-averaging (by making a single contingency table for all data) [24].

Disambiguation of n-grams
Column C in Multimedia Appendix 2 (worksheet Stage2) summarises the manual searches performed for the 467 ngram pairs with a candidate y difficult to interpret.Multimedia Appendix 2 (worksheet "Disambiguation n-grams") has the expanded strings and further information for truncated strings (71 candidates n-gram y).
Multimedia Appendix 2 (worksheet "Disambiguation SF to LF") contains the Long Form (LF) for each short form (SF) of 253 abbreviations or acronyms appearing within the candidates n-gram y.Allie [16] performance to obtain a SF for a LF is calculated taking as TP a top-1 result that is a correct long form.The F measure is 87.31% with precision =84.85% and recall 89.91%.

Binary classification (Tx or non Tx)
Among the 1935 unique (x,y) n-gram pairs, there are 954 n-gram pairs (x, y Tx ) with candidate y belonging to Tx.Table 3 shows the number of unique candidates y and y Tx per model and disease target x.Table 4 contains the results of the binary classification (Tx as 1 and non Tx as 0) of candidates y per disease target x and the interrater agreement with Cohen's Kappa [19] and Krippendorff's alpha [20].The three domain experts perform the binary classification independently for six of the ten diseases, i.e. for 627 candidate y.For the remaining four diseases, i.e. for 1308 candidate y, the three raters work collaboratively in the manual binary classification of n-grams.Considering all candidates y for six of the ten diseases, i.e. total of 627 candidates y, Krippendorff's alpha [20] is 0.86 for the three raters.The Cohen's Kappa [19] for each pair of raters: IAA(GS,A) = 0.86; IAA(GS,B) = 0.91; IAA(A,B) = 0.83.The three Cohen's Kappa values are interpreted as "almost perfect agreement" [26].

Normalisation with MetaMap
The 954 n-gram pairs (x, y Tx ) with candidate y belonging to Tx, were mapped to 569 unique (X,Y Tx ) UMLS CUI pairs.Table 5 shows the MetaMap performance per disease target x in mapping candidates y Tx to UMLS Metathesaurus concepts Y Tx .The overall measure of performance (macro-averaging) [24] of MetaMap by averaging each evaluation measure from Table 5 over all ten diseases: F measure = 78.50% with precision = 77.69%and recall = 82.46%.The candidates y for the target disease x = heart_failure are particularly challenging, with the lowest interrater agreement for the binary classification (Table 4) and the lowest F measure for MetaMap (Table 5).
Considering all candidate y Tx for the ten diseases, the micro-averaging performance (making a single contingency table for all data) [24] of MetaMap is: F measure=80.00% with precision=77.00%and recall =83.25%.If all candidate y Tx that are truncated strings or contained short forms are not considered, the MetaMap micro-averaging performance increases by 10% to F measure=90.88%, with precision=90.73%and recall=91.03%.The 954 n-gram pairs (x, y Tx ) with candidate y belonging to Tx, were mapped to 569 unique (X,Y Tx ) UMLS CUI pairs.This study considers the UMLS Semantic Types (133 broad categories) [27] to classify UMLS Metathesaurus concepts Y Tx .The rationale for an investigation of the UMLS Semantic Types for the UMLS Metathesaurus concepts Y Tx is that the set of the Semantic Types assigned to them may allow a machine-processable definition for Tx instead of the three textual human-readable definitions from [17].
We investigate a further categorisation of concepts Y Tx based on the UMLS Semantic Types assigned.T116,T123,T122,T118,T103,T120,T104,T200, T111,T196,T126,T131,T125,T129,T130,T197, T119,T124,T114,T109,T115,T192,T110,T127 Table 6 shows how the six main categories produced consisting of one Semantic Type, i.e. "T058|Health Care Activity" and "T070|Natural Phenomenon or Process", or more than one Semantic Type, e.g."T061 or T121" or "Device".The subcategory "Substance (CHEM)" takes most of the UMLS Semantic Types from the UMLS Semantic Group "Chemicals & Drugs" [27].From Table 7 we can observe that the two main categories with the higher number of UMLS CUIs are: "Substance" with 264 candidate concepts Y Tx and "T061 or T121" with 255 candidate concepts Y Tx .However, if we take into account Table 8 and the n-grams y Tx mapped to UMLS CUIs Y Tx , we can observe that the higher number of candidate n-grams y Tx is for the main category "T061 or T121" with 530 of the total 954.
From our investigation of the UMLS Semantic Types assigned to the candidate concepts Y Tx , mapped to the candidate n-grams y Tx , we conclude that the higher number of n-grams y Tx with candidate y belonging to Tx, is for the main category "T061 or T121", i.e. the main category that has the two UMLS Semantic Types considered for the n-grams z used as prior knowledge.

Evaluation guidelines given to observers for Stage 3
Task: Express agreement or disagreement for the categories (evidence-based labels) assigned to biomedical concepts related to medical conditions

Biomedical/clinical definitions
We consider the following definitions from [17]: • Treatment -Action taken by a health professional, in the context of contact with a treatment recipient, to alter the functioning of an individual with a disability or at risk of a disability.Treatment is defined broadly to include provision of information, devices, and referrals, specific active experiences, and passive interventions.
• Treatment ingredients -Observable (and, therefore, in principle, measurable) actions, chemicals, devices, or forms of energy that are selected or delivered by the clinician.• Mechanism of action -Process by which the treatment's essential ingredients induce change in the target of treatment.
Under "mechanism of action" of a treatment, we consider the active ingredients as well as the body's cells or substances and their interplay.Let's consider a detailed example: • T helper cells (a.k.a.T4 cells because they have the CD4 surface molecule) stimulate the production of antibodies against pathogens, and thus, the loss or improper function of CD4 T cell responses is detrimental for immunity to a wide range of pathogens [28].
• "The understanding of how sepsis affects CD4 T cells through their numerical loss and recovery, as well as function, is important in the development of future treatments designed to restore CD4 T cells to their presepsis state."[28].• In this study, CD4 T cells is interpreted as a "treatment ingredient" as CD4 T cells play a role in the "mechanism of action" of treatments designed to restore CD4 T cells.

Other definitions: correlation
In this study we use the term "correlation" taking into account the definition for "is correlated with" [29]: "x is correlated with y when x holds a statistical relationship with y" Let's exemplify its use: "The correlation between genotype (from the Greek genos, meaning race, offspring) and phenotype (from the Greek phaino-, from phainein, meaning to show) is defined as an above-chance probability of a distinct mutation being associated with a particular physical feature or abnormality.The genotype and phenotype share a statistical relationship."[30] The above-mentioned correlation will be represented as: Correlation (genotype à phenotype) Note that more than one correlation can be represented, such as Correlation (A à B à C), where the arrow (i.e.

à) indicates the direction of reading
Plausible reasons for disagreement Among others, we acknowledge the following reasons to express disagreement: • The evidence (quoted text) provided is somewhat unclear.
• Note: A simplified version (excluding UMLS CUIs) of the table displayed is included in the main manuscript as Table 1, exemplifying the seven evidence-based categories introduced.page 10

Further details of the evidence-based validation of UMLS Metathesaurus concept pairs
Only 68 of the 569 UMLS CUI pairs are in the MRREL table [31] of UMLS 2019AA, see column J within Multimedia Appendix 2 (worksheet Stage3).Table 9 shows the CUI pairs found in MRREL table per disease target x.Table 11 shows the 408 concept pairs (X,Y Tx ) investigated thoroughly per target disease x according to the seven evidence-based categories proposed in this study.Multimedia Appendix 2 (worksheet Stage3) contains the evidence (quoted text in column H) and their references (column I).The highest number of concept pairs (X,Y Tx ) with evidence provided is for the category "Tx with therapeutic effect" with 190 concept pairs, followed by the evidencebased category "Correlation" with 94 concept pairs.
If we look at Table 11 focusing per target disease x, the number of concept pairs (X,Y Tx ) for the evidence-based category "Tx with therapeutic effect" is always systematically higher than any other evidence-based category except for target disease x=arthritis, where the category "Tx with therapeutic effect" has the same number of concept pairs as the evidence-based category "Tx with uncertain therapeutic effect".There are 19 concept pairs (X,Y Tx ) of the 408 concept pairs (X,Y Tx ) investigated thoroughly with more than one evidence-based category assigned, like the concept pair (X=C0014544|Epilepsy,Y Tx =C0080356|Valproate). Figure 2 in the main manuscript takes as source the data in Table 12.Although the 3CosAdd formula (a four-term type of embedding analogies not pairwise) brings concept pairs (X,Y Tx ) according to the seven evidence-based category utilised in this study (first column of Table 12), it is quite clear from Table 12 that the evidence-based category with a higher number of candidate concepts Y Tx is "Tx with therapeutic effect" with 190.For 117 of these 190 the evidence is taken from BMJ Best Practice that is acknowledged for its editorial quality and for evidencebased methodology [12].
The evidence-based category "general medical term" does not have evidence (zero information sources in Table 12) as it includes broad concepts Y Tx of little value for clinicians.From Table 12, it is noticeable that the evidencebased category with the highest number of evidence-based information sources is "correlation" with 108 of the total 238 evidence-based information sources utilised in this study.
Table 13 and 14 provide further details about the number of evidence-based information sources utilised in this study.The same evidence-based information source can provide evidence (quotes) for more than one concept pair (X,Y Tx ).For example, ten topics from BMJ Best practice (i.e.ten documents each one with a unique URI) provide evidence for 117 UMLS CUI pairs (X,Y Tx ) under "Tx with therapeutic effect".[26,32,33].To resolve the paradox, we calculate "p poS and p neg as two separate indexes of proportionate agreement in the observers' positive and negative decisions" [33].Applying the formulas in [33], the positive agreement p poS is 0.937 and negative agreement p neg is 0.040, and the discrepancy is immediately evident (disparate p poS and p neg values).
Each observer was further asked to provide an alternative evidence-based category when in disagreement (column L and N in Multimedia Appendix 2 worksheet Stage3).We calculate Cohen's Kappa [19] between observers as a binary classification with 1 when (X,Y Tx ) has Y Tx belonging to the evidence-based category "Tx with therapeutic page 13 effect" either agreed or assigned by the observer and 0 otherwise.The Cohen's Kappa is 0.921 interpreted as "almost perfect agreement" [26].

Illustrating a best clinical winner that is also a NER winner
Table 15 shows a best clinical winner with Max(n4) = 8, which is also a NER winner with Max(n3) = 8, for CBOW and the search query -asthma +inhaled_corticosteroids +inhaled_corticosteroid with disease target x = asthma.
The penultimate column (right hand-side) in Table 15 shows the category for concept Y Tx , considering the six main categories introduced based on 38 UMLS Semantic Types (see Table 6 for details).

Source data for Figure 3
Table 16 shows the performance of the heuristic for 304 search queries considering: a) the values of n4 (the last three yellow columns in Multimedia Appendix 2 worksheet Q4); b) different thresholds for n4; and c) precision and recall as metric.

Cross-comparison with reusable datasets for evaluating relatedness
We investigate the UMLS CUI pairs among the reusable datasets for evaluating relatedness from [34,35] and the MRREL table [31] for UMLS 2019AA.We look for CUI pairs (X,Y) or (Y,X) where: • UMLS Metathesaurus concept X has the UMLS Semantic Type "T047|Disease or Syndrome".
The reusable datasets for evaluating relatedness from [34,35] and modified versions of them can be downloaded from [36] as CVS files.Table 17 shows the total number of CUI pairs and the number of CUI pairs with the UMLS Semantic Types -mentioned above.Relatedness is interpreted as an association existing between the CUI pairs.The 29 CUI pairs within the MiniMayoSRS.csvfile (appears in the first column in Table 17) are the concept pairs from the MayoSRS.csv with the high interrater agreement for semantic relatedness.
The seven evidence-based categories introduced in this study provide a more fine-grained semantics for the relatedness (association) among the 408 concept pairs (X,Y Tx ) investigated thoroughly, and enrich with evidence (quotes and references) from the biomedical literature.Table 18 and 19 show the 408 concept pairs (X,Y Tx ) according to the evidence-based category assigned.For two external observers (O1 and O2) and applying the formulas in [33], the positive agreement p poS is 0.937 (quite close to the highest value achievable 1.00) for the 408 concept pairs (X,Y Tx ).
A comparison between the last columns of Table 17 and 18 indicates the number of CUI pairs for the evidencebased category "Tx with therapeutic effect" is 89 and above the highest number 65 of CUI pairs from [34,35].Finally, we investigated how many of the 408 concept pairs (X,Y Tx ) investigated thoroughly, and enrich with evidence (quotes and references) from the biomedical literature, appear in the MRREL table [31] for UMLS 2019AA.Table 19 shows that only 45 of the 190 UMLS CUI pairs (X,Y Tx ) with the evidence-based category "Tx with therapeutic effect" appear within MRREL table of UMLS 2019AA, which incorporates the MED-RT (Medication Reference Terminology) [37].This fact seems to corroborate the difficulty for clinicians to obtain comprehensive information of the clinical worth of alternative choices for treating a given condition [38].
The domain knowledge and clinical expertise of the biomedical/clinical expert based on clinical practice • The domain knowledge and clinical expertise of the biomedical/clinical expert based on evidence-based literature (e.g. a more recent paper)

Table 1 -
UMLS CUIs and UMLS Semantic Types for target n-grams x, mapped to SNOMED CT identifiers

Table 1
[14]ains the target n-gram x belonging to the semantic field disease.Each target n-gram x represents a medical diagnosis mappable to a "type-of" of SNOMED CT[14]concept called disorder.Table1contains the UMLS CUI as well as the SNOMED CT identifier for each n-gram x.The UMLS Semantic Type for the UMLS Metathesaurus concept X mapped to target disease n-gram x is "T047|Disease or Syndrome".

Table 2 -
Prior knowledge: the 36 unique (disease X,treatment Z) UMLS CUI pairs.The last column shows the UMLS Semantic Types for the UMLS Metathesaurus concept Z

Table 3 -
Binary classification (Tx as 1 and non Tx as 0) of candidate y and y Tx per model and disease target x

Table 4 -
Binary classification (Tx as 1 and non Tx as 0) of unique candidates y per disease target x.The last columns show the interrater agreement (IAA) considering columns D, E, and F within Multimedia Appendix 2 (worksheet Stage2).

Table 5 -
MetaMap performance per disease target x when mapping candidates y Tx to UMLS Metathesaurus concepts Y Tx using the values in column H within Multimedia Appendix 2 (worksheet Stage2)

Table 6
[27]s how these 6 main categories (first column) can be further decomposed into 11 subcategories (second column) considering 38 UMLS Semantic Types (last column) of the total 133 UMLS Semantic Types[27].The 11 subcategories from Table6appear in column I of Multimedia Appendix 2 (worksheet Stage2).

Table 6 -
Six main categories introduced (last six columns of Table7 and 8), their further decomposition into subcategories, and their mapping to UMLS Semantic Types

Table 7 -
Categorisation of UMLS CUIs for candidate concepts Y Tx (third column) into the 6 main categories introduced (last six columns) based on 38 UMLS Semantic Types.The second column shows the number of candidate n-grams y Tx mapped to those UMLS CUIs Y Tx (third column).

Table 8 -
Categorisation of candidate n-grams y Tx considering their mapping to UMLS CUIs for candidate concepts Y Tx and the UMLS Semantic Types assigned to those UMLS CUIs Categorisation of candidate n-grams y Tx

Table 10
shows the total number of search queries per disease target x.Only 304 search queries (144 for CBOW and 160 for Skip-gram) of the total 446 (223 for CBOW and 223 for Skip-gram) have all the candidate y Tx mapped to concepts Y Tx with at least one evidence-based category assigned.
Table10displays the number of total candidate concepts Y Tx per disease target x (column previous to last), and the number of those with evidence (last column).By cross-comparing the last two columns of Table10, it can be observed that there are some UMLS concept pairs (X,Y Tx ) for anaemina and hypertension not investigated thoroughly.Indeed, only 37 CUI pairs for x=anaemina and 56 CUI pairs for x=hypertension are investigated thoroughly.

Table 10 -
Number of search queries per disease target x, and number of concept pairs (X, Y Tx ) with evidence

Table 11 -
Number of UMLS Metathesaurus concept pairs (X, Y Tx ) per evidence-based category

Table 12 -
The 408 UMLS CUI pairs investigated thoroughly and their evidence-based information sources per evidence-based category

Table 13 -
Number of evidence-based information sources (Number of URIs) per disease target x

Table 14 -
Number of CUI pairs (X,Y Tx ) with BMJ Best Practice as evidence source per disease target x.Each topic number has a unique URI, e.g. the URI http://bestpractice.bmj.com/topics/en-gb/61corresponds to topic: 61 Number of CUI pairs (X,Y Tx ) with BMJ Best Practice as evidence source

Table 15 -
Illustrating a best clinical winner that is also a NER winner.The search query -asthma +inhaled_corticosteroids +inhaled_corticosteroid for CBOW

Table 16 -
Precision and Recall for the empirical heuristic developed with the 304 search queries from Multimedia

Table 17 -
[36]er of UMLS CUI pairs among the reusable datasets for evaluating relatedness from[36]

Table 18 -
Number of UMLS CUI pairs (X,Y Tx ) according to the seven evidence-based categories introduced.The last column shows the number of CUI pairs (X,Y Tx ) where Y Tx has the UMLS Semantic Type T061 or T121

Table 19 -
[31]er of UMLS CUI pairs (X,Y Tx ) according to the seven evidence-based categories introduced.The last column shows the number of CUI pairs (X,Y Tx ) or (Y Tx ,X) in MRREL table[31]of UMLS 2019AA