Resolution of redundant semantic type assignments for organic chemicals in the UMLS
Introduction
The Unified Medical Language System (UMLS) [1], [2] has been created through the integration of a collection of about 150 source vocabularies from the biomedical domain. These sources are varied in their scope and purpose, and their integration provides a vehicle for expanding their utility beyond their original applications [3]. The integrated terms and relationships are housed in the Metathesaurus (META) [4], [5], where they have been mapped into concepts and links between them.
The Semantic Network (SN) supports the integration by providing a collection of 133 broad categories, called semantic types (STs), that enable high-level grouping of the META's concepts without regard to their sources [6], [7], [8], [9]. In particular, each concept is assigned one or more of these STs in order to elaborate its overarching semantics. This arrangement has helped enhance applications in areas such as knowledge retrieval [10], inter-terminology mapping [11], [12], and natural language processing [13], [14], among others.
In this paper, we deal with a specific kind of error, called redundant assignment [15], that can occur in the assignment of STs. This error occurs when a given concept has been assigned multiple STs and one of them is more general than another in the context of the SN's tree-structured hierarchy. For example, the assignment of Organic Chemical1 to a concept also assigned Lipid (a child of Organic Chemical) is redundant. A natural way to resolve this error is to remove the assignment of the more general ST since its assignment is implied by the assignment of its descendant, the more specific ST [16].
This resolution of a redundant ST assignment is suitable when the semantics of the multiple ST assignment is that of a conjunction, that is, the concept fits multiple categories being both “a this and a that.” However, when the two STs assigned a concept are from the subtree of the SN rooted at Organic Chemical, the semantics of a multiple ST assignment is different. Such an assignment is typically found for a concept that represents a composite chemical, which is obtained by combining other chemicals. Such composite chemical concepts are common in the UMLS with ST assignments from the subtree rooted at Organic Chemical.
The composite chemical represented by the concept could be a conjugate created by a chemical reaction of multiple chemicals, or it could be a complex formed from a mixture of chemicals. In each case, the composite chemical concept is collectively assigned all the STs assigned to its individual component chemicals. Hence, the logic that a more general ST assignment is redundant when a more specific ST assignment is also given has no basis in the case of a composite chemical concept, which is simply enumerating the types of the components. However, such redundant assignments are forbidden by the NLM in all cases, with no exception for these organic chemical composites.
A rule is needed for handling a redundant ST assignment from the Organic Chemical subtree that best reflects the essence of a composite chemical, similar to the solution when the more specific ST accurately captures the essence of a concept that does not denote a composite chemical. When reviewing the ST assignment choices made by the NLM in resolving redundant ST assignments to organic chemicals in earlier releases, no clear rule is detected. Sometimes, the more general ST was removed, and sometimes the more specific one was removed.
In this paper, we present a systematic methodology for properly resolving a redundant ST assignment in line with principles of chemistry. Our approach is based on a chemical analysis at the molecular level. The relative sizes of the respective constituents are the driving factors. In this way, the ST assignment better captures the nature of the composite chemical. The methodology is applied to a sample of organic chemicals for which a redundant assignment appeared in earlier releases of the UMLS and was resolved in later releases of the UMLS—allowing for simple comparisons.
Our methodology is suggested for use by editors when they are categorizing new composite chemical concepts that are being added to the UMLS. New usage notes are provided to guide the editors in this endeavor. Furthermore, the methodology should be used for revisiting organic chemical concepts that were identified to have redundant ST assignments in earlier releases of the UMLS. In a study of a sample of 254 such concepts, it was found that for 32% of them the current ST assignment does not accurately capture the essence of the concept.
Section snippets
Background
The SN efficiently expresses type information by utilizing inheritance along the IS-A path between types. Inheritance makes the explicit specification of certain information at lower-level descendant STs unnecessary when that same information already appears in higher-level ancestor STs [16].
Let C be a concept assigned both STs B and A such that B is a descendant of A. Then the assignment of A to C is called redundant [15] because it can be inferred from the assignment of B to C and the fact
Chemistry based analysis
The standard means of resolving a redundant assignment [16] may be inappropriate when dealing with composite chemical concepts, and we present a systematic methodology for proper resolution in line with principles of chemistry and chemical analysis. Before getting to our methodology, let us note that the combination of multiple “organic chemical” STs is meant to convey the types of the constituent chemicals in the case of a composite chemical. For example, an assignment of Organic Chemical and
Results
Table 3 shows the number of concepts with redundant ST assignments involving Organic Chemical for three UMLS versions. For example, in 2006AB, there were 1,626 such concepts. No such redundancies were encountered in the versions more recent than 2007AA. Some concepts have been counted more than once in Table 3 (in consecutive versions). The total number of distinct concepts is 1668.
We selected a sample of 254 from these concepts for review. The sample contained all 127 concepts from 2007AA. Of
Discussion
This paper handles an anomaly in the semantics of the assignment of multiple STs to the concepts of the META. The typical semantics is that of a conjunction, meaning a concept shares the semantics of both STs, e.g., in being both a Disease or Syndrome and an Anatomical Abnormality. However, when both STs are coming from the subtree of the SN rooted at Organic Chemical (Fig. 1), the semantics is of a chemical obtained by a reaction or mixture of two chemicals, each of which has been assigned a
Conclusion
The review and analysis of concepts that previously had redundant ST assignments in the UMLS has demonstrated that organic chemical concepts present a unique challenge in categorization. When an organic conjugate or complex chemical is being assigned a semantic type, the type for each of its components is determined. Except for a few rare cases (described by the last three usage notes of Table 6), we recommend that a combination of the STs of the components of an organic chemical that form a
Acknowledgment
This work was partially supported by the NLM under Grant R-01-LM008445-01A2.
References (21)
The Unified Medical Language System (UMLS): integrating biomedical terminology
Nucleic Acids Res
(2004)- et al.
The Unified Medical Language System: an informatics research collaboration
J Am Med Inform Assoc
(1998) - et al.
The Unified Medical Language System: toward a collaborative approach for solving terminologic problems
J Am Med Inform Assoc
(1998) - et al.
The UMLS Metathesaurus: representing different views of biomedical concepts
Bull Med Libr Assoc
(1993) - et al.
Using META-1, the first version of the UMLS Metathesaurus
UMLS Semantic Network
- McCray AT. Representing biomedical knowledge in the UMLS Semantic Network. High-Performance Medical Libraries: Advances...
An upper-level ontology for the biomedical domain
Comp Funct Genomics
(2003)- et al.
The scope and structure of the first version of the UMLS Semantic Network
- et al.
Automated acquisition of disease drug knowledge from biomedical and clinical documents: an initial study
J Am Med Inform Assoc
(2008)
Cited by (7)
Two complementary AI approaches for predicting UMLS semantic group assignment: heuristic reasoning and deep learning
2023, Journal of the American Medical Informatics AssociationMining of EHR for interface terminology concepts for annotating EHRs of COVID patients
2023, BMC Medical Informatics and Decision MakingArcheGEO: Towards Improving Relevance of Gene Expression Omnibus Search Results
2022, Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2022A review of auditing techniques for the unified medical language system
2020, Journal of the American Medical Informatics AssociationOntology Alignment in the Biomedical Domain Using Entity Definitions and Context
2018, BioNLP 2018 - SIGBioMed Workshop on Biomedical Natural Language Processing, Proceedings of the 17th BioNLP Workshop