Resolution of redundant semantic type assignments for organic chemicals in the UMLS

https://doi.org/10.1016/j.artmed.2011.05.003Get rights and content

Abstract

Objective

The Unified Medical Language System (UMLS) integrates terms from different sources into concepts and supplements these with the assignment of one or more high-level semantic types (STs) from its Semantic Network (SN). For a composite organic chemical concept, multiple assignments of organic chemical STs often serve to enumerate the types of the composite's underlying chemical constituents. This practice sometimes leads to the introduction of a forbidden redundant ST assignment, where both an ST and one of its descendants are assigned to the same concept. A methodology for resolving redundant ST assignments for organic chemicals, better capturing the essence of such composite chemicals than the typical omission of the more general ST, is presented.

Materials and methods

The typical SN resolution of a redundant ST assignment is to retain only the more specific ST assignment and omit the more general one. However, with organic chemicals, that is not always the correct strategy. A methodology for properly dealing with the redundancy based on the relative sizes of the chemical components is presented. It is more accurate to use the ST of the larger chemical component for capturing the category of the concept, even if that means using the more general ST.

Results

A sample of 254 chemical concepts having redundant ST assignments in older UMLS releases was audited to analyze the accuracy of current ST assignments. For 81 (32%) of them, our chemical analysis-based approach yielded a different recommendation from the UMLS (2009AA). New UMLS usage notes capturing rules of this methodology are proffered.

Conclusions

Redundant ST assignments have typically arisen for organic composite chemical concepts. A methodology for dealing with this kind of erroneous configuration, capturing the proper category for a composite chemical, is presented and demonstrated.

Introduction

The Unified Medical Language System (UMLS) [1], [2] has been created through the integration of a collection of about 150 source vocabularies from the biomedical domain. These sources are varied in their scope and purpose, and their integration provides a vehicle for expanding their utility beyond their original applications [3]. The integrated terms and relationships are housed in the Metathesaurus (META) [4], [5], where they have been mapped into concepts and links between them.

The Semantic Network (SN) supports the integration by providing a collection of 133 broad categories, called semantic types (STs), that enable high-level grouping of the META's concepts without regard to their sources [6], [7], [8], [9]. In particular, each concept is assigned one or more of these STs in order to elaborate its overarching semantics. This arrangement has helped enhance applications in areas such as knowledge retrieval [10], inter-terminology mapping [11], [12], and natural language processing [13], [14], among others.

In this paper, we deal with a specific kind of error, called redundant assignment [15], that can occur in the assignment of STs. This error occurs when a given concept has been assigned multiple STs and one of them is more general than another in the context of the SN's tree-structured hierarchy. For example, the assignment of Organic Chemical1 to a concept also assigned Lipid (a child of Organic Chemical) is redundant. A natural way to resolve this error is to remove the assignment of the more general ST since its assignment is implied by the assignment of its descendant, the more specific ST [16].

This resolution of a redundant ST assignment is suitable when the semantics of the multiple ST assignment is that of a conjunction, that is, the concept fits multiple categories being both “a this and a that.” However, when the two STs assigned a concept are from the subtree of the SN rooted at Organic Chemical, the semantics of a multiple ST assignment is different. Such an assignment is typically found for a concept that represents a composite chemical, which is obtained by combining other chemicals. Such composite chemical concepts are common in the UMLS with ST assignments from the subtree rooted at Organic Chemical.

The composite chemical represented by the concept could be a conjugate created by a chemical reaction of multiple chemicals, or it could be a complex formed from a mixture of chemicals. In each case, the composite chemical concept is collectively assigned all the STs assigned to its individual component chemicals. Hence, the logic that a more general ST assignment is redundant when a more specific ST assignment is also given has no basis in the case of a composite chemical concept, which is simply enumerating the types of the components. However, such redundant assignments are forbidden by the NLM in all cases, with no exception for these organic chemical composites.

A rule is needed for handling a redundant ST assignment from the Organic Chemical subtree that best reflects the essence of a composite chemical, similar to the solution when the more specific ST accurately captures the essence of a concept that does not denote a composite chemical. When reviewing the ST assignment choices made by the NLM in resolving redundant ST assignments to organic chemicals in earlier releases, no clear rule is detected. Sometimes, the more general ST was removed, and sometimes the more specific one was removed.

In this paper, we present a systematic methodology for properly resolving a redundant ST assignment in line with principles of chemistry. Our approach is based on a chemical analysis at the molecular level. The relative sizes of the respective constituents are the driving factors. In this way, the ST assignment better captures the nature of the composite chemical. The methodology is applied to a sample of organic chemicals for which a redundant assignment appeared in earlier releases of the UMLS and was resolved in later releases of the UMLS—allowing for simple comparisons.

Our methodology is suggested for use by editors when they are categorizing new composite chemical concepts that are being added to the UMLS. New usage notes are provided to guide the editors in this endeavor. Furthermore, the methodology should be used for revisiting organic chemical concepts that were identified to have redundant ST assignments in earlier releases of the UMLS. In a study of a sample of 254 such concepts, it was found that for 32% of them the current ST assignment does not accurately capture the essence of the concept.

Section snippets

Background

The SN efficiently expresses type information by utilizing inheritance along the IS-A path between types. Inheritance makes the explicit specification of certain information at lower-level descendant STs unnecessary when that same information already appears in higher-level ancestor STs [16].

Let C be a concept assigned both STs B and A such that B is a descendant of A. Then the assignment of A to C is called redundant [15] because it can be inferred from the assignment of B to C and the fact

Chemistry based analysis

The standard means of resolving a redundant assignment [16] may be inappropriate when dealing with composite chemical concepts, and we present a systematic methodology for proper resolution in line with principles of chemistry and chemical analysis. Before getting to our methodology, let us note that the combination of multiple “organic chemical” STs is meant to convey the types of the constituent chemicals in the case of a composite chemical. For example, an assignment of Organic Chemical and

Results

Table 3 shows the number of concepts with redundant ST assignments involving Organic Chemical for three UMLS versions. For example, in 2006AB, there were 1,626 such concepts. No such redundancies were encountered in the versions more recent than 2007AA. Some concepts have been counted more than once in Table 3 (in consecutive versions). The total number of distinct concepts is 1668.

We selected a sample of 254 from these concepts for review. The sample contained all 127 concepts from 2007AA. Of

Discussion

This paper handles an anomaly in the semantics of the assignment of multiple STs to the concepts of the META. The typical semantics is that of a conjunction, meaning a concept shares the semantics of both STs, e.g., in being both a Disease or Syndrome and an Anatomical Abnormality. However, when both STs are coming from the subtree of the SN rooted at Organic Chemical (Fig. 1), the semantics is of a chemical obtained by a reaction or mixture of two chemicals, each of which has been assigned a

Conclusion

The review and analysis of concepts that previously had redundant ST assignments in the UMLS has demonstrated that organic chemical concepts present a unique challenge in categorization. When an organic conjugate or complex chemical is being assigned a semantic type, the type for each of its components is determined. Except for a few rare cases (described by the last three usage notes of Table 6), we recommend that a combination of the STs of the components of an organic chemical that form a

Acknowledgment

This work was partially supported by the NLM under Grant R-01-LM008445-01A2.

References (21)

  • O. Bodenreider

    The Unified Medical Language System (UMLS): integrating biomedical terminology

    Nucleic Acids Res

    (2004)
  • B.L. Humphreys et al.

    The Unified Medical Language System: an informatics research collaboration

    J Am Med Inform Assoc

    (1998)
  • K.E. Campbell et al.

    The Unified Medical Language System: toward a collaborative approach for solving terminologic problems

    J Am Med Inform Assoc

    (1998)
  • P.L. Schuyler et al.

    The UMLS Metathesaurus: representing different views of biomedical concepts

    Bull Med Libr Assoc

    (1993)
  • M.S. Tuttle et al.

    Using META-1, the first version of the UMLS Metathesaurus

  • A.T. McCray

    UMLS Semantic Network

  • McCray AT. Representing biomedical knowledge in the UMLS Semantic Network. High-Performance Medical Libraries: Advances...
  • A.T. McCray

    An upper-level ontology for the biomedical domain

    Comp Funct Genomics

    (2003)
  • A.T. McCray et al.

    The scope and structure of the first version of the UMLS Semantic Network

  • E.S. Chen et al.

    Automated acquisition of disease drug knowledge from biomedical and clinical documents: an initial study

    J Am Med Inform Assoc

    (2008)
There are more references available in the full text version of this article.

Cited by (7)

View all citing articles on Scopus
View full text