Representing Clinical Diagnostic Criteria in Quality Data Model Using Natural Language Processing

Constructing standard and computable clinical diagnostic criteria is an important and challenging research area in clinical informatics community. In this study, we present our framework and methods for representing clinical diagnostic criteria in Quality Data Model (QDM) using natural language processing (NLP) technologies. We used a clinical NLP tool known as cTAKES for preprocessing of textual diagnostic criteria. We created mappings between cTAKES type system and QDM elements in both datatype and data levels. We evaluated the performance of our NLP-based approach by annotating 218 individual diagnostic criteria in the categories of Symptom and Laboratory Test. In conclusion, our NLP-based approach is a feasible solu-tion in developing diagnostic criteria representation and computerization.


Introduction
The term diagnostic criteria designates the specific combination of signs, symptoms, and test results that the clinician uses to attempt to determine the correct diagnosis 1 . It is one kind of the most valuable sources of knowledge for supporting clinical decision-making and improving pa-1 https://en.wikipedia.org/wiki/Medical_diagnosis#Diagnostic _criteria tient care (Yager and Mcintyre, 2014). Diagnostic criteria are a critical evidence resource of clinical decision support system; however, diagnostic criteria are usually described without uniform standard, scattered over different media such as medical textbooks, literatures and clinical practice guidelines, and mostly in free text formats. Several methods based on natural language processing (NLP) technology have been reported and used in structuring free-text-based clinical guidelines, clinical notes and electronic health records (EHRs), as (Rea, etc., 2012) and (Ohno-Machado, etc., 2013). However, there are not sufficient researches on using NLP-based approaches to support the formalization of freetext diagnostic criteria. To achieve computable diagnostic criteria, we consider that a computable model to represent diagnosis criteria and the use of clinical NLP applications to support the modeling are two essential research areas.
Current efforts on development of international recommendation standard models in clinical domains have laid the foundation for modeling and representing computable diagnostic criteria. National Quality Forum (NQF) Quality Data Model (QDM) (Quality Data Model, 2015) as an information model that describes clinical concepts in a standardized format. It allows quality measure developers and many clinical researchers or performers to describe clearly and unambiguously the data required to calculate the performance measure. QDM is designed with the purpose to allowing EHRs (Li, etc., 2012) and other clinical electronic systems to share a common understanding and interpretation of the clinical data. In a previous study, researcher Jiang (2015) evaluated the application feasibility of QDM through a data-driven approach and demonstrated that the use of QDM is feasible in building a standardsbased information model for representing computable diagnostic criteria.
On clinical NLP studies, many NLP tools currently are applied in the clinical unstructured free text processing and also support terminology annotation, such as Health Information Text Extraction tool (HITex) 2 , MetaMap (Aronson and Lang, 2010), OpenNLP 3 and Clinical Text Analysis and Knowledge Extraction System (cTAKES) (Savova, etc., 2010). Some studies compared the performance of the frequently used NLP tools, and the results showed that cTAKES scored best in both performance and usability.
cTAKES is an open source Apache project and it is a NLP system for extraction of information from electronic medical record clinical free-text. cTAKES was built on the Unstructured Information Management Architecture (UIMA) framework which is an open source framework designed by IBM and a series of comprehensive NLP methods (Bruce, 2012). In this study, we use cTAKES as a NLP tool to support the formalization of diagnostic criteria.
The objective of our study is to describe our efforts in developing a semi-automatic approach using NLP to facilitate the representation of clinical diagnostic criteria in QDM.

cTAKES:
The components of cTAKES are specifically trained for the clinical domain, and create rich linguistic and semantic annotations that can be utilized by clinical decision support systems and clinical research 4 . cTAKES discovers clinical named entities and clinical events using a dictionary lookup algorithm and a subset of the Unified Medical Language System (UMLS) 5 , mainly including the following mentions: disease/disorders, sign/symptoms, medications, anatomical sites and procedures.
Besides, cTAKES extract named entity attributes and assigns values for the attributes such as UMLS concept unique identifiers (CUIs) and SNOMED CT codes, polarity, uncertainty, conditional, etc. In this study, we used the cTAKES version 3.2.1.

NQF QDM:
The QDM consists of criteria for data elements, relationships for relating data element criteria to each other, and functions for filtering criteria to the subset of data elements that are of interest 6 . The basic components of the QDM include: category (e.g., Symptom), datatype (e.g., Symptom, Active), attribute (e.g., information about severity, start Datetime, stop Datetime, and ordinality), and value set comprising concept codes from one or more code systems. In this study, we used the QDM version 4.1 (Quality Data Model, 2015). Figure 1 shows a framework we designed for the NLP-supported QDM modeling of diagnostic criteria. The framework comprises three modules. The first module is an NLP annotation module. We use cTAKES as a NLP tool to support structured representation of diagnostic criteria. The second module is a data model transformation between cTAKES type system and QDM elements. The transformation is supported using both manual mapping strategies and machine learning algorithms. The third module is a unified web interface for human review. As the output, all collected data elements, value sets and logic expressions of diagnostic criteria are formalized by using QDM-based standard representation.

NLP annotation
We first used the cTAKES to perform NLP annotation on textual diagnostic criteria. cTAKES is a modular system of pipelined components combining rule-based and machine learning techniques, introduced in (Savova, Masanz, etc., 2010). As an operable interface, UIMA provides the tooling for selecting which descriptors are used together and determining the order of the descriptors, see detail in (cTAKES 3.2 Component Use Guide, 2015). Dictionaries such as UMLS, SNOMED CT and RxNorm are integrated into cTAKES clinical pipeline.

Data Model Transformation
We implemented the model mapping and data transformation on two levels: the datatype-level mapping and the data-level mapping.

(1) Datatype-level Mapping
We created the datatype-level mappings between cTAKES UIMA Common Analysis System (CAS) type system and QDM datatypes, as well as corresponding attributes and features between these two heterogeneous schemas (Figure 2). We established the mapping relations through analyzing the textual definitions of datatypes in both models. Datatype-level mappings are mainly focused on 7 selected QDM datatypes and 8 cTAKES types that frequently appear in diagnostic criteria. . The cTAKES processes text and stores the results in the UIMA-CAS structure, whereas the HQMF as a standard format is used to represent QDM-based eMeasure data. All cTAKES instance data output as CAS XML data and are converted into HQMF XML data using the data-level mapping rules. Figure 3 illustrates the data-level mapping rules between CAS and HQMF elements we created for QDM datatype Laboratory Test, Performed.

Evaluation
For evaluation, we manually annotated a collection of individual criteria with QDM datatypes and attributes. We used the manual annotations as gold standard and evaluated the performance of NLP-based annotations. Two authors (HN, GJ) reviewed the annotations and the consensus was resolved through discussions. Three standard measures were used to describe the performance of the NLP module: precision, recall and F-measure.

Results
To implement experiment and evaluation, we first collected 218 individual criteria in the Symptom and Laboratory Test categories. The individual criteria were extracted manually from the text of 44 diagnostic criteria in 13 different clinical topics (an example of textual diagnostic criteria is shown in Appendix A). All the diagnostic criteria are collected from a number of sources including medical textbooks, journal papers, documents issued by professional organization (such as the World Health Organization -WHO) and Internet. Table 1 shows the number of individual criteria of the 13 clinical topics. We used a cTAKES (V.3.2.1) NLP analysis engine known as the AggregatePlaintxtUMLSProcessor and processed the test criteria. Using the datatype-level mapping rules we created and the cTAKES annotation results of two EventMentions (LabMention and SignSymptomMention), our algorithm automatically allocated a QDM datatype for each individual diagnostic criteria. The allocation results could reflect the automatic mapping classification performance for QDM datatypes (Laboratory Test, Performed and Symptom, Active). For example, Figure 4 and Figure 5 show the text of two individual diagnostic criteria with CTAKES annotations in LabMention and SignSymptomMention.
Example1: Thrombocytopenia (platelets <100,000 cells/mm3) After automatically mapping based on our mapping rules between two systems, diagnostic criteria in free-text are transformed into QDM based HQML XML structure. One of the QDM data examples attached in the Appendix B. Table  2 shows the evaluation results in terms of whether mapping rules correctly allocate a QDM datatype for an individual criterion.
To evaluate the performance of data-level mappings, we tested the mapping results of 78 individual diagnostic criteria which were annotated manually using the QDM datatype Laboratory Test, Performed. The test was mainly focused on the mapping performance of four attribute elements, including code/code system, laboratory test value, measurement and unit, comparison operator. Table 3 shows the evaluation results of elements mapping in the QDM datatype Laboratory Test, Performed.

Discussion
To bridge the semantic gap between cTAKES type system and QDM Model, we performed critical element analysis and created element mappings in both datatype and data levels. As cTAKES UIMA-CAS and QDM both are comprehensive models with independent structures, more semantic analysis need to be studied in order to extend our current mapping rules, e.g., the mapping analysis on QDM temporal representation and cTAKES temporal type. Furthermore, there exist elements that could not be directly mapped between two models under different contexts.
Previous studies investigated the eligibility criteria in clinical trial protocol and developed approaches (known as EliXR) for eligibility criteria extraction and semantic representation, and used hierarchical clustering for dynamic categorization of such criteria (Weng , etc., 2011) (Luo, etc.,2011. In future, we will develop machine learning-based methods leveraging the EliXR approach to enable the analysis for a large amount of clinical diagnostic criteria data.
The study demonstrated overall performance of cTAKES used for generating the QDM-based representation of diagnostic criteria. The evaluation results in Table 2 indicated that criteria in the Laboratory Test category could be automatically classified into the QDM datatype effectively; whereas the performance for classifying criteria in the Symptom category was sub-optimal. The reason is mainly because that cTAKES uses the SignSymptomMention that doesn't distinguish between a sign and a symptom. The evaluation results in Table 3 indicated that the code/code system and value mappings could acquire satisfactory performance whereas the performance for the unit annotation is good in precision but sub-optimal in recall. In addition, the operator recognition was insufficient, for examples, in criteria 'Sé zary cells with a diameter > 14 um representing > 20% of the circulating lymphocytes' ， '%' is annotated as Symbol but '>' is not recognized in cTAKES that cause low precision. Above all, the mapping rules were able to generate validated QDM datatypes and related elements, covering most typical model elements used in diagnostic criteria. NLP-based technologies could provide a semiautomatic way to support the preliminary classification and enable a pattern-based QDM representation.

Conclusion
In this study, we demonstrated that clinical NLP tool (e.g., cTAKES) could support the QDM modeling of free-text diagnostic criteria in a semi-automatic way. We are actively working on developing machine learning algorithms to improve the performance of our NLP-based approaches for representing clinical diagnostic criteria in QDM.