Identifying Patients With Hypoglycemia Using Natural Language Processing: Systematic Literature Review

Background Accurately identifying patients with hypoglycemia is key to preventing adverse events and mortality. Natural language processing (NLP), a form of artificial intelligence, uses computational algorithms to extract information from text data. NLP is a scalable, efficient, and quick method to extract hypoglycemia-related information when using electronic health record data sources from a large population. Objective The objective of this systematic review was to synthesize the literature on the application of NLP to extract hypoglycemia from electronic health record clinical notes. Methods Literature searches were conducted electronically in PubMed, Web of Science Core Collection, CINAHL (EBSCO), PsycINFO (Ovid), IEEE Xplore, Google Scholar, and ACL Anthology. Keywords included hypoglycemia, low blood glucose, NLP, and machine learning. Inclusion criteria included studies that applied NLP to identify hypoglycemia, reported the outcomes related to hypoglycemia, and were published in English as full papers. Results This review (n=8 studies) revealed heterogeneity of the reported results related to hypoglycemia. Of the 8 included studies, 4 (50%) reported that the prevalence rate of any level of hypoglycemia was 3.4% to 46.2%. The use of NLP to analyze clinical notes improved the capture of undocumented or missed hypoglycemic events using International Classification of Diseases, Ninth Revision (ICD-9), and International Classification of Diseases, Tenth Revision (ICD-10), and laboratory testing. The combination of NLP and ICD-9 or ICD-10 codes significantly increased the identification of hypoglycemic events compared with individual methods; for example, the prevalence rates of hypoglycemia were 12.4% for International Classification of Diseases codes, 25.1% for an NLP algorithm, and 32.2% for combined algorithms. All the reviewed studies applied rule-based NLP algorithms to identify hypoglycemia. Conclusions The findings provided evidence that the application of NLP to analyze clinical notes improved the capture of hypoglycemic events, particularly when combined with the ICD-9 or ICD-10 codes and laboratory testing.


Introduction
Background Approximately 34 million (13%) US adults have diabetes [1]. Worldwide, 387 million persons have diabetes, a number that is expected to rise to 592 million by 2035 [2]. In 2017, direct and indirect costs attributed to diabetes in the United States were estimated to be US $327 billion [3]. Optimal glycemic control (glycated hemoglobin [HbA 1c ] <7%) can be achieved with comprehensive antidiabetic treatment; however, the risk of hypoglycemia increases. In patients with type 2 diabetes (T2D), after experiencing hypoglycemia, the 3-year incidence of cardiovascular events was 35.1%, and mortality 28.3% to 31.9% [4,5].
The incidence of hypoglycemia has been reported to vary widely for patients with diabetes. An earlier systematic review and meta-analysis of 46 studies found that 45% of the patients with T2D had mild or moderate hypoglycemia and 6% had severe hypoglycemia; the prevalence was even higher among those treated with insulin, with 50% having mild or moderate hypoglycemia events and 21% having severe events [6]. A subsequent review study showed that the rates of severe hypoglycemia in T2D were between 0.7 and 12 per 100 person-years in randomized controlled trials and between 0.2 (without treatment with insulin or sulfonylureas) and 2 (with treatment with insulin or sulfonylureas) per 100 person-years [7]. The most recent systematic review and meta-analysis of 72 studies indicated that the incidence rate of hypoglycemia was 14.5 to 42,890 episodes per 1000 person-years in type 1 diabetes (T1D) and 0.072 to 16,360 episodes per 1000 person-years in T2D [8].
The reported rates of hypoglycemia vary largely because of the marked heterogeneity in the way that hypoglycemia is defined, measured, and reported. Accurately identifying patients with hypoglycemia is key to preventing adverse events and mortality. There are several methods to identify hypoglycemia events and severity in large populations, including patient questionnaires and International Classification of Diseases, Ninth Revision (ICD-9), or International Classification of Diseases, Tenth Revision (ICD-10), and electronic health records (EHRs). Studies have found that using questionnaires [9] or International Classification of Diseases (ICD) codes [10] is often insensitive, leads to underestimation of hypoglycemia events, and is nonspecific in detecting hypoglycemia events.
EHRs have been widely adopted by health care systems, resulting in large amounts of data, including unstructured text in clinical notes [11,12]. The amount of unstructured text is vast and continues to grow at a breakneck pace. Clinical notes enable health care providers to not only identify patients at risk of hypoglycemia but also to obtain details on hypoglycemia; for example, symptomatic or asymptomatic hypoglycemia [13]. Once the patients at risk of hypoglycemia are identified, their treatment can be personalized, which helps to prevent future hypoglycemia and the resulting serious adverse effects. Traditional methods such as manual chart review can extract information related to hypoglycemia from EHR clinical notes [14]; however, such methods are time-consuming, labor intensive, and not scalable, which makes them impractical for use in large populations [15].
By contrast, novel data science approaches, including using natural language processing (NLP), have been applied to overcome the aforementioned difficulties [16]. NLP, a form of artificial intelligence, uses computational algorithms to process human language content for a variety of purposes [17]. The application of NLP algorithms is a scalable, efficient, and quick method to extract unstructured data from a large population [18,19]. Applications of NLP in the health domain can be categorized into 2 groups: rule-based methods and machine learning methods [20]. Rule-based NLP techniques are based on a predefined clinical vocabulary, which identifies a set of core concepts for target extraction (eg, hypoglycemia), and may also use pattern matching (such as regular expressions) and filters [21,22]. Rule-based systems are time-consuming to set up, but they are easy to understand and modify and often require fewer amounts of data than machine learning approaches [21,23,24]. Machine learning systems leverage the same feature sets as those used in rule-based systems but do the work to discover the rules needed for a solution; however, this comes at a price: the resulting systems often function as a black box, which is difficult for humans to understand and trust [20]. In addition, machine learning systems typically require very large sample sizes for development [23]. Deep learning approaches (neural networks) are a form of machine learning used in recent years [25,26], which can achieve performances comparable with, or better than, those of domain experts in identifying clinical information [16]. However, deep learning-based models require large amounts of training data to achieve high accuracy, hindering the adoption of deep learning-based models in scenarios with limited amounts of training data [27]. As a result, state-of-the-art deep learning methods of NLP (eg, transformer models and transfer learning) were developed to address these issues, and they have been proven to be extremely effective in the NLP domain [27,28].

Objectives
Currently, little is known about what types of NLP algorithms were applied to identify hypoglycemia and how differences in hypoglycemia incidence identified from unstructured data using NLP compare with hypoglycemia incidence identified from structured data (eg, ICD codes) across studies. It was reported in 1 study that a higher number of hypoglycemia events could be identified in clinical notes by using NLP than by using ICD codes (65% vs 20%, respectively) [29]. Thus, in this systematic review, we aimed to synthesize the literature on the application of NLP to extract hypoglycemia from EHR clinical notes and compare the differences between hypoglycemia incidence identified from unstructured data using NLP and hypoglycemia incidence identified from structured data (eg, ICD codes) across studies.

ACL Anthology
• hypoglycemia OR blood glucose OR blood sugar OR hypoglycemic

Google Scholar
• natural language processing AND hypoglycemia AND electronic health records

IEEE Xplore
• (All Metadata:blood sugar OR All Metadata:blood glucose OR All Metadata:hypoglycemia OR All Metadata:hypoglycemic) AND (All Metadata:natural language processing OR All Metadata:NLP OR All Metadata:"machine learning" OR All Metadata:"artificial intelligence" OR All Metadata:"text mining" OR All Metadata:"text analysis" OR All Metadata:"text analyses" OR All Metadata:"text analytics" OR All Metadata:"text processing") • Filters applied: journals

Inclusion and Exclusion Criteria
The inclusion criteria were as follows: studies that (1) were restricted to participants aged ≥18 years; (2) reported a sample with a diagnosis of diabetes; (3) applied NLP to identify hypoglycemia; (4) reported the number or percentage of participants who had experienced at least one hypoglycemic episode, the incidence of hypoglycemic episodes experienced, or data to allow the calculation of one of these measures; (5) used EHR data; (6) were published as full papers in peer-reviewed journals; (7) were published in English. No restrictions were applied regarding the definition or measurement of hypoglycemia. No restrictions were applied to country or origin of the studies. Studies were excluded if (1) they did not report outcomes related to hypoglycemia, (2) they were pharmacological trials or the intervention focused on treatment or care, (3) the participants were all pregnant or children, and (4) they reported only conference papers or proceedings.

Data Extraction
We first developed and tested a data extraction form, with adaptations made accordingly. The titles, abstracts, and full-text articles were screened by 2 independent reviewers (MCRM, LS, Emily M Pan, or Yi Lan Zhang). Once conflicts were identified, agreement was reached after discussion with the third reviewer (YZ). The results related to the identification of eligible studies were summarized according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines ( Figure 1). The searches yielded 2070 citations, and after removing duplicates, 1705 (82.37%) titles and abstracts were screened for eligibility. After full-text retrieval of 334 potentially relevant papers, 326 (97.6%) were subsequently excluded, leaving 8 (2.4%) papers that applied NLP to identify hypoglycemia and reported the rates of hypoglycemia that were eligible for inclusion in the analyses. The reference sections of the relevant articles were searched manually, but no further relevant articles were found. Studies were summarized based on the following categories: authors and country, sample size and characteristics, medical conditions, antihyperglycemic medication, study design, data source, definition of hypoglycemia, method used to identify hypoglycemia, NLP algorithm (eg, rule-based or machine learning), NLP algorithm validation, and outcomes (Tables 1 and 2). In the case of Google Scholar, the first 100 results based on relevancy ranking is suggested to identify additional articles, and in the case of ACL Anthology, all the citations found were added to the irrelevant set (excluded based on title and abstract) [30]. NLP: natural language processing.    IM i glucagon administration; NLP: mention of hypoglycemia; severe hypoglycemia: ICD-9 and ICD-10 codes for hypoglycemia that is severe by default or ICD-9 and ICD-10 codes for hypoglycemia and hypoglycemia is reason for care on discharge or admission or hypoglycemia index date on same day as emergency department visit or inpatient diagnosis on admission (all related to hypoglycemic coma); plasma glucose level measures <54 mg/dL; IM glucagon administration; NLP: mention of hypoglycemia with either a descriptor of hypoglycemia severity, including severity terms (eg, severe) and attributes (eg, emergency), or emergency department visit or inpatient admission on same day as medical record was written

NLP Algorithms Applied to Identify Hypoglycemia
All included studies applied rule-based algorithms ( Table 3). The study by Misra-Hebert et al [35] described in detail the NLP steps, including splitting clinical notes into sentences and phrases, filtering sentences and phrases to those containing references to a hypoglycemia-related Unified Medical Language System [38] concept, identifying temporal phrases (identifying when the event occurred), and clarifying polarity (assertion or negation) into no, nonsevere, or severe event using both rule-based algorithms. Li et al [34] identified hypoglycemia using a formally defined pattern (regular expression) [39] such as a blood sugar word, followed within 5 words by what could be a low blood sugar value represented by a number ranging from 10 to 69. Uzoigwe et al [36] identified keywords or concepts of interest related to both symptom-based and nonsymptom-based hypoglycemic events. The remaining studies (5/8, 63%) applied the same NLP algorithms to identify [29,[31][32][33]37] (1) terms or concepts (eg, hypoglycemia), including alternative or incorrect spellings and abbreviations; (2) descriptive attributes of the hypoglycemia mention (eg, seriousness, duration, and frequency); (3) sentiment of the mention (eg, denial, affirmation, and discussion); and (4) other contextual information (eg, note section headers and neighboring text).
Manual review of clinical notes was used as the gold standard to validate the NLP algorithms in 63% (5/8) of the studies. Of the 8 studies, 2 (25%) did not report validation of the algorithm, whereas in the 6 (75%) reporting studies, the precision (positive predictive value) for the hypoglycemia algorithm was 0.77% to 93% [29,[31][32][33]35,37]. Of these 6 studies, 5 (83%) reported that the recall (sensitivity) was 0.67 [29,[31][32][33]37]. • Identify terms consistent with hypoglycemia (including alternative or incorrect spellings and abbreviations) • Identify descriptive attributes of the hypoglycemia mention (eg, seriousness, duration, and frequency) • Identify sentiment of the mention (eg, denial and affirmation, including "has," "diagnosed," and "present") • Identify contextual information (eg, note section headers and neighboring text). Sections such as "history of present illness," "assessment," "hospital course," "reason," "review of symptoms," and "chief complaint" generally reflected occurrence of hypoglycemia Rule-based Li et al, 2019 [34] • A formally defined pattern (regular expression), which identified clinical reports mentioning a "blood sugar word" followed within 5 words by what could be a low blood sugar value represented by a number ranging from 10 to 69 Rule-based Misra-Hebert et al, 2020 [35] •

Split clinical notes into sentences and phrases •
Filter sentences and phrases to those containing a hypoglycemia-related Unified Medical Language System concept • Identify temporal phrases (when the event occurred) • Classify polarity (assertion or negation) into no, nonsevere, and severe event Rule-based Uzoigwe et al, 2020 [36] • Identify keywords or concepts of interest: symptom-based and nonsymptom-based hypoglycemic events • Symptom-based terms: neuroglycopenic and adrenergic symptomology associated with hypoglycemia.
• Adrenergic symptomology: elevated or irregular heart rate, sweating, tremor, trembling, tingling, or shaking, and vision impairment • Neuroglycopenic symptomology: cognitive issues, irritable or anxious, mood or behavior change+NOT substance abuse or alcohol, slurred speech+NOT stroke+NOT substance abuse or alcohol • Nonsymptom-based definition: Relevant medical ontology such as "low glucose" • A blood glucose laboratory value ≤70 mg/dL documented

Prevalence or Incidence of Hypoglycemia
The prevalence or the incidence of hypoglycemia largely varied across studies. All studies used a combination of NLP and other approaches (eg, ICD codes) to identify hypoglycemia. Overall, the prevalence rate of any condition of hypoglycemia was 3.4% to 46.2%, as reported by 50% (4/8) of the studies [31,33,34,36], and the incidence rate was 6.28% to 65.7%, as reported by 38% (3/8) of the studies [29,31,32]. The prevalence rate of nonsevere hypoglycemia was 0.1% to 3.4% [29,31,35] and that of severe hypoglycemia was 5.1% to 18.7% [29,31,33,37]. Of the 8 studies, 4 (50%) compared the prevalence or incidence of hypoglycemia identified by NLP and ICD codes. In the study by Nunes et al (2016) [31], the prevalence rates of any hypoglycemia within the study period were 12.4%, 25.1%, and 32.2% for the ICD-9, NLP algorithm, and combined algorithm, respectively. Similarly, Misra-Hebert et al [35] found that NLP identified higher nonserious hypoglycemia events than ICD codes (14,763 vs 10,205 events) during the study period from 2005 to 2017; among 204,517 patients with no ICD codes for nonsevere hypoglycemia, evidence of nonsevere hypoglycemia was found in 7035 (3.44%) using NLP. Li et al [34] also showed that hypoglycemia was identified in 21% of the participants, with 9.67% identified only by NLP algorithms. In addition, Uzoigwe et al [36] found that the prevalence rates of hypoglycemia were 11.4% and <0.1% using NLP algorithms and ICD codes, respectively, in T2D; the prevalence rates were 20.4% and 0.1%, respectively, in T1D.
Using the combination of NLP and other approaches (eg, ICD codes) identified the highest prevalence or incidence of hypoglycemia compared with either method alone. Nunes et al [31] found that the prevalence rates of hypoglycemia were 12.4% for ICD codes, 25.1% for NLP algorithm, and 32.2% for combined algorithms; the incidence rates per 100 person-years were 2.3%, 4.8%, and 6.3% using ICD codes, NLP, and combined algorithms, respectively. Similarly, Misra-Hebert et al [35] identified that the incidence proportions of patients in the period from 2005 to 2017 were 0.4% and 1.3% for nonsevere hypoglycemia when using only ICD codes, whereas when NLP was added, the incidence proportions increased to 0.8% and 2.6%.

Principal Findings
This systematic review aimed to synthesize the literature on the application of NLP to extract hypoglycemia from EHR clinical notes. Of the 8 studies, 4 (50%) reported that the prevalence rate of any level of hypoglycemia was 3.4% to 46.2%. Overall, the use of NLP to analyze clinical notes improved the capture of hypoglycemic events that may have been undocumented or missed using laboratory testing or ICD-9 and ICD-10 codes.
The combination of NLP and other approaches significantly increased the identification of hypoglycemic events compared with individual methods. All reviewed studies applied rule-based NLP methods to identify hypoglycemia.
Previous reviews of the prevalence and incidence of hypoglycemia using NLP are limited. Our study found that the prevalence rate of any condition of hypoglycemia was 3.4% to 46.2%, whereas a previous review study reported that the prevalence rate of any condition of hypoglycemia ranged from 1% to 19% for studies using EHR as a data source [8]. In addition, 13% (1/8) of the studies in our review reported that symptom-based hypoglycemia-the estimated prevalence rate of hypoglycemia using combined symptom-based and nonsymptom-based definitions-was 20.4% (T1D) and 11.4% (T2D) [36], which is more prevalent than previous analyses without applying NLP for data extraction [40,41].
All included studies (n=8) applied rule-based NLP to identify hypoglycemia. The main aim of our paper focused on the application of NLP algorithms to identify hypoglycemia and not on the method for developing algorithms. Published articles have reported developing machine learning or deep learning algorithms to identify hypoglycemia, but they did not report the incidence of hypoglycemia; therefore, we did not include such papers in our review. For example, Chen et al [42] incorporated 3 machine learning algorithms to detect hypoglycemia, including logistic regression, linear support vector machines, and random forest. The result showed that single cross-validation logistic regression with cost-sensitive learning achieved the best performance with sensitivity of 0.693 and specificity of 0.974. In addition, Jin et al [43] developed and evaluated deep learning-based NLP systems to automatically detect hypoglycemia events from EHR narratives; they found that the convolutional neural network model yielded a promising performance with precision of 0.96 and recall of 0.86 in a 10-fold cross-validation setting. Furthermore, none of our reviewed studies applied the currently dominant method (eg, transformer models and transfer learning) in NLP research to identify hypoglycemia from EHR data. Our review indicated that the applications of NLP to identify hypoglycemia mainly use the rule-based system. Although machine learning-and deep learning-based algorithms have been developed, they have not been applied in clinical research.
A limitation of this review is the heterogeneity of the reported results. This heterogeneity prevents the estimation of the pooled incidence and prevalence of hypoglycemia in diabetes using NLP algorithms. In addition, excluding conference proceedings reduced the number of papers included. However, medical literature does not take conference proceedings into much consideration when making clinical decisions; therefore, conference proceedings are usually not included in a review paper in medical literature. However, in terms of clinical impacts, findings from the excluded conference proceedings would have more impact regarding the clinical decision of using NLP as a clinical algorithm, which can help patients or physicians to better identify high-risk hypoglycemia. To the best of our knowledge, this is the first systematic review to synthesize the prevalence and incidence of hypoglycemia using NLP in individuals with diabetes. All reviewed studies applied the combination of NLP with ICD codes and laboratory testing and identified higher incidence of hypoglycemia when using EHR data sources. This has significant clinical implications for the prevention and management of hypoglycemia; with the widespread use of EHRs, leveraging clinical notes significantly improves the identification of individuals with hypoglycemia. The preferred strategy is to use structured data (ICD codes), followed by using NLP to synthesize the unstructured data to pinpoint those at highest risk for hypoglycemia.

Conclusions
In conclusion, our findings provided evidence that the application of NLP to analyze clinical notes improved the capture of hypoglycemic events, particularly when combined with ICD-9 and ICD-10 codes and laboratory testing. Identifying such patients with diabetes is important and necessary for characterizing treatment and unmet needs, thus preventing the adverse events and mortality associated with hypoglycemia. The current application of NLP in the identification of hypoglycemia still relies on the traditional rule-based methods; although machine learning-and deep learning-based algorithms have been developed, they have not been applied in clinical research. Future research should explore comparison of the rule-based systems, machine learning approaches, and deep learning-based NLP methods (eg, transformer models and transfer learning) to improve NLP efficiency.