Patient Dietary Supplements Use: Do Results from Natural Language Processing of Clinical Notes Agree with Survey Data?

There is widespread use of dietary supplements, some prescribed but many taken without a physician’s guidance. There are many potential interactions between supplements and both over-the-counter and prescription medications in ways that are unknown to patients. Structured medical records do not adequately document supplement use; however, unstructured clinical notes often contain extra information on supplements. We studied a group of 377 patients from three healthcare facilities and developed a natural language processing (NLP) tool to detect supplement use. Using surveys of these patients, we investigated the correlation between self-reported supplement use and NLP extractions from the clinical notes. Our model achieved an F1 score of 0.914 for detecting all supplements. Individual supplement detection had a variable correlation with survey responses, ranging from an F1 of 0.83 for calcium to an F1 of 0.39 for folic acid. Our study demonstrated good NLP performance while also finding that self-reported supplement use is not always consistent with the documented use in clinical records.


Introduction
Dietary supplements are widely used by the general population in the United States. According to the National Health Interview Survey (NHIS), 18% of American adults had used herbal supplements in 2012 [1]. In the 2017-2018 National Health and Nutrition Examination Survey (NHANES), 57.6% of adults reported using dietary supplements in the past 30 days. Women used more dietary supplements than men did across all age categories, and their use increased with age. In particular, adults aged 60 and older reported taking the most supplements, with almost a quarter (24.9%) saying they take four or more [2].
However, since dietary supplements are not regulated by the U.S. Food and Drug Administration (FDA) before marketing, the adverse effects and potential drug interactions with prescription medications are not well known. Certain supplements can be dangerous on their own if not taken correctly, and they can also cause adverse events when combined with other substances [3]. In addition, adverse reactions to the products are often not reported [4]. For example, Liu et al. studied the surgical evaluations of 376 patients and found 75% were using complementary therapies, many of which included supplements, but only 17% had discussed it with their physicians [5]. The lack of disclosure has significant implications for patient safety, particularly in situations involving acute care. Patients may unknowingly put their health at risk by taking supplements without proper medical guidance or without considering their potential interactions with other medications. Another study conducted by Wood et al. identified that the use of complementary therapies has become increasingly popular, with a usage rate of 64%. Among these therapies, megadose vitamins and nutritional supplements were found to be the most popular. However, the study also found that many patients did not disclose their complementary use to their healthcare provider [6]. A systematic review of patients with cardiovascular disease found the use of complementary and alternative medicine was common among the patients, with varying prevalence ranging between 19% and 64% of whom were also taking prescription medications. There appeared to be little awareness that there may be interactions with their prescription medications [7]. Many of the known interactions are with prescription cardiac medications, introducing a risk of serious complications [4]. Loya et al. report that 31.5% of their study participants were at risk of experiencing interactions between over-the-counter medications and supplements they were taking [8]. In a Veterans Affairs study, Lee et al. found that 61% of their sample of 200 cancer patients used supplements, and 12% of those were at risk of interactions with prescribed medications [9]. Additionally, a study by Geller et al. found that tens of thousands of emergency room visits every year in the United States are due to adverse events related to dietary supplements [10]. Finally, a study conducted at a geriatrics clinic (n = 124) examined the use of dietary supplements and potential interactions with prescription drugs in the elderly population. They identified 23% of participants, and of the supplement users, 54% were at risk for interactions [11].
Prevention of adverse events due to these interactions requires knowledge of the medications and supplements used by patients. This is normally obtained from patient medication lists; however, supplements are not routinely present in medication lists unless prescribed by the healthcare provider. Our group was the first to leverage electronic health record (EHR) data in the study of dietary supplements [12][13][14]. We developed a herb-drug interaction alert system that automatically identifies potentially harmful combinations of supplements and drugs from EHR data [12]. In a related study, we found that supplements can be detected more completely through natural language processing (NLP) analysis of patients' unstructured and semi-structured clinical notes [14]. We were able to identify a significantly larger cohort of individuals using the herbal remedy Ginkgo biloba by analyzing clinical notes rather than relying solely on structured pharmacy data [14]. Specifically, out of a large sample, we were able to identify over 28,000 patients taking Ginkgo biloba based on clinical notes alone, compared to only 9 patients who could be identified through structured pharmacy data. Other researchers have also studied dietary supplements in the context of EHR. Zhang et al. reported that supplement terminology is not fully standardized between medical terminologies and databases. They identified gaps between supplement and standard medication terminologies, making it difficult to identify them in EHR systems. They further found supplements that were not mentioned in the medication list [15].
More recently, Arnaud et al. employed NLP techniques to predict the medical specialties of patients at an early stage of hospital admission, integrating structured data with unstructured textual notes from a dataset of about 260 K emergency department records. Their findings show that NLP can accurately predict medical specialties, which has implications for optimizing resource allocation, enhancing patient outcomes, and cutting costs [16]. Additionally, Elbattah et al. reviewed key studies published over the past six years on recent developments and applications of text analytics in healthcare. The main findings highlight the potential of NLP and other text analytics techniques to improve various aspects of healthcare. The study emphasized the need for continued research and development in this field to enhance the effectiveness and efficiency of text analytics in healthcare [17]. Finally, Fan et al. demonstrated the use of rule-based and machine learning models for the detection of supplement use from clinical notes [18], although the study was performed at a single healthcare facility, which may limit generalizability. Additional work on this topic has also been reported [19,20].
In this study, we investigated the use of NLP to detect the use of multiple supplements in free-text clinical notes and evaluated the NLP results again using survey data from three different healthcare facilities. Specifically, we applied the trained NLP model to the notes of patients that had completed a survey regarding their supplement use and evaluated the correlation between patients' self-reported supplement use and the detection of supplement use using NLP techniques on the clinical notes. The aim of this study is to examine the potential of NLP techniques to detect dietary supplement use from both semi-structured and unstructured clinical notes. Our goal is to demonstrate the ability to capture the use of dietary supplements from free text clinical notes and assess the correlation between documented versus self-reported use of dietary supplements.

Surveys
Paper-based surveys were completed by 377 Veterans from three different Veterans Affairs Medical Centers (VAMCs) (130 from the Washington DC VAMC, 115 from the VA West Haven HCS, and 132 from the VA Salt Lake City HCS) to measure their use of dietary supplements. Informed consent was obtained from each participant after clarification of the study objectives and activities. The characteristics of the participants are shown in Table 1, and a sample of the reported supplements is shown in Table 2. From these surveys, a supplement set was created consisting of the unique supplements used by the participants.

Annotation
This study used medical records from the U.S. Veterans Health Administration (VHA) system under IRB supervision through the VA Informatics and Computing Infrastructure (VINCI). VINCI is a collaboration between the VA Office of Information and Technology (OI&T) and the Office of Research and Development (OR&D) to provide researchers with an environment for secure access to VA healthcare data stored in the Clinical Data Warehouse (CDW) [21].
CDW is a national repository for the VA's electronic health records. It was first "developed in 2006 to accommodate the massive amounts of data being generated from more than 20 years of use and to streamline the process of knowledge discovery and application." VA pharmacy data are documented in CDW, pharmacy benefits management (PBM), and VHA managerial cost accounting (MCA). Dietary supplements, however, are rarely documented in the structured pharmacy tables [14]. Fortunately, CDW includes unstructured data as textual information utility (TIU) data and houses software applications for NLP.
A list of keywords was constructed from the supplement set derived from the surveys and augmented with known variants and misspellings. The keywords were grouped into categories when they had the same meaning. These keywords were used to retrieve a set of clinical notes from patients, not including the survey participants, containing the keywords. The documents were tokenized. Relevant punctuation, tabs, and new line markup were not removed prior to annotation review; these elements were operative in the text that was semi-structured. These documents were further split into snippets consisting of the matched keyword and its context, +/−20 words around the keyword. We have found in multiple previous studies that the inclusion of 20 words before and after each keyword results in a more complete context representation than single sentences, partly due to the non-grammatical structures in many clinical notes, which make automatic sentence splitting unreliable. 1000 of these snippets were manually reviewed and annotated to indicate if each keyword occurrence indicated the current use of supplements (yes or no). In snippets containing multiple keywords, all keywords were annotated.

Machine Learning and Evaluation
A support vector machine (SVM) was trained using features derived from the annotated snippets. Each snippet keyword was considered a separate observation, resulting in 1913 observations (many of the 1000 snippets contained multiple keywords). A set of unique bigrams was constructed from all snippets for use as features. The authors used the WEKA workbench [22] in developing the SVM. An additional feature was used to indicate the keyword in the snippet, and the outcome was the supplement annotation (yes or no). The final feature set consisted of 14,217 two-gram features and one keyword feature. A 10-fold cross validation was used to measure performance, and then all observations were used to create a final model.

Survey Evaluation
In order to evaluate the ability of the model to identify supplement use among the survey participants, the model was applied to snippets from clinical notes belonging to the participants. Only clinical notes from the time period between one year prior and one month after the survey date were used in order to maintain context with the surveys. Snippets were retrieved and extracted in the same way as in the training of the SVM, using the supplement keywords, resulting in 28,897 snippets. The SVM model was then applied to those snippets to identify if they indicated active supplement use. If the model predicted active supplement use, then the snippet was classified as positive for the use of the supplement indicated by the keyword. Snippets were then aggregated to the patient level. A patient was considered to be using the supplement if they had at least one snippet positive for the supplement. The methods are illustrated in Figure 1.

Results
Eighty-three keywords representing 44 categories of dietary supplements were identified ( Table 3). The SVM trained on the observations derived from the snippets performed with precision = 0.914, recall = 0.914, and f-measure = 0.914 (10-fold cross validation, details in Table 4).

Discussion
Significance: The use of dietary supplements is widespread, with many individuals taking them without a physician's guidance. However, dietary supplements can potentially interact with both over-the-counter and prescription medications, and many of these interactions are unknown to patients. The documentation of dietary supplement use in structured medical records is often missing, but unstructured clinical notes contain additional information about supplement use. To take advantage of the clinical notes, we developed an NLP system to detect dietary supplement use in clinical notes and used survey data from a group of 377 patients from three healthcare facilities for evaluation.
In particular, we applied the trained model to notes belonging to patients that had completed a survey evaluating their supplement use and evaluated the correlations between patients' self-reported supplement use and that detected with NLP techniques in the clinical notes. We trained an SVM model on bigrams from snippets of text containing supplement keywords, resulting in an overall high performance (F1 = 0.914). The NLP results' agreement with patient self-reported surveys, however, was variable, with good agreement in many cases (e.g., calcium, F1 = 0.83) and low agreement in others (e.g., folic acid, F1 = 0.39).
Implication: Health record systems generally contain medication lists; however, dietary supplements are not routinely tracked via this mechanism. Some dietary supplements have been shown, and others are strongly suspected, of having interactions with other medications. The good news is that we have demonstrated that NLP of semi-structured and unstructured clinical notes can reliably detect the use of many dietary supplements. The not-so-good news is that the agreement between the self-reported supplement use and the NLP results is not consistently high.
Our study has several strengths, including the ability to capture dietary supplement use from free-text clinical notes, which could enable future clinical studies on drug interactions and outcomes research. Moreover, we show that patients from multiple healthcare facilities often self-reported supplement use that contradicted what was recorded in the clinical record, indicating the importance of improving the documentation of supplement use in medical records. Finally, the use of a nationwide electronic health record system allowed for the generalizability of our findings across different healthcare facilities.
Limitations: Despite the promising results of this study, a limitation we encountered was the variability in performance between different supplements when compared to patient self-report in surveys. In error-checking our process, we manually reviewed cases where the SVM prediction did not match the survey answers. This uncovered multiple cases where the patient reported that they were not taking a specific supplement; however, the supplement was recorded in their medication list. There was an apparent pattern indicating that patients did not regard something as a supplement if it was prescribed by a physician. This may be the underlying cause of the lower agreement of our model in cases such as folic acid and melatonin, both of which are commonly prescribed by a physician. Future studies could address the mental model patients have of what they consider to be a supplement. In addition, roughly half of the supplements we studied did not have the number of observations needed for reliable results to be measured when compared to surveys. A larger survey set, or a more directed survey, would be useful to address this. Another limitation is that we did not perform feature selection or parameter optimization before training the SVM, nor did we compare the SVM against other machine learning algorithms. Although SVMs perform their own internal feature selection, better optimization may be obtained from feature selection and parameter tuning.
Another challenge we face in the study of dietary supplements is the lack of dosage, duration, and an exact start date for supplement use. While we were able to identify numerous mentions of the use of dietary supplements, the documentation of dosage, duration, and exact start date is inconsistent, if available at all. Unfortunately, this is a documentation problem that NLP cannot solve. With increasing awareness of the importance of holistic care for patients, we expect the documentation of dietary supplements to improve.
In order to further pursue this important topic, future work will possibly include a larger study with more data and survey participants. Additionally, future work will explore other machine learning algorithms, including the application of large language models such as those produced by bidirectional encoder representations from transformers (BERT), to improve NLP performance.

Conclusions
In conclusion, we have demonstrated the ability to capture the use of dietary supplements using NLP techniques and found that the NLP results do not always agree with self-reported use in survey data. Our findings underscore the importance of supplement documentation and highlight the need for improved practices to ensure that clinicians have access to accurate and complete information about their patients' dietary supplement use. Furthermore, the widespread use of dietary supplements and the potential interaction with other medications highlight the importance of providing holistic care for patients, taking into consideration their lives outside of clinical encounters.  Informed Consent Statement: All subjects were properly consented to according to the approved protocol.

Data Availability Statement:
The datasets generated during and/or analyzed during the current study are not publicly available to protect the privacy of research participants, but aggregated datasets are available from the corresponding author on reasonable requests.

Conflicts of Interest:
The authors declare no conflict of interest.