Tardive dyskinesia among patients using antipsychotic medications in customary clinical care in the United States

Background Tardive dyskinesia (TD) is a movement disorder resulting from treatment with typical and atypical antipsychotics. An estimated 16–50% of patients treated with antipsychotics have TD, but this number may be underestimated. The objectives of this study were to build an algorithm for use in electronic health records (EHRs) for the detection and characterization of TD patients, and to estimate the prevalence of TD in a population of patients exposed to antipsychotic medications. Methods This retrospective observational study included patients identified in the Optum EHR Database who received a new or refill prescription for an antipsychotic medication between January 2011 and December 2015 (follow-up through June 2016). TD mentions were identified in the natural language–processed clinical notes, and an algorithm was built to classify the likelihood that the mention represented documentation of a TD diagnosis as probable, possible, unlikely, or negative. The final TD population comprised a subgroup identified using this algorithm, with ≥1 probable TD mention (highly likely TD). Results 164,417 patients were identified for the antipsychotic population, with1,314 comprising the final TD population. Conservatively, the estimated average annual prevalence of TD in patients receiving antipsychotics was 0.8% of the antipsychotic user population. The average annual prevalence may be as high as 1.9% per antipsychotic user per year, allowing for a more-inclusive algorithm using both probable and possible TD. Most TD patients were prescribed atypical antipsychotics (1049/1314, 79.8%). Schizophrenia (601/1314, 45.7%), and paranoid and schizophrenia‐like disorders (277/1314, 21.1%) were more prevalent in the TD population compared with the entire antipsychotic drug cohort (13,308/164,417; 8.1% and 19,359/164,417; 11.8%, respectively). Conclusions Despite a lower TD prevalence than previously estimated and the predominant use of atypical antipsychotics, identified TD patients appear to have a substantial comorbidity burden that requires special treatment and management consideration.


Background
Tardive dyskinesia (TD) is a movement disorder resulting from treatment with typical and atypical antipsychotics. An estimated 16-50% of patients treated with antipsychotics have TD, but this number may be underestimated. The objectives of this study were to build an algorithm for use in electronic health records (EHRs) for the detection and characterization of TD patients, and to estimate the prevalence of TD in a population of patients exposed to antipsychotic medications.

Methods
This retrospective observational study included patients identified in the Optum EHR Database who received a new or refill prescription for an antipsychotic medication between January 2011 and December 2015 (follow-up through June 2016). TD mentions were identified in the natural language-processed clinical notes, and an algorithm was built to classify the likelihood that the mention represented documentation of a TD diagnosis as probable, possible, unlikely, or negative. The final TD population comprised a subgroup identified using this algorithm, with �1 probable TD mention (highly likely TD).

Results
164,417 patients were identified for the antipsychotic population, with1,314 comprising the final TD population. Conservatively, the estimated average annual prevalence of TD in patients receiving antipsychotics was 0.8% of the antipsychotic user population. The average annual prevalence may be as high as 1.9% per antipsychotic user per year, allowing for a more-inclusive algorithm using both probable and possible TD. Most TD patients were prescribed atypical antipsychotics (1049/1314, 79.8%). Schizophrenia (601/1314, 45.7%), and paranoid and schizophrenia-like disorders (277/1314, 21.1%) were more prevalent in the TD population compared with the entire antipsychotic drug cohort (13, Introduction inpatient and outpatient clinical notes of healthcare providers, as well as visit summaries, follow-up and referral letters, reports from imaging services, pathology investigations, surgical procedures, and other sources are available. Optum used a generalized natural language processing (NLP) system to extract and organize concepts from free-text into semi-structured data fields, along with pertinent sentiments (affirmations, and negations) and other modifiers (severity, duration and cause). Previously, Optum NLP algorithmic analysis of an EHR database has successfully provided clinical insights into hypoglycemia and binge-eating disorders [23][24][25][26][27]. In this study, descriptive characteristics of the underlying population of antipsychotic users and the population of patients with TD were assessed using data from EHR structured fields and data parsed from unstructured NLP clinical notes, which include such details as: demographics, healthcare utilization, antipsychotic prescription types, underlying psychiatric comorbidities, and the extrapyramidal symptoms. The average annual prevalence of TD among patients receiving antipsychotic medication was estimated.

Ethics
The New England Institutional Review Board approved this project as an exempted retrospective study and determined that informed consent was not required.

Study participants
Patients were included if they received a prescription (new or refill) for an antipsychotic between January 1, 2011 and December 31, 2015. A list of antipsychotics used in the study is provided in S1 Table. Patients entered the antipsychotic drug cohort on the date of the first antipsychotic prescription in the accrual period (anchor date) when all inclusion criteria were met. To be included in this analysis, the EHR had to contain clinical notes and the year of birth and the sex of the patient. The patients had to be at least 18 years old on the anchor date, have at least one outpatient clinic visit with an evaluation and management code during the 12-month period prior to cohort entry (baseline period) excluding the anchor date, at least one outpatient visit or hospitalization 12 months-24 months prior to the anchor date, and at least one outpatient prescription during the baseline period excluding the anchor date.

Antipsychotic drug cohort
Prevalent and new (treatment naïve) users of antipsychotics were eligible to enter the study cohort. A prevalent antipsychotic drug user was defined as a patient with a prescription for an antipsychotic, with evidence of previous prescription(s) for any antipsychotic drug (same or different) in the previous 12 months. A new antipsychotic drug user was defined as a patient with a prescription of antipsychotic drug on the anchor date, with no evidence of a prescription for any antipsychotic drug in the previous 12 months.

TD population
Because there is no diagnostic code for TD, a subgroup of patients who were likely to have a diagnosis of TD was identified using an algorithm based on the mention of TD in the free-text clinical notes. The final algorithm focused on the single term "tardive dyskinesia" that was abstracted in the NLP clinical notes with the documentation of a sign, disease, or symptom in any section of the clinical note. Attributes, either sentiments or other modifiers, that provided context to each TD mention were reviewed and were used to classify the TD mention. TD mentions with sentiments indicating an affirmation (have, has, exhibit) were categorized as probable; clear negations (express not, deny, free) were categorized as negative; less clear fell into two categories, possible (seem, develop) or unlikely (concern, consider, describe). Supplementary modifiers were reviewed and modifiers that support a TD diagnosis (chronic, severe, longstanding, medication-induced, and controlled) or reference an affected body part (face, jaw, or lower extremity) could shift a TD mention initially classified as possible to probable. In addition, contextual information about medication orders was abstracted from the clinical notes; having TD listed as a reason for treatment was categorized as a probable TD mention.
Hierarchically, according to their most likely TD mention, patients were classified as 'highly likely TD' if they had at least one probable TD mention. Patients were classified as 'possibly likely TD' if they had at least one possible TD mention. 'Ambiguous unlikely TD' cases had an unlikely TD mention, and 'unlikely TD' cases had negative TD mentions. Using structured fields, we excluded patients having a diagnosis of Parkinson's disease (International Classification of Diseases [ICD]-9: 332.0x, ICD-10: G20.xx) or secondary parkinsonism (ICD-9: 332.1, ICD-10: G21.xx). The final algorithm used to identify patients restricted the TD population to patients classified as highly likely TD. For each case, the date of earliest probable TD mention during the study period was considered the anchor date. Prior TD cases included patients with first probable TD mention in the baseline period, and new TD cases included patients with first probable TD mention during follow-up.

Statistical analysis
The annual prevalence of TD was assessed among the underlying population of antipsychotic users available in each year of follow-up (2011-2016). To be included in the underlying population for a specific year (denominator), we required that the patient had met study eligibility criteria prior to or during the anchor year for prevalence estimation and had an observed outpatient medical encounter in the Optum EHR database during the anchor year or year prior. The requirement of an outpatient visit within 1 year of the anchor year for prevalence estimation provides some confidence that the patient continued to receive care that is documented in the Optum EHR database. Both prior and new TD cases were counted in the numerator of the prevalence calculation. In a specified year, a patient was included in the numerator if they were classified as a TD case during or prior to that year, and if the patient was included in the underlying population (denominator) for that year. Prevalence was calculated in two ways: 1) using a more restrictive definition that included highly likely TD cases only, and 2) using a more inclusive definition that included both highly likely and possible TD cases. For these measures, a summary estimate, average annual prevalence, is reported.
Baseline covariates of antipsychotic medication users and the subpopulation of patients with highly likely TD were derived from the structured fields in the EHR data during the 12-month baseline period. Characteristics included demographics and lifestyle characteristics, antipsychotic exposure, presence of a movement disorder, underlying comorbidities of interest, and mental health conditions. Prescription orders for antipsychotic medications were identified using National Drug Codes. Antipsychotic medications received on the anchor date were classified as either typical or atypical antipsychotic medications. Movement disorders, comorbidities and mental health conditions were identified using ICD-9-CM and ICD-10-CM diagnostic codes, as well as procedure codes and NLP data when appropriate. Descriptive analyses were performed to characterize the population taking antipsychotic medications and of the subpopulation of TD patients.

Study population
A total of 164,417 patients were identified as having prescriptions for antipsychotics and �1 medical visit or hospitalization 12-24 months before the anchor date; in this population there were 6,294 (3.8%) patients with a mention of TD in their NLP clinical notes, and 1,314 (0.8%) that comprised the final TD population (Fig 1)

Estimated TD prevalence
Based on highly likely TD cases, the annual prevalence of TD in patients receiving antipsychotics ranged from 7.6-9.7 per 1000 patients during the study period. The average annual prevalence estimate of probable TD was 7.8 per 1000 patients during the interim years (0.8% of the antipsychotic user population per year). Counting both highly likely and possibly likely TD cases, the annual prevalence of probable or possible TD among patients receiving antipsychotics ranged from 18.0 to 20.5 per 1000 patients during the study period. The average annual prevalence estimate of probable or possible TD was 18.8 per 1000 patients during the interim years (1.9% of the antipsychotic user population per year).

Study population characteristics
Similarly to the overall study population, the majority of the TD population were 50 years of age or older (952/1314; 72.4%), female (800/1314; 60.9%), and white (881/1314; 67.1%) ( Table 2). While patients could have codes for more than one mental health condition, approximately half of the TD population had evidence of a neurotic/anxiety disorder (693/ 1314; 52.7%), or a mood disorder (661/1314; 50.3%) ( Table 2). Few of the patients prescribed an antipsychotic drug or who had TD had evidence of alcohol use, abuse, or dependence, but most had a history of smoking and about one-quarter to one-third had evidence of drug dependence, respectively (Tables 1 and 2).
Approximately two-thirds of the TD population had a structured code for obesity (929/ 1314; 70.7%). Fewer patients had evidence of a diabetes diagnosis (390/1314; 29.7%) or dyslipidemia (589/1314; 44.8%). Underlying comorbidities of obesity, diabetes, and dyslipidemia were more prevalent in the TD population than in the entire antipsychotic drug cohort (Fig 2).

Movement disorders
The TD population had greater baseline occurrence of movement disorders than the antipsychotic population (Tables 1 and 2) (Fig 3).

Healthcare utilization
A majority of the TD population had 3 or more outpatient visits during the baseline period (1197/1314; 91.1%) compared with 86.2% (141,801/164,417) in the antipsychotic population (Tables 1 and 2). About one-half of the TD population had at least one visit to the emergency

Discussion
In this retrospective, descriptive analysis, we used natural language processing to extract information from clinical notes and built an algorithm to identify probable cases of TD. This information was used to evaluate the clinical characteristics of prevalent and new antipsychotics users, including those deemed highly likely to have a TD diagnosis. Consistent with previous studies, we found that in clinical practice a majority of patients in the TD population received atypical rather than typical antipsychotics [14,21,22]. Despite the use of these newer antipsychotics, the burden of TD within this population persisted. We estimated a TD prevalence of 0.8%-1.9% of people taking antipsychotics, depending on the stringency of our TD definition. This estimation is much lower than the previously published prevalence of 16-50% of patients taking antipsychotics [3,11,12]. However, it is important to note that previous prevalence estimates of TD may be affected by study populations that are reflective of the US population, or by studies that were conducted before atypical antipsychotics were widely used. One previous prevalence study estimated a TD prevalence of 31.5% of people taking antipsychotics; however, the study was conducted at only one health center and had a study population of 619 people [12]. The prevalence of TD estimated in the current  study represents patient data taken from a large and geographically diverse number of medical practices in the US, in which over 160,000 antipsychotics users were identified. This prevalence estimate may reflect a true reduction in TD with the rise in use of atypical antipsychotics, may be due in part to under-documentation of TD cases in clinical notes, or may result from misclassification or incomplete capture of TD cases using the current algorithm. Patients in the TD population had a quantitatively higher psychiatric comorbidity burden compared with all patients treated with antipsychotics. Because the algorithm favors specificity over inclusiveness when identifying TD diagnosis in the EHR database, the comparison between the highly likely TD group and the APD users group may have omitted a segment of the TD patient population. Furthermore, the substantial number of health records analyzed and the large absolute differences observed in these burden effects obviate the need for statistical analysis for the comparison of these groups. Therefore, these qualitative comparisons support the case for a real-world burden of TD in the clinical setting and demonstrate a novel means to extract burden from existing clinical data resources.
The TD population had a relatively higher prevalence of schizophrenia (absolute difference of 37.6%) compared with the antipsychotic population. In our analysis, psychiatric comorbidities appeared at a higher rate in the TD population relative to the antipsychotic population, including paranoid and schizophrenia-like disorder (9.3% difference), drug dependence (8.2% difference), neurotic/anxiety disorder (7.8% difference), personality disorder (4.8% difference), mood disorder (4.3% difference), and alcohol dependence (2.4% difference). Our data suggest the TD population had an increased non-psychiatric comorbidity compared with the general antipsychotic population, including diabetes (8.5% difference), dyslipidemia (7.1% difference), obesity (6.1% difference), alcoholism (3.9%), fractures (2.3% difference), falls (1.0% Tardive dyskinesia among patients using antipsychotic medications in the United States difference), and dysphagia (0.8% difference). Notably, the TD population had overall higher baseline occurrences of movement disorders, such as, extrapyramidal movement disorder (22.8% difference), drug-induced subacute dyskinesia (16.6% difference), and orofacial dyskinesia (8.4% difference). In addition to the increased health burden, the TD population experienced a numerically larger healthcare burden than the general antipsychotic population, with increases in outpatient, inpatient, and emergency department visits. Future studies with an algorithm optimized for sensitivity will enable rigorous statistical evaluation of these trends. Reasons for the increased rates of comorbidities and increased healthcare utilization should also be explored in future studies.
This study was based on an analysis of EHR data. While EHR data are valuable for examination of clinical health care outcomes and treatment patterns, all EHR databases have certain inherent limitations because the data are collected for the purpose of clinical patient management, not research. The Optum EHR is not a closed system. Some patients may only receive a fraction of their care through a healthcare provider captured in the database, and medical encounters outside of the networks contributing data to the EHR will not be observed. In this study, the antipsychotic user cohort was restricted to patients who had clinical notes as part of their EHR; however, clinical notes may be incomplete if the patient sought care outside of the healthcare provider networks contributing to the Optum EHR database. To mitigate the potential for missing data, eligibility requirements were implemented to restrict the study population to patients with evidence of routine care within the contributing EHR systems.
Within the baseline period, we required that the patient have a recorded encounter in the EHR database at least 1 year prior to the initiation date, and at least one encounter with an outpatient evaluation and management procedure code within 1 year prior to the initiation date. Patients meeting these criteria likely have a reasonably high capture of medical encounters. In addition, we restricted the denominators for prevalence estimates to the periods of time when there was evidence that the patients had received continuous care. This study relies on information extracted from free-text clinical notes using a generalized NLP approach to identify and classify cases of TD; therefore, misclassification of the diagnosis of TD is possible. Clinical notes may include mentions of TD that are not indicative of a positive presence of TD. Opting for specificity over inclusiveness, the final algorithm included only patients with strongly affirmed TD mentions only (highly likely TD), and mentions that were initially classified as possible, but had supporting/affirming modifying information. The algorithm can be subsequently benchmarked using a set of patient records individually categorized as TD or non-TD diagnoses, and adjudicated by physician review to refine the algorithm if needed.
Despite these limitations, this study is one of the largest epidemiological studies of TD to date. Based on an algorithm used to extract information from EHR clinical notes, we found a lower estimated TD prevalence than previously published estimates. Although further work is needed to confirm these findings, understanding the characteristics of the TD patient population, specifically its substantial comorbidity and healthcare burden, informs healthcare providers responsible for the treatment and management of TD patients. Tardive dyskinesia among patients using antipsychotic medications in the United States Supporting information S1 Table. Antipsychotic medications identified for use in the study. Prescriptions orders for antipsychotic medications were identified using National Drug Codes in structured EHR data fields and classified by type (typical versus atypical). (DOCX)