Coding of Childhood Psychiatric and Neurodevelopmental Disorders in Electronic Health Records of a Large Integrated Health Care System: Validation Study

Background: Mental, emotional, and behavioral disorders are chronic pediatric conditions, and their prevalence has been on the rise over recent decades. Affected children have long-term health sequelae and a decline in health-related quality of life. Due to the lack of a validated database for pharmacoepidemiological research on selected mental, emotional, and behavioral disorders, there is uncertainty in their reported prevalence in the literature. Objectives: We aimed to evaluate the accuracy of coding related to pediatric mental, emotional, and behavioral disorders in a large integrated health care system’s electronic health records (EHRs) and compare the coding quality before and after the implementation of the International Classification of Diseases, Tenth Revision, Clinical Modification ( ICD-10-CM ) coding as well as before and after the COVID-19 pandemic. Methods: Medical records of 1200 member children aged 2-17 years with at least 1 clinical visit before the COVID-19 pandemic (January 1, 2012, to December 31, 2014, the ICD-9-CM coding period; and January 1, 2017, to December 31, 2019, the ICD-10-CM coding period) and after the COVID-19 pandemic (January 1, 2021, to December 31, 2022) were selected with stratified random sampling from EHRs for chart review. Two trained research associates reviewed the EHRs for all potential cases of autism spectrum disorder (ASD), attention-deficit hyperactivity disorder (ADHD), major depression disorder (MDD), anxiety disorder (AD), and disruptive behavior disorders (DBD) in children during the study period. Children were considered cases only if there was a mention of any one of the conditions (yes for diagnosis) in the electronic chart during the corresponding time period. The validity of diagnosis codes was evaluated by directly comparing them with the gold standard of chart abstraction using sensitivity, specificity, positive predictive value, negative predictive value, the summary statistics of the F -score, and Youden J statistic. κ statistic for interrater reliability among the 2 abstractors was calculated. Results


Introduction
Children and adolescents are particularly vulnerable to chronic mental and behavioral conditions because their brain continues to develop.Childhood mental and behavioral disorders, including autism spectrum disorder (ASD), attention-deficit hyperactivity disorder (ADHD), disruptive behavior disorders (DBD), anxiety disorder (AD), and major depressive disorder (MDD), are common neurological disorders and are on the rise in recent decades [1][2][3][4].Affected children and adolescents are subjected to long-term negative health and social consequences [5,6], leading to significant health care costs and public health burden [7,8].Therefore, accurately estimating their incidence and prevalence is important to guide policy-making, resource allocation, and implementation of different intervention programs.
Trends in mental and behavioral disorders are difficult to examine using routinely collected data and often are difficult to compare across studies because of differences in case ascertainment methods.Therefore, reported incidence and prevalence rates vary widely across studies [9], and the accuracy of case ascertainment has been challenged by researchers [10,11].This problem is further complicated by the accuracy of coding after the mandatory introduction of the International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) coding systems to classify diagnoses and procedures in the United States, which occurred on October 1, 2015 [12].The ICD-10-CM provides increased specificity and detail for many health conditions [12], including childhood psychiatric and neurodevelopmental conditions.Many health care providers adopted the electronic health record (EHR) enacted by the Health Information Technology for Economic and Clinical Health Act (HITECH Act) in 2009 [13,14].
The Kaiser Permanente Southern California (KPSC)integrated EHR data provide researchers with important health information to perform pharmacoepidemiological studies of children's behavioral, medical, and psychiatric conditions, including examining the public health impact of childhood psychiatric and neurodevelopmental conditions; however, there is uncertainty over the accuracy of the clinical diagnosis codes requiring validation before their use.Therefore, the objective of this study was to perform an EHR review on the validity of diagnosis codes during 3 time points (ICD-9-CM and ICD-10-CM coding and before and after the COVID-19 pandemic) for ascertaining psychiatric and neurodevelopmental conditions in a large socioeconomically diverse pediatric population aged 2-17 years [15].The validation for the pre-and post-COVID-19 pandemic is important to assess how much the diagnosis coding has been impacted by increasing the use of virtual visits (telephone and video-assisted encounters).

Study Setting
This study was conducted using data on member children extracted from the KPSC EHR.The KPSC health care system provides services to over 4.8 million members in 15 hospitals and 234 medical offices throughout southern California.Mental health services are provided to member patients by qualified providers at in-and outpatient psychiatric care facilities in KPSC settings.Although most members receive their care at KPSC hospitals and <10% use contracting hospitals, all diagnostic, procedural, and pharmacy records are captured and maintained by the KPSC EHR since its full implementation in 2008.The sociodemographic characteristics of KPSC members closely reflect the California population [15].

Ethical Considerations
This study was conducted with approval by the KPSC Institutional Review Board (IRB# 13114).Informed consent was waived, as the study was low risk and strictly involved the use of internal EHR data, accessible only to authorized personnel when needed.

Study Design and Sample Selection
Data for this validation study were obtained retrospectively from children who were members of the KPSC health care system during three distinct time periods: (1) January 1, 2012, through December 31, 2014; (2) January 1, 2017, through December 31, 2019; and (3) January 1, 2021, through December 31, 2022.We carefully selected the 3 time periods to investigate the medical coding accuracy of selected psychiatric and neurodevelopmental conditions encompassing both the ICD-9-CM and ICD-10-CM periods as well as the pre-and post-COVID-19 pandemic eras.To be included in the validation study cohort for each time period, children must have been enrolled in the KPSC health care system for at least 1 year during the corresponding time period at specific age ranges varying by condition (5-17 years of age for ADHD, 2-17 years of age for ASD, and 3-17 years of age for the other 3 conditions-AD, MDD, and DBD).Children may present with signs and symptoms of the specified conditions very early in life; however, for this study, we used reported and reliable lower age groupings for ASD and ADHD diagnoses [16,17] and widely published age goupings for anxiety and depressive disorders [18,19].Furthermore, at least 1 clinical visit, including virtual visits, in each corresponding time period was required.
In the KPSC system, diagnosis codes for clinical visits (eg, hospitalization, outpatient office visits, and emergency department visits) from all KPSC facilities are extracted from EHRs and entered into a structured database by professional medical coders from the clinical data management team.In this validation study, coding-based outcomes of interest were ascertained using these clinical diagnosis codes within the 3 specified time periods (Multimedia Appendix 1 presents the ICD-9-CM and ICD-10-CM codes).
For each of the investigated conditions, we randomly sampled 40 cases according to the following strata: (1) those without a diagnosis (No-Dx) and (2) those with a diagnosis (Dx).Thus, records for a total of 1200 individual children were selected.The accuracy for each of the 2 strata, with or without documented diagnostic records, was expected to be around 85%.A sample size of 120 per diagnostic condition would provide less than a 5% one-sided margin for a 90% CI of the accuracy for each stratum.

Diagnosing ASD, ADHD, DBD, and MDD
The KPSC system has an integrated framework for inand outpatient as well as emergency department encounter services.During the child's visit to any of these facilities, the practitioner has access to the child's diagnoses, but often the diagnosis of ASD, ADHD, and DBD is made in an outpatient setting.The following criteria were used to diagnose and code ASD, ADHD, and DBD within the KPSC setting: (1) a Child Behavior Checklist must be filled out by parents and teachers to describe the child's behavioral and emotional problems and (2) a clinical interview must be performed by a qualified mental health professional.In a preliminary study conducted for this project, 96% of children with ASD, ADHD, and DBD were found to have had their conditions diagnosed by KPSC child and adolescent psychiatrists, developmental and behavioral pediatricians, child psychologists, and neurologists consistent with the diagnostic criteria from the Diagnostic and Statistical Manual of Mental Disorders-Fifth Edition (DSM-V).Diagnosis of the remaining 4% was confirmed upon membership, as these cases had been previously diagnosed outside the KPSC system [20,21].
The diagnosis of depression disorder, in the KPSC system, was based on the US Preventive Services Task Force on screening for depression in children and adolescents recommendation [22].The KPSC system uses the Patient Health Questionnaire (PHQ-9) and the PHQ-9 modified for Adolescents (PHQ-A).If a patient's score on the PHQ-9 or PHQ-A does not seem to accurately reflect observed clinical symptoms, DSM-V criteria are recommended for diagnosis.
To ascertain the neurodevelopmental conditions investigated in this study, we relied on systemwide clinical diagnoses made by experts in the field as mentioned above.

Chart Abstraction Process
Trained research associates (abstractors) reviewed EHRs for documentation of a diagnosis (yes/no) of each condition under investigation for the selected sample of children during the study period.Children were considered to have the disorder in the presence of documented evidence of that condition noted in the chart during the corresponding time period of investigation.To ensure data quality and consistency of chart reviews between the 2 abstractors, a total of 180 cases, stratified by the 2 strata (Dx and No-Dx), were randomly selected for reabstraction (90 per abstractor and 36 per condition).There was a total of 4 possible responses for each chart reviewed case, as follows: diagnosis "Yes," diagnosis "No," "Unable to ascertain due to blocked notes," and "Unable to ascertain due to insufficient notes."The abstractors based their responses on the information they were able to ascertain from the clinical notes.For example, due to the sensitive nature of psychiatric diagnoses, some progress notes may have been blocked; therefore, if the abstractors could not ascertain the diagnosis due to blocked notes, it was coded as such.Similarly, if the abstractors could not ascertain a diagnosis due to a lack of documentation or notes in the KPSC system with underused care (eg, outside claims data), the record was coded as "Unable to ascertain due to insufficient notes." The results of this assessment informed our full chart review process of the 1200 medical records as mentioned above.In other words, we excluded those claims data records from the random selection of the validation sample.Furthermore, during chart review, records in which the abstractors encountered blocked notes and were therefore unable to make a determination were flagged.These flagged records (n=38, 3%) were replaced with another randomly selected record for chart review from the same strata.Potential cases that were still unclear in the clinical use records were adjudicated by the study investigator with expertise in the field (DG).The child psychiatric and neurodevelopmental condition cases abstracted through this process served as the gold standard.
We obtained the characteristics of all children of the state of California residents during the same time periods using publicly available data posted on the Centers for Disease Control and Prevention Wonder website [3].Both the KPSC EHR and the Centers for Disease Control and Prevention (Wonder) sources provided information on child characteristics, including age and race/ethnicity.Data on median household income were estimated based on census tracts for KPSC patients.

Statistical Analysis
We described the characteristics of our study population between those with and without the conditions of interest.Furthermore, we investigated how representative the KPSC pediatric population is compared to the state of California pediatric population during the entire study period using frequency distributions.For the purpose of comparison with the California children population, the age of the children for the KPSC population was evaluated based on the date of randomly selected clinical visits for each child during the entire study period.
The agreement between the 2 abstractors was evaluated with the interrater reliability assessment (κ statistic) by using the initial 180 chart reviews.We compared findings from the manual chart review of the 1200 cases, set as the gold standard, with corresponding diagnosis records for he ICD-9-CM and ICD-10-CM coding periods as well as before and after the COVID-19 pandemic through sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV).These performance measurements were reported as weighted percentages with corresponding 95% CIs using normalized sampling weights (W), as defined below: # in stratum j 120 × #total study population i , i = 1 to 5, j = 1, 2 where i is for each of the 5 conditions and j is for the corresponding two strata (Dx and No-Dx) within each condition.To evaluate the overall performance, we also reported the summary statistics of F-score and Youden J statistic, which are composite measurements of sensitivity and PPV (F-score) or sensitivity and specificity (J statistic).
All analyses were conducted using SAS statistical software (version 9.4; SAS Institute, Inc).

Results
An overview of the patient characteristics for our sample, study population, and the state of California pediatric population are shown in Table 1.Overall, 2,283,931 children from all KPSC hospitals and medical offices were obtained during the 3 time periods.Compared with the state of California pediatric population, KPSC had similar distributions on the children's sex but had slightly higher percentages of children for ages 2-5 years (25.6% for KPSC vs 24.7% for the state of California) and 6-11 (37.8% for KPSC vs 37.4% for the state of California).In addition, KPSC had slightly lower percentages of children identified as non-Hispanic White (24.5% vs 28.5%) or Asian/Pacific Islander (10.0%vs 13.1%).However, this could be partially attributed to the larger percentage of unknown, missing, or multiple race/ethnicity in the KPSC group (6.0%vs 0.5%, respectively).Overall, the KPSC pediatric population in this study was a good representation of the entire state of California's pediatric population.Our sample of 1200 children across the 5 conditions was broadly representative of the KPSC study population with respect to age, sex, and race/ethnicity.However, children in the older age groups were slightly oversampled.The overall κ between our 2 abstractors was 95%.Table 2 shows the distribution and their sample sizes for the two strata (Dx and No-Dx) among our entire study population and by the 3 time periods.Based on these comparisons against the chart review results, the performance measurements of our EHRs are shown in Table 3.The weighted sensitivity, specificity, PPV, and NPV for each of the 5 conditions were as follows: • ADHD: sensitivity 100%, specificity 99.9%, PPV 99.2%, and NPV 100% • ASD: sensitivity 100%, specificity 100%, PPV 99.2%, and NPV 100% • MDD: sensitivity 100%, specificity 100%, PPV 99.2%, and NPV 100% • AD: sensitivity 87.7%, specificity 100%, PPV 100%, and NPV 99.2% • DBD: sensitivity 100%, specificity 100%, PPV 100%, and NPV 100% The corresponding F-score and Youden J statistic were 99.6% and 99.9% for ADHD, 99.6% and 100% for ASD, 99.6% and 100% for MMD, 93.4% and 87.7% for AD, and 100% and 100% for DBD, respectively.Results were similar across the ICD-9-CM and ICD-10-CM coding time periods as well as the before and after the COVID-19 pandemic.

Principal Findings
This validation study was performed to determine the accuracy of clinical diagnosis codes in ascertaining childhood psychiatric and neurodevelopmental cases using data abstracted from the EHR of a large integrated health care system serving a demographically diverse patient population.To our knowledge, the accuracy of data on studied conditions using clinical diagnostic codes has not been validated using EHR data.Furthermore, the extent to which the transition of ICD-9-CM to the ICD-10-CM coding system as well as the pre-and post-COVID-19 pandemic eras have impacted the ascertainment of the studied behavioral and developmental conditions is unclear.Our study showed that within a large integrated health system, there is strong agreement between the diagnosis codes (ICD-9-CM or ICD-10-CM) and the patients' conditions, including ASD, ADHD, DBD, AD, and MDD, both before and after the COVID-19 pandemic eras.
In recent years, EHRs have become important data sources for various epidemiological study designs investigating potential associations between exposures and outcomes that have become standard among researchers and health care providers as part of the American Recovery and Reinvestment Act of 2009 (specifically the HITECH Act) [14].Although the EHR has become an important information management and care delivery system tool ensuring quality of care by providing access to comprehensive treatment-related data, its validity and completeness for conducting pharmacoepidemiological studies have been challenged by many due to how information is coded [23].In the KPSC health care system, the process of using data coding and coding rules of the medical diagnoses and procedures recorded in patients' health records is performed by trained medical coders from the clinical data management team.Individual diagnostic conditions in the EHR need to be evaluated critically for accuracy and consistency.Furthermore, whether the accuracy of case ascertainment has been impacted by the introduction of the ICD-10-CM coding system as well as the effect of the COVID-19 pandemic on the quality of data capture needs to be investigated.Therefore, we performed this validation study to evaluate (1) the accuracy of pediatric mental, emotional, and behavioral disorders identified in EHRs and (2) the coding quality before and after the implementation of the ICD-10-CM codings as well as before and after the COVID-19 pandemic.
The findings of this study suggest that the validity of EHR data for the identification of childhood mental, behavioral, and emotional conditions is quite strong and the transition from the ICD-9-CM coding system to the ICD-10-CM coding system as well as coding during the COVID-19 pandemic had minimal impact on the overall accuracy of case ascertainment.
The main strength of this study is the large chart abstraction conducted to assess the validity and reliability of childhood mental, behavioral, and emotional disorders case ascertainment using EHR data extracted from a large integrated health care system.The EHR system database provides an opportunity for neurodevelopmental outcome investigation with a high degree of validity of case ascertainment for epidemiological studies, in addition to current developments by others using natural language processing algorithms [24][25][26][27].This validation study, comprised of a demographically diverse southern California population, is likely generalizable to health care settings with similar EHR database systems.Although the overall agreement between our abstractors was almost perfect (κ=95%), a potential limitation of this study was the use of medical record abstractors who were not blinded to the source of the data.However, a previous study that evaluated the agreement between masked and unmasked medical record abstraction reported no impact of bias in case ascertainment [28].A further potential limitation is the fact that we had some records with blocked notes that needed to be replaced because of incompleteness in ascertaining the conditions of interest.However, in close examination, we found that the blocked notes were only 38 (3%) out of the 1200 cases, which we believe will have minimal impact on the overall analysis.In addition, we did not consider oversampling of the older age strata in the summarized performance metrics.Considering that the older age group could be given a more accurate diagnosis coding, the actual concordance of the electronic coding might be slightly lower than what we observed in this study.

Conclusions
Our findings suggest that childhood mental, behavioral, and emotional disorders are reliably coded in the EHRs and can be used for pharmacoepidemiological studies.Furthermore, the completeness of data remained similar during the preand postpandemic eras and after the implementation of the ICD-10-CM coding in the EHR system.

Table 1 .
Characteristics of the study cohort and the state of California pediatric population aged 2-17 years (2012-2014, 2017-2019, and 2021-2022).Data are from the natality information of Centers for Disease Control and Prevention website [3].c Starting age is different for each condition.d Median household income and insurance type information are not available for the California state data. b

Table 2 .
Frequencies of study population and chart review results by diagnosis codes.Condition-specific row percentages may not sum to 100% due to rounding.
a DBD: disruptive behavior disorders.b Dx: with confirmed diagnosis codes in electronic data.c No-Dx: without confirmed diagnosis codes in electronic data.d ASD: autism spectrum disorder.e MDD: major depressive disorder.f ADHD: attention deficit hyperactivity disorder.

Table 3 .
Weighted performance measurements for childhood psychiatric and neurodevelopmental disorders by data sources before and after the implementation of the International Classification of Diseases, Ninth/Tenth Revisions, Clinical Modification codes in the Kaiser Permanente Southern California (KPSC) system in 2015 (ample size=1200) and before and after the COVID-19 pandemic.
f ADHD: attention deficit hyperactivity disorder.