Validity of diagnoses, procedures, and laboratory data in Japanese administrative data

Background Validation of recorded data is a prerequisite for studies that utilize administrative databases. The present study evaluated the validity of diagnoses and procedure records in the Japanese Diagnosis Procedure Combination (DPC) data, along with laboratory test results in the newly-introduced Standardized Structured Medical Record Information Exchange (SS-MIX) data. Methods Between November 2015 and February 2016, we conducted chart reviews of 315 patients hospitalized between April 2014 and March 2015 in four middle-sized acute-care hospitals in Shizuoka, Kochi, Fukuoka, and Saga Prefectures and used them as reference standards. The sensitivity and specificity of DPC data in identifying 16 diseases and 10 common procedures were identified. The accuracy of SS-MIX data for 13 laboratory test results was also examined. Results The specificity of diagnoses in the DPC data exceeded 96%, while the sensitivity was below 50% for seven diseases and variable across diseases. When limited to primary diagnoses, the sensitivity and specificity were 78.9% and 93.2%, respectively. The sensitivity of procedure records exceeded 90% for six procedures, and the specificity exceeded 90% for nine procedures. Agreement between the SS-MIX data and the chart reviews was above 95% for all 13 items. Conclusion The validity of diagnoses and procedure records in the DPC data and laboratory results in the SS-MIX data was high in general, supporting their use in future studies.


Introduction
Administrative databases are widely used in medical research studies. 1e4 Their large sample size, population representativeness, and clinical information enables large-scale studies to be conducted inexpensively. 5,6 However, the use of administrative databases for research is based on an assumption that databases convey reasonably accurate data for health status and service utilization information. Because the use of inaccurate data could produce biased results, 6,7 validation of the information recorded in administrative databases is essential.
In previous validation studies, comorbidities were recorded with high specificity, but their sensitivities were low and variable across different diseases. 8e13 Studies have also shown that, despite accurate recording of major procedures, such as surgeries and invasive examinations, minor procedures not related to reimbursements were often under-reported. 14e16 However, literature on validation studies is sparse compared with the widespread application of databases, and administrative database research has often used non-validated diagnostic or procedural codes. 17 The Japanese Diagnosis Procedure Combination (DPC) database has been widely used in clinical epidemiology studies. 18e21 The DPC data are unique in that distinctions are made between main diagnosis, comorbidities, and complications, and unlimited numbers of procedures and medications can be recorded. 22 In addition, the National Hospital Organization (NHO) introduced the Standardized Structured Medical Record Information Exchange (SS-MIX) standardized storage 23 to its hospitals, enabling daily laboratory data to be recorded. However, there have been no validation studies for either the DPC or SS-MIX data.
The aim of the present study was to evaluate the validity of diagnoses, procedures, and laboratory results recorded as DPC and SS-MIX data. We conducted a multicenter validation study in NHO hospitals using chart review results as reference standards.

Data source
In Japan, the DPC-based lump-sum payment system was introduced in acute-care hospitals nationwide from 2003. 24 The DPC data used for payments include patient demographics and selected clinical information, admission and discharge statuses, diagnoses, surgeries and procedures performed, medications, and special reimbursements for specific conditions. Diagnoses are recorded using International Classification of Diseases, Tenth Revision (ICD-10) codes by the attending physicians. Suspected diagnoses are allowed to be recorded, but are designated as such. There are six categories of diagnoses, each with a limited number of recordable diseases. One diagnosis each is coded for "main diagnosis", "admissionprecipitating diagnosis", "most resource-consuming diagnosis", and "second most resource-consuming diagnosis". A maximum of four diagnoses each can be coded for "comorbidities present at time of admission" and "conditions arising after admission". All procedures performed during hospitalization are recorded according to the Japanese fee schedule for reimbursement.
The NHO was established in 2004 to take over the management of the national hospitals. As of October 2014, there were 143 hospitals nationwide run by the NHO, including both general acutecare hospitals and specialized long-term-care hospitals. Fifty-four hospitals from 35 prefectures had implemented the DPC-based payment system, and the mean number of acute-care beds in these 54 hospitals was 410 (range, 135e730). All NHO hospitals provide administrative claims data to the Medical Information Analysis (MIA) databank, which is managed by the Clinical Research Center at NHO Headquarters. In NHO hospitals with implementation of the DPC-based payment system, the DPC data are also stored in the MIA databank. In addition, the NHO preliminarily introduced the SS-MIX standardized storage 23 to its hospitals in 2013. The SS-MIX storage enables medical chart information from different vendors, including daily laboratory data, to be recorded in a standardized manner. In the SS-MIX storage, laboratory data are specified using JLAC-10 codes. The flow of data is shown in Fig. 1.
We conducted the present study on patients admitted to four acute-care NHO hospitals with implementation of both the DPCbased payment system and SS-MIX storage. The four hospitals were a 380-bed hospital in Shizuoka Prefecture, a 280-bed hospital in Kochi Prefecture, a 380-bed hospital in Fukuoka Prefecture, and a 420-bed hospital in Saga Prefecture. Laboratory data collected from the SS-MIX storage at each hospital and DPC data extracted from the MIA databank were compared with chart review results.

Study population and variables
Among the patients aged 18 years on admission who were eligible for the DPC-based payment system and admitted and discharged between April 1, 2014 and March 31, 2015, we randomly selected 100 patients from each hospital, aiming to conduct 400 chart reviews in total.
The items examined in the study are presented in Table 1. The 17 diagnoses were diseases used for calculating the Charlson comorbidity index (CCI), 25e27 which is widely used for risk adjustment in administrative database research studies. We also evaluated whether 10 specific procedures were performed on the day of admission. These procedures were selected from those used to   calculate another index of severity, as identified by a previous study. 28 We excluded blood examinations and drug use, as well as procedures that were rarely conducted (<5% among the 400 patients by searching the MIA databank before the chart reviews). The authors did not see the frequencies of the remaining 10 procedures until the chart reviews were complete. In addition, we examined the data of 13 laboratory tests performed on the day of admission. When multiple tests of the same item were conducted on the admission day, we selected the earliest one.

Chart reviews
We conducted chart reviews in the four hospitals from November 2015 through February 2016. Two authors (HY1 and MM) independently conducted chart reviews of the cases and identified whether patients had each of the 17 Charlson diseases, either as a primary diagnosis or a comorbidity present on admission. When chronic complications of diabetes were not documented, diabetes was classified as "diabetes without chronic complications". The reviewers also identified whether each of the 10 procedures were performed on patients on the day of admission. When discrepancies arose, the two reviewers re-reviewed the charts and resolved the discrepancies through discussion. The inter-reviewer agreements for the 17 diseases before discussion in the chart reviews were evaluated using kappa coefficients and categorized as near-perfect (0.81e1.00), substantial (0.61e0.80), moderate (0.41e0.60), fair (0.21e0.40), and poor (0.00e0.20). Laboratory data were obtained by one author (MK) in one hospital and an assistant in three hospitals. The chief investigator (HY1) was consulted when questions arose about which laboratory data to use. The review process took 10e15 min per patient on average.

Data extraction from databases
Patient demographics, diagnoses, and outcomes were extracted from the MIA databank. The 17 Charlson diseases were identified using the coding algorithms of Quan et al. 27 A diagnosis was considered primary when it appeared as "main diagnosis" or "admission-precipitating diagnosis" and as a comorbidity when it appeared in "comorbidities present on admission". Suspected diagnoses were excluded. In addition to the diagnoses, we searched for metastatic and recurrent malignancies using the TNM classification and primary/recurrent malignancy information recorded as clinical information. Records of procedures performed on the day of admission were extracted from the MIA databank using the codes presented in Table 1.

Statistical analysis
The frequencies of the 17 Charlson diseases (as either a primary disease or a comorbidity) were identified by the chart reviews and DPC data. The sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of the DPC data were calculated, accepting the chart review results as the reference standard. In addition to the original diagnoses, we combined the following diagnoses and assessed the resulting validity: diabetes (with or without chronic complications) and liver disease (mild, moderate, or severe). We also tested the validity of identifying metastatic tumors using clinical information in the DPC data (M1 in the TNM classification or recurrent cancer). The CCI values were calculated from the 17 diagnoses using the newly validated weight assignment by Quan et al, 29 and the changes in the CCIs obtained using the different data sources were assessed.
The overall false-positive and false-negative cases for the 17 Charlson diseases in this study were defined as those with one or more false-positive diagnoses and one or more false-negative diagnoses, respectively. To evaluate the effect of recordable number of comorbidities in the DPC data, we categorized patients by the number of comorbidities recorded in the DPC data (0e3 vs. 4) and compared the overall false-positive and false-negative rates between the two groups.
Furthermore, we evaluated the validity of the primary diagnosis in the DPC data for identifying the Charlson diseases. Here, sensitivity was defined as the proportion of patients with one of the 17 Charlson diseases as the primary diagnosis by chart review who had the same diagnosis recorded in the DPC data. Specificity was defined as the proportion of patients without any of the 17 diseases as the primary diagnosis who also had none of these diseases recorded as the primary diagnosis in the DPC data.
The sensitivity, specificity, PPV, and NPV of the procedure records in the DPC data were calculated in a similar manner to the diagnoses, placing the chart review results as the reference standard. Laboratory data in the SS-MIX data were considered accurate if the value recorded in the SS-MIX data equaled that in the chart review, or if the SS-MIX data correctly identified that no testing of the item was conducted on the admission day.
Comparisons of categorical variables were conducted using the chi-square test, and a two-sided P value of <0.05 was considered significant. Statistical analyses were performed using IBM SPSS for Windows, version 21.0 (IBM Corp., Armonk, NY, USA).

Standard protocol approvals, registrations, and patient consent
The study protocol was approved by the Central Ethics Review Board of the NHO, which deemed that written informed consent from participants was unnecessary. An announcement about the study and the possibility for participants to opt out of the study was made through the Internet.

Overview and patient characteristics
We were able to conduct 80 chart reviews in three hospitals and 75 chart reviews in a fourth hospital (315 patients analyzed in total) due to time constraints. Cultures for methicillin-resistant Staphylococcus aureus were performed in most patients in one hospital, and vital signs could not be obtained from the charts of another hospital. Therefore, we excluded the former hospital from analyses of bacterial culture, and the latter hospital from analyses of heart rate/respiration monitoring and pulse oximetry. No patients had human immunodeficiency virus infection.
The mean age of the 315 patients was 66.8 (standard deviation [SD], 16.7) years, 183 (58.1%) were male, and 15 patients (4.8%) died during hospitalization. The inter-reviewer agreements for diagnoses before discussion by the reviewers are presented in Table 2. Agreement was near-perfect for six diagnoses, substantial for seven, moderate for two, and fair for one. Table 2 also presents the frequencies of 16 diseases in each hospital. Malignancy was the most commonly identified disease in all hospitals, and there was variation in disease frequencies across the four hospitals.

Validity of diagnosis
The frequencies of the Charlson diseases and validity indices for the DPC data are presented in Table 3. With the exception of peptic ulcer disease, the frequencies were lower in the DPC data than in the chart reviews. Sensitivity ranged from 0% (hemiplegia or paraplegia) to 83.5% (malignancy) and was lower than 50% for seven of 17 diseases, while specificity was above 96% for all diseases. PPV ranged from 23.1% (peptic ulcer disease) to 100% (dementia and moderate or severe liver disease, both with specificity of 100%), and exceeded 80% for 9 diseases. NPV was above 90% for all diseases.
When combining two conditions in two diagnoses (liver disease and diabetes), we observed modest increases in sensitivity and specificity compared with the original diagnoses that only included less severe conditions. Identification of metastatic tumors using clinical information in the DPC data had higher sensitivity with slightly lower specificity compared with diagnosis-based identification. The mean CCIs calculated from the chart review results and DPC data were 2.2 (SD, 2.9) and 1.5 (SD, 2.4), respectively. Of the 315 patients, the CCIs calculated from the two data sources were equal to each other in 198 cases (62.9%), lower in the DPC data in 89 cases (28.3%), and higher in the DPC data in 28 cases (8.9%).
Of the 315 patients, the number of diagnoses recorded as comorbidities present on admission was zero in 50 cases (15.9%), one in 53 cases (16.8%), two in 63 cases (20.0%), three in 43 cases (13.7%), and four in 106 cases (33.7%). Overall, there were 53 patients (16.8%) with false-positive diagnoses for at least one of the 17 diagnoses, and 128 (40.6%) with one or more false-negative diagnoses. Compared with the patients with 0e3 diagnoses recorded as comorbidities, the patients with four diagnoses recorded as comorbidities had a higher overall false-positive rate (27.4% vs. 11.5%, p < 0.001) and a higher overall false-negative rate (51.9% vs. 34.9%, p ¼ 0.004).
According to the chart reviews, there were 123 patients (39.0%) with one of the 17 Charlson diseases as the primary diagnosis, and 192 patients (61.0%) without such a diagnosis. The sensitivity, specificity, PPV, and NPV of the primary diagnosis in the DPC data

Validity of procedures
The frequencies of conducted procedures and validity indices for the DPC data are presented in Table 4. Sensitivity was low for four procedures: heart rate/respiration monitoring (66.7%), pulse oximetry (21.1%), peripheral intravenous infusion (72.7%), and urinary catheter insertion (65.5%). For the other six procedures, sensitivity exceeded 90%. The DPC data were also highly specific, with the lowest of the 10 procedures having a specificity of 88.5% (pulse oximetry).

Validity of laboratory data
The agreements for laboratory data between the chart review results and the SS-MIX data are presented in Table 5. Based on the chart review results, the frequency of the conducted tests ranged from 18% (brain natriuretic peptide) to 63% (white blood cell count, hemoglobin, and creatinine). The SS-MIX data were accurate in over 95% of cases for all 13 tests examined.

Discussion
In the present study, we evaluated the validity of diagnoses, procedures, and laboratory data recorded as administrative data, using chart review results as the reference standard. As with any administrative data, there were some limitations to the recorded data. However, our results suggest that the DPC and SS-MIX data can serve as relatively accurate substitutes for clinical data in future studies.
The specificity of DPC diagnosis records in identifying the Charlson diseases was high, but the sensitivity was low and varied across conditions. Within the diseases examined, there seemed to be considerable under-reporting, while coding of conditions that did not exist (i.e., up-coding) appeared uncommon. Compared with previous validation studies, the sensitivity was similar or slightly lower in the DPC data. 8e13 Two reasons could account for these present findings. First, attending physicians, who are not professional coders, are obliged to record the diagnoses and may not be aware of the specific codes used to express conditions. For example, "hemiplegia or paraplegia" may not be recognized as a diagnosis, and codes within "diabetes" (with or without chronic complications) may be disregarded. Second, only two primary diagnoses and up to four comorbidities can be recorded in the DPC data, in contrast to up to 16 diagnoses in a database in Canada. 8e12 In the present study, patients with four recorded comorbidities had a higher overall false-negative rate compared with patients with three or less recorded comorbidities. Even when comorbidities are recognized, some of them may not be recorded when all four "slots" for comorbidities are occupied. Meanwhile, sensitivity (78.9%) and specificity (93.2%) were both high when limited to primary diagnosis.
A strength of the present study is the random selection of participants and inclusion of those without specific diseases, which enabled calculations of sensitivity and specificity. From our results, PPV and NPV can be estimated by assigning disease prevalences.  Under random selection, the PPV was acceptably high for most diseases, thus warranting diagnosis-based patient identification in future studies. By limiting the study population to patients with higher disease prevalences (e.g., elderly patients), the PPV would be higher. However, it should also be noted that patients identified using database diagnoses may be an underestimated or unrepresentative sample of the disease population. When we calculated the CCIs, the value derived from the DPC data tended to be lower than that derived from the chart review results. There were differences in the CCIs for 37.1% of the patients, including 28.3% with lower values when using the DPC data, which is consistent with previous studies. 9,10 An international comparison of databases also showed variation in the ability of the CCI to predict mortality, 29 which may partly arise through the characteristics of databases. Although the CCI is a commonly used index of comorbidity, its interpretation requires caution because the values may be biased compared with clinical studies or studies using different databases.
Cancer classifications were among the clinical information stored in the DPC data, and we tested their ability to correctly identify clinical conditions. A simple method of using M1 or recurrent tumor was accurate in identifying metastatic tumors (sensitivity: 73.2%; specificity: 96.0%). Such clinical information may serve as a useful substitute for diagnosis, especially when the number of recordable diagnoses is limited. Other clinical information in the DPC data included consciousness, New York Heart Association classification, and Hugh-Jones classification, which may also be used to identify clinical conditions. 30 We also evaluated the validity of procedure records in the DPC data. Although the 10 procedures in our study were minor, the DPC data identified most of them with high accuracy. This is in contrast to a previous study, wherein the sensitivities for X-ray scans, computerized tomography scans, insertion of indwelling urinary catheter, and venous catheterization were 0%, 0.5%, 0%, and 39.6%, respectively. 14 Only 10 procedures can be coded in the database used in the previous study, whereas there is no such limit in the DPC data. This difference could explain the different rates of underreporting. Heart rate/respiration monitoring and pulse oximetry can only be reimbursed when patients have severe conditions under continuous monitoring, but such criteria were difficult to define in the chart reviews and we included all patients in whom these two procedures were performed. This could have resulted in the low sensitivities.
The laboratory data recorded in the SS-MIX storage had excellent agreement with the chart review results. Instead of summarizing patient information for use in payment or database construction, the SS-MIX storage is designed to standardize and store electronic medical records themselves. 23 Although standardization of laboratory results using JLAC-10 codes in the newlyintroduced SS-MIX system was considered a possible challenge, the data were collected accurately. Our previous study utilized the SS-MIX storage in its preliminarily introduced hospitals, 31 but a largescale study remains to be conducted. SS-MIX data would be a useful source for future clinical epidemiology studies.
Several limitations of the present study should be noted. First, we used chart review results as the reference standard to assess the validity of the DPC and SS-MIX data. These reviews are dependent on the quality of the charts, and unrecorded information could not be captured. An ideal reference standard should identify a condition that is truly present in a patient. However, such information is difficult to obtain, and previous studies have also regarded chart reviews as the best available references. Also, the kappa coefficients for inter-reviewer agreement were low for some diagnoses, which poses a challenge when considering the chart review results as a reference. Second, we conducted the study in four hospitals within the NHO. Although we confirmed variation in the disease prevalences across the hospitals, it remains unclear whether the results can be extrapolated to other institutions. Importantly, these hospitals were early adopters of the SS-MIX storage, and it is possible that their data collection and recording are more accurate compared with other hospitals. Furthermore, there could be some features of the NHO hospitals that make their recorded data more reliable compared with other hospitals in Japan. One such example is lectures to health information administrators that aim to improve disease coding. However, other hospitals could also be conducting similar initiatives. Third, we assessed limited numbers of diagnoses, procedures, and laboratory data, and the validity of items that were not examined cannot be determined. Last, the number of participants was small, and the sensitivity estimates may be statistically unstable for some diseases with low prevalence.
The present study adds to the sparse literature of studies validating administrative data and can serve as an important basis for future studies using DPC and SS-MIX data. The results support the usefulness of diagnoses and procedure records within the DPC data, provided that the investigators using these data acknowledge their limitations and make appropriate interpretations. We also confirmed the validity of laboratory data in the newly-introduced SS-MIX storage, and the SS-MIX storage would add a considerable amount of information to future database-based studies.

Conflicts of interest
None declared.