Overview of the development process
The project followed a process based on a modified Delphi technique (the Rand Corporation (RAND)/University of California, Los Angeles (UCLA) appropriateness method)[19], which has been widely used to develop healthcare QIs[20]. The method integrates an evidence review, a face-to-face multidisciplinary panel meeting, and repeated anonymous ratings for consensus building. The development and ratification of QIs are depicted in Fig. 1.
Multidisciplinary panel
Panel members, comprising seven healthcare professionals—five paediatricians, one obstetrician, and one public health specialist, selected the conditions for QI and consensus development. The sampling strategy was a non-random selection aimed at seeking participants who would be informative, with recommendations by DS and approval by TN and HK. We selected members who had held any one of the following responsibilities: being a board or committee member of the medical academic society, engaged in guideline development in selected conditions, and having outstanding research achievements in related research fields.
Selection of conditions for QI development
Four categories (patient safety, general paediatrics, advanced paediatrics, and advanced obstetrics), followed by nine conditions regarding paediatric and perinatal care, were identified based on published research, the burden of disease, frequency of presentation, and national priority areas. These included conditions with high-prevalence, such as paediatric bronchial asthma, neonatal respiratory care, and caesarean section. These categories and conditions were identified through discussions at the panel meeting. Two conditions among these retain two sub-conditions: 1) “Rare Diseases” consisting of acute lymphoblastic leukaemia and congenital diaphragmatic hernia, and 2) “Acute Abdomen” consisting of intussusception and appendicitis.
Initial literature search (systematic search for evidence)
Because de novo development of evidence-based QIs is costly and time-consuming, methods using existing clinical practice guidelines (CPGs) have gained interest as viable alternatives[21]. Thus, for selected paediatric and perinatal conditions, we retrieved existing CPGs and QIs available in English or Japanese while searching only one medical literature database (PUBMED) for these conditions. In 2020, we searched the literature published between April 2010 and March 2020. We also searched for CPGs using the selected conditions in the following QI databases: Agency for Healthcare Research and Quality (United States), National Quality Forum (United States), National Institute for Health and Care Excellence (NICE, United Kingdom), Canadian Medical Association, Australian National Health and Medical Research Council, Minds with the Japan Council for Quality Health Care, and the National Hospital Organization (Japan). Websites regarding these selected conditions, such as those of the European Paediatric Association, were also reviewed for CPGs. The website search was limited to English and Japanese settings. Furthermore, we manually searched to identify literature that might be relevant to this study.
Indicator development
Recommendations in the selected guidelines were extracted from the CPGs. Each recommendation was screened for eligibility from the viewpoints of 1) strength of recommendation (relatively strong in each condition), 2) validity and adequacy in actual clinical practice in Japanese settings, and 3) feasibility of defining indicators using administrative databases. Recommendations that did not match the above-mentioned three criteria were excluded. Statements without any recommended action were excluded. The remaining recommendations were then converted into a standardised indicator format using the modified American College of Cardiology/American Heart Association methodology[22]. Existing QIs were also converted into a standardised indicator format. We designed QIs based on the format of the Japanese administrative database.
Subcommittees for each condition
An expert coordinator was appointed to review the proposed indicators under each condition. The subcommittees consisted of five experts recruited to undertake a review of the proposed indicators. Three experts participated in the first-step ratings, and two experts participated in the second-step ratings. They assessed the proposed indicators based on their selected subcommittees’ conditions, while members of the multidisciplinary panel were responsible for assessing all proposed indicators.
Expert consensus process
The proposed indicators were reviewed and ratified by experts from subcommittees in each condition and a multidisciplinary panel with two-step, two-rounds of independent ratings. Three experts from subcommittees reviewed the proposed indicators in the first-step rating, while nine members (two experts from subcommittees (in each condition) and seven panel members) reviewed the indicators in the second-step rating. During each step/round, members rated the appropriateness of each QI candidate on a 9-point scale, where 1 and 9 represented “least suitable” and “most suitable,” respectively. QIs were adopted according to the following criteria: the median individual rating in each round/step was > 7, the number of members who gave a rating of < 3 was one or fewer in the first-step rating, and the number of members who gave a rating of < 3 was two or fewer in the second-step rating. In addition, members were given the opportunity to provide comments or suggest additional candidates. In round 1, members individually evaluated indicators with a set of documents that described the QIs adapted from nine domains: evidence-based, interpretable, actionable, denominator, numerator, validity, reliability, feasibility, and overall assessment[22]. In round 2, they convened for a web-based or face-to-face meeting to discuss, revise, and individually evaluate the proposed indicators, anonymously sharing their results from the first round. If additional candidates were presented after the meeting, they discussed them via email using the same postal questionnaire as mentioned above.
Pilot practice test for feasibility and adaptability
We used data from the Japanese Administrative Database, the Diagnosis Procedure Combination (DPC) per-diem payment system (details of the DPC system have been described elsewhere)[23]. In brief, the DPC is a case-mix patient classification system linked to payments at acute- and mixed-care hospitals in Japan. Anonymous clinical and administrative claims data were also included in the database. Clinical data comprised baseline patient information, diagnosis (based on ICD-10), and detailed medical information, including all major or minor procedures, medication, and device use.
We collected DPC data from the National Center for Child Health and Development, which is the only national children’s hospital, between April 2018 and March 2019 (fiscal year 2018). We further collected data on QIs for rare diseases between April 2019 and March 2021, considering the small number of patients. For each indicator, percentage scores (QIs) were calculated as follows: number of times the indicator was met / number of participants (excluding those who had obvious reasons for not implementing the process as defined by the indicator) ×100. The medians of the indicator scores were also computed as the overall quality score of the program. To ensure feasibility, we further checked the data in cases where the percentage scores were lower than 5%. Data processing was performed using the Microsoft SQL server (Microsoft Corporation, Redmond, WA, USA).