Creating the BELgian COngenital heart disease database combining administrative and clinical data (BELCODAC): Rationale, design and methodology

BACKGROUND
Congenital heart disease (CHD) entails a broad spectrum of malformations with various degrees of severity and prognosis. Consequently, new and specific healthcare needs are emerging, requiring responsive healthcare provision. Research on this matter is predominantly performed on population-based databases, to inform clinicians, researchers and policy-makers on health outcomes and economic burden of CHD. Most databases contain data either from administrative sources or from clinical systems. We describe the methodological design of the BELgian COngenital Heart Disease Database combining Administrative and Clinical data (BELCODAC), to investigate patients with CHD.


METHODS
Data on clinical characteristics from three university hospitals in Belgium (Leuven, Ghent and Brussels) were merged with mortality and socio-economic data from the official Belgian statistical office (StatBel), and with healthcare use data from the InterMutualistic Agency, an overarching national organization that collects data from the seven sickness funds for all Belgian citizens. Over 60 variables with multiple entries over time are included in the database.


RESULTS
BELCODAC contains data on 18,510 patients, of which 8926 patients (48%) have a mild, 7490 (41%) a moderately complex and 2094 (11%) a complex anatomical heart defect. The most prevalent diagnosis is Ventricular Septal Defect in 3879 patients (21%), followed by Atrial Septal Defect in 2565 patients (14%).


CONCLUSIONS
BELCODAC comprises longitudinal data on patients with CHD in Belgium. This will help build evidence-based provision of care to the changing CHD population.


Introduction
Epidemiology of congenital heart disease (CHD) has changed over the past decades [1][2][3]. The global birth prevalence is estimated to be 9.4 per 1000 live births [4]. Although a slight increase in the birth prevalence was observed worldwide, this prevalence remained relatively stable in Western countries [4][5][6]. Improved treatment options have resulted in a better survival and consequently an increase of the prevalence of adult CHD, initially limited to mild/moderately complex lesions but more recently also of patients with a complex defect [7]. Consequently a new population of older patients (60y and beyond) with mild and moderately complex lesions is emerging, with specific healthcare needs [8,9].
These changes affect the healthcare system and challenge the design of responsive care models. Research focused on current epidemiological characteristics and the impact of healthcare provision and organization of care on clinical outcomes and costs at population level may help create coordinated healthcare provision for people with CHD that is costeffective and accessible, while yielding optimal clinical outcomes. Such research initiatives require large multicenter databases containing clinical data and information on healthcare use. Population-based CHDrelated databases are currently available in several countries [10][11][12][13][14][15][16][17][18][19]. Most databases contain data from either clinical systems or administrative sources. Though, they both have their shortcomings. Where data from clinical systems mostly lack outpatient information, data from administrative sources tend to be less reliable in terms of case finding. A solution would be the combination of data from clinical and administrative data sources.
We created a CHD-related database which encompasses demographic, clinical, economic and healthcare utilization data from Belgium, which could drive future healthcare services and outcome research in CHD. In this paper we describe the design and methodology of the BELgian COngenital heart disease Database combining Administrative and Clinical data (BELCODAC).

Setting
Belgium is a European country with a surface area of 30,528 km 2 and a population of 11.4 million, yielding a population density of 372 per square kilometer. It consists of three regions: Flanders, Wallonia and the Brussels Capital region, with 57.6%, 31.9% and 10.5% of the country's population, respectively [20]. The Belgian population is predominantly white Caucasian with a life expectancy at birth of 81.5 years for the whole population in 2018 (83.7 years for women and 79.2 years for men) [21]. The main causes of death are malignancies in men (29.9%) and diseases of the circulatory system in women (29.6%) [22]. The average fertility rate is 1.61 child per woman in 2018. There are 3.3 physicians and 5.7 hospital beds per 1000 inhabitants [23,24]. The healthcare system is characterized by high accessibility and compulsory healthcare insurance according to the Bismarck model [25]. In this model, employers and employees fund the healthcare insurance through compulsory payroll deduction, providing access to sickness funds regardless of pre-existing conditions. Sickness funds are public institutions and service costs are governmentally controlled. Costs are covered by the health insurance organizations (i.e., sickness funds) and partly by patients (i.e., out-of-pocket). Four centers in Belgium perform CHD surgery, of which two are located in Flanders (i.e., University Hospitals Leuven and Ghent University Hospital), and two in the Brussels Capital region (i.e., Cliniques Universitaires Saint-Luc and Queen Fabiola Children's University Hospital) serving Brussels and Wallonia.

Study population
Patients were eligible for inclusion in BELCODAC when diagnosed with a CHD, defined as "a gross structural abnormality of the heart and/ or intrathoracic great vessels that is actually or potentially of functional significance" [26], and had at least one visit at one of the participating hospitals throughout their lifetime. As the merging of the different data sources required the Identification Number of the National Register (INNR) as a unique patient identifier, only patients for whom this number was available in the hospital information system could be included. Patients with very mild defects including Patent Foramen Ovale (PFO) and open ductus of Botalli requiring no intervention, acknowledged spontaneously closed Atrial Septal Defect (ASD) or Ventricular Septal Defect (VSD) and isolated mild peripheral pulmonary stenosis, were excluded.

Data sources
BELCODAC combines data collected from different sources: the participating hospitals; the InterMutualistic Agency (IMA); and StatBel (see supplementary Fig. 1). Patients were identified through an analysis of medical patient records from three different hospitals in Belgium: University Hospitals Leuven, Ghent University Hospital and Cliniques Universitaires Saint-Luc.

University hospitals Leuven
University Hospitals Leuven established a digital clinical database in 1992. Data for all outpatient visits at the pediatric cardiology center or adult CHD program were prospectively collected. Data from 1970 to 1992 was retrospectively obtained from patient records.

Ghent University Hospital
Ghent University Hospital prospectively collected data in their clinical collection system since 2002 for pediatric patients, which was adopted by the adult CHD clinic in 2004. In 2013, the adult CHD outpatient clinic started their own system to collect data for patients in follow-up. Historical data from adult CHD clinic outpatients were retrospectively collected up to 2008. Most variables were coded in the hospital information system.

Cliniques Universitaires Saint-Luc
Cliniques Universitaires Saint-Luc implemented a digital system for patient identification and clinical data collection in 1997. All pediatric cardiology outpatient visits are entered into this system.
For all three hospitals, uncertain elements were cross-checked with the patients' medical files. Missing information was obtained through a manual search of these medical files. A codebook was used to ensure uniform coding throughout the different hospitals providing clinical variables.

InterMutualistic agency
We obtained data on healthcare use and pharmacological data from the seven Belgian health insurance organizations. Because health insurance is compulsory, almost every citizen holds a membership at one of these seven sickness funds, which manages healthcare payments and reimbursements. Around 1% of citizens in Belgium does not hold a membership to a sickness fund (e.g. homeless people). This proportion is slightly higher in Brussels Capital Region (1.9%) and lowest in Flanders (0.5%) [27]. The InterMutualistic Agency (IMA) is a national overarching organization that collects data from these seven organizations for all Belgian citizens. IMA started data collection in its current form in 2002 and performs an update every six months. Because data during the database start-up (2002)(2003)(2004)(2005) was incomplete, data from 2006 onwards was included in the final database. Furthermore, because reimbursement can be requested from the sickness fund up to 2 years after provision of the health care service and data was requested at the end of 2017, we could only get reliable information until 2015. As this system is used for reimbursement purposes, the data on healthcare utilization is highly accurate. BELCODAC ultimately contains data on healthcare use, available for the period from 2006 to 2015.

StatBel
Date and cause of mortality was obtained from StatBel, the Belgian statistical office. StatBel obtains physician-reported death certificates for every deceased person in Belgium as of 1991. Mortality data were linked to the INNR of Belgian citizens. StatBel additionally provided socio-economic and demographic data, which were obtained from the municipalities and the censuses of 2001 and 2011 [20].

Variables
Over 60 variables were included in the BELCODAC database (see Table 1). Most variables contain multiple entries over time. Several variables, such as date of birth, were validated across data sources. Included variables were selected through consensus between physicians specialized in CHD care in adult as well as pediatric populations; nurses specialized in CHD care; academics experienced in CHD research; and health economists experienced in merging administrative and clinical databases.
BELCODAC comprises data in six main categories: (i) sociodemographic data; (ii) medical data; (iii) mortality data; (iv) pharmacological data; (v) healthcare use data; and (vi) financial data (see Table 1). Socio-demographic data comprise date of birth, sex, place of residence, educational level, marital status, profession, periods of emigration, household type and size and social status. Medical data were obtained through the hospitals. This category contains information on diagnosis, coded via an adapted version of the CONCOR (CONgenital CORvitia) classification [28], level of anatomical complexity based on the Bethesda classification [29]. Other relevant existing comorbidities can be derived from IMA data through the occurrence of certain healthcare provisions or combinations of healthcare use. Mortality data contains information on the date and cause of death. Pharmacological data covers prescribed medication use and prescribed therapies with product code and the dosage, the date at which it was described and who prescribed the therapy. Healthcare use data contains variables on the type, date and frequency of the patients' contact with the healthcare system. Financial data comprises variables relevant in determining the cost of the healthcare use for the patient and the government. Detailed information on the format of these variables is presented in supplementary Table 1.
Additional variables can be generated through the presence (or combination of) certain data. For example, information on the presence of infective endocarditis can be generated by combining data on duration and type of prescribed antibiotics with hospitalization occurrence.

Merging of data from different sources
To merge and combine the information from the five sources, patients' INNR were sent to a Trusted Third Party (TTP) from eHealth, a governmental agency. This TTP provided a Random Number (RN) for each patient, enabling the hospitals to send the clinical data on RN to a second TTP. After a second coding, performed by the second TTP from the Crossroads Bank for Social Security (CBSS), data is merged with data from IMA and StatBel. Ultimately, a quadruple coding system (see supplementary Fig. 1) ensured a coded database where no single party held all of the respective keys enabling identification of individual patients. The entire process was supervised by the Data Protection Authority of Belgium. Data are stored on a secured server, accessible through a Virtual Private Network (VPN) connection with a Secure Socket Layer (SLL) certificate.

Ethics and privacy commission
Approval of the ethics committee from University Hospitals Leuven, Ghent University Hospital and Cliniques Universitaires Saint-Luc was obtained, and the ethics committee of Leuven acted as central ethics committee (file numbers S59859 and S59858). The Data Protection Authority reviewed the data collection process rigorously. At this governmental level, approval was obtained from three separate committees: the Statistical Supervisory Committee; the Sectoral Committee of Social Security and of Health; and the Sectoral Committee of the National Register. A small cell risk analysis was performed on the final set of requested variables at request of the Sectoral Committee of Social Security and of Health. In going through these steps, this project contains the necessary elements to link data of administrative institutions and medical facilities in accordance with General Data Protection Regulation (GDPR) guidelines.

Baseline data
We included 18,510 eligible CHD patients through examination of the hospital specific clinical databases, contributing 149,048 patient years, 15,808,314 healthcare use records with 3,305,989 unique contact dates and 48,201 healthcare facility admissions. There were 174,617 individual visits with a cardiologist. Between 2006 and 2015, we registered 721 childbirths in 521 women. Furthermore, 719 deaths occurred within the observation period.
Of these 18,510 included patients, 13,748 (74%) were registered at Leuven University Hospitals, 4274 (23%) at Ghent University Hospital and 488 (3%) patients were obtained at Cliniques Universitaires Saint-Luc. Of these patients, 277 had at least one visit in two of the participating hospitals. Such patients were assigned to the hospital in which most visits were recorded. Fig. 1 shows the distribution of the BELCODAC population across birth cohorts, stratified per center. Most patients were born between 2000 and 2009 (24%). Our cohort contains 8926 patients (48%) with a mild defect, 7490 (40%) with a moderately complex defect and 2094 (11%) with a complex defect.
The most prevalent diagnosis in this cohort was VSD in 3879 patients (21%), followed by ASD type 2 in 2565 patients (14%). Within the group of moderately complex defects, the most prevalent diagnoses were pulmonary valve abnormality (n = 1739, 9%) and coarctation of the aorta (n = 1233, 7%). Within the group of complex defects, Transposition of the Great Arteries (TGA) was most prevalent (n = 751, 4%). Distribution for the different primary diagnoses in the database can be found in Table 3 according to the CONCOR classification [28].
We registered 22,524 cardiac procedures, obtained from the clinical databases. Thirty-one% of patients had at least one surgical procedure, 15% had one or more catheter interventions and 12% underwent both. The most common procedure was VSD closure occurring 1517 times, followed by ASD closure (n = 1340), trans-catheter ASD closure (n = 1178), coarctectomy (n = 1083) and trans-catheter balloon dilation of the pulmonary valve (n = 1058). There were 602 pacemaker implantations.

Discussion
We designed and developed BELCODAC, encompassing clinical data and information on healthcare utilization and outcomes in patients with CHD. This database aims to inform clinicians, researchers and policymakers regarding the clinical outcomes and economic burden of healthcare provision for patients with CHD at all life stages. Additionally, BELCODAC exemplifies the inclusion of variables from different sources obtained longitudinally.
Within the field of CHD research, several registries and administrative databases already exist [30][31][32]. Many of the existing databases were originally created for reimbursement purposes yet provide a great amount of data on patients across geographic regions and centers, enabling epidemiological and outcome research. A recent review identified 12 administrative databases used for research on adults with CHD [33]. Although all used for CHD research, the only ones that were designed for CHD specifically were the Québec CHD database [12] and the Danish Public Registries [11]. Aside from these databases, several registries currently exist for CHD [31]. These registries prospectively collect data on a group of patients. Examples of such registries are the Dutch CONCOR registry [28], the German national register for CHD, the Guangdong Registry of Congenital Heart Disease in China [34] and SWEDCON in Sweden [35]. Registries typically miss details as they are designed for monitoring performance. Additionally, they may be subject to selection bias when they are based on voluntary enrolment or informed consent [33].
CHD databases in countries with a Bismarck healthcare model, which do not suffer from the bias of socio-economic status, are sparse. The only existing registries that use this particular model are based on voluntary enrollment [28,36].

Comparison with other studies
Compared to a study examining the overall prevalence of heart defects in the population of Quebec, Canada [37], we found lower rates of ASD prevalence (21% in Quebec vs. 15% in our cohort) and higher rates of coarctation of the aorta (2% in Quebec vs. 7% in our cohort). When comparing to data from Germany [38], data were more in line with ours. Differences could be attributed to the average age of the cohort as well as the definition of certain defects. Overall, we see very similar results, indicating a sample representative for the population with no major differences between our cohort and the overall CHD population outside of Belgium.

Strengths of BELCODAC
BELCODAC contains a large population of patients with CHD suitable for epidemiological research. The longitudinal nature of BELCODAC data provides detailed information on healthcare use, clinical complications and mortality data throughout all life stages and settings. Most existing databases lack out-of-hospital data and data on mortality, providing only partial information on healthcare utilization and outcomes [33]. The use of data from national health institutes in BELCODAC assures that data are not limited to in-hospital contacts and events or to a regional level. Additionally, using data from the hospitals provides more detailed and accurate data concerning the medical variables such as diagnosis, intervention type and complications. For several variables, data are cross-validated between clinical and administrative data, resulting in more reliable data.
Since patients are included in BELCODAC without preselection, this database is not subject to the same level of ascertainment bias as registries that include only a subgroup of patients. BELCODAC might therefore provide useful insights for those patients that are often missed, such as mild defects or patients who are lost to follow-up. Furthermore, it enables us to add socio-economic and demographic data, providing a wide array of variables to determine possible confounders.
Most of the existing databases use the International Classification of Diseases and Related Health Problems (ICD) ninth or tenth revision codes to identify patients [33]. This coding may be susceptible to misclassifications when coded by administrative personnel or physicians who lack understanding of the ICD-terminology and may lack sufficient detail to document complex physiology [33,39]. Furthermore, since the ICD-coding lacks specific CHD subtypes, such as a Norwood procedure, ICD coding is not the ideal classification for CHD. We were able to opt for the CONCOR-hierarchy to classify the heart defects, specifically designed for CHD research [28].
The coding list for cardiac interventions was developed by the researchers collaborating on BELCODAC. Involving professionals in data collection is important to create a valuable database [40]. Whereas other databases often have in-hospital administrative sources responsible for the coding of clinical data, ours occurred through collaborations and discussions between healthcare professionals, resulting in clinical data representative for the patients' condition as depicted by the physicians in the hospital.
Although this database is retrospectively designed, it is possible to gather additional information in the future on this cohort enabling future analyses when deemed necessary.

Limitations of BELCODAC
Firstly, BELCODAC is not suitable for in-depth analysis of the patients' physical status, due to shortcomings regarding the depth of the data. For instance, physical examinations or parameters of echocardiography performed by physicians at outpatient visits, are not included. In addition, information on non-reimbursed healthcare use is lacking (e.g. dietician visits and some forms of psychotherapy). Over-the-counter (OTC) medication is not included in this database. However, OTC medication constitutes a small proportion of medication use in Belgium, mainly comprising simple pain medication, nasal decongestive, vitamins, digestive medication and skincare. All types of cardiovascular medication are prescription-based.
Secondly, Flemish patients are overrepresented in BELCODAC. We included complex CHD patients from only 1 French-speaking hospital. No differences between Flemish and Walloon patients in terms of complexity, treatment or clinical outcomes are expected, since patients benefit from the same healthcare model and have the same genetic background.
A third limitation is the possibility of selection bias towards more complex defects, seeing as all three included hospitals are university hospitals. Although every patient with at least one visit in one of the participating hospitals was included, patients with mild CHD without the need for referral to an academic hospital are not included in BELCODAC.
Fourthly, although StatBel reliably captures mortality, physicianreported cause of death may be less reliable [41]. Inaccuracies between  [29], with the timeline of data entry at each hospital. actual cause of death and data from national registries occurs due to incomplete death certificates; diagnostic issues; situations where cause of death is ambiguous or cases where death had multiple causes. These uncertainties could be present and need to be considered when performing research on cause of death using BELCODAC.

Governance
Combining data from clinical and administrative sources as shown in BELCODAC can provide meaningful insights for researchers aiming to improve healthcare quality and cost. Therefore, the methods used to develop this merged database are pivotal for future development of datasets. A first step in this process, was the setup of a collaboration between universities, healthcare providers from the hospitals and governmental agencies. A second step was the formation of agreements regarding data collection, data storage, ownership, access and analysis. These agreements were made in close collaboration with all involved partners, under supervision of the legal offices from the universities and the collaborating hospitals and with the national Data Protection Authority. Third, the format and extend of the data needed to be agreed upon. For this we developed arguments for the necessity of each variable included in the database. This was scrutinized in a small cell risks analysis and evaluated by the Data Protection Authority. As a fourth step, the data collection, each partner collected the required variables and recoded these to fit the pre-established and agreed upon codebook. The data was then encrypted via a triple coding system as shown in Supplementary Fig. 1. Lastly, we created a publication committee, in which every participating university and hospital is represented. This committee supervises data management, data storage and authorizes analysis and publication of the data. These steps ultimately resulted in the BELCODAC database, where we successfully combined different data sources to form a rich database for patients with CHD.

Conclusions
BELCODAC is designed to contain longitudinal data on patients with CHD in Belgium. Linking administrative and clinical data in patients with CHD created a valuable foundation for research on clinical outcomes, healthcare use, organization of care and healthcare costs. The information obtained through this type of health services research will foster an evidence base for the development of healthcare provision adapted to the changing needs of the CHD population.