Divergences between healthcare-associated infection administrative data and active surveillance data in Canada

Background: Although Canada has both a national active surveillance system and administrative data for the passive surveillance of healthcare-associated infections (HAI), both have identified strengths and weaknesses in their data collection and reporting. Active and passive surveillance work independently, resulting in results that diverge at times. To understand the divergences between administrative health data and active surveillance data, a scoping review was performed. Method: Medline, Embase and Cumulative Index to Nursing and Allied Health Literature along with grey literature were searched for studies in English and French that evaluated the use of administrative data, alone or in comparison with traditional surveillance, in Canada between 1995 and November 2, 2020. After extracting relevant information from selected articles, a descriptive summary of findings was provided with suggestions for the improvement of surveillance systems to optimize the overall data quality. Results: Sixteen articles met the inclusion criteria, including twelve observational studies and four systematic reviews. Studies showed that using a single source of administrative data was not accurate for HAI surveillance when compared with traditional active surveillance; however, combining different sources of data or combining administrative with active surveillance data improved accuracy. Electronic surveillance systems can also enhance surveillance by improving the ability to detect potential HAIs. Conclusion: Although active surveillance of HAIs produced the most accurate results and remains the gold-standard, the integration between active and passive surveillance data can be optimized. Administrative data can be used to enhance traditional active surveillance. Future studies are needed to evaluate the feasibility and benefits of potential solutions presented for the use of administrative data for HAI surveillance and reporting in Canada.


Introduction
Each year, many Canadians acquire an infection during their hospital stay that increases morbidity and mortality, and that bears a financial cost to the healthcare system (1). These healthcare-associated infections (HAI) are preventable, measurable, and are the most frequently reported adverse event in healthcare worldwide. Every year, it is estimated that 220,000 Canadian patients develop a HAI (2). Many HAIs are now caused by antimicrobial resistant organisms (AROs), which make them difficult to treat. The Public Health Agency of Canada (PHAC) estimates that approximately 2% of patients admitted to large, academic Canadian hospitals will acquire an infection with an ARO during their hospital stay (3). Surveillance, including monitoring and reporting of HAI, is a critical component of infection prevention and control and needs to be strengthened at the national level. Although coronavirus disease 2019 (COVID-19) did not originate as a HAI, the current pandemic has revealed how critical it is to have reliable and consistent data in order to formulate an effective response to infection. When asked to provide projections regarding the course of COVID-19 virus, Prime Minister Trudeau said that "….the inconsistency in the data from across Canada is part of the delay in offering a nationwide picture" (4).
In Canada, PHAC collects national data on multiple HAIs through the Canadian Nosocomial Infection Surveillance Program (CNISP); a program established in 1994 as a partnership between PHAC, the Association of Medical Microbiology and Infectious Disease Canada and sentinel hospitals from across Canada (5). The objectives of CNISP are to provide national and regional benchmarks, identify trends on selected HAIs and AROs, and provide key information to help inform the development of federal, provincial and territorial infection prevention and control programs and policies (5). At present, the CNISP network comprises 87 acute-care sentinel hospitals from ten provinces and one territory. The network's goal is to have all Canadian acute care hospitals adopt the CNISP HAI surveillance definitions and contribute data to the national surveillance system (2). Despite the desire to expand the surveillance program, CNISP is limited 1) by funding capacity, 2) by lack of human resources available to participate in national surveillance (2) and 3) because most hospitals already report to their provincial government and are unwilling to enter data twice. As a result, CNISP HAI rates may not provide a complete picture and some segments of the Canadian hospital population are underrepresented-such as smaller, community hospitals (6).
National statistics reported by PHAC relating to HAIs only include data from hospitals that participate in CNISP as they all follow standardized case definitions, methods and case reporting. Currently HAI rates reported by provinces and territories or posted by individual hospitals cannot be combined as case definitions, methods of data collections and calculation of rates vary from hospital to hospital and between provinces and territories (2). Active surveillance is done by Infection Prevention and Control (IPC) practitioners and each province, territory, administrative region or hospital can determine their own surveillance protocols based on local epidemiology and resources, making it difficult to evaluate improvement efforts and compare HAI rates in Canadian hospitals (7).
On the other hand, Canada has a wealth of administrative health data including insurance registries, inpatient hospital care, vital statistics, prescription medications and electronic health record system (8). Exploring the potential of integrating these diverse administrative health data sets could provide a more robust picture of HAIs across Canada.
The hospital discharge abstract database (DAD), housed at the Canadian Institute for Health Information (CIHI), collects demographic and clinical information from patient discharge summaries from all acute care facilities in Canada, except in Québec (Québec has its own discharge abstract database-Maintenance et exploitation des données pour l'étude de la clientele hospitalière (MED-ÉCHO)-that reports to CIHI's Hospital Morbidity Database) (9). Information is entered in the database by professional coders from all hospitals and is used by CIHI to produce data and analytic reports. The CIHI's Data and Information Quality Program is recognized internationally for its high standard (10). However, discharge summaries are not standardized across the country and reflect only what is entered into the summary by the attending physician. The CIHI could, however, be a potential partner to support data collection and reporting of HAIs for acute care hospital. We conducted a scoping review to identify existing gaps between administrative data and active surveillance data for healthcare-associated infection surveillance and to propose possible integration strategies to optimize data.

Research question
The main research question was "What are the discrepancies between HAI administrative data and active surveillance data in Canada?". The research sub-questions were: Are administrative data valid for HAI surveillance? For each type of HAI, what are the discrepancies between administrative data and hospital surveillance data? We performed this scoping review following the PRISMA extension for scoping review (11).

Relevant literature
We performed a search developed in collaboration with a medical research librarian. The inclusion criteria consisted of articles evaluating passive surveillance of specific or various HAIs in Canada. We included articles (qualitative, quantitative and mixed-method studies) published between 1995 and November 2, 2020 in Canada. The search strategy contained terms relative to location (Canada), surveillance, data source and HAI. In addition, we performed a second search with the same terms (except for the location) and only including systematic reviews.
A pilot selection process was carried out to identify databases with relevant studies and three electronic databases were searched: MEDLINE, EMBASE and Cumulative Index to Nursing and Allied Health Literature (CINAHL), in English and French with no date restriction. The search strategies were created on MEDLINE then adapted for the other databases SCOPING REVIEW (Supplemental Data S1). After deduplication, two reviewers independently screened citations by title and abstract. Selected articles were evaluated for eligibility at the full-text level. The first reviewer also performed a hand search of the grey literature and reviewed the references list of all eligible and published studies to identify any articles that were not initially captured through electronic search. Conflicts were resolved through discussion until consensus was reached.

Data extraction and quality assessment
An electronic data form was developed on Distiller SR (Evidence Partners, Ottawa, Canada) for this scoping review. The following data were extracted from each article: general information; study details; types of HAI and surveillance; source of data; outcomes and results.
Both reviewers assessed each study's quality/risk of bias of each study using ROBINS-I for non-randomized studies (12) and AMSTAR-2 tool for systematic review (13). Overall, studies were ranked at low, moderate or high risk of bias. Any disagreement or inconsistency between the reviewers were resolved through discussion. The complete data collection and quality assessment items are shown in Supplemental Data S2.

Data analysis
A qualitative descriptive approach was used to synthesize the data collected. Principal studies characteristics, summary of performance statistics and quality assessment scores were summarized into tables. We presented a summary of findings for each study grouping into categories depending on the type of administrative data used and the scope of the study. We focused on how the administrative data were used for HAI surveillance, the divergence in results with traditional surveillance and if author recommended administrative data to enhance surveillance. A synthesis of systematic reviews was also presented with studies categorized as review assessing validity of administrative data or review assessing validity of electronic surveillance system.

Study characteristics
Of the 12 observational studies included, six focused on SSI, three on CDI, two on MRSA and one on BSI. Studies were performed from 2009 to 2020 and eight were from Alberta. Seven studies compared administrative data with hospital surveillance data and seven studies used data linkage. All studies used DAD as the source of administrative data (alone or combined with other sources). The main characteristics of all included studies are summarized in  Four systematic reviews were also included, three on the use of electronic surveillance system (ESS) and one on the use of administrative data for HAI surveillance. All reviews included at least one article from Canada. The study characteristics are summarized in Table 2.

Within-study risk of bias
Observational studies were assessed for risk of bias using the ROBIN-1 tool ( Table 1). Most of these studies used similar methodology but lacked information on missing data (Supplemental Table S3). However, they were all assessed as low risk of bias.
Systematic reviews were assessed using the AMSTAR-2 tool ( Table 2, Supplemental Table S4). One article was considered at moderate risk of bias as it did not report its protocol or describe included studies in adequate details. Three articles were considered at high risk of bias as some did not report their protocol or assess the risk of bias, quality or heterogeneity of included studies.

Summary of findings
Studies using one administrative database compared with active surveillance Validation studies showed that DAD used alone for capturing HAI cases is not valid in comparison with IPC traditional active hospital surveillance. For example, Rennert-May et al. (17) assessed the validity of using the ICD-10 code administrative database (DAD) to identify complex SSIs within three months of hip or knee arthroplasty. The study found that the ICD codes in DAD were highly specific (99.5%) but had a sensitivity of 85.3% and a predictive positive value of only 63.6%. They concluded that DAD was not able to accurately determine if someone had an SSI according to surveillance definition (     symptomatic from asymptomatic cases. In fact, DAD had only a moderate sensitivity of 85% and a positive predictive value of 80% (Table 3).

SCOPING REVIEW
On the other hand, Daneman et al. evaluated if mandatory public reporting by hospital was associated with reduction in hospitals CDI rates in Ontario (23). Aside from the main analysis, they performed a cross-validation of CDI rates from administrative data against rates reported by single institutions via the mandatory public reporting system. They used Pearson correlation coefficients weighted for hospital bed-days and found an excellent concordance across the institutions (23).
The same coefficient was used in the study by Ramirez Mendoza et al. (18) that compared DAD with surveillance data for hospital-acquired MRSA in Alberta and Ontario. The results showed strong correlation between DAD and IPC surveillance data. The study concluded that there was good evidence of comparability between these datasets; however, rate or denominator diverged widely between administrative data and active surveillance data (Table 3). Some authors did not agree with the study conclusion or methodology, notably with the choice of Pearson correlation using hospital-level data and the difference of rates or denominators between administrative and surveillance data (30).  Table 3). The authors recommended that the administrative data not be used as a quality indicator for interhospital comparison.

Studies combining multiple administrative databases
In contrast, Crocker et al. (14) compared infection rates calculated using a combination of DAD and NACRS to identify spinal procedure and SSIs. They showed that these rates were comparable with postoperative SSI rate published using traditional surveillance (Table 3). However, the validity of the results was not verified in this study.

Studies combining administrative database with laboratory database
Studies showed that laboratory records could be used to enhance administrative data. For example, Almond et al. (25) assessed the validity of a laboratory-based surveillance method to identify hospital-acquired CDI (HA-CDI). Laboratory data alone can result in overestimation of CDI rates, with positive laboratory result not meeting the case definitions for HA-CDI (e.g. asymptomatic colonization, recurrent CDI). However, this study assessed the alternative of linking positive CDI laboratory records to DAD. The study demonstrated a very high sensitivity but a specificity of 65.7% and a positive predictive value of 74.3% (Table 3). These results indicated that 26% of cases classified as HAI were not true HAI cases, resulting in a higher rate observed with this method. In addition, authors completed a receiver operator characteristic (ROC) analysis to see if using a time from admission (collection date−admission date) of ≥4 days was the appropriate algorithm to use for classifying hospital-acquired cases in the laboratory dataset. The ROC analysis indicated that more cases were classified correctly five days after admission. Thus, a simple change in the laboratory detection using longer time from admission to classify cases as healthcare-associated may increase the specificity with a small cost to sensitivity.

Systematic review and administrative data
Only one study (26) assessed the accuracy of administrative data for surveillance of HAI. Others reviewed articles on ESS using electronic medical records for HAI surveillance compared to traditional surveillance, but included many articles that used a combination of administrative data and ESS (27)(28)(29). Administrative data was found to have very heterogeneous sensitivity and positive predictive value, generally low to modest with a particularly poor accuracy for the identification of device-associated HAI (e.g. CLABSI, CAUTI) ( Table 4) (26,28).
In general, the highly variable accuracy for administrative data was mainly due to the amount of different diagnostic codes used between studies (26). Van Mourik et al. assessed the accuracy of administrative data. One-third of included study had important methodological limitations and ones with higher risk of bias were associated with a more optimistic picture than those employing robust methodologies (26). On the other hand, Leal et al. found a good sensitivity and excellent specificity for administrative data (Table 5) (29). However, populations and methodologies were very heterogeneous, and the quality of the studies included in the review was not assessed. All four reviews found that combining administrative data sources with other sources for surveillance, in particular with microbiology data, improved the accuracy. Studies also found that microbiology data had a good sensitivity (28,29); however, Freeman et al. concluded that ESS using microbiology data alone tended to overestimate HAI (27). Streefkerk et al. (28) also found that microbiology data combined with antibiotic prescription and laboratory (biochemistry, hematology, etc.) data were more accurate than microbiology alone (Table 5). Finally, most studies concluded that administrative data were advantageous to track HAI requiring post-discharge surveillance (e.g. SSI).

Systematic review and electronic surveillance system
Results showed that electronic surveillance using algorithms for HAI detection from electronic medical records had not yet reached a mature stage but presented good opportunities and potential. Most concluded that ESS should be developed and used in hospitals, recognizing that these methods can reduce burden associated with traditional manual surveillance (27)(28)(29). In fact, sensitivity was generally high and specificity variable for most ESS compared with traditional active surveillance (Tables 4  and 5). Freeman et al. found that a lot of computer algorithms for electronic surveillance outperformed manual chart review method (27). A majority of studies in this review emphasized the linkage of electronic databases with "in-house" surveillance system rather than commercial software (27

Discussion
Canada has a great wealth of administrative health data collected at the provincial/territorial level from diverse parts of the healthcare system. However, these data are not used to their full potential and their increased use could enhance HAI surveillance efforts and decrease the workload associated with traditional active surveillance. This scoping review explored the use and validity of administrative data used alone or combined with other data sources for HAI surveillance in Canada. Overall, studies showed that using one source of administrative data alone for surveillance of HAI is not sufficiently accurate in comparison with traditional active surveillance. However, combining different sources of data improved accuracy. Moreover, combining administrative data with active surveillance was shown to be an effective strategy to enhance active surveillance and decrease work burden for IPC teams.

Advantage and inconvenience of administrative data
Administrative data are not collected for surveillance purposes. However, they have a lot of attractive characteristics that make them interesting for the enhancement of HAI surveillance. They are inexpensive, available from nearly all healthcare facilities, collected in a consistent manner, subjected to quality check and do not add an administrative burden to clinicians or patients (31  CIHI has a comprehensive data quality program and any known quality issues are addressed by the data provider or documented in data limitations documentation available to all users (36). However, there are still many barriers to be overcome before accurate administrative data for HAI surveillance could be produced. Studies show that the lack of accuracy is an important limitation in using administrative data as a quality indicator for hospital comparison. For instance, the variability of medical practice, the documentation and discharge coding amongst facilities, the interpretation of medical coders, the fact that data collection relies on primary care provider and that information is based on their capacity to detect and report a HAI (possible misclassification errors, human errors) (15,19,37,38). Essentially, information is limited by what is reported in the medical chart and depends mainly on adequate clinician documentation.
For example, reporting to the DAD database requires the physician to adequately fill the discharge summary, including HAIs if known. HAIs are usually not detected in real time and may likely be assessed differently by a clinician and the infection prevention and control team, the latter following standardized definitions. The health records department's professional coding specialist then translates charts and discharge summaries into standard codes. A study conducted in 2015-2016 in Alberta interviewed coders on physician-related barriers to producing high-quality administrative data (39). These barriers included incomplete and nonspecific documentation by physicians, physicians and coders using different terminology (e.g. physician diagnostic not in ICD-10 list), lack of communication between coders and physicians (mainly in urban settings) and the fact that coders are limited in their ability to add, modify or interpret physician documentation. Finally, coders are not allowed to use supporting documentation that could increase specificity of diagnostic codes (e.g. laboratory reports) (39). In fact, an important limitation for CIHI is that in general, the physician documentation takes priority over all other documentation, even if laboratory reports or other documentation indicate a different diagnosis. Yet there are multiple studies demonstrating that laboratory data could be used to enhance administrative data (13,21,29,37). Hence, allowing coders to use laboratory data could be a feasible solution to improve coding accuracy. Integration of administrative data in electronic surveillance systems

Integration of administrative data in infection prevention and control surveillance
Another potential approach to make surveillance less labor-intensive is to use electronic surveillance systems. In the current review, seven observational studies used data linkage of electronic databases and three systematic reviews assessed electronic surveillance systems. Leal et al. developed a complete ESS to identify and classify BSI with a high degree of agreement with manual chart review (21). Results from the systematic review by Freeman et al. suggested that ESS implementation is feasible in many settings and should be developed by hospitals (27). The ESS can also be developed to detect more than one HAI. Moreover, the systematic review by Steefkerk et al. on ESS presented the 10 best studies selected based on the overall quality and performance score, and the majority used a two-step procedure using administrative, electronic medical records or microbiology data followed by a confirmatory assessment by the IPC professional (28). In this case, ESS could be designed to favor sensitivity over specificity, knowing that manual review will exclude false positives (31 Some provinces are good models for surveillance using electronic data. For example, most studies included in this scoping review were from provinces that have electronic systems (e.g. Alberta, Ontario). Alberta is a good example for HAI surveillance as all acute-care sites conduct traditional surveillance using a single surveillance protocol and a centralized online data entry system (41). This system allows administrative information to be shared between all its facilities. Québec also has a centralized electronic system created for the Surveillance Provinciale des Infections Nosocomiales program using uniform definitions to detect HAI (42); however, no study from Québec met our inclusion criteria. One study by Gilca et al. is worth considering: this study included 83 acute-care hospitals participating in CDI surveillance in the province of Québec (43). Authors compared administrative and surveillance data and found an excellent agreement between rates obtained from MED-ÉCHO (hospital discharge database) and CDI incidence according to provincial surveillance. However, the origin of acquisition for CDI cases was not indicated in the administrative database. Thus, it was not possible to separate nosocomial from community-acquired cases with only the use of administrative data.
A study conducted in three states in the United States and in the province of Ontario, Canada assessed the information technology challenges and strategies of developing and implementing a multihospital electronic system to prevent MRSA (44). They included 11 hospitals, all with an understaffed information technology group, and with seven different systems having unique information technology structure and unique data system. They found innovative strategies to enable automated collection, sharing, analysis and reporting of data in a compatible format for all hospitals. The study was published in 2013, and authors are currently applying the same strategies to develop ESS for other HAIs. This study is a good example of the feasibility of implementing ESS using different hospital systems.

Strengths and limitations
We used standardized and robust methods to identify, review and assess quality of the published literature with all steps performed by two independent reviewers. Two different search strategies were used to ensure that all Canadian studies were included as well as systematic reviews that included at least one study in Canada. Our review included a small number of studies; however, we are confident that our search strategies combined with hand-search captured all relevant available articles. This is the first review to report on divergences between administrative data and surveillance data for HAI surveillance in Canada.
This review has several limitations. We included only studies that were published in French or English; however, as French and English are the two official languages in Canada, we do not expect to have missed important studies. Observational studies identified represent only three Canadian provinces, with two-thirds of the studies from Alberta. Alberta has a provincewide integrated healthcare system that is easily queried, which is not the case with the systems in the remaining provinces. While our review included both articles published in English or French, our search was conducted using only English terms. We searched only three databases and we may have missed relevant articles included in other databases. This study was conducted on Canadian data only and may not be generalizable to other countries.

Conclusion
This scoping review identified numerous divergences between administrative data and active surveillance data for HAI surveillance in Canadian hospitals. However, it also identified possible solutions, depending on the HAI under surveillance, and demonstrated that administrative data can be used to enhance HAI surveillance. Electronic surveillance systems have the potential to save time and human resources and combining multiple administrative datasets may also improve data accuracy. The IPC team who used administrative data or electronic surveillance systems were able to reduce their workload in active surveillance. Although active surveillance of HAIs produced the more accurate results and remains the gold-standard, further studies on HAI surveillance in Canada should focus on the feasibility of data sharing between provinces through electronic systems, the feasibility for medical coders to have access to documentation other than physician documentation, and the feasibility of using administrative data to help reduce the burden of active surveillance.

Supplemental material
These documents can be accessed on the Supplemental data file.
Supplemental Data S1 Supplemental Data S2 Supplemental Table S3  Supplemental Table S4