Clusters of healthcare-associated SARS-CoV-2 infections in Norwegian hospitals detected by a fully automatic register-based surveillance system

Background Notifications to the Norwegian Institute of Public Health of outbreaks in Norwegian healthcare institutions are mandatory by law, but under-reporting is suspected due to failure to identify clusters, or because of human or system-based factors. This study aimed to establish and describe a fully automatic, register-based surveillance system to identify clusters of healthcare-associated infections (HAIs) of SARS-CoV-2 in hospitals and compare these with outbreaks notified through the mandated outbreak system Vesuv. Methods We used linked data from the emergency preparedness register Beredt C19, based on the Norwegian Patient Registry and the Norwegian Surveillance System for Communicable Diseases. We tested two different algorithms for HAI clusters, described their size and compared them with outbreaks notified through Vesuv. Results A total of 5033 patients were registered with an indeterminate, probable, or definite HAI. Depending on the algorithm, our system detected 44 or 36 of the 56 officially notified outbreaks. Both algorithms detected more clusters then officially reported (301 and 206, respectively). Conclusions It was possible to use existing data sources to establish a fully automatic surveillance system identifying clusters of SARS-CoV-2. Automatic surveillance can improve preparedness through earlier identification of clusters of HAIs, and by lowering the workloads of infection control specialists in hospitals.


Introduction
Healthcare-associated infections (HAIs) are infections acquired by patients during contact with the healthcare service. HAIs encompass a wide range of infections including respiratory tract infections and surgical site infections. Outbreaks of HAIs also occur in healthcare settings. HAIs with SARS-CoV-2 pose a threat to both patients and staff within the healthcare service, which is why Norwegian hospitals introduced measures to slow the spread of SARS-CoV-2 within their wards and preserve the functions of hospitals [1]. Research suggests that 35e55% of HAIs could be prevented through interventions [2]. Measures included risk assessment and screening of patients as well as adherence to standard and transmission-based precautions [3]. SARS-CoV-2 can be brought into healthcare institutions through patients with known or unknown infections, staff or visitors [4].
All outbreaks in healthcare institutions are notifiable to the Norwegian Institute of Public Health (NIPH) through the official outbreak notification system Vesuv (hereafter: the notification system) [5]. Manual surveillance and notification systems such as this are resource intensive, both in terms of identifying and reporting HAIs and outbreaks [6]. Adherence to this mandatory notification is unknown, but there is probably a degree of under-reporting.
During the pandemic, the NIPH established an emergency preparedness register e Beredt C19 e to collect and produce timely information on the pandemic [7]. The institute has previously shown that by implementing different registerbased algorithms, it is possible to detect HAI of SARS-CoV-2 in hospitals and outbreaks of SARS-CoV-2 in long-term care facilities [8,9].
As the next step in the development of future surveillance systems for both cases and outbreaks for all types of HAIs, we developed and described a register-based system to identify clusters of healthcare-associated COVID-19 in Norwegian hospitals. We then compared our findings from the automated surveillance system with the outbreaks registered in the official notification system.

Data sources
Beredt C19 contains several different national registers, which epidemiologists at the NIPH can access and extract pseudonymized data from. Each team can only access data relevant to their analyses. In this study, we used the central health registers: the Norwegian Patient Register (NPR) and the Norwegian Surveillance System for Communicable Diseases (MSIS). NPR consists of data from different hospitals from different regions using different electronic patient journal systems, however, the data in NPR are homogeneously structured [10]. All hospital stays in Norway are extracted from NPR, with information on age, sex, admission date, discharge date, national identification number, type of stay (overnight stay, day treatment and outpatient treatment), code for admission (chapter of ICD-10-code), health trust, hospital and ward. We extracted all positive PCR tests in Norway from MSIS with the national identification number, age and test date. Data are sent to Beredt C19 daily, allowing near real-time surveillance. From the notification system, we collected all SARS-CoV-2 outbreaks (defined as more infectious cases than expected within an area in a given period of time, or two or more cases with a presumed common source) in hospitals, along with dates of onset for the first and last cases, total number of patients infected, health trust, hospital and ward.

Study population
We included all patients registered in NPR with overnight stays in Norwegian hospitals during the period from week 10 of 2020 to week 37 of 2022. Everyone registered with a positive SARS-CoV-2 test in MSIS, from the day after admission to seven days after discharge was considered as potentially infected in the hospital. For those registered with more than one stay but with one or fewer calendar days between discharge and admission, we combined the stays. For those with stays in more than one ward the week before the positive test we counted the ward they were in four days before the positive test as the probable place of infection [11].
We compared results from the register-based surveillance system with results from the manual notification system, which contains outbreak notifications that include both patients and employees at general hospitals, psychiatric institutions, and drug-dependency units. During the study period, 329 outbreaks were notified by the hospitals. For the comparison, we excluded all outbreaks that either did not distinguish between patients/employees or did not report any patients (N ¼ 194) and where the number of patients was less than 2 (N ¼ 58). Furthermore, we excluded outbreaks notified from drug dependency or psychiatric institutions, or a rehabilitation hospital (N ¼ 21). We also did not include outbreaks and clusters with a start date after week 35 of 2022, because outbreaks and clusters that started in weeks 36 and 37 could be impossible to detect due to missing follow-up time.

Ethical considerations
Beredt C19 was established under Section 2e4 of the Health Preparedness Act [12]. The NIPH has conducted a data protection impact assessment of the register. The work included in this study was conducted as part of the mandated work of the NIPH to collect data on levels of infection in risk groups and is intended as a way of guiding infection prevention measures. Further details can be found on NIPH's website.

Definition of HAIs of SARS-CoV-2 in this system
The algorithms to define indeterminate, probable and definite HAIs of SARS-CoV-2 and an algorithm to detect clusters of infections in long-term care facilities have previously been described in detail in other publications [8,9].
The admission date is defined as day 0 ( Figure 1). (1) Infections were classified as community-acquired when the patient tests positive on day zero or one. (2) If the positive test was eight days or more after admission it was classified as a definite HAI, including up to one day after discharge. (3) Probable HAI was defined as those who tested positive: (i) five to seven days after admission, if they were still in the hospital; (ii) six to seven days after admission and one day after discharge; and (iii) seven or more days after admission and two days after discharge. (4) Indeterminate HAI was defined as those who tested positive: (i) two to four days after admission if they were still in the hospital; (ii) three to five days after admission and one day after discharge; (iii) four to six days after admission and two days after discharge; or (iv) three to seven days after discharge.

Definition of SARS-CoV-2 clusters
We used probable and definite HAIs to define clusters. If there were cases in the same ward within 14 days of the previous case, those were counted as being in the same cluster because of proximity in place and time and can therefore be viewed as a potential outbreak. Fourteen days was chosen to detect all outbreaks, but it is possible to calibrate the system using other time intervals. Thereafter, we added indeterminate HAIs from the same ward starting one week before the first case through to one week after the last case.
The result was clusters with a minimum size of probable and definite HAIs at each ward and a maximum size including indeterminate, probable and definite HAIs on the same wards. We explored two different algorithms for the minimum size for it to be considered a cluster, illustrated in Table I.
In both algorithms there must be at least one probable or definite HAI. At large hospitals, we explored the data at the ward level, while at smaller hospitals we explored the data at the hospital level.

Statistical analyses
We described age, sex, number of stays, and admission ICD-10 code for the hospital population in a table with appropriate statistical measures of central tendency and dispersion. We compared the date and size of the outbreaks registered in the notification system and the clusters detected with the algorithm in the register-based surveillance system. We also qualitatively described the outbreaks registered in the notification system that we did not find in the register-based system.
All analyses were performed in R version 4.0.2 [13]. The Rscripts with algorithms used are available from the Norwegian Institute of Public Health [14].

Results
We found 5033 cases of indeterminate, probable, or definite HAIs in our data material; 2311 (45.9%) were female, and the median age was 70 years (interquartile range: 46e80 years). There were 1797 probable and definite HAI cases; 710 (39.5%) were female, and the median age was 73 years (interquartile range: 59e81 years).
There were 329 outbreaks notified from Norwegian hospitals to the notification system. After exclusion, there were 56 outbreaks with at least two patients infected at general hospitals notified to the notification system ( Figure 2). With the clustering algorithm, we found 684 clusters during the study period with at least one probable or definite HAI. With algorithm 1 and 2 that number is reduced to 301 and 206 clusters respectively (Table II).
Twelve outbreaks were notified through the notification system that we did not manage to detect by either of our algorithms. In four of those, there were only two infected patients. Four outbreaks included three infected patients, while in the last four, there were four or more patients. In eight of those 12 outbreaks, we found some HAIs, but not enough for them to be classified as a cluster using our algorithm. In two more reported outbreaks, we found clusters of indeterminate HAIs, while in the last two, we did not find any HAIs using either algorithm. Four of the clusters we identified corresponded to two outbreaks each in the notification system. Two of the clusters that had a corresponding outbreak in the notification system lasted four months each. Before 2022 the clustering algorithm and the notification system had approximately the same number of outbreaks. In 2022, however, we saw far more clusters using the algorithm than were notified to the notification system (Figure 2). Between 11 th February 2022 and 23 rd May 2022, there were no outbreaks registered in the notification system where at least two patients were infected from general hospitals.

Discussion
We identified 44 out of the 56 officially notified outbreaks of SARS-CoV-2 in Norwegian hospitals during the period week 10 of 2020 to week 35 of 2022 using a fully automated register-based surveillance system. In more than half of the identified clusters, the number of infected patients matched. We also identified more clusters in 2022 than those notified to the notification system. There were 22 outbreaks notified to the notification system during the Omicron epidemic, our system identified 258 and 180, respectively. This difference could be explained by under-reporting due to lack of resources to notify about a large number of outbreaks in the healthcare service, but could also be due to our surveillance system being too sensitive during the beginning of the Omicron epidemic when the community incidence was highest.
We have presented a fully automated surveillance system that can identify ongoing and past outbreaks in hospitals in near real-time. A systematic review of semi-and fully automated surveillance systems of HAIs found that the development of such systems has yet to reach a mature stage, and concluded  that researchers should focus on validating existing systems [15]. Unfortunately, our system cannot be fully validated using the notification system, as the notification system appeared to be missing a substantial number of outbreaks in 2022. Verberk et al. also pointed out the need for standardization, and van Mourik et al. found that one of the biggest problems in national surveillance is the heterogeneity of the data structure for different institutions and regions [16,17]. This is not a problem in Norway due to the publicly owned central health registers which make the data structure homogeneous. We have made the source code and algorithm publicly available in order to allow the algorithms to be tested in other settings [14]. It is also possible to apply for access to these data, which can enable external validation. In the Ministry of Health and Care Services' action plan for better infection control in healthcare, one of the measures described is to develop an electronic surveillance system for different types of HAIs [18]. The method used in this study can be used for infections other than SARS-CoV2, as has also been shown by others [19e22]. The advantage of a fully automated surveillance system is that it can automate previously manual tasks for infection control specialists and give near real-time data both locally and nationally. This may enable infection control personnel to promptly implement infection prevention and control measures. However, the implementation of any such system requires a balance between sensitivity and specificity. We have presented results for two different definitions of outbreaks, where both include indeterminate HAIs only after a probable or definite HAI has been identified. It is possible to increase sensitivity further by including all indeterminate HAIs, but this would reduce specificity, possibly leading to unnecessary attention from infection prevention and control specialists. Our system uses data collected routinely for other purposes and does not require extra work by the  hospitals. The system makes it possible to detect outbreaks early such that relevant measures can be taken promptly to limit the size and severity of the outbreak. One of the disadvantages of this system is that it is hard to verify whether the results are actual outbreaks or sporadic cases among patients and newly discharged patients discovered through screening. Although our system has a high sensitivity, the specificity can be calibrated by adjusting the definitions, both of HAI and of clusters. Infections in staff were not included in this study as we were not able to link staff to specific wards using the data available to us. The inclusion of staff could help us better understand outbreaks spreading between different wards, as staff could represent important links in the chains of transmission. Using two existing, publicly owned registers and an automated algorithm, this study shows that it is possible to detect clusters of HAIs in Norwegian hospitals. By structuring the underlying data as was done, it was possible to employ our publicly available algorithm to routinely collected health data in other settings. The move from manual notification of outbreaks in healthcare settings may increase patient safety and preparedness, provided the sensitivity of the surveillance system is high enough, as HAIs and outbreaks in hospitals can be identified earlier. Finally, the automated algorithm could relieve workloads on infection control specialists in the hospital setting, thus freeing up resources for other important tasks within infection prevention and control.