Background rates of 41 adverse events of special interest for COVID-19 vaccines in 10 European healthcare databases - an ACCESS cohort study

Background In May 2020, the ACCESS (The vACCine covid-19 monitoring readinESS) project was launched to prepare real-world monitoring of COVID-19 vaccines. Within this project, this study aimed to generate background incidence rates of 41 adverse events of special interest (AESI) to contextualize potential safety signals detected following administration of COVID-19 vaccines. Methods A dynamic cohort study was conducted using a distributed data network of 10 healthcare databases from 7 European countries (Italy, Spain, Denmark, The Netherlands, Germany, France and United Kingdom) over the period 2017 to 2020. A common protocol (EUPAS37273), common data model, and common analytics programs were applied for syntactic, semantic and analytical harmonization. Incidence rates (IR) for each AESI and each database were calculated by age and sex by dividing the number of incident cases by the total person-time at risk. Age-standardized rates were pooled using random effect models according to the provenance of the events. Findings A total number of 63,456,074 individuals were included in the study, contributing to 211.7 million person-years. A clear age pattern was observed for most AESIs, rates also varied by provenance of disease diagnosis (primary care, specialist care). Thrombosis with thrombocytopenia rates were extremely low ranging from 0.06 to 4.53/100,000 person-years for cerebral venous sinus thrombosis (CVST) with thrombocytopenia (TP) and mixed venous and arterial thrombosis with TP, respectively. Interpretation Given the nature of the AESIs and the setting (general practitioners or hospital-based databases or both), background rates from databases that show the highest level of completeness (primary care and specialist care) should be preferred, others can be used for sensitivity. The study was designed to ensure representativeness to the European population and generalizability of the background incidence rates. Funding The project has received support from the European Medicines Agency under the Framework service contract nr EMA/2018/28/PE.

with thrombocytopenia (TP) and mixed venous and arterial thrombosis with TP, respectively. Interpretation: Given the nature of the AESIs and the setting (general practitioners or hospital-based databases or both), background rates from databases that show the highest level of completeness (primary care and specialist care) should be preferred, others can be used for sensitivity. The study was designed to ensure representativeness to the European population and generalizability of the background incidence rates.

Background
On 11 January 2020, the release of the genetic sequence of SARS-CoV-2 triggered the rapid development of COVID-19 vaccines on a global level [1]. More than two hundred vaccine candidates were in the development pipeline. One year later, 26 vaccines were in use across the world [2], and as of January 10th, 2022, 9.46 billion COVID-19 vaccine doses have been administrated worldwide, and about half of the world population has been vaccinated [3]. Due to the rapid development of new COVID-19 vaccines, questions arose about the benefits and risks of the vaccines at individual and population levels. Several emerging safety signals have been detected soon after COVID-19 vaccines launches. Researchers have reported case series with unusual thrombotic events after immunization with ChAdOx1nCov-19 (Oxford/AstraZeneca) [456] and Ad26.COV2.S (Janssen/Johnson & Johnson) [7] vaccines, which led to several regulatory measures, mainly in Europe and in the United States [8 9]. These thrombotic events were shown to occur, in most instances, in co-occurrence with thrombocytopenia. This new phenomenon, named thrombosis with thrombocytopenia syndrome (TTS), was further characterized with the initiation of the development of a case definition by the Brighton Collaboration Working Group [10]. Furthermore, the spectrum of adverse events has been expanded to conditions such as myocarditis and pericarditis [11] with series of cases initially reported after vaccination with Comirnaty (Pfizer) in Israel [12]. Other very rare events of capillary leak syndrome were reported after vaccination with adenovector viral vaccines [13] and more recently, Guillain Barré Syndrome (GBS) has been detected as a potential safety concern following administration with Ad26.COV2.S vaccine [14]. The experience with COVID-19 vaccines highlights once more the importance and the need for robust surveillance systems and collaboration to carefully monitor adverse effects even after regulatory approvals for timely adoption of public health measures. The same conclusion, made after the 2009 H1N1 pandemic, had led to the Innovative Medicines Initiative funded project that designed and tested a system in Europe, which was implemented by the Vaccine Monitoring Collaboration for Europe (VAC4EU) in January 2020 [15]. In May 2020, ACCESS (The vACCine covid-19 monitoring readinESS), a project funded by the European Medicines Agency (EMA) leveraging expertise in the European Pharmacoepidemiology & Pharmacovigilance research network and the VAC4EU, was launched to prepare real-world monitoring of COVID-19 vaccines [16]. This ACCESS study aimed to generate background incidence rates of adverse events of special interest (AESI) that would allow contextualization of potential safety signals detected following administration of COVID-19 vaccines.

Study design and setting
A multi-database dynamic cohort study was conducted in 10 healthcare databases from 7 European countries: Italy, Spain, Den-mark, Netherlands, Germany, France and United Kingdom (UK). The study protocol (EUPAS37273) is publicly available on the European Network of Centers for Pharmacoepidemiology and Pharmacovigilance (ENCePP) register [17]. The study was conducted over the period 2017 to 2020, except for two databases in which the study ran over the years 2010-2013 for Danish registries (DCE-AU) and 2014-2017 for German Pharmacoepidemiological Research Database (GePaRD). The 10 population-based healthcare databases included data from ARS, PEDIANET (Italy), FISABIO, BIFAP and SIDIAP (Spain), PHARMO (Netherlands), CPRD (UK), GePaRD (Germany), SNDS (France) and Danish Registries. The databases differed in terms of population size, provenance of the diagnosis (e.g., emergency room, in and/or outpatient, specialist or general practitioners (GP)) and diagnostic coding systems (International Classification of Diseases (ICD), Ninth Revision, Clinical Modification (ICD-9-CM), and ICD, Tenth Revision, Clinical Modification (ICD-10-CM), ICD-10 German Modification (ICD10-GM), CIM10 (Classification Internationale des Maladies), Read, SNOMED CT US Edition and Spanish Edition (SCTSPA)). Table 1 provides a summary of the main characteristics of the data sources. For three of them (BIFAP, SIDIAP and PHARMO), subpopulations were defined which included individuals with both primary care and hospital medical records (BIFAP_PC_HOSP, SIDIAP_PC_HOSP and PHARMO versus BIFAP_PC, SIDIAP_PC and PHARMO_PC_HOSP). The creation of subpopulation was necessary when diagnosis records from hospital discharge data and primary care had different source populations and/or lag times.

Study population
The source population comprised all individuals observed in one of the participating databases for at least one day during the study period and who had at least one year of data availability before study entry, except for individuals with data available since birth. Individuals were included in the study according to predefined inclusion and exclusion criteria. Reasons for exclusion were: invalid or missing birth date or missing sex record, exit before study entry (01 January 2017; 01 January 2010 for DCE-AU; 01 January 2014 for GePaRD), and less than one year of lookback period prior to study entry. Individuals at increased risk of severe COVID-19 disease were identified according to the presence of at least one of the following underlying conditions in the lookback period or during the study follow-up: cardiovascular disease, cancer, chronic lung disease, HIV, chronic kidney disease, type 2 diabetes, severe obesity (BMI 30), sickle cell disease or use of immunosuppressants.

Adverse events of special interest (AESI)
As part of the harmonization of COVID-19 vaccine safety monitoring during clinical development phase, the Coalition for Epidemic Preparedness Innovations (CEPI) has created a preliminary list of AESIs for COVID-19 vaccine safety monitoring together with the Brighton Collaboration [18]. This list of AESIs has been defined based on events that are related or potentially related to marketed vaccines, events related to vaccine platforms or adjuvants, and events that may be associated with COVID-19. This preliminary list has been further extended and was reviewed and accepted by the European Medicines Agency advisory group monitoring committee. The final list included a total of 41 AESIs, see Box.

Data management workflow and data analysis
This study was conducted in a distributed manner using a common protocol, a ConcePTION common data model (CDM) [19] for syntactic harmonization, a common analytics program for semantic harmonization and data transformation/analysis [20]. Each data access provider (DAP) applied the Extract-Transform-Load process which led to a syntactic harmonization. The syntactic foundation transforms the structure of the data sets held by each DAP to a common format. To create the study variables semantic harmonization was needed to reconcile differences across different terminologies. A shared semantic foundation was built for each AESI by using a standardized event definition form. For each AESI and underlying condition, medical code lists have been created using the ADVANCE code mapper tool [21] and integrated coding systems: ICD-9-CM, ICD-10-CM/GM, CIM10, READv2, SNOMED CT US Edition and SCTSPA. DAPs were asked to review and update the proposed medical codes based on local coding habits and prior experience. Narrow and broad algorithms were established for most AESIs allowing, respectively, for a specific and a sensitive clinical case definition. Event definition forms including medical code lists were made publicly available through the VAC4EU Zenodo community (https://www.zenodo.org/communities/vac4eu/?page = 1&size = 20). R scripts that included semantic harmonization and transformation of data in the CDM into incidence rates were coded in R using version 3.1.0 and distributed to the DAPs for local deployment. Aggregated data were uploaded by each DAP on the Digital Research Environment (https://www. andrea-cloud.eu/azure-dre), a secured Microsoft Azure cloudbased research environment, for final analysis and pooling. Demographic characteristics including age and person-time of follow-up were computed in each data source. Incidence rates (IR) and 95 % exact confidence interval (95 %CI) for each AESI and for each database were calculated for the study period, by year, age and sex and by dividing the number of incident cases by the total person-time at risk. Age-standardized IRs (according to the European population [22]) for the period 2017-2019 (or 2010-2013 for DCE-AU or 2014-2017 for GePaRD) were pooled using the DerSimonian and Laird meta-analytic approach for random effects models according to the provenance of the events (Table 1). Incidence rates were expressed per 100,000 person-years (PY). Percentage change between the years 2017-2019 versus 2020 were also computed to assess the change in health care utilization during the COVID-19 pandemic. Statistical analyses were performed in SAS v9.4 and STATA v17.

Incidence rates of AESIs
IRs per database, per year, age and sex are detailed in the final study report available on Zenodo website [23] and on the VAC4EU dashboard [24]. Incidence rates that are presented in this paper used the narrow clinical definitions and are for time periods that exclude the year 2020. Table 3 presents age-standardized pooled incidence rates for all AESIs according to the provenance of events over the study period. Plots in Fig. 1 depict the incidence rates per age and according to the provenance of events. Age-and sexstratified incidence rates for all AESIs according to the ACCESS recommendations are presented in Supplementary materials (Table 1).
For the autoimmune diseases, pooled IRs for ADEM, GBS and narcolepsy were the lowest rates. Provenance of diagnoses impacted substantially the observed rates with diagnoses of narcolepsy, GBS and diabetes most frequently reported in settings including in-outpatient and/or GPs records. A clear age and sexpattern was shown for GBS and TP. IRs for GBS and TP were slightly elevated in males. Cardiovascular disorders were more frequently reported in the hospital setting. Microangiopathy and stress cardiomyopathy showed the lowest rates compared to other cardiovascular disorders. A peak of myocarditis and myocarditis/ pericarditis was observed in the 20-29 age category and higher rates of SOCV were observed in the younger population (0-19). IRs for myocarditis and coronary artery disease were higher in males, while IR for stress cardiomyopathy was higher in females. For circulatory disorders, IRs were higher for diagnoses from hospital records as compared to diagnoses in GPs records. Rates ranged from 229.67/100,000 PY for venous thromboembolism (VTE) to 0.85/100,000 PY for CVST in databases including GPs and hospital medical records. TTS rates were extremely low ranging from 0.06/100,000 PY for CVST with TP to 4.53/100,000 PY for mixed venous and arterial thrombosis with TP. Circulatory disorders were shown to increase with age, except for CVST for which an agepattern was not detected. IRs for disseminated intravascular coagulation, arterial thrombosis and microangiopathy were higher in males compared to females. Hepato-gastrointestinal disorders were more frequently reported in hospitals or GPs settings with rates increasing with age and a slight decrease for acute liver injury in the elderly (80 + ). Similarly, nerves and central nervous disorders were more frequently reported in hospital setting. Rates for generalized convulsion peaked in the younger population (20)(21)(22)(23)(24)(25)(26)(27)(28)(29) and in the elderly (80 + ). No clear age-pattern was observed Table 2 Demographic characteristics.   for meningoencephalitis and rates for transverse myelitis dropped in the 80 +. IR for transverse myelitis was slightly elevated in females. Anaphylaxis and anosmia-ageusia diagnoses were more frequently reported in settings including GPs records. Rates for anaphylaxis and multisystem inflammatory disorders peaked in the younger ages (0-19). Rates for death and sudden death showed a clear age-pattern across study setting but could not be detected in settings including exclusively inpatient medical records. ARDS

Incidence rates in 2020 and in population with underlying conditions
For the year 2020, all AESIs, except anaphylaxis and ARDS, were less frequently reported in setting with emergency room visit. Anosmia-ageusia, sudden death, ARDS and thrombosis (CVST and VTE) were more frequently reported in settings with both GP and hospital medical records (Figure 3, Supplementary materials). IRs in population with underlying conditions showed significantly higher rates for all AESIs compared to the general population (data not shown).

Discussion
Based on data from 63 million European individuals, this cohort study generated age-and-sex specific background incidence rates with high precision for a pre-specified list of 41 AESIs, necessary for monitoring the safety of COVID-19 vaccines. We generated background incidence rates using a distributed data network with common protocol, common data model and common analytics using 10 diverse healthcare databases across 7 European countries. These rates have been reported from January 2021 onwards, periodically and were used throughout 2021 by the European Medicines Agency and vaccine manufacturers for observed/expected analyses (personal communication). Gubernot et al. (2021) [25] recently conducted a literature review of incidence rates of 22 AESIs, as well as the Brighton Collaboration (9 events); our overall rates are consistent with the literature derived data although we could not compare age and sex strata. Li et al. (2021) [26] recently published a study on AESI incidence rates from the OHDSI network, which covered general practice or claims data from eight countries (USA, UK, Australia, France, Germany, Spain, Netherlands and Japan) and reported on 15 AESIs. In general, our results differed substantially for several of the 12 common AESIs, which may be explained by the fact that incidence rates from the 8 countries were pooled regardless that the provenance of the events that went into the numerator differed substantially: 5 of the OHDSI data sources only captured GP recorded diagnosis data, whereas US and Japanese data captured claims. Our approach and strength were to pool results only across similar provenance of the event and to present the rates by provenance, independently. We considered the different provenances in the analysis as this is crucial for the correct interpretation of realworld evidence derived from heterogeneous data sources, several AESIs are only diagnosed in secondary care and are underestimated in primary care medical records, such as cardiovascular and thrombotic events. We would recommend that rates are presented by provenance and that this diversity is preserved in the pooling for the observed/expected analyses.
To give examples we briefly describe and compare the rates for selected AESIs with published data focusing on AESI that have been identified as safety risks following administration of COVID-19 vaccines. More detailed contextualization of the rates for each of the different AESI against published references is available in our final study report [23]. Our VTE rates (pulmonary embolism and deep vein thrombosis) were of similar magnitude compared to literature data retrieved by Gubernot [28], with a notable increase of incidence with age. For TTS, we operationalized the Brighton Collaboration case definition after a public webinar by VAC4EU & Brighton Collaboration (https://youtu.be/-Sp5GKfzB2I) establishing four subcategories of thromboembolic events, i.e., venous thrombosis (VTE), arterial thrombosis (AMI and stroke), CVST and the combination of all (mixed venous and arterial), each of the thromboembolic conditions was stratified by the co-occurrence of a thrombocytopenia diagnosis within 10 days around the thromboembolic diagnosis. Our observations suggested that CVST is extremely rare, as are any of the combinations with thrombocytopenia, with rates estimated at < 1 to 5/100,000 PY. These observations are consistent with the recent study from Burn (2021) [29]. Our clinical definition for thrombocytopenia included both immune thrombocytopenia and secondary thrombocytopenia and showed higher rates compared to other published references such as Li et al. [26]which restricted the concept definition to immune diseases (448/100,000 PY versus 56/100,000 PY in males of older ages). Our incidence rates for the composite endpoint myocarditis/pericarditis were slightly lower compared to data from Li et al. (2021) [26], since we excluded chronic conditions and causes such as rheumatism. Our rates of myocarditis were much lower than our composite of myocarditis/pericarditis, showing that the composite endpoint was mainly driven by pericarditis medical conditions. Our rates for myocarditis differed by age and sex and were comparable with Gubernot et al. (2021) [25] which reported rates ranging from 1 to 10 cases/100,000 PY. The impact of the COVID-19 pandemic on healthcare seeking and recording was clearly highlighted in the year 2020 with a sharp increase in rates in medical events directly related to COVID-19 such as ARDS, sudden death and anosmia-ageusia.

How to use the background rates
In the context of readiness for real-world monitoring of COVID-19 vaccines, the background incidence rates, which had been released periodically and openly, have been proven useful for observed-to-expected (O/E) analyses by EMA and vaccine manufacturers. In vaccine pharmacoepidemiology, signal detection methods are preliminary assessments allowing identification of potential safety concerns, but background rates are required to interpret them [30,31]. Health authorities usually request O/E analysis to refine detected safety signals before implementing any further assessments [32]. The O/E analysis relies on exposure data and published background incidence rates. Since mass vaccinations campaigns usually roll out in a channeled manner, it is of crucial importance to have rates stratified by age, sex, and underlying comorbidity, which usually is poorly documented in the literature. In this study, age, sex-stratified and comorbidity specific rates were generated from 10 existing large electronic healthcare databases in 7 European countries, with semantically harmonized data. Because each data source has its own characteristics with regards to provenance of the events (GPs only, in or outpatient settings, emergency room visit or specialist referrals), we provided pooled estimates according to the provenance of the events. Background rates can be generated prior to vaccination roll-out when electronic health data are available, but the users should be aware of the nature of the event, the setting in which it is diagnosed, and evaluate whether the data source appropriately captures the data. Data sources that contain data from the setting where the disease is typically diagnosed should be preferred. In this study, given the  Table 3 Pooled incidence rates and 95% confidence intervals for all AESIs (narrow definition) over the study period* according to the provenance of the events databases.  nature of the AESIs included in this study, we recommend using background rates from data banks that show the highest level of completeness of identification of these events in terms of the type of diagnoses (such as GP and hospital-based data sources) which includes all data sources with such subpopulation. For some events, such as CVST, data sources including emergency and outpatient visits may be preferred, while for anosmia-ageusia or chilblain-like lesions, data sources including GPs setting would be recommended (see Table 1 in Supplementary materials).

Strengths and Limitations
ACCESS was a project funded by the EMA to prepare European infrastructure to monitor COVID-19 vaccines. The project ran from May 2020 to July 2021 and delivered background rates of AESIs, template protocols for implementation of observational studies, and feasibility assessments in each country to participate in studies and analyses performed by EMA for its Scientific Committees and ECDC. All deliverables have been made publicly available immediately to the scientific community through the EU PAS register (https://www.encepp.eu/encepp/studiesDatabase.jsp), the VAC4EU website (https://vac4eu.org) and Zenodo (https://www. zenodo.org). European data sources are quite heterogeneous because of different coding systems, health care practices, provenance of diagnosis and systems. To standardize the analytical process, we applied a two-step approach, first a syntactic harmonization, putting all data in the same structure, and secondly a semantic harmonization, which was conducted transparently and centrally through a R-script. Semantic harmonization is complex, and infinite. It comprises harmonization of different coding systems with different granularity levels, coding practices in different settings. The harmonization process across terminologies was organized through the use of the Unified Medical Language System using the Codemapper [21] followed by extensive review of codes by the DAPs. This harnesses the expertise of the local data sources. It is acknowledged that while a rigorous harmonization process has been applied, residual heterogeneity may persist within and between data sources which would impact pooled results. Therefore, it is our recommendation to consider, in addition to pooled incidence rates, data source-specific incidence rates for further used of the generated data. The question on heterogeneity paths the way for the development of metrics to measure heterogeneity in data sources and the development of guidance to define acceptable thresholds when conducting distributed data network studies. Our study stressed the importance of an appropriate study setting to conduct future safety research's studies. Due to the nature and resource constraints for this study, case validation could not be conducted; we attempted to assess and reveal the impact of potential misclassification by using narrow and broad clinical definitions for most of the AESIs, and by stratifying by the provenance of diagnosis. In some instances, the governance approval can be a lengthy process, especially in a pandemic situation. For this reason, the Danish DAP decided to prioritize the use of a set of data for which ethics approval was previously obtained. For the other DAPs, we obtained governance approvals from scientific and ethics committees within a few weeks after submission of the protocol. Access to data was also facilitated with pre-agreement with DAPs. Ultimately, we could generate background incidence rates for newly identified syndrome like TTS in a few days showing the strength of the network in rapid response to specific research questions.

Conclusion
The ACCESS project started at an early stage of the COVID-19 pandemic as a component of the EMA readiness strategy for the times where vaccines would be authorized. ACCESS was successful in delivering these data on time as the first set of background rates were made available to EMA in December 2020, providing support to the safety monitoring of vaccines as soon as they were available in the EU. A large population of 63 million European individuals was included in the study, without restrictions beyond study period, to ensure representativeness to the European population and generalizability of the background incidence rates.

Data availability
The authors are unable or have chosen not to specify which data has been used.

Declaration of Competing Interest
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: [Corinne Willame is conducting a PhD at the University Medical Center Utrecht, as part of which she coordinated this project from January 2021. She works full time for Janssen Pharmaceutica and allocated working-time to finalize this project, this should be considered an in-kind contribution. Janssen Pharmaceutica was not a member of the consortium or project. Rosa Gini, Claudia Bartolini and Olga Paoletti are employed by ARS Toscana, a public research center that conducts or participates in pharmacoepidemiology studies compliant with the ENCePP Code of Conduct. The budget of ARS is partially sustained by such studies. Nicolas Thurin works for Bordeaux PharmacoEpi, an independent research platform of the Bordeaux University and its subsidiary the ADERA SAS, which performs financially supported studies for public and private partners, in line with the ENCePP Code of Conduct. Lei Wang, Vera Ehrenstein and Johnny Kahlert are salaried employees of their organization, which receives institutional research funding from pharmaceutical companies and regulatory agencies, administered by Aarhus University. Miriam Sturkenboom, Daniel Weibel, Carlos E. Durán, Roel Elbers, are salaried employees by University Medical Center Utrecht, which receives institutional research funding from pharmaceutical companies and regulatory agencies, administered by University Medical Center Utrecht. All these studies follow the ENCePP code of conduct. Miriam Sturkenboom is a consultant to the Task Force for Global Health for the Safety Platform for Emergency vACcines (SPEAC) project. Caitlin Dodd currently works for Panalgo. Felipe Villalobos, Meritxell Pallejà-Millán and María Aragón are salaried employees at Fundació Institut Universitari per a la recerca a l'Atenció Primària de Salut Jordi Gol i Gurina (IDIAPJGol), which receives institutional research funding from public and private partners, pharmaceutical companies and regulatory agencies, administered by IDIAPJGol. Mar Martín-Pérez, Patricia García-Poza, Airam de Burgos, Maria Martínez-Gon zález and Verónica Bryant are employees at the Spanish Agency for Medicines and Medical Devices which fully finance the BIFAP database, a non-profit database for performing independent pharmacoepidemiologic research in the public research Spanish setting. Consuelo Huerta was employee at the Spanish Agency for Medicines during the time in which this study was performed. She is currently assistant professor of the Public Health Department of the Faculty of Medicine in the Complutense University of Madrid. Ulrike Haug and Tania Schink are working at an independent, non-profit research institute, the Leibniz Institute for Prevention Research and Epidemiology -BIPS. Unrelated to this study, BIPS occasionally conducts studies financed by the pharmaceutical industry. Almost exclusively, these are post-authorization safety studies (PASS) requested by health authorities. The design and conduct of these studies as well as the interpretation and publication are not influenced by the pharmaceutical industry and performed in line with the ENCePP Code of Conduct. All other authors have no conflict of interest.].