Contextualising adverse events of special interest to characterise the baseline incidence rates in 24 million patients with COVID-19 across 26 databases: a multinational retrospective cohort study

Summary Background Adverse events of special interest (AESIs) were pre-specified to be monitored for the COVID-19 vaccines. Some AESIs are not only associated with the vaccines, but with COVID-19. Our aim was to characterise the incidence rates of AESIs following SARS-CoV-2 infection in patients and compare these to historical rates in the general population. Methods A multi-national cohort study with data from primary care, electronic health records, and insurance claims mapped to a common data model. This study's evidence was collected between Jan 1, 2017 and the conclusion of each database (which ranged from Jul 2020 to May 2022). The 16 pre-specified prevalent AESIs were: acute myocardial infarction, anaphylaxis, appendicitis, Bell's palsy, deep vein thrombosis, disseminated intravascular coagulation, encephalomyelitis, Guillain- Barré syndrome, haemorrhagic stroke, non-haemorrhagic stroke, immune thrombocytopenia, myocarditis/pericarditis, narcolepsy, pulmonary embolism, transverse myelitis, and thrombosis with thrombocytopenia. Age-sex standardised incidence rate ratios (SIR) were estimated to compare post-COVID-19 to pre-pandemic rates in each of the databases. Findings Substantial heterogeneity by age was seen for AESI rates, with some clearly increasing with age but others following the opposite trend. Similarly, differences were also observed across databases for same health outcome and age-sex strata. All studied AESIs appeared consistently more common in the post-COVID-19 compared to the historical cohorts, with related meta-analytic SIRs ranging from 1.32 (1.05 to 1.66) for narcolepsy to 11.70 (10.10 to 13.70) for pulmonary embolism. Interpretation Our findings suggest all AESIs are more common after COVID-19 than in the general population. Thromboembolic events were particularly common, and over 10-fold more so. More research is needed to contextualise post-COVID-19 complications in the longer term. Funding None.


Research in context
Evidence before this study During the rollout of the COVID-19 vaccinations, regulatory authorities paid special attention to a pre-specified list of adverse events of particular concern (AESIs). Some of the pre-specified AESIs have since been related with vaccinations. Not only are some AESIs potentially related with COVID-19 vaccinations, but also with the virus itself. To comprehend the benefit-risk of COVID-19 vaccinations, we must be aware of the projected COVID-19 infection rates. A literature search was conducted up until March 31, 2022 in PubMed to determine what evidence has previously been published on the occurrence of AESI after COVID-19 infection. Included in the search criteria were "COVID-19" and each of the 16 AESIs evaluated (the original 15 AESIs plus thrombosis with thrombocytopenia). There were 63 publications discovered and evaluated. However, the investigations were conducted on a single database or a small number of databases, and the majority of the papers concerned AESIs after immunization, with no mention of post-COVID occurrences.

Added value of this study
We demonstrate that all 16 studied AESIs are more common after COVID-19 than expected in the general population. Thromboembolic events were particularly common, and over 10fold more so. These findings highlight the need for more research to contextualize post-COVID-19 complications in the longer term.

Introduction
Since coronavirus disease 2019 (COVID- 19) was first reported until July 2022, over 554 million cases and 6.3 million deaths have been reported worldwide. 1   vaccines have demonstrated to reduce COVID-19 hospitalisations and deaths, both in randomised controlled trials and in real-world observational studies. 2,3 As of 30th May 2022, more than 5 billion people (approximately 67% of the global population) have received at least one dose of a COVID-19 vaccine. 4 At least 39 vaccines have been authorised or approved for use by at least one country, of which 11 have been granted emergency use by the World Health Organization (WHO). Although vaccine safety has been rigorously monitored in clinical trials, rare adverse events can go undetected during vaccine development due to limited number of trial participants, a relatively short follow-up duration, and insufficient generalisability to the broader population.
Throughout the roll-out of the COVID-19 vaccines, adverse events of special interest (AESIs) have been prespecified to be monitored by medicines regulatory authorities. To provide context for these safety investigations, Li et al., 5 in a multinational network cohort study, reported heterogeneity in the background incidence rates of 15 AESIs across age and sex stratification as well as various administrative claims and electronic health records (EHR) databases. Against this backdrop, some vaccine-related adverse events have gone on to be associated with the vaccines; two within the originally pre-specified 15 AESIs (myocarditis and pericarditis 6 and Guillain-Barré syndrome (GBS) 7 ), and one newly-identified AESI (thrombosis with thrombocytopenia syndrome 8 ).
It is, however, important to note that some AESIs are not only potentially associated with the COVID-19 vaccines, but with COVID-19 itself. And in order to understand the benefit risk of COVID-19 vaccines it is important to contextualise this with expected rates following COVID-19. To address this problem, the Observational Health Data Sciences and Informatics (OHDSI) 9 community carried out a network study using observational data from 26 databases across 11 countries with the aim of estimating the incidence rates of 16 vaccine related AESIs among people who had COVID-19, as compared to the background population during the pre-pandemic phase.

Data sources
Rates were obtained from 26 databases, which included 8 administrative claims databases, 12 EHRs, 1 EHR with a registry, and 5 general practitioner (GP) databases. These databases represented 11 countries: Belgium, Estonia, France, Germany, Japan, the Netherlands, Serbia, Spain, Turkey, the United Kingdom (UK), and the United States of America (US). All of these databases represent subsets of the total population from which they originate.  13 ; IQVIA Disease Analyzer France (IQVIA_FRANCE_DA) 14 ; IQVIA Disease Analyzer Germany (IQVIA_GERMAN_DA) 14 ; and The Information System for Research on Primary Care (SIDIAP). 15 Table 1 provides a high-level overview of the databases. All datasets were mapped to the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) V5.3.1 or higher 9 which is maintained by OHDSI. See Appendix 1 for database details.
The use of APHM was approved by the health data access commission (CADS). The use of CPRD_AURUM was approved by the Research Data Governance Process, protocol 20_000211. The use of IPCI database was approved by its governance board (2020-04). For UK_BIOBANK, ethical approval was provided from the UKB Access Review Board, reference 58,356 "Defining and redefining human disease at scale: an atlas of the human phenome." COVID-19 research using UK Biobank data has been outlined here: https://www.ukbiobank.ac. uk/learn-more-about-uk-biobank/covid-19-hub.
Participant data for this project are available directly from the UKB following a protocol review and contractual agreements, more information can be found on the UKB website and participants that had withdrawn consent were excluded from the study.

Articles Study participants
We determined the incidence of AESI within two target populations: a 'Pre-Pandemic Background Population' from 2017 to 2019 and 'Patients with COVID-19' from 2020 to 2022. This work focuses on the 'Patients with COVID-19' using the 'Pre-Pandemic Background Population' to understand the occurrence of those conditions in the absence of either COVID-19 or the vaccines. We defined the 'Pre-Pandemic Background Population' cohort as patients who were observed in a database on the index dates 1 January 2017, 1 January 2018, or 1 January 2019 (i.e., the patient was active in the database on one or any of those dates they would be included in this cohort). The entry event was the date observed and patients were only included if they had at least 365 days of observable time prior to the index date. We used the time prior to the pandemic, 2017 till 2019, as a way to identify patients that we are confident have not been exposed, had, or were experiencing COVID-19. Once the COVID-19 pandemic started it would be hard to be confident a patient did not or does not have the disease.
The 'Patients with COVID-19' cohort was defined as persons with a positive laboratory test for SARS-CoV-2 (any SARS-CoV-2 laboratory testing method) or an observed diagnosis code for COVID-19 (without a negative test on or within 3 days after index). The index date, or entry event, was set to the first of these and did not have to occur on a specific month or day. Additional inclusion rules included keeping patients that had at least 365 days of observable time prior to the index date and having no diagnosis of COVID-19 or SARS-CoV-2 prior to index. The calendar time for this cohort included all time available within each database after 1 December 2019, see Table 1 for specifics (e.g., CPRD_AURUM last date of capture is March 2021). An additional 'Patients with COVID-19' variation cohort was created for sensitivity analysis; it was a similar cohort except with the additional inclusion rule that cohort entry had to be prior to the COVID-19 vaccines becoming available (i.e., index before 1 January 2021); this cohort was called 'Patients with COVID-19 prior to 2021'. Patients were not excluded from the COVID-19 cohorts if they were also in the 'Pre-Pandemic Background Population'. The codes for this COVID-19 definition are listed in the protocol appendix 16 and have been used in prior studies. [17][18][19][20] All databases contributed to both study cohorts ('Pre-Pandemic Background Population' and 'Patients with COVID-19'), except for FIIBAP and IU, which only contributed to the 'Patients with COVID-19' cohort as these databases only contained COVID-19 patients and not a general population.
This study uses coded data that already exist in an electronic database. Confidentiality of patient records was always maintained. All study reports contain only aggregate data and do not identify individual patients or physicians. At no point during the study was identifying information about the subjects used.

Adverse events of special interest (AESIs)
We studied 16 AESIs. This list includes the original 15 prespecified by regulatory authorities and the addition of thrombosis with thrombocytopenia syndrome (TTS) which was an adverse event discovered when COVID-19 vaccinations started. The 16 outcomes include: GBS, facial nerve (Bell's) palsy, anaphylaxis, encephalomyelitis, narcolepsy, appendicitis, non-haemorrhagic stroke, haemorrhagic stroke, acute myocardial infarction (AMI), myocarditis and pericarditis, deep vein thrombosis (DVT), pulmonary embolism, disseminated intravascular coagulation (DIC), immune thrombocytopenia (ITP), transverse myelitis, and the cooccurrence of thrombosis with thrombocytopenia (TWT) as a proxy to TTS. All AESI definitions were used in previous published studies 5 and were based on the FDA's Center for Biologics Evaluation and Research protocol. 21 The AESIs were reviewed within each data partner to ensure the outcome could be reliably captured and if not, that AESI would be excluded for that particular data partner (Appendix 2 provides further detail). The definitions are fully specified in the protocol. 16

Statistical analysis
We defined time at risk as 1-90 days after the study participant index date. Patients contributed time-at-risk from the index date until the earliest of 90 days after the index, they leave the database, or the start date of the AESI event. Death was not explicitly used as a censoring criteria, as the capture of death varies within each database, including some databases that do not capture death at all. We do censor patients at the end of observation, whether that be death or other reason for loss-tofollow-up.
We also excluded individuals for a specific outcome if a prior event had occurred during the clean window for that outcome. A clean window is the minimum time between event occurrences to be considered a new event. We used an outcome-specific clean window, within which an outcome is not considered incident. The clean window was 365 days for all outcomes except 30 days for anaphylaxis and 183 days for facial nerve palsy and encephalomyelitis. 21 The clean window was set different for anaphylaxis, facial nerve palsy, and encephalomyelitis to better define incident cases of these AESI.
Incidence rates were calculated as the total number of events divided by the person time at risk. The incidence rates were stratified by age and sex subgroups for each database. The age subgroups were (in years): 0-5, 6-17, 18-34, 35-54, 55-64, 65-74, 75-84, and ≥85. The age-specific rates for the AESIs were pooled across the databases using a random effects meta-analysis, with the DerSimonian-Laird method to estimate between database variation. The 95% prediction intervals were calculated using the R package "meta". Prediction intervals reflect the expected uncertainty if an estimated rate from another study were included in the metaanalysis.
Indirect standardisation 22 was used to account for differences between the age subgroups and sex distribution in the COVID-19 disease cohort and the 'Pre-Pandemic Background Population'. Within each database, the pre-pandemic cohort was used as the demographic reference, such that an expected incidence rate could be computed using COVID-19-specific incidence rates weighted to reflect the 'Pre-Pandemic Background Population' demographics. COVID-19 cohort observed and expected rates were compared using standardised incidence ratios (SIR) with corresponding exact 95% confidence intervals.
Negative control outcomes were also used to evaluate potential bias in incidence ratio estimates. Twenty negative control outcomes were selected as diseases with no a priori evidence of a causal relationship with COVID-19, based on literature review and clinician adjudication (Appendix 3). Incidence rate and SIRs for these negative control outcomes were estimated in the same manner as for the 16 AESIs.

Role of funding source
There was no funding source for this study. All authors approved the final version, had access to the data, and accept responsibility to submit the final version for publication.

Results
We included 23,840,986 'Patients with COVID-19' from 26 databases representing a diverse set of care settings from North America, Europe, and Asia (Table 1). The percentage of females found across the databases ranged from 47.4% to 56.1%; all databases had slightly more females than males, except IMASIS and JMDC. The average age of persons in each database ranged from 24 to 72 years, reflecting the differing patient populations covered across the databases (e.g., IBM_MDCD has a large child and childbearing-aged female population, while IBM_MDCR represents retired individuals). The 'Pre-Pandemic Background Population' represented more than 492,730,503 person records. Characterisation results produced by CohortDiagnostics. 24 can be found on an interactive web app (https://data.ohdsi.org/Covid19SubjectsAesiIncidence Rate/). Fig. 1 plots the incidence rates for the 'Patients with COVID-19' population, stratified by age group (x-axis), sex (half the x-axis represents females and the other half males), and database (color-coded by database type: claims, EMR, and GP centric data), trellised by the 16 AESIs. Similar age trends across databases were observed. Some AESIs had a clear increase in incidence rate with age: AMI, non-haemorrhagic stroke, DVT, PE, haemorrhagic stroke, Bell's Palsy, TWT, ITP, DIC, and GBS. In contrast, some AESIs had a clear decrease in incidence rate with age: appendicitis and anaphylaxis. Finally, some age trends were less clear: myocarditis and pericarditis, narcolepsy, encephalomyelitis, and transverse myelitis. Fig. 1 also shows substantial database heterogeneity within individual AESIs. For example, for DIC in the 90 days after COVID-19 diagnosis, males aged 35-54 in JMDC experienced the event 2417 per 100,000 person-years as compared to CPRD which experienced the event 5 per 100,000 person-years. Additionally, the incidence rates for 'Patients with COVID-19' were compared to the 'Pre-Pandemic Background Population' in Appendix 4. For most outcomes, across all database/age group/sex combinations, the incidence rates are higher for the 'Patients with COVID-19' compared to the 'Pre-Pandemic Background Population'.
To help give an overview of these results, Fig. 2 shows the pooled estimated age-and sex-stratified incidence rates per 100,000 person years (PY) for AESI events in the 90-days after index for 'Patients with COVID-19'. Nevertheless, the incidence of several AESIs differed greatly between age and sex strata. For example, pulmonary embolism was uncommon for patients less than 35 then becoming common for older individuals. The pattern was similar for AMI, nonhaemorrhagic stroke, and DVT (with the exception of AMI and non-haemorrhagic stroke becoming common at an older age of 55). The incidence rate of haemorrhagic stroke and TWT was primarily uncommon becoming common later in life (75 years and 65 years respectively). The other AESIs were primarily uncommon with ages between 18 and 74, experiencing the event rarely. For comparison, we additionally produced this figure for the 'Pre-Pandemic Background Population' which can be found Appendix 5. Fig. 3 reports the meta-analytic estimates of SIRs comparing the 'Patients with COVID-19' to the 'Pre-Pandemic Background Population'. Twelve of 16 AESIs had ratios above 2 and 7 of 16 AESIs had ratios above 5. Pulmonary embolism had the highest SIRs (11.7 [95% confidence interval 10.1-13.7]), suggesting that the observed incidence of pulmonary embolism in the 90 days after COVID-19 diagnosis was over 11 times higher than expected in a 90-day period for the background population in the pre-pandemic period. The database specific results are reported in Appendix 6. The negative control outcome estimates are provided in Appendix 7 and reveal consistent positive bias in estimated SIR values. SIRs ranged from 0.9 to 5.0 with a majority of generated SIRs above 1.7. The meta-analysis was repeated for 'Patients with COVID-19 prior to 2021' to see if inclusion of COVID-19 vaccinated patients may have impacted the results however the results found were similar (Appendix 8).

Discussion
To our knowledge this is the largest study to date on the descriptive epidemiology of AESIs among the COVID-19 population. In this study we estimated age-and gender-specific incidence rates of 16 AESIs among COVID-19 patients using 26 observational data health sources from around the world. To contextualise our findings, we reported on the age-and gender-SIR, comparing the IR among COVID-19 cohort to the 'Pre-Pandemic Background Population' of the databases. Our findings suggest a consistent trend of an increased risk for multiple AESIs among the COVID-19 cohort when compared to the database 'Pre-Pandemic Background Population', though the magnitude of the risk should be interpreted with caution.
We found considerable heterogeneity in the IR among the COVID-19 cohort reflected by the wide prediction intervals of the pooled age-and sex-specific IR, suggesting that caution is needed when these estimates are used. We also observed considerable variability with age and some with sex groupings, emphasising the need for age-and sex-stratification when assessing risks and benefits of COVID-19 vaccines.
The observed magnitude of heterogeneity across sources within age and sex subgroups suggests that residual differences are present. The remaining heterogeneity may be related to differences in healthcare systems, settings, data capture processes, or true differences in subpopulations. A limitation of this work is we have not furthered stratified by comorbidities related to either COVID- 19

Articles
or obesity) or socio-economic status. These differences may also be due to systematic error, selection bias, or differential outcome measurement error between databases. 25 In a similar analysis across 13 databases, Li et al. 5 reported that AESI background rates varied by database and regions. Consistent to our findings, the authors also found considerable differences in the incidence rates by age and sex, suggesting caution is needed when incidence rates are compared across populations. 5 Among COVID-19 patients, the majority of COVID-19 AESIs were found to be "uncommon" with few "rare" among all age and gender groups investigated. Thrombotic events such as AMI, strokes, DVT, and pulmonary embolism were more frequent compared to other AESIs and were "common" in older COVID-19 patients. These findings are consistent with prior studies suggesting that cardiovascular and thrombotic complications in particular are relatively common post COVID-19, 26,27 especially among older patients. In most databases, the risk of these thrombotic events was higher among COVID-19 patients when compared to the 'Pre-Pandemic Background population' of the databases with a pooled SIR above 3. In particular, the SIR for pulmonary embolism had an elevated trend away from the null in all databases but one. It is well established that COVID-19 is associated with a hypercoagulable state, and older patients with additional risk factors have a worse outcome. Consistent with our results, clinical evidence suggests that COVID-19 is particularly related to pulmonary embolism 28 especially among those with pneumonia. However, until now, the precise incidence of thrombosis in patients with COVID-19 has not been determined, mainly because multiple studies have reported conflicting estimates. 29 The Centers for Disease Control and Prevention (CDC) lists myocarditis and pericarditis, thrombosis with thrombocytopenia syndrome (TTS), presented in this work as TWT, and GBS as three serious types of adverse events following COVID-19 vaccination, with evidence that suggests, although rare, a link to certain types of COVID-19 vaccinations. 30 These three adverse events are also listed on the label of at least one COVID-19 vaccine authorised for use in the US and Europe. Our findings suggest that, while still rare among COVID-19 patients, the risk of these three events may be higher in the COVID-19 patients when compared to the 'Pre-Pandemic Background Population'. Since the estimated incidence ratios are unadjusted and are not presented for causal inference, future investigation may be warranted to fully contextualise the risk benefit balance of the COVID-19 vaccines as it relates to myocarditis and pericarditis, TWT, and GBS.
Multiple studies have concluded that COVID-19 disease may lead to neurological complications leading to GBS, and some studies also highlighted differences in the presentation of the disease, with greater severity of symptoms in GBS associated with COVID-19. [31][32][33] Previous studies have also found that myocarditis and pericarditis may be part of the wide spectrum of cardiovascular sequelae of COVID-19 disease. 32,34,35 A CDC network retrospective cohort study 34 using data electronic medical records across multiple databases in the US, reported that among 814,524 COVID-19 patients the incidence of myocarditis or pericarditis varied by age and gender and ranged between 17.6 and 114.0 per 100,000 for males and between 10.8 and 61.7 per 100,000 for females in a 21 day risk window. The same study reported that the incidences of myocarditis or pericarditis after SARS-CoV-2 infection were higher than after the mRNA COVID-19 vaccination for both males and females in all age groups. To our knowledge, our study is the first to report on TWT incidence among 'Patients with COVID-19'. However, prior work has highlighted the potential misclassification error related to identifying TTS (also known as vaccine-induced thrombotic thrombocytopenia [VITT]) in observational data. 36 The incidence rates of TWT reported in this study were higher among men of older ages which is inconsistent with known trends of TTS/VITT. Thus, these rates should be interpreted with caution.
Our findings suggest that COVID-19 disease itself must be considered when assessing the relationship between COVID-19 vaccines and the AESI. To be specific, our findings suggest that COVID-19 disease may be associated with at least some of the AESIs and may consequently exert a confounding or intermediating effect in the observed association between the vaccines and the AESI. When conducting observational studies investigating the association between the vaccines and the AESI, it is important to control for COVID-19.
A particular strength of this study is that it includes a large number of databases from around the world, covering a sizable study population with diversity in geographical location, underlying populations, pandemic status, health systems, and data types. Our analysis enabled a comprehensive and standardised assessment of incidence rates of AESIs among 'Patients with COVID-19' across multiple settings. This was possible due to the use of the OMOP CDM, which enabled use of the same study design and analytical code in all databases and to gather results from participating data partners rapidly and without transferring patient level data. All outcome definitions, clinical codes, and phenotype algorithms have been made open source and are available online for review and to maximise reproducibility and reuse.
A limitation of this analysis was we did not differentiate for the multiple variants of COVID-19, and we did not consider recurrent COVID-19. This means we cannot categorise or compare the differences in AESIs associated with the different variants or in patients who had COVID-19 multiple times. The incidence of the AESIs may change depending on the variant or number of infections, so this work cannot describe the incidence of an AESI for a given variant, be representative of future variants, or describe what happens for patients who had multiple infections of COVID-19.
An additional limitation of the analysis of the SIRs is that they do not fully account for confounding and were shown to exhibit bias with negative controls and therefore should not be interpreted as causal effect estimates. Also, the use of a historical population as a comparator is known to be associated with an elevated type 1 error, which may bias the results away from the null 37 (e.g., individuals with 'Patients with COVID-19' disease my tend to receive more clinical attention or examination than individuals in the 'Pre-Pandemic Background Population'). Results from negative control outcomes in this work show a consistent positive bias (away from the null). These findings indicate that most associations observed in our study are potentially larger than the magnitude of the bias observed using negative controls. Another limitation of this study is that all outcomes and the COVID-19 definition itself are subject to measurement error. While most of the outcome definitions have been used in prior studies and were reviewed by clinicians and data experts, they were mainly based on the presence of specific diagnostic codes and were not validated further. There was significant heterogeneity in COVID-19 test and disease ascertainment across databases. COVID-19 severity is an important factor that was outside the scope of this analysis but may have affected the risk of AESIs and probably varied by database. When defining the 'Pre-Pandemic Background Population', we used data from 2017 to 2019 of all people in each database with more than 365 days of observation indexed on 1 January. The impact of these design decisions, in particular the index date (anchoring effect) has been shown to influence rate estimates, however the effect of season has been shown to be moderate. 25 Finally, there are some limitations related to participating databases. Information on hospital admissions was not available in all primary care databases used (Table 1) resulting in inpatient events not being captured. EHR databases are subject to incomplete capture of medical events that may occur but are recorded outside the participating health system. The bias of incomplete information was partially mitigated by including only those patients who had at least one year of continuous observation but defining continuous observation can be challenging across disparate databases. Administrative claims databases offered a potentially complete data capture but lacked some important data elements such as laboratory test results. We mitigated some of these limitations by providing within database incidence rate comparison. Additionally, the repeated influxes of vast numbers of critically ill COVID-19 cases suddenly overwhelmed hospitals and we cannot exclude changes in the organisation and thus the quality of coding in code-based administrative database. Finally, all our databases represented subsets of the population in which they originate, which poses a risk of selection bias.
Our study assessed the descriptive epidemiology of the occurrence of AESIs in the 90-days after COVID-19 disease. These results found large variations in the rates of AESIs in 'Patients with COVID-19' across age groups and sex, showing the need for stratification. Considerable database heterogeneity was found across the AESIs, suggesting individual study estimates should be interpreted with caution. Comparing the 'Patients with COVID-19' to the 'Pre-Pandemic Background Population' showed a fairly consistent elevated rate in experiencing an AESI within 90-days after index. This elevated risk we see both consistently across the database stratified results as well as the meta-analysis. The results of this work are of public health importance as they help put into perspective the risk of the AESIs post vaccination versus post SARS-CoV-2 infection.

Data sharing statement
The results of the analysis can be both found at https://data.ohdsi.org/ Covid19SubjectsAesiIncidenceRate/ and an export of the incidence rates can be found in Appendix 4.
Outside the license data previously described, this study was performed as a federated network study, meaning the data remained with the data partner. Individual organisations would need to be contacted in order to gain access to those data assets.