Six-month sequelae of post-vaccination SARS-CoV-2 infection: A retrospective cohort study of 10,024 breakthrough infections

Vaccination has proven effective against infection with SARS-CoV-2, as well as death and hospitalisation following COVID-19 illness. However, little is known about the effect of vaccination on other acute and post-acute outcomes of COVID-19. Data were obtained from the TriNetX electronic health records network (over 81 million patients mostly in the USA). Using a retrospective cohort study and time-to-event analysis, we compared the incidences of COVID-19 outcomes between individuals who received a COVID-19 vaccine (approved for use in the USA) at least 2 weeks before SARS-CoV-2 infection and propensity score-matched individuals unvaccinated for COVID-19 but who had received an influenza vaccine. Outcomes were ICD-10 codes representing documented COVID-19 sequelae in the 6 months after a confirmed SARS-CoV-2 infection (recorded between January 1 and August 31, 2021, i.e. before the emergence of the Omicron variant). Associations with the number of vaccine doses (1 vs. 2) and age (<60 vs. ≥ 60 years-old) were assessed. Among 10,024 vaccinated individuals with SARS-CoV-2 infection, 9479 were matched to unvaccinated controls. Receiving at least one COVID-19 vaccine dose was associated with a significantly lower risk of respiratory failure, ICU admission, intubation/ventilation, hypoxaemia, oxygen requirement, hypercoagulopathy/venous thromboembolism, seizures, psychotic disorder, and hair loss (each as composite endpoints with death to account for competing risks; HR 0.70–0.83, Bonferroni-corrected p < 0.05), but not other outcomes, including long-COVID features, renal disease, mood, anxiety, and sleep disorders. Receiving 2 vaccine doses was associated with lower risks for most outcomes. Associations between prior vaccination and outcomes of SARS-CoV-2 infection were marked in those <60 years-old, whereas no robust associations were observed in those ≥60 years-old. In summary, COVID-19 vaccination is associated with lower risk of several, but not all, COVID-19 sequelae in those with breakthrough SARS-CoV-2 infection. The findings may inform service planning, contribute to forecasting public health impacts of vaccination programmes, and highlight the need to identify additional interventions for COVID-19 sequelae.


Introduction
The observation that individuals can be infected with SARS-CoV-2 after being vaccinated against COVID-19 (so-called breakthrough infections) has caused concerns (Nixon and Ndhlovu, 2021). These concerns are mitigated by abundant evidence that the risk of severe COVID-19 illness (as proxied by hospitalisation, admission to intensive care unit, and mortality) is lessened by vaccination (Agrawal et al., 2021;Antonelli et al., 2022;Bahl et al., 2021;Butt et al., 2021;Cabezas et al., 2021;Glatman-Freedman et al., 2021;Haas et al., 2021;Hyams et al., 2021;Mateo-Urdiales et al., 2021;Roest et al., 2021). One case-control study investigated the association between self-reported SARS-CoV-2 infection in 908 pairs of vaccinated and unvaccinated individuals and self-reported symptoms beyond 28 days (Antonelli et al., 2022). It found that compared to unvaccinated individuals, those with breakthrough infection were at a lower risk of symptoms beyond 28 days.
However, how COVID-19 vaccination affects the broad spectrum of sequelae of SARS-CoV-2 infection remains elusive. In particular, it is unknown if vaccinated individuals are at the same risk as unvaccinated individuals of venous thromboembolisms, ischaemic strokes, neuropsychiatric complications, long-COVID presentations, and other postacute sequelae following infection with SARS-CoV-2. In addition, unvaccinated people might have health behaviours related to vaccination hesitancy (Latkin et al., 2021), and previous studies have not tried to eliminate this potential source of bias.
This cohort study based on electronic health records compares the 6months outcomes of SARS-CoV-2 infection among individuals who were (vs. those who were not) vaccinated against COVID-19. A range of outcomes (both acute and post-acute) with documented associations with COVID-19 across multiple body systems were investigated.

Data and study design
The study used TriNetX Analytics, a federated network of linked EHRs recording anonymized data from 59 healthcare organizations (HCOs), primarily in the USA, totalling 81 million patients. Available data include demographics, diagnoses (using ICD-10 codes), procedures (Current Procedural Terminology [CPT] codes), and measurements (e.g. blood pressure). The HCOs consist in a mixture of primary care centres, hospitals, and specialist units. They provide data from uninsured as well as insured individuals. Data de-identification is attested to and receives a formal determination by a qualified expert as defined in Section §164.514(b)(1) of the HIPAA Privacy Rule. This formal determination supersedes TriNetX's waiver from the Western Institutional Review Board (IRB). Using the TriNetX user interface, cohorts are created based on inclusion and exclusion criteria, matched for confounding variables, and compared for outcomes of interest over specified time periods. For further details about TriNetX, see Appendix (methods A1).

Cohorts
Both the primary and control cohorts were defined as all patients who had, between January 1, 2021 and August 31, 2021, a confirmed SARS-CoV-2 infection, which we defined as either a confirmed diagnosis of COVID-19 (ICD-10 code U07.1) or a first positive PCR test for SARS-CoV-2. In the primary cohort, patients were included only if their confirmed SARS-CoV-2 infection occurred at least 14 days after a recorded administration of a COVID-19 vaccine approved for use in the USA (i.e. BNT162b2 'Pfizer/BioNTech', mRNA-1273 'Moderna', or Ad26.COV2.S 'Janssen'). In the control cohort, patients were included only if no vaccine against COVID-19 was recorded before their SARS-CoV-2 infection and if they had received a vaccine against influenza at any time. In the USA, the Centers for Disease Control and Prevention (CDC) recommend yearly influenza vaccination to everyone over the age of 6 months. This inclusion criterion thus excludes patients with obvious vaccine hesitancy (so-called 'anti-vaxxers') as this is correlated with other health-related behaviours that might confound associations with COVID-19 outcomes (Latkin et al., 2021). However, people with specific hesitancy towards COVID-19 vaccine but not other vaccines would still be included in this cohort. More details are provided in the Appendix (methods A2).

Covariates
A set of established and suspected risk factors for COVID-19 and for more severe COVID-19 illness was used (de Lusignan et al., 2020;Taquet et al., 2021b;Williamson et al., 2020): age, sex, race, ethnicity, obesity, hypertension, diabetes, chronic kidney disease, asthma, chronic lower respiratory diseases, nicotine dependence, substance misuse, psychotic, mood, and anxiety disorders, ischaemic heart disease and other forms of heart disease, socioeconomic deprivation, cancer (and haematological cancer in particular), chronic liver disease, stroke, dementia, organ transplant, rheumatoid arthritis, lupus, psoriasis, and disorders involving an immune mechanism. To capture these risk factors in patients' health records, 55 variables were used. More details including ICD-10 codes are provided in the Appendix (methods A3). Cohorts were matched for all these variables, as described below. In addition, cohorts were stratified by the date of the SARS-CoV-2 infection in 2-monthly periods (January 1 to February 28, 2021, March 1 to April 30, 2021, May 1 to June 30, 2021, and July 1 to August 31, 2021) and matching was achieved independently within each period (guaranteeing that as many patients in the matched cohorts had their SARS-CoV-2 infection in every 2-month period).

Outcomes
We investigated the 6-month incidence of all acute and post-acute outcomes which have been shown to be significantly associated with COVID-19 in four large-scale studies based on electronic health records (Al-Aly et al., 2021;Daugherty et al., 2021;Taquet et al., 2021bTaquet et al., , 2021a Each outcome was defined as the set of corresponding ICD-10, CPT, and VA Formulary codes as specified in the original study which showed their association with COVID-19. To account for death as a competing risk and thus address survivorship bias, each outcome was analysed as part of a composite outcome with death as the other component (Manja et al., 2017). While necessary to address survivorship bias, this approach is unlikely to provide insight into the risk of outcomes which are much rarer than death itself. The analysis was performed on October 12, 2021. More details, and ICD-10/CPT/VA Formulary codes, are provided in the Appendix (methods A4).

Secondary analyses
We assessed whether the associations between prior vaccinations and outcomes of SARS-CoV-2 infection were moderated by the number of vaccine doses received and by age at the time of infection. This was achieved by restricting the primary cohort to (i) those who had received only one vaccine dose at least 14 days before infection, (ii) those who had received two vaccine doses 14 days before infection, (iii) those <60 years-old, and (iv) those ≥60 years-old. For the latter two subgroup analyses, the control cohorts were also restricted to those <60 and ≥60 years-old respectively. For completeness, we also stratified the analysis by time since first vaccine dose by restricting the primary cohort to (v) those who had COVID-19 at least 4 months after their first vaccine dose, and (vi) those who had COVID-19 between 2 weeks and 4 months after their first vaccine dose. However, we note that the value of this analysis is limited because longer times between the first vaccine dose and COVID-19 diagnosis makes it more likely that a second dose was given in the interim; this is an unavoidable confounder of this analysis (see appendix methods A5).

Statistical analyses
Propensity score matching (carried out within the TriNetX network) was used to create cohorts with matched baseline characteristics (Austin, 2011). Propensity score 1:1 matching used a greedy nearest neighbour approach with a caliper distance of 0.1 pooled standard deviations of the logit of the propensity score. Any characteristic with a standardized mean difference (SMD) between cohorts lower than 0.1 is considered well matched (Haukoos and Lewis, 2015). The Kaplan-Meier estimator was used to estimate the incidence of each outcome. Hazard ratios (HR) with 95% confidence intervals were calculated using the Cox model and the null hypothesis of no difference between cohorts was tested using log-rank tests. The proportional hazard assumption was tested using the generalized Schoenfeld approach. When the assumption was violated, a time-varying HR was assessed using natural cubic splines fitted to the log-cumulative hazard (Royston and Parmar, 2002). The contribution of the individual outcomes of interest within the composite endpoint (with death as the other component) was reported as the number of events of interest over the total number of events (e.g. the number of respiratory failures over the number of respiratory failures or deaths).
We statistically tested whether the associations between prior COVID-19 vaccine and outcomes of SARS-CoV-2 infection was moderated by the number of vaccine doses. This was achieved for each outcome by testing whether the ratio between the HR in those who had received 1 dose and the HR in those who had received 2 doses at the time of infection was statistically significantly different from 1. Similarly, moderation by age was achieved by testing whether the ratio between the HR in those ≥60 and the HR in those <60 years-old was statistically significantly different from 1.
Further details are provided in the Appendix (methods A6). Statistical analyses were conducted in R version 3.6.3 except for the log-rank tests which were performed within TriNetX. Statistical significance was set at two-sided p-values < 0.05. Bonferroni correction for multiple comparisons was applied to correct for the simultaneous assessments of 45 outcomes in the primary analysis. A REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) statement was completed (see Appendix).

Results
A total of 10,024 individuals with SARS-CoV-2 infections recorded at least 2 weeks after a first dose of vaccine against COVID-19 were identified (mean [SD] age at infection: 57.0 [17.9] years-old, 59.4% female). Among them, 65.1% were vaccinated with BNT162b2 'Pfizer/Bio-NTech', 9.0% with mRNA-1273 'Moderna', 1.6% with Ad26.COV2.S 'Janssen', and 24.4% with unspecified subtype. 9479 of these individuals were matched to 9479 individuals with SARS-CoV-2 infections who did not have a COVID-19 vaccine before SARS-CoV-2 infection. The main demographic features and comorbidities of both cohorts are summarised in Table 1 (additional baseline characteristics presented in Table A1 in the Appendix). Adequate propensity-score matching (standardised mean difference < 0.1) was achieved for all comparisons and baseline characteristics and all subgroups (Tables A2-A7 in the Appendix).
Besides the outcomes shown to be significantly associated with vaccination in the primary analysis, those who had received two vaccine doses at the time of SARS-CoV-2 infection were also at a significantly lower risk of myalgia, myocarditis, cerebral haemorrhage, interstitial lung disease, urticaria, anosmia, and myoneural junction/muscle disease (each as composite endpoints with death), and at a significantly lower risk of death ( Fig. 3 and Appendix Fig. A6 and Table A10). In those who had received only one vaccine dose, HRs followed a similar pattern, though they were generally closer to 1 (Fig. 3 and Appendix Fig. A7 and Table A11). For each of the outcome, the HR in those who had received 2 doses was not statistically significantly different from the HR in those who had received 1 vaccine dose (Appendix Table A12). Results for the analysis stratified by time since the first dose of the vaccine are presented in the Appendix (Table A13-A14) but these results should be interpreted cautiously given that longer times between the first vaccine dose and COVID-19 diagnosis makes it more likely that a second dose was given in the interim.
We found a substantial effect of age on the results. Many HRs in younger individuals (<60 years-old) were in general lower (i.e. favouring vaccination even more) for outcomes significantly associated with vaccination ( Fig. 3 and Appendix Fig. A8 and Table A15). These lower HRs were observed against the backdrop of lower absolute risks for most outcomes among younger individuals (Appendix Table A15). In contrast, in individuals ≥ 60 years-old, although most trends were similar to those in the younger group, none of the HRs were statistically significantly different from 1 after Bonferroni correction ( Fig. 3 and Appendix Fig. A9 and Tables A16).
For all but one outcomes associated with vaccination status in the primary analysis, the HR in the older group was significantly higher (i.e. favouring vaccination less) than the HR in the younger group (Appendix Table A17). For instance, the HR for the composite outcome of death or respiratory failure was 0.48 (95% CI 0.39-0.60) in individuals < 60 years-old and 0.85 (95% CI 0.76-0.96) in those ≥ 60 years-old (ratio between the two: 1.77, 95% CI 1.38-2.26, p < 0.0001). The only exception was the composite of death and hypercoagulopathy/DVT/PE (HR in the older group 0.93, HR in the younger group 0.72, ratio between the two 1.30, 95% CI 1.00-1.70, p = 0.054).

Discussion
It is established that vaccination protects against hospitalisation, ICU admission, and death from COVID-19 (Agrawal et al., 2021;Antonelli et al., 2022;Bahl et al., 2021;Butt et al., 2021;Cabezas et al., 2021;Glatman-Freedman et al., 2021;Haas et al., 2021;Hyams et al., 2021;Mateo-Urdiales et al., 2021;Roest et al., 2021) but little was known about other outcomes of breakthrough SARS-CoV-2 infections. The data presented in this study, from a large-scale electronic health records network, confirm that vaccination protects against death and ICU admission following breakthrough SARS-CoV-2 infection and provide estimates of HR for these outcomes in a general population. Our study also shows that vaccination against COVID-19 is associated with lower risk of additional outcomes that had not been assessed in previous studies, namely respiratory failure, hypoxaemia, oxygen requirement, hypercoagulopathy or venous thromboembolism, seizures, psychotic disorder, and hair loss.
On the other hand, previous vaccination did not appear to be protective against several previously documented outcomes of COVID-19 such as long-COVID features, arrhythmia, joint pain, type 2 diabetes, liver disease, sleep disorders, and mood and anxiety disorders. The narrow confidence intervals (related to the high incidence of these outcomes post-COVID) rules out the possibility that these negative findings are merely a result of lack of statistical power. The inclusion of death in a composite endpoint with these outcomes rules out survivorship bias as an explanation. Instead, these negative findings might indicate that these outcomes arise through different pathophysiological mechanisms than outcomes which are affected by prior vaccination. For example, for anxiety and depression, it might be that antagonistic forces are at play with the protective effect of the vaccine being counteracted Table 1 Baseline characteristics and major outcomes for the vaccinated and unvaccinated cohorts, before and after propensity score matching. Only characteristics with an overall prevalence above 5% after matching are presented here; for additional baseline characteristics see appendix Table A1. Vaccinated (unmatched) Unvaccinated ( by the additional stressor of being infected despite being vaccinated. The absence of a protective effect against long-COVID features is concerning given the high incidence and burden of these sequelae of COVID-19 (Taquet et al., 2021a). However, the risk of several individual long-COVID features were significantly associated with prior vaccination (but did not survive correction for multiple comparisons): myalgia (HR 0.78, 95% CI 0.67-0.91), fatigue (HR 0.89, 95% CI 0.81-0.97), and pain (HR 0.90, 95% CI 0.81-0.99), with potentially additional protection after a second dose of the vaccine against abnormal breathing (HR 0.89, 95% CI 0.81-0.98) and cognitive symptoms (HR 0.87, 95% CI 0.76-0.99). Relative differences in the incidence of individual long-COVID features might explain why our findings differ from those of an app-based survey suggesting that vaccination is overall protective against long-COVID symptoms (Antonelli et al., 2022). Other reasons might explain this difference. First, the present study used data routinely collected from a general population rather than self-selected individuals. Second, this study uses a large sample size and matches cohorts on a broad range of recorded comorbidities. To further decrease selection bias, the control cohort of this study was selected among those having received an influenza vaccine thus helping control for the confounding effect of vaccine hesitancy and related health behaviours (Latkin et al., 2021). Third, by using symptoms and diagnoses extracted from health records rather than self-reported, our study might capture the most severe presentation of symptoms and these might be less prone to demand characteristics and other forms of detection bias. Finally, there might be differences in the type of vaccines used between the two study. In Fig. 1. Hazard ratios for the outcome within 6 months of infection with SARS-CoV-2 between individuals vaccinated vs. unvaccinated against COVID-19. HR lower than 1 indicate outcomes less common among vaccinated individuals. Horizontal bars represent 95% confidence intervals. Each outcome is a composite outcome with death as a component to address competing risks. The contribution of the outcome of interest to the overall incidence of the composite endpoint is encoded by the colour. particular, no ChAdOx1 nCov-19 ('Oxford/AstraZeneca') vaccine was used in our study (as this vaccine was not used in the USA).
The findings that vaccination against SARS-CoV-2 does not protect against some of the post-acute outcomes of COVID-19 should not obscure the fact that vaccination remains an important protective factor against these outcomes at the population level, since the best way to prevent those outcomes is to prevent SARS-CoV-2 infection in the first place. This is also the case in those ≥60 years-old (Polack et al., 2020). However, these findings highlight that some post-acute outcomes of SARS-CoV-2 (and notably long-COVID presentations) are likely to persist even after successful vaccination of the population, so long as breakthrough infections occur. These findings thus help in determining the necessary service provision. They also underline the urgency to identify other preventive or curative interventions to mitigate the impact of such COVID-19 sequelae.
Death was included in a composite endpoint with each outcome of interest to address competing risks (Manja et al., 2017). This implies that for outcomes much rarer than death (which is the minority of outcomes investigated), HRs might be driven by death rather than the outcome of interest. For instance, the 6-month incidence of psychotic disorders in the unvaccinated cohort was 1.11% whereas the death rate was 4.5% in this cohort. Hence the HR of 0.75 for the composite endpoint of death and psychotic disorder is in part driven by differences in death rate. Observing the HR for the outcome of interest in isolation (i.e. not composed with death) is informative so long as it is lower than 1. A HR lower than 1 suggests a lower rate of the outcome of interest despite survivorship bias (which in the present study increases HRs). If the HR is higher than 1, it is impossible to know whether the outcome is more common among vaccinated individuals or if it is a result of survivorship bias. HRs for all outcomes in isolation are provided in Appendix Tables A18-A22 for the primary and secondary analyses. For instance, the HR for psychotic disorder measured in isolation was 0.79, indicating that the incidence of this outcome might indeed be lower in vaccinated than unvaccinated individuals and its significant HR might not be merely driven by differences in death rates.
Importantly, the protective effects of the vaccine against acute severity of infection and some post-acute sequelae appears to affect primarily those <60 years-old. In this group, the effects were large and robust, whereas in the ≥60 year-old group, effects were smaller and not statistically robust. This finding regarding age cannot be explained by differences in statistical power since the younger and older subgroups have similar sample sizes (4633 and 4657 respectively) and outcomes are in general more frequent in the older subgroup. Instead, it might be due to differences in contributions of pathophysiological pathways of breakthrough infections between younger and older individuals. In younger patients, effective B-cell response to vaccination might be followed by infection with variants against which antibodies have less neutralising activity (Wu et al., 2021). In older patients, the B-cell response to vaccination might itself be ineffective (Siegrist and Aspinall, 2009) so that the clinical presentation of breakthrough infection is similar to that of infection in unvaccinated individuals. Importantly, among older people, the vaccine remains effective at preventing severe COVID-19 outcomes by preventing COVID-19. Therefore our findings do not contradict reports of vaccine effectiveness against severe COVID-19 outcomes in older people for vaccine effectiveness also counts the number of outcomes prevented by preventing infection altogether.
This study has several limitations beyond those inherent to research using EHRs (Casey et al., 2016;Taquet et al., 2021b) (summarised in the Appendix methods A1) such as the unknown completeness of records, no validation of diagnoses, and sparse information on socioeconomic and lifestyle factors. First, we do not know which SARS-CoV-2 variant individual patients were infected with and this might affect the protective effect of vaccines. There is evidence that variants of concerns are overrepresented in breakthrough infections (McEwen et al., 2021). Since such variants tend to be associated with worse outcomes (Nyberg et al., 2021), their enrichment in breakthrough infections means that some HRs presented in this study might be conservative estimates. Given the time scale of the study, no individual infected with the B1.1.529 ('Omicron') variant were included. Second, it might be that vaccination status affects the probability to seek or receive medical attention, particularly for less severe outcomes. Third, this study says nothing about the outcomes in patients infected with SARS-CoV-2 but who did not have it recorded in their EHR (e.g. because they did not get tested). This implies that our data are probably enriched for more severe forms of COVID-19 which may influence the hazard ratios observed (in particular the apparent lack of association in the older subgroup). Fourth, this study was not designed to investigate whether the association between vaccination status and outcomes of subsequent SARS-CoV-2 infection was moderated by time interval between vaccine and infection; although we conducted this as a secondary analysis, its results should be interpreted with caution, as noted earlier. Fifth, we could not compare the different vaccines against each other since the majority had received the Pfizer/BioNTech vaccine. Sixth, prior infection was not used as a comparator to assess whether individuals with a second occurrence of SARS-CoV-2 infections are at a lower risk of COVID-19 outcomes. Seventh, no adjustment was made for the medications used at the time of COVID-19 infection. Finally, as an observational study, causation cannot be inferred (though the specificity of the association with some outcomes and not others and the dose-response relationship support causality as an explanation).
In summary, the present data show that prior vaccination against COVID-19, especially after two doses, is associated with significantly less risk of many but not all outcomes of COVID-19, in younger but not older individuals. These findings may inform service planning, contribute to forecasting public health impacts of vaccination programmes, and highlight the urgent need to identify or develop additional preventive and curative interventions for sequelae of COVID-19.

Role of funding source
The funding source had no role in study design; in the collection, analysis, and interpretation of data; in the writing of the report; and in the decision to submit the paper for publication.

Fig. 3.
Hazard ratios for the outcomes within 6 months of infection with SARS-CoV-2 between individuals who received one dose of the vaccine (vs. unvaccinated individuals), those who received two doses of the vaccine (vs. unvaccinated individuals), vaccinated vs. unvaccinated individuals under the age of 60, and vaccinated vs. unvaccinated individuals over the age of 60. HR lower than 1 indicate outcomes less common among vaccinated individuals. Horizontal bars represent 95% confidence intervals. Each outcome is a composite outcome with death as a component to address competing risks. The contribution of the outcome of interest to the overall incidence of the composite endpoint is encoded by the colour. Only a subset of representative outcomes is displayed. The same figures with all outcomes are presented in the Appendix Fig. A6-A9.

Contributors
MT and PJH conceptualised the study, acquired funding, defined the methodology, and were involved in project administration and supervision. MT and QD curated the data, did the formal analyses, and validated the findings. MT created visualizations and wrote the original draft. MT, QD, and PJH reviewed and edited the manuscript for style and content. PJH and MT were granted unrestricted access to the TriNetX Analytics network for the purposes of research, and with no constraints on the analyses performed nor the decision to publish. The corresponding author had full access to all the data in the study, conducted the analysis, and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.