Biological factors associated with long COVID and comparative analysis of SARS-CoV-2 spike protein variants: a retrospective study in Thailand

Background Post-acute COVID-19 syndrome (long COVID) refers to the persistence of COVID-19 symptoms or exceptional symptoms following recovery. Even without conferring fatality, it represents a significant global public health burden. Despite many reports on long COVID, the prevalence and data on associated biological factors remain unclear and limited. This research aimed to determine the prevalence of long COVID during the two distinct epidemic periods in Thailand, due to the Delta and Omicron variants of SARS-CoV-2, and to investigate the biological factors associated with long COVID. In addition, the spike protein amino acid sequences of the Delta and Omicron variants were compared to determine the frequency of mutations and their potential biological implications. Methods A retrospective cross-sectional study was established to recruit confirmed COVID-19 participants at Maharat Nakhon Ratchasima Hospital who had recovered for at least three months and were infected between June 2021 and August 2022. The demographic data and long COVID experience were collected via telephone interview. The biological factors were analyzed through binary logistic regression. The datasets of the SARS-CoV-2 spike protein amino acid sequence of the Delta and Omicron variants in Thailand were retrieved from GIDSAID to determine mutation frequencies and to identify possible roles of the mutations based on published data. Results Data was collected from a total of 247 participants comprising 106 and 141 participants of the Delta and Omicron epidemic periods, respectively. Apart from the COVID-19 severity and health status, the baseline participant data of the two time periods were remarkably similar. The prevalence of long COVID observed in the Omicron period was higher than in the Delta period (74.5% vs. 66.0%). The biological factors associated with long COVID were epidemic variant, age, treatment with symptomatic medicines, and vaccination status. When the spike protein sequence data of the two variants were compared, it was observed that the Omicron variant exhibited a greater quantity of amino acid changes in its receptor-binding domain (RBD) and receptor-binding motif (RBM). The critical changes of the Omicron variant within these regions had a significant function in enhancing virus transmissibility and host immune response resistance. Conclusion This study revealed informative data associated with long COVID in Thailand. More attention should be given to long COVID caused by unique virus variants and other biological factors to prepare a healthcare management strategy for COVID-19 patients after recovery.

may have different effects on the occurrence of long COVID symptoms.Our understanding is that there is no published data that indicated a correlation between viral genetic markers and long COVID outcomes.This is because the majority of researchers believe that long COVID is associated with specific host genetic variants (Batiha et al., 2022).
Besides the genetic variants of SARS-CoV-2, long COVID might be connected to several biological host variables.The development of long COVID has been associated with several characteristics, including the severity of the illness, aging, gender, and rising levels of certain inflammatory markers (Batiha et al., 2022).Nevertheless, the data pertaining to the Asian population, particularly Thai, is extremely limited.This study aimed to identify biological risk factors associated with the development of long COVID in Thai, specifically focusing on the SARS-CoV-2 variants and host factors.Furthermore, the amino acid variations found in the spike protein of the SARS-CoV-2 Delta and Omicron variants were compared for their role in pathogenesis and transmission.

Study design and participants
This cross-sectional study collected retrospective data from confirmed COVID-19 cases who acquired the SARS-CoV-2 infection during the Delta and Omicron epidemic periods, as reported by the Department of Medical Science of the Ministry of Public Health in Thailand (DMSc, 2023).Data were analyzed to determine the difference in prevalence and characteristics of long COVID symptoms between the two periods and to identify which biological factors were associated with long COVID.A minimum sample size of 206 positive COVD-19 cases was required, which was calculated using the n4Studies application (https://www.facebook.com/n4Studies/)based on an estimation infinite population proportion method by setting a prevalence difference proportion of 16% (ONS, 2022), and an alpha error of 0.05.Based on the medical records of the confirmed SARS-CoV-2 RT-qPCR positive cases at Maharat Nakhon Ratchasima Hospital from June to November 2021 (Delta period) and from December 2021 to the first week of August 2022 (Omicron period), all confirmed COVID-19 cases who were ≥ 18 years old, infected by SARS-CoV-2 for the first time within these periods, and had recovered from the infection for a minimum of three months were included as eligible participants.They were approached at random via telephone between March and April 2023 and requested to provide verbal consent for a single remote interview via telephone.To reduce the recall bias that influenced the validity assessment, we conducted a primary evaluation of the memorization participants by asking the time period of their infections.Fortunately, all participants were able to recall the disease period.Individuals who were pregnant during the COVID-19 infection, missed contact, had no recollection of long COVID experiences, passed away prior to interview, or lacked the ability to communicate in Thai were excluded from the study.
A peer-evaluated questionnaire was employed to record and analyze retrospective self-report data from each participant during the interview process.The self-reported questionnaire asked about basic demographic data and the experience of COVID-19 disease, including COVID-19 severity, the presence of long COVID symptoms, drug(s) for treatment, and vaccine status.According to the World Health Organization (WHO, 2022a), the definition of long COVID in this study is the persistence of existing infection symptoms or the emergence of new symptoms within three months of the initial SARS-CoV-2 infection.These symptoms must persist for a minimum of two months without any other explanation.In this investigation, the spectrum of long COVID disorders is categorized into seven systems, including cardiovascular: chest pain, fast-beating or heart palpitations; neurology: headaches, lightheadedness, pins-and-needles feelings, change in smell or taste, difficulty thinking or concentrating; dermatology: hair loss, rash; psychology: depression or anxiety, sleep problems; respiratory: difficulty breathing or shortness of breath, coughing; generalized symptoms of fatigue and myalgia; and others: diarrhea, stomach pain, constipation, etc.
The questionnaire for COVID-19 severity was evaluated by employing the specific criteria classification explanation, which classified the condition as mild, moderate, or severe according to the NIH categories (NIH, 2022).Participants were asked regarding the drug(s) they used for treatment in the five most used categories in Thailand, which include Andrographolide herbal medicine, Flavipiravir, Molnupiravir, symptomatic medicines (any medical therapy of a disease that only affects its symptoms, not the underlying cause, such as antipyretic medicine, cold medicine, cough medicine, etc.), and steroids and others.
This study was approved by the Ethical Committee of the Maharat Nakhon Ratchasima Hospital Institutional Review Board (MNRH IRB), Ministry of Public Health, Thailand (Certificate No. 139/2022).Participant identifiers were removed from the data and replaced with a participant code.The collected data will be destroyed immediately after publication.

Comparing genetic variations in the spike gene between the Delta and Omicron variants
The Global Initiative for Sharing Avian Influenza Data (GISAID) database (https: //gisaid.org) is a free, open-access database that is available online.Its purpose is to facilitate the rapid exchange of data from key pathogens, such as influenza, SARS-CoV-2, respiratory syncytial virus (RSV), Mpox virus, and arboviruses such as chikungunya, dengue, and Zika.The database also offers genetic sequence and related clinical and epidemiological data associated with human viruses, as well as geographical and species-specific data associated with avian and other animal viruses, to further the informative data on the evolution and transmission of viruses during epidemics and pandemics.
In this study, GISAID database was accessed for datasets of Delta and Omicron SARS-CoV-2 variants collected in Thailand between July 1, 2021 and August 31, 2022.We collected only complete sequence datasets, which included 206 Delta variant sequences and 292 Omicron sequences.These genetic sequences are derived from the study objectives, who have different backgrounds in terms of gender, clinical status, and living areas in Thailand.Mutations causing amino acid changes within the spike protein of the two variants were examined to compare the percentage of accumulated mutation frequency.Subsequently, key mutations within each variant defined as having a difference in the percentage of accumulated mutation frequency over 50% were reviewed to determine whether their roles associated with pathogenesis, transmission, and immune evasion of the SARS-CoV-2 variant might help to explain a long COVID outcome.

Statistical analysis
All data were analyzed using the Statistical Package for the Social Sciences (Version 25.0; SPSS Inc., Chicago, IL, USA).Calculation of frequency, percentage, mean and standard deviation was used to characterize the baseline demographic data, to compare the data of long COVID between the Delta and Omicron epidemic periods, and to compare the genetic variations in the spike protein of the SARS-CoV-2 variants.A binary logistic regression test with enter method was used to determine the association between biological factors and long COVID outcome.The strength of association was determined using an odds ratio at 95% confidence interval and a P-value < 0.05 was considered as significant association.Furthermore, the model summary or goodness of fit derived from a binary logistic regression analysis was elucidated by the values of -2Log Likelihood (-2ll) and Nagelkerke R 2 (Pseudo R 2 ).

Demographic characteristics
Between June 2021 and August 2022, during the Delta and Omicron epidemic periods in Thailand, a total of 490 confirmed COVID-19 cases at Maharat Nakhon Ratchasima Hospital were randomly asked for informed consent to participate in a telephone interview.A total of 251 subjects enrolled voluntarily; however, 247 were selected for analysis based on the defined inclusion and exclusion criteria.The number of participants who were infected by SARS-CoV-2 during the Delta and Omicron epidemic periods was 106 and 141, respectively.The flowchart of the subject enrollment process is shown in Fig. 1.
The baseline demographics and clinical characteristics of confirmed COVID-19 cases of the Delta and Omicron epidemic periods were compared are shown in Table 1.The mean age of the participants was 40.75 ± 14.20 years during Delta and 35.61 ± 11.70 years during Omicron.Female gender, age range of 18 to 29 years, obesity, and blood group O were the most prevalent characteristics of the participants in both epidemic periods.Nevertheless, during the Delta period, more than half of the cases (n = 54, or 50.9%) involved immunocompromised hosts who had at least one underlying disease.Hypertension, diabetes, hypercholesterolemia, asthma, and allergies were common underlying conditions.In contrast, infection with SARS-CoV-2 during the Omicron period primarily affected healthy hosts (n = 125, 88.7%).Almost one-third of the Delta cases (n = 38, 35.8%) exhibited symptoms classified as moderate to severe, whereas the Omicron cases predominantly presented mild symptoms.The antiviral drug Favipiravir and symptomatic drug usage was the key treatment utilized by the participants during both variant periods.The majority of the participants affected during the Delta period had not yet been immunized against COVID-19 (n = 63, 59.4%).Conversely, a significant numbers of participants affected during the Omicron period had received at least three vaccinations (n = 80, 56.7%).In respect to the vaccine type, the inactivated vaccine was predominant during the Delta period (n = 24, 55.8%), whereas the mRNA vaccine was predominant during the Omicron period (n = 75, 56.4%).Significantly, throughout both epidemic periods, the majority of participants reported having sustained long COVID, with a number of symptoms having comparable prevalence rates between the Delta and Omicron periods.
The prevalence of self-reported long COVID symptoms between the Delta and Omicron epidemic periods was compared (Fig. 2).The seven most frequently reported symptoms in long-term COVID patients were as follows: cardiovascular, neurology, dermatology, psychology, respiratory, generalized symptoms of fatigue and myalgia, and others.Both participant groups reported all symptoms but the rate of observation for each symptom varied between the groups.Participants who were infected during the Delta period showed a predominant psychological disorder, including insomnia, anxiety, and depression.Conversely, abnormalities in the respiratory and neuron systems were prevalent during the Omicron period.
A comparison of the baseline demographics of the participants between non-long COVID cases and long COVID cases is presented in Table 2. Non-long COVID cases had a mean age of 40.86 ± 13.74 years and long COVID cases were 36.56 ± 12.59 years old.Comparative frequencies were observed for almost all analyzed variables in the two groups.Significantly, the prevalence of long COVID experience was greater during the Omicron period (60%) compared to the Delta period (40%); additionally, a greater proportion of participants with moderate to severe COVID-19 exhibited a higher rate of long COVID.
A binary logistic regression analysis was used to determine biological factors associated with long COVID.Variant epidemic period, age, treatment with symptomatic medicines, and vaccination status were found to be significant predictors of long COVID outcomes.

Notes.
Data reported as n (%) for categorical variables, and mean ± SD for continuous variables.a Based on Weir & Jan (2023).b Based on WHO (2022a).c Any form of medical drug that only targets the symptoms of a disease, such as antipyretic medicine, cold medicine, cough medicine, etc., rather than the underlying cause.Abbreviations: BMI, Body Mass Index; No, Number.
The odds of developing long COVID were significantly higher for SARS-CoV-2 infection during the Omicron period [OR = 2.976 (95% CI [1.202-7.365]);p = 0.018] than for infection during the Delta period.A 1-year increase in age was correlated with a decreased likelihood of developing long COVID [OR = 0.969 (95% CI [0.942-0.996]);p = 0.025].In comparison to other antiviral and immunosuppressive drugs, treatment with symptomatic medications for COVID-19 was associated with a higher risk of developing long COVID [OR = 3.804 (95% CI [1.149-12.590]);p = 0.029].It is worth mentioning that in comparison to vaccination with at least one dose, non-vaccination was a significant risk factor for long COVID ); p = 0.026].All ten independent variables, as shown in Table 2, possessed the potential to account for 19.3% of the long COVID outcome explanation.The remaining 80.7% of the variance could be accounted for by variables that were not included in the binary logistic regression analysis.The dataset for analysis in this part was available in File S1.

Genetic variations in the spike gene between the Delta and Omicron variants isolated in Thailand
As it has been hypothesized that the spike protein of SARS-CoV-2 is accountable for long COVID, particularly neuropsychiatric symptoms (Theoharides, 2022), the prevalence and characteristics of long COVID outcomes may be influenced differently by the mutations in this protein found in the Delta and Omicron variants.This analysis compared the mutation frequencies observed in each region of the spike protein across the two variants as shown in Fig. 3.Most amino acid changes based on the wild-type spike protein sequence were observed in the S1 subunit compared to the S2 subunit.A higher number of amino acid changes were found in nearly all regions of the S1 subunit of the Omicron variant in comparison to the Delta variant, except for the signal peptide (SP) region.The amino acid substitutions observed in the fusion peptide (FP) and heptad repeat 1 (HR1) region of the S2 subunit were similar between the two variants.No amino acid changes were observed in the cytoplasmic domain (CD) and protease cleavage site of the S2 subunit (S2 ) for the Omicron variant.
The frequency of each mutation in the S1 and S2 spike protein subunits was compared between the two variants and is presented as heatmap data in Figs. 4 and 5.The predominant high-frequency mutations were found in the S1 subunit of both variants, especially in the regions of the N-terminal domain (NTD), receptor-binding domain (RBD), receptorbinding motif (RBM), and protease cleavage site between the S1 and S2 subunits (S1S2 ).For the S2 subunit, high-frequency mutations were observed in the FP region of the Omicron variant and the HR1 region of both variants.Finally, the literature review of the pathogenesis evidence or suspected evidence regarding the role of the key mutations of each variant, defined here as percentage frequency differences between variants greater than 50%, is shown in Table 3.The Omicron variant sequences obtained from Thailand had a larger number of key mutations compared to the Delta variant sequences.Most of the significant mutations identified in the Delta variant sequences were associated with ACE2 affinity binding enhancement, viral infectivity, and immune evasion.The sole mutation, D950B, located in the HR1 region of the S2 subunit, lacked any evidence regarding its function.The greater part of the key mutations in the Omicron variant exhibited comparable functions to those observed in the Delta variant.However, a few mutations played addition roles in facilitating spike cleavage, virus fusion, and virus transmissibility; spike function impairment had also been published.Additionally, the function of five mutations in the NTD and RBD regions of the S1 subunit remained unknown: L24del, P25del, P26del, A27S, and R408S.The dataset for analysis in this part was available in tFile S2.

DISCUSSION
The potential of a high public health burden due to long COVID must be considered amid the ongoing global transmission of SARS-CoV-2 and its rapidly emerging variants.Research comparing the variations in long COVID prevalence and characteristics caused by distinct virus variants is scarce.Thus, this study collected informative data on long COVID prevalence and clinical symptoms by comparing the Delta and Omicron epidemic periods in Thailand.
Our demographic analysis was consistent with previous studies indicating that most participants in both epidemic periods were female (Antonelli et al., 2022;WHO, 2022b).The Omicron variant had a higher potential for infection among healthy people compared to the preceding Delta variant.This may be correlated with the biological characteristic of an increased transmissibility of Omicron.It must be emphasized that all strains of SARS-CoV-2 are capable of inducing long COVID.This is despite the fact that the prevalence of long COVID varied considerably between studies due to dissimilarities in the COVID-19 participant selection process, study sites, ethnicity, self-report bias, and time frame analysis (Antonelli et al., 2022;Du et al., 2022).Over sixty percent of our two participant groups had long COVID.Long COVID prevalence in the Delta period found in this study was comparable with a previous report from Thailand (Wangchalabovorn, Weerametachai & Leesri, 2022).However, the data collected from participants' self-reports may be influenced by confounding factors, especially recall bias in the memorization of events that occurred a long time ago.Although this study decreased this bias by employing a question to assess the memory of participants, it may not entirely prevent it.Post-acute COVID-19 syndrome or casually long COVID comprises a wide spectrum of clinical symptoms as described in this study.Even though the Delta and Omicron groups presented long COVID characteristics at varying rates, all abnormality classifications were discernible in both groups.As reported in studies from Europe and Asia (Menges et al., 2021;Xiong et al., 2021), fatigue that was categorized as a generalized symptom and respiratory symptoms remained prevalent.Although the precise mechanisms underlying the development of long COVID remain unknown, several hypotheses have been proposed, including immune dysregulation, persistent inflammatory reactions, antibody-dependent enhancement (ADE) from non-neutralizing antibody response, autoimmune mimicry, viral persistence, reactivation of latent pathogens, and alterations in the host microbiome (Batiha et al., 2022;Chen et al., 2023).With respect to viral persistence and shedding, the upper respiratory tract, lower respiratory tract, gastrointestinal tract, and blood were observed to have the longest durations of viral RNA shedding at 83, 59, 126, and 60 days, respectively (Chen et al., 2023).The sustained presence of SARS-CoV-2 RNA or antigen and immune responses directed against it may determine the development of long COVID (Files et al., 2021;Chen et al., 2023).
In this study, the biological factors that were found significantly associated with the prognosis of long COVID were variant epidemic period, age, treatment with symptomatic medicines, and vaccination status as determined by binary logistic regression analysis.Infection during the Omicron period was associated with an almost three times increased   risk of long COVID sequelae in comparison to the Delta period.This finding contradicts the results of previous studies (Antonelli et al., 2022;Du et al., 2022).Although the prevalence of long COVID caused by different strains did not differ significantly, Du et al. (2022) identified a significant distinction in specific symptoms observed in each strain through a systematic review and meta-analysis.As we utilized a lengthy period during the Omicron epidemic covering the variant epidemics of BA.1, BA.2, BA.4, and BA.5, which contain many critical mutations that increased the fitness for infection (Tian et al., 2022), the discrepancy in the period of analysis may account for the outcome of our study.In agreement with the findings of previous studies (Peghin et al., 2021;Maglietta et al., 2022;Notarte et al., 2022;Subramanian et al., 2022;Yoo et al., 2022), advanced age did not support long COVID induction in this analysis.However, it differed from some studies (Sudre et al., 2021;Thompson et al., 2022).Based on our current understanding, this study represents the first report that establishes treatment with symptomatic medications and vaccination status as substantial risk factors for the development of long COVID.As long COVID may be caused by the persistence of viral antigen, which results in an ongoing activation of the host immune response, long COVID can be induced by the use of symptomatic medications lacking antiviral activity.Compared to those who had received at least one dose of vaccination, non-immunized individuals exhibited a six-fold increased risk of developing long COVID.This crucial information was substantiated by a recent cohort research (Catala et al., 2024).Moreover, our analysis data, along with those of other prior studies, emphasized the absence of a correlation between long COVID development and female gender or initial severity of COVID-19 (Townsend et al., 2020;Simani et al., 2021;Al-Kuraishy et al., 2022;Al-Thomali et al., 2022).
The accumulation of mutations in the spike protein gene of SARS-CoV-2 is a significant factor in the emergence of novel variants of concern (VOCs).The Omicron variant exhibited the highest number of mutations compared to earlier VOCs (Alpha, Beta, Gamma, and Delta), with more than 60% of the mutations accumulated in the spike protein gene as opposed to other regions of the genome (Tian et al., 2022).Viral infection and transmissibility are determined by amino acid changes in the receptor-binding domain (RBD) and receptor-binding motif (RBM) of the SARS-CoV-2 spike protein.Furthermore, the resistance of the host antibody (Ab) immune response is attributed to the amino acid changes occurring in the spike protein, which serves as the primary target for neutralization (Tian et al., 2022).Recently, there has been speculation regarding the potential contribution of the spike protein, either alone or in conjunction with other inflammatory mediators, in the induction of long COVID (Theoharides, 2022).
Unfortunately, the amino acid sequence of the spike protein obtained directly from participants with long COVID and those without long COVID in this study could not be accessible for the purpose of explicitly comparing and discussing the viral mutations associated with the development of long COVID.We instead performed a comprehensive analysis by retrieving the available spike protein sequence datasets from GIDSAID.In light of this constraint, a direct prediction of a long COVID induced by a viral mutation is not achievable.The analysis of the amino acid sequence of the spike protein from Delta and Omicron variants in Thailand resulted in insightful findings regarding the frequency of mutations and the function of critical mutations that occur in these two variants.Consistent with a previous study (Harvey et al., 2021), the Omicron variant dataset contained two or more times the number of mutations located in the RBD and RBM compared to the Delta variant dataset.The observed variation in the critical region of the spike protein might support an association between spike protein function and progression of long COVID.The enhanced binding affinity to the angiotensin-converting enzyme-2 (ACE2) receptor, which is specific for SARS-CoV-2 infection, on mast cells may be linked to the development of mast cell activation syndrome (MCAS), one of the underlying mechanisms of long COVID, as hypothesized to result from the amino acid changes in the RBD and RBM regions.Extraordinary symptoms may be induced in long COVID patients due to dysregulation in the release of inflammatory mediators in MCAS (Batiha et al., 2022).

CONCLUSION
The SARS-CoV-2 variant, age, treatment with symptomatic medications, and nonimmunization status were identified as biological factors associated with long COVID progression in our retrospective cross-sectional study.The identification of these factors should help to facilitate the development of a suitable health management strategy.Furthermore, the analysis of the frequency and biological implications of mutations observed in the spike protein gene of the Delta and Omicron variants could support the explanation for long COVID development and provide a scientific reference for monitoring, prevention, and vaccine development.

Figure 2
Figure 2 Comparing percentage of individual's experience of self-reported long COVID symptoms from study population in Delta and Omicron variant periods (total n = 175).Full-size DOI: 10.7717/peerj.17898/fig-2

Kiatratdasakul et al. (2024), PeerJ, DOI 10.7717/peerj.17898 9/22 Table 2 Baseline demographics of study populations comparing between non-long COVID and long COVID cases and variables associated with the risk of long COVID status (n = 247).
a Reference group was non-selection in each drug category to treat.b Any form of medical drug that only targets the symptoms of a disease, such as antipyretic medicine, cold medicine, cough medicine, etc., rather than the underlying cause.c To elucidate the model summary or goodness of fit following binary logistic regression analysis, the -2Log Likelihood (-2ll) and Nagelkerke R 2 (Pseudo R 2 ) statistic values were reported.*Significant level at P-value < 0.05.