Presence, Location and Conservation of Putative G-Quadruplex Forming Sequences in Arboviruses Infecting Humans

Guanine quadruplexes (G4s) are non-canonical nucleic acid structures formed by guanine (G)-rich tracts that assemble into a core of stacked planar tetrads. G4s are found in the human genome and in the genomes of human pathogens, where they are involved in the regulation of gene expression and genome replication. G4s have been proposed as novel pharmacological targets in humans and their exploitation for antiviral therapy is an emerging research topic. Here, we report on the presence, conservation and localization of putative G4-forming sequences (PQSs) in human arboviruses. The prediction of PQSs was performed on more than twelve thousand viral genomes, belonging to forty different arboviruses that infect humans, and revealed that the abundance of PQSs in arboviruses is not related to the genomic GC content, but depends on the type of nucleic acid that constitutes the viral genome. Positive-strand ssRNA arboviruses, especially Flaviviruses, are significantly enriched in highly conserved PQSs, located in coding sequences (CDSs) or untranslated regions (UTRs). In contrast, negative-strand ssRNA and dsRNA arboviruses contain few conserved PQSs. Our analyses also revealed the presence of bulged PQSs, accounting for 17–26% of the total predicted PQSs. The data presented highlight the presence of highly conserved PQS in human arboviruses and present non-canonical nucleic acid-structures as promising therapeutic targets in arbovirus infections.


Introduction
Vector-borne diseases are bacterial, viral, or parasitic infections transmitted to the human host through the bite of infected arthropod species, such as mosquitoes, ticks, midges, and flies [1]. In 2020, the World Health Organization (WHO) declared that vectorborne diseases accounted for approximately 20% of all infectious diseases [2]. In the case of viral infections, arthropod-borne virus (arbovirus) infections are a global threat, as travel and trade contribute to the spread of vectors and viruses over large geographical areas, and climate change favours disease transmission [3][4][5]. Arboviruses are a large group of RNA viruses belonging to different families and genera, of which about fifty members are known to infect humans. A few members of arboviruses cause mild flu-like symptoms and joint pain. The vast majority of Arboviruses cause severe and life-threatening disease, with mortality rates as high as 50% [2,6,7]. Specific anti-arbovirus treatments are not available and vaccines have been developed against less than 10% of the arboviruses. As a result, arbovirus infections are controlled solely by prevention strategies to hinder the spread of viruses in the environment and among humans. Efforts to prevent and treat vector-borne viral diseases must be intensified. In fact, in 2022, WHO launched the Global Arbovirus Initiative to promote all initiatives to control arboviruses with epidemic and pandemic potential [8].
Approved antiviral drugs target viral proteins involved in key viral steps, from viral entry to viral gene expression and genome replication. Direct targeting of nucleic acid is very sporadic, as achieving selective targeting has always been extremely challenging. Nucleic acids have been shown to fold into structures alternative to the classical double helix, which do not obey the Watson-Crick hybridization canon and are therefore defined as noncanonical nucleic acid structures. Among these non-canonical structures, G-quadruplexes (G4s) have been shown to play key biological roles both at the human and viral level [9][10][11][12][13][14][15]. G4s can form in G-rich sequences of DNA or RNA, in which four guanines (Gs) are linked by Hoogsteen-type hydrogen bonds to form planar square structures called G-quartets. The stacking of successive G-quartets leads to the formation of the G4 structure, which is supported and stabilized by physiological cations, such as potassium or sodium [16]. G4s have been identified primarily in mammalian cells, but more recently their presence in viruses, bacteria and parasites has also been investigated [17]. At the viral level, G4s are involved in the control of key viral processes, such as transcription, genome replication and the induction or maintenance of the viral latency [12].
Several algorithms have been validated to predict the presence and distribution of putative quadruplex (G4)-forming sequences (PQSs) in genomes [18]. The different algorithms calculate the presence of PQSs or the G4 folding propensity, taking into account the number of Gs and G islands as well as the loop length. G4 prediction algorithms have been trained on the human genome [18,19]. To date, few bioinformatic analyses have predicted PQSs in microorganisms using tools such as the well-established QGRS and G4Hunter [18][19][20][21][22][23][24][25]. Viral genomes have been shown to contain G4s that do not strictly follow the rules of canonical G4s, but include bulges, mismatches and stem loops [26,27]. Therefore, PQS prediction algorithms that take into account the possibility of G4s folding from imperfect G-runs should be used to better estimate the presence of PQSs on viral genomes. Recently, Bioconductor's pqsfinder tool was released as a flexible tool for analyzing putative PQSs that also contain bulges or mismatches [28].
This work shows that arboviruses embed both canonical and bulged PQSs and that the different viral families show different patterns of PQSs enrichment or depletion. The conservation of each predicted PQS among virus isolates was also analysed to correlate the presence of highly conserved putative G4 sequences with their possible biological role. Our data provide new information on the evolutionary conserved PQSs among human arboviruses, provide insights into unexplored aspects of arbovirus biology and reveal innovative anti-arbovirus targets.

Prediction of PQSs in Human Arboviruses
Arboviruses were grouped according to the Expasy ViralZone and NCBI taxonomy classifications [29,30]. A total of 40 different arboviruses infecting humans was retrieved, which were further divided into three groups on the basis of the type of nucleic acid constituting their genomes: 1 dsRNA, 16 negative-strand ssRNA and 23 positive-strand ssRNA (Table 1). For each virus, the complete set of sequenced genomes was downloaded from the NCBI database. Partially sequenced and unverified genomes, as well as genomes containing nucleotide strings longer than five nucleotides without base assignment (i.e., NNNNN) were not considered for further analysis. For each virus, the nucleotide sequence to be considered as reference genome was retrieved from the NCBI Reference repository. The accession numbers are listed in Table 1.
First, for each virus, the GC content of the reference genome alone and of all the sequenced virus isolates was calculated and expressed as an average value (Table 1). Taken together the arboviruses have an average GC content of 44%. Positive-strand ssRNA viruses reference genomes have an average GC content around 50% (48-55%), whereas negative-strand ssRNA viruses reference genomes display GC contents that span from 33% to 50%. The Banna virus, which is the only arbovirus with a dsRNA segmented genome, has a GC content of 37% to 42%, depending on the segment. Analysis of the mean GC content of all sequenced genomes per virus provided data on the conservation of G and C residues, possibly involved in G4 formation. For the majority of positive-strand ssRNA viruses, the GC content was conserved, with the exception of Dengue strains 1 and 2. Notably, Dengue 1 and 2 are also the viruses with the highest number of sequenced genomes among all analyzed viruses, 2095 and 1764, respectively. The analysis of the negative-strand ssRNA viruses showed that the family members with segmented genomes shared a lower GC content conservation (e.g., Crimean Congo hemorrhagic fever virus), whereas the members with the genome composed of a single linear molecule of RNA (e.g., Chandipura virus) showed a very high GC content conservation. Among segmented negative-strand ssRNA viruses, the S segment, coding for non-structural proteins, was the less conserved. Once again, the viruses with the highest numbers of sequenced genomes showed the highest variability in GC content values. The conservation of the GC content in the segments of the Banna virus, the only dsRNA virus in this analysis, was very segment-dependent. Segment 9 was the less conserved and codes for the outer-capsid protein VP9. In this case too, we could analyze more sequences from this segment than from the other eleven. This may be because VP9 has been studied and recognized as the protein involved in host attachment and viral internalization [31]. Table 1. Analysed arboviruses data. The table reports the analysed viruses in alphabetical order. The columns indicate the virus name (Virus), the viral Genus/Family each virus belongs to (Genus, Family), the viral genomic nucleic acid type (Genome), the viral genomic structure (Genome structure), the name of each segment in case of segmented genomes (Segments), the analysed reference genome NCBI entry (Reference genome), the total number of analysed genomes/segments (Total analysed genomes and segments), the number of analysed sequences of each segment (Analysed segments), the average GC content of the Reference genomes and of the entire group of analysed genomes per virus (% GC Reference genomes and % GC all analysed genomes, respectively). Colors correspond to dsRNA (grey), negative-strand RNA ((-)ssRNA, yellow), positive-strand RNA ((+)ssRNA, blue) viruses. In segmented RNA viruses, the viral segments S, M and L are ordered by length.  Next, the pqsfinder algorithm was run on all reference genomes. The algorithm was set up to recognize sequences with G-runs containing at least two G residues. Each PQSs could have loops with a maximum length of 12 nucleotides and a maximum of one loop with a length of zero nucleotides. The pqsfinder algorithm was used to identify canonical PQSs and PQSs harboring a single bulge (Table 2). PQSs containing mismatches (non-G bases in the G-quartet) were excluded from the prediction. The minimum acceptable score was set at 12, in order to exclude PQSs that were characterized by short G-runs, together with long loops and a bulge, and therefore unlikely to form. Both the positive and the negative RNA strands were analyzed for the presence of PQS, as they represent two different stages of viral infection and are both essential in the viral replication cycle (Table 2) [32].  Family), the viral genomic nucleic acid type (Genome), the viral genomic structure (Genome structure), the name of each segment in case of segmented genomes (Segments), the total number of PQSs, the number of canonical (no bulges) and the number of bulged PQSs predicted in viral genomes (PQSs in viral genomes, Canonical PQSs and Bulged PQSs, respectively), the percentage of bulged PQSs on the total number of predicted PQSs (% bulged PQSs), The percentage of bulged and canonical PQSs conserved in more than 80% of analysed viral genomes (% conserved bulged PQSs and % conserved canonical PQSs, respectively), the rounded average total number of predicted PQSs on shuffled genomes and the statistical significance of the difference between the number of viral vs. shuffled PQSs (PQSs in shuffled genomes and p-values PQSs viral vs shuffled genomes, respectively). The symbols (↑), (↓) and (=) indicate that the number of PQSs predicted in the viral genomes is higher, lower or equal to the number of PQSs predicted in the shuffled genomes. Colors correspond to dsRNA (grey), negative-strand RNA ((-)ssRNA, yellow), positive-strand RNA ((+)ssRNA, blue) viruses. In segmented RNA viruses, the viral segments (S, M and L) are ordered by length.   The analysis that considered canonical and bulged PQSs showed that the reference genomes of positive-strand ssRNA viruses, Flavivirus and Alphavirus, are particularly enriched in PQSs. In particular, the Japanese encephalitis, the Langat, the Louping ill, the Tick-borne encephalitis and Zika viruses were predicted to embed more than one hundred PQSs in their genomes. In general, the members of the Flavivirus family embed an average of 94 PQSs per genome. The second group of positive-strand RNA viruses, the Alphaviruses have an average of 67 PQSs per genome, with the Semliki Forest virus topping the list with 95 PQSs. Negative-strand ssRNA viruses, with the genome consisting of a single linear RNA strand, showed an average of 37 PQSs, with the Chandipura virus and the Australian bat lyssavirus showing the highest number of PQSs (i.e., 46 and 45, respectively) and the non-Indiana Vesicular stomatitis virus strains showing the fewest (i.e., 29). Segmented negative-strand ssRNA viruses and the dsRNA Banna virus, although not so different in GC content from the other viruses, were predicted to have very few PQSs (average of 10 and 2, respectively). A closer look at the PQSs strand location (Table S1) showed that among the positive-strand RNA viruses, Flaviviruses embed more PQSs in the positive strand (i.e., the viral genome, but also the viral mRNA) [33], whereas Alphaviruses have members with equal strand distribution (Chikungunya, Mayaro, O'nyong-nyong, Ross River, Sagiyama, Semliki Forest, Venezuelan and Western equine encephalitis) and members with PQSs mainly located in the antigenome strand (negative-strand) [34]. Single linear negative-strand RNA viruses have more PQSs in the positive strand, i.e., in the antigenome which corresponds also to the viral mRNA. Segmented negative-strand viruses showed the highest variability, with PQS distributed on both strands, depending on the virus.
The number of canonical PQSs was then calculated, excluding the bulged sequences from the first prediction ( Table 2). The maximum loop length and minimum sequence score remained the same as in the previous analysis. This analysis showed that approximately 22% of the calculated PQSs are non-canonical, when arboviruses are considered as a single group. Looking at single classes of viruses, 21% of PQSs in Alphaviruses (positive-strand ssRNA viruses), 26% in negative-strand ssRNA viruses with a single linear RNA and 23% in segmented RNA viruses contain a bulge. Flaviviruses, the viruses with the highest number of PQSs, are less likely to have bulged PQSs (17%). The Banna virus has 22% of non-canonical PQSs (Table 2).
Next, the significance of the predicted PQSs was then calculated. To assess whether the predicted PQSs in arboviruses were statistically relevant or random, the results from viral genomes were compared with those obtained by viral genome simulation. Shuffled genomes (one hundred per virus), with the same nucleotide composition but different order with respect to the references, were generated. The presence of PQSs was predicted using the same parameters as in the first analyses (Table 2). To estimate the statistical significance of PQSs prediction, data on viruses (Reference genomes) and on shuffled genomes were subjected to one-sample t-test and p-values were calculated [35]. The one-sample t-test was used to determine whether the average PQSs number of the shuffled genomes (one hundred per virus) was significantly different from the PQSs number predicted on the relative viral reference genome. p-values lower than 0.001 were considered significant. Significance analysis and generation of corresponding p-values indicated that 52% of the considered viral genomes/segments were significantly enriched in PQSs, while 37% showed significant depletion in PQSs compared to the presence of G-runs on shuffled genomes/segments. The remaining 11% of viral genomes/segments showed no significant enrichment or depletion in PQSs. The PQS prediction was highly significant for positive-strand RNA viruses, with the exception of the Eastern equine encephalitis virus and the Ross River virus. Flaviviruses are all enriched in PQSs, whereas Alphaviruses, with the exception of Barmah Forest and Semliki Forest viruses, are depleted in PQSs. When considering negative-strand RNA genomes, viruses with single linear RNA genomes are all statistically enriched in PQSs, whereas segmented viruses display segments with PQSs enrichment, PQSs depletion and segments that are not significantly enriched in PQSs (i.e., Rift Valley fever virus). The Banna virus (dsRNA) has segments 10 and 11 with non-significant p-values, the other 10 segments have a statistically significant PQS prediction, despite the low number of predicted PQSs.

Conservation of Predicted PQSs and Genomic Location of Highly Conserved PQSs
Once the presence of PQSs in arboviruses had been predicted and their statistical relevance had been assessed, their distribution within the genomes was examined (Figure 1 and Figures S1-S5, density panels). PQS density distribution was calculated using pqsfinfer density function: it indicates if and where PQSs were clustered within the genome of interest; high scoring PQSs clustered in high density regions are considered to have a higher folding potential. We observed that, in general, PQSs were widely distributed across the length of the genome and that PQSs with high scores, and therefore more likely to form, tended to cluster together. RNA viruses are prone to genomic mutations to enhance their environment/host adaptability [36,37], so the conservation of PQSs across all sequenced isolates of each virus species was assessed, hypothesizing that the presence of a conserved PQS in a poorly conserved genomic environment would strengthen the hypothesis of a significant biological function. The conservation rate of each predicted PQSs in all genome/segment sequences we retrieved from the NCBI database was calculated (Figure 1 and Figures S1-S5). We considered the conservation analysis to be significant when at least 5 isolates per virus were available.
The different RNA virus populations have different mutation rates [38]. Positivestrand ssRNA viruses have high mutation rates, followed by negative-strand ssRNA viruses. DsRNA viruses have the lowest mutation rate of the three viral classes analyzed. Notably, PQSs do not seem to follow this rule, as many members of the Flavivirus and Alphavirus (e.g., West Nile virus and Semliki Forest virus) have highly conserved PQSs throughout the genome. The Eastern equine encephalitis and the Ross River viruses, which would be excluded by the previously calculated p-value on shuffled genomes, have highly conserved PQSs. On the contrary, few negative-strand ssRNA viruses have highly conserved PQSs. We found few conserved PQSs in the segments of the Uukuniemy and the Sandfly viruses and in the genomes of the Chandipura and Vesicular stomatitis viruses. In the case of segmented negative-stranded RNA viruses, we found that segments with significant p-values (Table 2) did not have non-significantly conserved PQSs. The Banna virus (dsRNA) was not only poor in terms of predicted PQSs, although they were statistically significant, but also showed a very low rate of PQSs conservation among the virus isolates. Taken together, the conservation analysis revealed that the conservation of a particular PQS among isolates belonging to the same virus was not related to the initial prediction score (i.e., the folding propensity and associated stability), nor did it depend on the presence of other PQSs in the vicinity (density).
The conservation of canonical and bulged PQSs was calculated and their percentage of conservation was reported if they were conserved in more than 80% of the analysed viral genomes. Notably, in positive polarity viruses (Flaviviruses and Alphaviruses) bulged and canonical PQSs were conserved to the same extent, with viruses such as the Sagiyama and the Semliki Forest viruses having more than 87% of fully conserved canonical and bulged PQSs. The pattern of conservation among negative polarity RNA viruses is much more diverse. Segmented RNA viruses do not appear to conserve bulged PQSs, nor do they have a high conservation rate of canonical PQSs. Single linear negative polarity RNA viruses tend to conserve both canonic and bulged PQSs, with viruses such as the Australian bat Lyssavirus with no conservation potential and viruses such as the Isfahan virus and the Vesicular stomatitis (non-Indiana strains) having high conservation rates of both canonical and bulged PQSs. The Banna virus (dsRNA) showed no strong conservation of either type of PQSs ( Table 2).
The genomic location of all PQSs with more than 85% conservation was then determined ( Figure 1 and Figures S1-S5, conservation panel, Figure 2 and Figure S6). Genome coordinates were obtained for 5 -and 3 -untranslated (UTR) and coding sequences (CDS). For the majority of arboviruses, the CDSs are well defined in the Reference genomes, whereas the UTRs are more inconsistently annotated. When missing in the annotation file, the UTRs were manually defined as the regions preceding the first CDS and closing the genome after the last nucleotide of the last CDS. Flaviviruses showed a strong tendency to have and preserve PQSs in coding regions but also in 3' UTRs. The vast majority of conserved PQSs of Alphaviruses are embedded in coding sequences, with the exception of the Semliki Forest and Ross River viruses, which also have conserved PQSs in the UTRs. Negative-strand viruses preserve PQSs in coding sequences, with exceptions such as Bunyamwere La Crosse virus, Dugbe virus, Sadfly Sicilian virus and Rift Valley fever virus that embed conserved PQSs also in UTRs of L and M segments (Figure 2 and Figure S6).

Figure 2.
Conserved Genomic localization of PQSs in arboviruses. Plots reporting the annotation of highly conserved PQSs. Each viral genome was divided into three regions: Untranslated regions (5 and 3 UTRs) and coding sequences (CDS). PQSs were annotated on the basis of the official NCBI annotation of each viral Reference genome. In panels with long virus names, the word "segment" has been abbreviated to "seg". The Vesicular stomatitis virus non-Indiana strains has been abbreviated to Vesicular stomatitis virus non-Ind.

Discussion
The presence and possible key roles played by nucleic acid secondary structures, in particular G4s, during the viral cycle of major human pathogens, such as HIV-1, HSV-1 and many others [12], has begun to be demonstrated. Understanding the regulation of G4 folding in viruses has attracted much attention due to the potential use of G4s as targets for innovative antiviral therapies [14]. The most comprehensive viral genome analyses have been performed using algorithms that predict canonical putative G4s [18,22,39], or using algorithms that penalize sequences with cytosine runs [20,21,23]. These pattern-based algorithms do not consider non-canonical G4 forming sequences, such as those containing bulges or stem loops. Notably, both G4s folding with bulges or forming stem loops have been reported at the viral level [26,27,40].
Arboviruses are a major threat to humans with no specific pharmacological treatment and few prevention strategies [4]. Here we challenged the PQS Finder algorithm, which was designed to be imperfection-tolerant and validated on the human sequence data [28], with the genomes of arbovirus that infect humans. We had previously performed an extensive analysis on human viral pathogens using a traditional approach [39]. In this study we extended and included seven novel members of the arboviruses (Bunyavirus snowshoe hare, Chandipura, Dhori, Isfahan, Punta Toro, Sandfly fever Sicilian, Tick-borne encephalitis, Possawan encephalitis, and Usutu viruses) and examined the presence of PQSs and their conservation in all sequenced) virus isolates (up to February 2023). More than twelve thousand genomes/segments were analysed, belonging to the forty different arbovirus that infect humans. The present work provides new data showing that: a. arboviruses harbor bulged PQSs in addition to canonical ones; b. the vast majority of the predicted PQSs are statistically significant when compared with shuffled sequences; c. not only the canonical but also the bulged PQSs are conserved; d. the conserved PQSs are mainly located in coding sequences and 3'UTRs.
Our data show that frequency of PQSs is not related to the GC content of viral genomes, confirm that the clustering of G-runs is not random, and suggest a specific biological role for G4 structures at the arboviral level. Viruses, especially those with an RNA genome, such as that of arboviruses, mutate with high frequency [36,38,41]. The comparison with randomly generated shuffled genomes showed that members of the arbovirus family are statistically enriched or depleted in PQSs, revealing that certain members of this RNA virus family, prevent the generation of novel regions that could fold into G4s, despite their high mutation rate.
The conservation of PQSs in viruses is one of the strongest indications of the biological relevance of G4s. The utmost conservation of G-tracts in a G4-forming pattern indicates that they are required for infection/replication/transmission of the virus. Our data show that, depending on the virus class, both canonical and bulged PQSs are conserved among virus isolates, suggesting that also bulged G4s play a role in the biology of arboviruses. In addition, our analysis suggests that PQSs in the coding sequences and 3'UTRs of positive-strand ssRNA viruses, especially Flaviviruses, could play an essential role during viral infection.
The role of G4s in regulating transcription and translation when embedded in coding regions has begun to be elucidated at the human level [42]: our data emphaticize their regulatory role also in arboviruses, especially Flaviviruses where PQSs are mainly located on the positive RNA strand, which acts as both viral genome and viral mRNA. Furthermore, since the 3 UTR of positive-strand ssRNA viruses regulates numerous aspects of the viral life cycle such as replication/translation and the complex network of the host-cell interactions, the highlighted presence of several G4s at this genomic level paves the way for a deeper understanding of G4s as regulators of novel aspects of arbovirus infection.
Our data also show that G4s are not particularly abundant or conserved in negativestrand ssRNA or dsRNA arboviruses. Negative-strand and dsRNA viruses have more complex viral cycles than positive-polarity ssRNA viruses: under these conditions, G4-mediated slowing of viral transcription and replication may be more likely to be avoided [43].

Viral Genomes Selection
Accessible viral genomic sequences (12240 in total) belonging to the 40 arboviruses infecting humans were downloaded from the genome database of the NCBI in January 2023. Genomes were grouped in FASTA-format files. Dengue viruses were clustered according to the four reported serotypes (Dengue virus 1-4) [44]. Vesicular stomatitis viruses (VSVs) were divided into two separated groups, the first including the Indiana serotype and the second containing all the other serotypes [45]. For each virus, the NCBI entry code indicated in the NCBI reference repository was considered as reference genome [46]. For the West Nile virus, two different reference genomes (lineage 1 and Kunjiun subtype) are considered as reference, the NCBI Reference Sequence NC_009942.1 (lineage 1) was set as reference genome. Differently from Dengue virus isolates, West Nile sequences are not registered indicating the lineage or the subtype, so that all genomes corresponding to the West Nile virus were aligned together. FASTA files were purged from unverified or partially sequenced genomes, as well as from genome sequences containing multiple stretches of nucleotides lacking base assignments (i.e., NNN). NCBI accession codes and FASTA files containing all viral complete genomic sequences are shown in Table 1 and contained in Supplementary Material (aligned_genomes.zip), respectively. Genomes were aligned using the Jalview platform [47].

Prediction of PQS
All the analyses were performed using R (version 4.2.2). The FASTA files containing the reference genomes and the FASTA files containing the multiple alignments were loaded onto the R platform and GC content was calculated using Biostrings (2.66) [48,49]. PQS prediction was performed on the reference genome using pqsfinder [28] (version 2.14.1) with the following parameters: deep = TRUE, min_score = 12, max_bulges = 1, max_mismatches = 0, loop_max_len = 12. The deep parameter has been set to TRUE to allow detection of PQS clusters. Canonical PQS prediction was performed retrieving the sequences displaying no bulges from the initial PQS prediction. PQS density was predicted using the pqsfinder density function. The PQS score was automatically assigned by the pqsfinder algorithm. All R scripts created to generate predictions, density, score, GC content, and conservation rates are available in the Supplementary Material.

Shuffling and Statistical Analyses
The R universalmotif (version 1.16.0) was used to shuffle the reference genome sequences [50]. Each reference was shuffled one hundred times using the linear method with k-let = 1. For each shuffled sequence, a PQS prediction was performed with the pqsfinder parameters indicated in Section 4.2.1. To estimate the significance of the analysis, the one-sample t-test was performed comparing the PQS predictions of the shuffled genomes with the PQS prediction of the reference genome. Significance was expressed as a p-value.

Conservation of PQS
To calculate the percentage of conservation of each PQS the vmatchPattern function of Biostrings (version 2.66) was used, setting the parameter with.indels = TRUE, to count PQS with longer loops as conserved. The multiple aligned genomes were loaded onto the R platform and the number of times each predicted PQS was present in the aligned genomes was counted. The percentage of conservation was also calculated, taking into account the number of aligned genomes analysed. The conservation values were then plotted together with the density pattern and the PQS scores using Gviz (version 1.42.0) [51].

Annotation of Conserved PQS
Only PQSs with more than 80% conservation were annotated on viral genomes. GTF files were uploaded using rtracklayer (version 1.58.0) [52]. Annotation was performed using annotatr (version 1.24.0), with the length of each conserved PQS set as the minimum overlap [53]. The region preceding the first CDS was considered the "5'UTR", while the region following the last CDS was considered the "3'UTR". Bar graphs were generated using ggplot2 (version 3.4.0).

Conclusions
Arboviruses are a heterogeneous family of viruses that are transmitted to humans by arthropod vectors. This work has shown that many members of the family, mainly belonging to the Flavivirus subgroup, embed highly conserved PQSs in their genomes. The conserved PQSs are located in coding regions and at genome ends, reinforcing the critical role of G4s in the regulation of viral cycles. These findings pave the way for a broader understanding of the mechanisms regulating arbovirus infections and suggest that highly conserved PQSs may be novel and innovative antiviral targets against arbovirus infections.

Conflicts of Interest:
The authors declare no conflict of interest.