Stability of the Virome in Lab- and Field-Collected Aedes albopictus Mosquitoes across Different Developmental Stages and Possible Core Viruses in the Publicly Available Virome Data of Aedes Mosquitoes

Our study revealed that the virome was very stable across all developmental stages of both lab-derived and field-collected Aedes albopictus. The data representing the core virome in lab A. albopictus proved the vertical transmission route of these viruses, forming a “vertically transmitted core virome.” Field mosquitoes also contained this stable vertically transmitted core virome as well as additional viruses, which probably represented “environment-derived core virome” and which therefore were less stable over time and geography. By further screening publicly available SRA viral metagenomic data sets from mosquitoes belonging to the genus Aedes, some of the identified core ISVs were shown to be present in the majority of SRAs, such as Phasi Charoen-like phasivirus and Guadeloupe mosquito virus. How these core ISVs influence the biology of the mosquito host and arbovirus infection and evolution deserves to be further explored.

mately 1,300 kilometers apart. Third, we conducted a comparative meta-analysis using comparisons between the viruses identified in this study and 48 publicly available virome SRA data sets of Aedes mosquitoes. Finally, phylogenetic analyses were performed on the following three selected viruses: Aedes phasmavirus (APV), which was highly present in both field and lab A. albopictus from China; Guadeloupe mosquito virus (GMV)-related viruses; and Phasi Charoen-like phasivirus (PCLPV), which was found in the majority of investigated virome data sets.

RESULTS
Comparison of the eukaryotic viromes in lab-and field-collected A. albopictus. Pools of eggs, larvae, pupae, and (male/female) adults from a lab colony in Wuhan, as well as pools of larvae, pupae, and adults from the field in Guangzhou underwent metagenomic sequencing. Averages of 26.2 million trimmed reads were obtained per pool (Table 1) and were de novo assembled into 1,657,229 contigs in total. The clustering of 71,303 contigs longer than 500 bp from all samples at 95% nucleotide (nt) identity over 80% of the read length resulted in 56,419 representative contigs. According to BLASTx annotation results obtained using DIAMOND, the majority (93%) of the representative contigs belonged to Eukaryota (mainly derived from the mosquito host genome) and 179 contigs were assigned as eukaryotic viruses. Eukaryotic viral reads in each pool occupied 0.2% to 1.8% of the trimmed reads, except for the 17-GZ-larva pool, which contained 15.8% viral reads. Most of the eukaryotic viral contigs belonged to 80 different viruses (including several viruses with segmented genomes), although some of these had very low similarities to known viruses in the database (Fig. 1A). No known pathogenic arboviruses were identified, and the closest relatives of the identified viruses were generally found in insects or plants. Only 20 of the 80 viral species belonged to an established viral genus or family (e.g., Flavivirus, Iflavirus, Phasivirus, Quaranjavirus, Rhabdoviridae, Virgaviridae, Totiviridae, Nodaviridae), whereas the remaining viral genomes were most closely related to viruses discovered in recent years that current lack a formal taxonomical classification.
The heat map shown in Fig. 1A displays the number of reads mapped against each of the representative (partial) viral genomes as well as the percentage of coverage of each virus per sample (see Table S3 and Fig. S1 in the supplemental material). The virome in field-collected A. albopictus from both 2017 and 2018 showed significantly higher richness and diversity than that in lab-derived mosquito pools. The adjusted P values of the Wilcoxon test determined for the Chao1 and Shannon indices were 0.012 and 0.008, respectively (Fig. S1). The virome profiles of the different pools of lab colonies were relatively stable, except that several viral species were absent from or were present in low abundance in the larva pool (Fig. 1A). The field-collected larva, pupa, and adult pools collected in GZ in 2018 and the adult pool collected in 2017 also displayed very similar viral communities. However, the larval pool collected in GZ in 2017 contained more than 30 additional unique viral species with almost 100% length coverage that were almost completely absent in the adults collected at the same time and in the same place. Furthermore, the lab and field A. albopictus pools had 20 viral species in common. Among them, some contigs distantly related (47% BLASTx identity) to Yongsan bunyavirus 1 (YBV1) were present in all lab-and field-derived mosquito pools with high abundance and length coverage (Fig. 1). Some viruses such as Yongsan tombus-like virus 1 and Phasi Charoen-like phasivirus appeared to be present at higher levels in the lab-derived A. albopictus samples. In contrast, reads of several viruses appeared to be present in higher numbers in the field samples, as was the case for Guato virus and Guadeloupe mosquito mononega-like virus (Fig. 1A). Furthermore, field mosquito pools contained many more viruses which were absent in lab-derived samples (Fig. 1A), as exemplified by Wenzhou sobemo-like virus 4 and Hubei mosquito virus 2 (both with close relationship to Guadeloupe mosquito virus) (Fig. 1B). Prevalence of viruses in public SRA virome data sets derived from Aedes mosquitoes. All 48 available sets of viral metagenomic data representing Aedes sp. from the public database (SRA) (see Table S1 in the supplemental material) were further analyzed to determine the conserved prevalence of ISVs in various Aedes mosquitoes. These samples were collected from locations on different continents, including the United States, Puerto Rico, Australia, Thailand, Guadeloupe, China, and Kenya. The mosquito species in the majority of the samples was A. aegypti, except for one sample from China (A. albopictus) and five from southwestern Australia (A. alboannulatus or A. camptorhynchus).
In order to investigate the potential presence of a "core virome," we analyzed the presence/absence of each virus across the geographic origins reflected in the available SRA entries. The number of viral species shared among different locations is displayed in Fig. 2A, and virus names corresponding to each overlapping category can be found in Table S5. Twenty viruses belonging to the families Totiviridae, Phenuiviridae, Orthomyxoviridae, Xinmoviridae, Virgaviridae, Rhabdoviridae, Phasmaviridae, and Flaviviridae and six unclassified viruses (Chuvirus Mos8Chu0, Kaiowa virus, Hubei odonate virus 15, Guadeloupe mosquito virus, Humaita-Tubiacanga virus, Trichoplusia ni TED virus) were identified from at least five locations. Notably, the total number of viral species present in samples from the United States and Puerto Rico was much lower than that in samples from other locations, which could have been due to the presence of different viruses in these samples, less optimal VLP enrichment procedures used, and/or the lower sequencing depths.
The presence or absence of each virus in each of the analyzed virome data sets is shown in the heat map in Fig. 2B. Only the 42 viral species present in a minimum of three locations and containing at least one contig longer than 1.5 kb are displayed in the figure, and they were grouped by collection locations and viral families. Totiviridae was the most prevalent viral family, containing two highly abundant species (Aedes aegypti totivirus and Australian Anopheles totivirus) with positivity rates across samples of 63.8% and 60.3%, respectively. Phasi Charoen-like phasivirus belonging to the Phenuiviridae was present in all four Aedes species and at all locations (except SRAs from the United States, in which only three of the investigated viruses were identified). Among the unclassified viral species, Chuvirus Mos8Chu0 virus and Kaiowa virus were present in 40 and 35 of 58 samples, respectively, and were found in all four Aedes species. Guadeloupe mosquito virus was detected in field A. aegypti and A. albopictus mosquitoes from Puerto Rico, Thailand, Guadeloupe, Kenya, and China. Humaita-Tubiacanga virus (absent in lab-derived and wild A. albopictus from this study) was found in field A. albopictus from Yunnan (China) and field A. aegypti from 5 locations.  The viral species shown in the heat map present in samples from at least three locations and containing at least one contig that is longer than 1,500 bp. The specific viral species was considered to be present in the sample as long as the sample had one contig (Ͼ500 bp) assigned to the species.

Phylogeny of three selected viruses.
Considering the abundance, variance, universality, and number of complete coding genomes of viruses obtained, phylogenetic analysis was conducted on the following three selected viruses: (i) Aedes phasmavirus (APV; see below), a novel virus distantly related to YBV1 with high abundance in both field and lab A. albopictus samples from China; (ii) Phasi Charoen-like phasivirus (PCLPV), present in all four Aedes species and all locations (except the United States) of screened virome data sets; and (iii) Guadeloupe mosquito virus (GMV), occupying the majority of reads in the field-collected A. albopictus samples from Guangzhou, widely distributed in SRA data sets and showing a high level of variance. Complete coding viral genomes of APV, PCLPV, and GMV recovered from analyzed virome data sets of Aedes mosquitoes as well as corresponding reference genomes from GenBank were used for phylogeny. YBV1 is a newly identified virus reported in 2019 that showed high prevalence in A. vexans nipponii from Korea (29) and is classified in the family Phasmaviridae. We identified a novel virus, APV, in this family in both lab and field A. albopictus samples from China whose closest reference was YBV1. PCLPV, a widely distributed mosquito virus, has been identified from lab colonies, mosquito cell lines, and wild Aedes mosquitoes in eight counties or regions (China, Puerto Rico, Thailand, Kenya, the United States, Australia, Guadeloupe, and Brazil) ( Table 2). It has been suggested that it is maintained in nature through vertical transmission (30). A recent study has reported that GMV is closely related to Wenzhou sobemo-like virus 4 and Hubei mosquito virus 2 (28). GMV-related viruses were detected only in mosquito samples collected from the field as listed in Table 2 (i.e., were not detected in the mosquito cell lines or lab colonies), indicating that it was likely acquired horizontally from the environment.
Phylogeny of Aedes phasmavirus. In both lab-and field-derived A. albopictus sequencing data from samples collected in China in 2017 and 2018, three contigs of APV in each sample were found to share the highest amino acid identities of 50.8%, 48.1%, and 43.6% with the S, M, and L segments of YBV1 (GenBank accession no. MH703047.1, MH703046.1, and MH703045.1), respectively. In total, 10 viral genomes of APV with three segments were recovered from all sequenced A. albopictus samples (17-lab-egg, 17-lab-larva, 17-lab-pupa, 17-lab-adultM, 17-lab-adultF, 17-GZ-larva, 17-GZ-adult, 18-GZ-larva, 18-GZ-pupa, and 18-GZ-adult). The longest lengths were of 1,185, 2,022, and 6,468 nt, which corresponded to the nucleocapsid, glycoprotein, and RNA-dependent RNA polymerase (RdRp) genes, respectively. For all three segments, the nucleotide identity among the 10 assembled viral genomes ranged between 96% and 100%, which indicated that they represent closely related variants. In a separate phylogenetic analysis of the three segments, the 10 viral genomes of APV clustered together within the genus Orthophasmavirus in the family Phasmaviridae (Fig. 3). Their closest relative was YBV1, which was identified in A. vexans mosquitoes collected in South Korea in 2016, but with low levels of nucleotide similarity (ranging from 52% to 57%) in the coding regions of the three segments. Thus, the APVs present in the lab and field A. albopictus samples collected in China appeared to represent a novel species in the genus Orthophasmavirus. Phylogeny of Phasi Charoen-like phasivirus. Since the abundances of PCLPV reads in the A. albopictus samples from Guangzhou and lab-derived mosquitoes were relatively low, no complete genome was recovered from these pools. All three segments of PCLPV with a complete coding region were successfully recovered in one A. , and on two more genomes obtained from lab Aag2 cell lines in Belgium (Fig. 4). It was interesting that all of the PCLPVs collected in distant geographic locations, in different years, and from field mosquitoes versus lab cell lines clustered very closely in the three maximum-likelihood (ML) trees of each segment. The nucleotide similarities among the genomes of PCLPVs were very high, ranging from 94% to 98% for the S segment, from 95% to 99% for M segment, and from 94% to 99% for L segment. . The viral genomes contained two segments encoding an RdRp and one hypothetical protein on segment 1 (1,500 to 1,600 nt) and a capsid and one hypothetical protein encoded by segment 2 (3,000 to 3,200 nt). GMV, Wenzhou sobemo-like virus 4, and Hubei mosquito virus 2 are all newly described and currently unclassified viruses with a distant relationship to the families Luteoviridae and Sobemoviridae (28).

Phylogeny of Guadeloupe mosquito virus-related viruses. Twelve viral genomes (with complete coding regions) similar to GMV, Wenzhou sobemo-like virus 4, and
The phylogenetic analysis performed on the basis of segments 1 and 2 indicated that these sequences fell into three clades (Fig. 5). The genomes identified in samples collected in Thailand in 2015, Puerto Rico in 2014, and Kenya in 2018 clustered together with the Guadeloupe mosquito viruses found in Guadeloupe in 2016 and 2017, forming clade 1, with the nucleotide identity in this clade ranging from 90% to 99% for both segments. Five genomes recovered from A. albopictus collected in Guangzhou in 2017 and 2018, one recovered from the same host collected in Yunnan in 2016, and Wenzhou sobemo-like virus 4 clustered together in clade 2, with nucleotide identities among them ranging from 94% to 98%. The third clade consisted of Hubei mosquito virus 2 and one sequence from a sample collected in Kenya in 2018, with nucleotide identities ranging from 70% to 100%. The nucleotide similarities for the two segments among these three clades ranged from 54% to 69%.

DISCUSSION
In this study, we first characterized the eukaryotic virome across different developmental stages of lab-derived and field-collected A. albopictus from China. Five pools from lab mosquitoes (containing 1,250 individuals in total) and five pools from fieldcollected mosquitoes (6,600 individuals in total) were analyzed (Table 1). Considering that the rearing conditions were stable in the lab and that the colony has been reared in the lab for many years, we collected lab-derived samples only in 2017 as no scientifically significant changes over time were anticipated. To take potential yearly virome fluctuations for field mosquitoes into account, they were collected in two consecutive years (2017 and 2018). The proportion of the eukaryotic viral reads in each sample (ranging from 0.2% to 15.8%) was comparable to that reported from previous mosquito virome studies (24,28), suggesting that the sequencing depth was sufficient for virome analysis. As expected, the virome diversity in lab A. albopictus was lower than that in field mosquitoes (Fig. 1), probably due to the less complex environment and the availability of clean water and food resources. Unlike the bacterial communities seen in fieldcollected mosquitoes (13,(31)(32)(33)(34)(35), it is interesting that the virome in lab-derived A. albopictus was very stable across all developmental stages, consistent with vertical transmission of these viruses. For the larvae only, relatively few viral reads were identified. This finding suggested that the virus might remain dormant at a very low concentration without or with very low rates of replication. The fact that these viruses were also present in field mosquitoes suggested that A. albopictus seems to contain a "vertically transmitted core virome" as was described before for A. aegypti and Culex quinquefasciatus from Guadeloupe (28), for which stable distinct core eukaryotic viromes were identified which possessed nearly identical viruses across time and space (28). In addition, another set of viruses was found to be shared by the field-collected A. albopictus mosquitoes across different stages, suggesting that they have a "core virome" with higher richness and diversity (Fig. S1). Whether these additional viruses were acquired from the environment or the lab-derived mosquitoes lost these viruses in captivity remains to be determined. All these vertically transmitted core viromes in A. albopictus deserve more attention with respect to their effects on vector competence for important medically relevant arboviruses. Due to the mosquito samples having originated from only one lab colony or one location in the field, further surveillance over a larger geographic range and longer periods of time will be needed to confirm the stability of the virome profile across different developmental stages. In addition, the larval pool collected in the wild in 2017 was an outlier, as it contained many unique  viruses, which probably originated from the particular water environment they lived in, which could have been contaminated by other coinhabiting insects or larvae. Alternatively, it could be the case that a number of mosquitoes other than A. albopictus were included in the pool by accident. These results further suggest a significant influence of the breeding site ecology on the mosquito viral community.
Since some viruses (e.g., PCLPV, GMV, and Aedes aegypti totivirus) identified in A. albopictus from China were also identified as part of the core virome in A. aegypti from Guadeloupe (28), we were interested in determining how these core viruses in Aedes mosquitoes were further distributed around the world. Therefore, we explored 48 Aedes sp. viral metagenomic data sets from a public database (SRA). The samples were from different countries on different continents (China, the United States, Australia, Kenya, Thailand, Puerto Rico, and Guadeloupe) and from different mosquito species (A. aegypti, A. albopictus, A. alboannulatus, and A. camptorhynchus) and were treated with different wet-lab and sequencing procedures using different amounts of pooled mosquitoes and sequencing depths, but a highly prevalent set of widely distributed viruses, such as Totiviridae, Orthomyxoviridae, Xinmoviridae, Flaviviridae, PCLPV, and GMV, were able to be identified on the family or species level (Fig. 2). How these conserved viruses in Aedes mosquitoes interact with or influence an arbovirus infection is an interesting point for further studies. A previous study explored the effect of coinfections of PCLPV and cell-fusing agent virus (CFAV) on the replication of arboviruses in cell line Aa23 derived from A. albopictus (36). CFAV-PCLPV-positive Aa23 cells strongly inhibited the growth of two flaviviruses (ZIKV and DENV) and completely blocked infection by La Crosse virus (bunyavirus). Although the exact blocking mechanism was not known, on the one hand, the results suggested that the data generated from laboratory cell lines persistently infected by mosquito-specific viruses (MSVs) should be interpreted with caution. On the other hand, the intra-MSV interactions need to be considered when the influence of MSVs on vector competence is explored.
Although many viruses were very prevalent in Aedes mosquitoes, such as Chuvirus Mos8Chu0, Kaiowa virus, Wuhan mosquito virus 6, Whidbey virus, Aedes aegypti totivirus, and Australian Anopheles totivirus, phylogenetic analysis was further performed on GMV and PCLPV, which had the greatest number of complete coding genomes, and on APV, which was highly abundant in both field and lab A. albopictus from China. APV was found to be distantly related to the YBV1 identified from A. vexans from South Korea forming a separate clade. All the Chinese strains from both lab-and fieldcollected mosquitoes were very similar (Fig. 3). Also, for PCLPV, the genomes found in samples from different years, locations, and habitats were very similar for all three segments (Fig. 4). Interactions between bunyaviruses and arthropods occurring over 20 million years had been previously demonstrated (37)(38)(39)(40)(41). The findings indicating that APV and PCLPV seem to be very closely related were puzzling and might suggest a rather relatively recent spread of this virus or a very low evolutionary rate, which would be unexpected for RNA viruses. However, the effects of these viruses on their host are very poorly understood. A recent study revealed that preexisting infection of Aag2 cells with PCLPV did not affect the infection and growth of the mosquito-borne viruses in genus Flavivirus, Alphavirus, and Rhabdovirus in cell culture (42). GMV is a recently described two-segmented, currently unclassified virus. Three separate clades were observed for GMVs and GMV-related viruses, largely clustering according to location. This group of viruses should be proposed as a new family, and their role in arbovirus infection needs to be further studied.
In summary, our results reveal that the virome profile was very stable across different stages in both lab-and field-derived Aedes albopictus and that a number of possible core viruses exist in Aedes mosquitoes of at least four species in multiple locations in seven countries. Since the number of available Aedes virome data sets is still rather limited and since wet-lab procedures, sequencing depths, and pool sizes differed greatly among the analyzed data sets, the core viruses need to be further confirmed by next-generation sequencing (NGS) or PCR. To fully characterize and understand the genetic and phenotypic diversity of mosquito-specific viruses, samples from individual PCLPV; and (iii) 18-GZ-larva-seg1 (MT361057) and 18-GZ-pupa-seg2 (MT361060) for GMV. The consensus sequences of these viruses were generated from the bam files using SAMtools and bcftools (58).
The nucleotide sequences of the complete genomes or complete coding regions in the genomes (segments) of these viruses were used to determine their evolutionary history together with appropriate reference strains from GenBank. Alignments of the sequences were performed with MAFFT v7.222 (59) using the most accurate algorithm, L-INS-I, with 1,000 cycles of iterative refinement. Ambiguously aligned regions were removed by trimAl v1.2 (60) using an automated trimming heuristic, which was optimized for maximum-likelihood (ML) phylogenetic tree reconstruction. The phylogenetic trees for each segment were reconstructed from 1,000 ultrafast bootstrap ML tree replicates using IQ-TREE v1.6.11 (61) with best-fit model selection by ModelFinder (62). FigTree v1.4.3 (63) was used for phylogenetic tree visualization.
Data availability. Accession numbers of the viruses obtained from our data set are listed in Table S2, and viral genome sequences recovered from the SRA data sets can be found at https://github.com/ Matthijnssenslab/AedesVirome. All viral sequences used to construct phylogenetic trees can be found at https://github.com/Matthijnssenslab/AedesVirome.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. FIG S1, PDF file, 0.2 MB.