Genomic Analysis Highlights Putative Defective Susceptibility Genes in Tomato Germplasm

Tomato (Solanum lycopersicum L.) is one of the most widely grown vegetables in the world and is impacted by many diseases which cause yield reduction or even crop failure. Breeding for disease resistance is thus a key objective in tomato improvement. Since disease arises from a compatible interaction between a plant and a pathogen, a mutation which alters a plant susceptibility (S) gene facilitating compatibility may induce broad-spectrum and durable plant resistance. Here, we report on a genome-wide analysis of a set of 360 tomato genotypes, with the goal of identifying defective S-gene alleles as a potential source for the breeding of resistance. A set of 125 gene homologs of 10 S-genes (PMR 4, PMR5, PMR6, MLO, BIK1, DMR1, DMR6, DND1, CPR5, and SR1) were analyzed. Their genomic sequences were examined and SNPs/indels were annotated using the SNPeff pipeline. A total of 54,000 SNPs/indels were identified, among which 1300 were estimated to have a moderate impact (non-synonymous variants), while 120 were estimated to have a high impact (e.g., missense/nonsense/frameshift variants). The latter were then analyzed for their effect on gene functionality. A total of 103 genotypes showed one high-impact mutation in at least one of the scouted genes, while in 10 genotypes, more than 4 high-impact mutations in as many genes were detected. A set of 10 SNPs were validated through Sanger sequencing. Three genotypes carrying high-impact homozygous SNPs in S-genes were infected with Oidium neolycopersici, and two highlighted a significantly reduced susceptibility to the fungus. The existing mutations fall within the scope of a history of safe use and can be useful to guide risk assessment in evaluating the effect of new genomic techniques.


Introduction
Tomato (Solanum lycopersicum L.) is one of the most widely grown vegetables in the world. The species is subjected to many diseases, which cause substantial economic losses. Breeding for disease resistance is thus a key objective in tomato improvement. Breeders' efforts have been mainly focused on the introgression of resistance genes (R-genes) in elite genotypes, a strategy which is time consuming and often not durable [1], however, as most resistance genes confer race-specific resistance and are frequently overcome by a pathogen's new virulent race.
Most pathogens require the cooperation of the host to establish a compatible interaction, which is mediated by host susceptibility (S) genes [2][3][4]. The identification of S-genes' spontaneous mutants represents an emerging breeding strategy for durable and broad-spectrum resistance [5]. A mutated S-gene named Mildew Locus O (MLO) was found to induce resistance to Blumeria graminis f. sp. hordei in barley and has been used for over 70 years in breeding programs [6]. Many MLO orthologs have been identified history of safe use can be demonstrated as a result of traditional usage and/or widespread cultivation, the donor plant and/or gene/allele and the associated trait can be subjected to a reduced risk assessment [37]. In other words, the risk assessment will consider both the probability for such an allele to be obtained by conventional breeding or to be already in place in the breeders' gene pool. A genomic survey on the genetic diversity already present in the germplasm of a species can assist this step. The knowledge of existing defective alleles in the germplasm of a species can assist this risk assessment step and provide a resource for tomato genomic-assisted breeding programs as well as tailored gene editing approaches for resistance to biotic stresses.
Here, we analyzed in a set of 360 resequenced tomato genotypes, a set of 125 genes belonging to ten S-gene families (PMR4, PMR5, PMR6, MLO, BIK1, DMR1, DMR6, DND1, CPR5, and SR1), with the goal of evaluating the frequency of high-impact mutations and highlight potential sources of broad-spectrum and recessively inherited resistance. The identified mutations were screened to assess their likely impact on protein functionality. Genotypes carrying high-impact homozygous SNPs in S-genes were assayed for resistance to O. neolycopersici, of which two highlighted reduced susceptibility. Moreover, sgRNA sequences were designed for eight S-genes, and they were made available for the creation of optimal gene editing constructs.

Results and Discussion
In order to identify natural mutant alleles of tomato S-genes, we analyzed the genomic diversity of the cultivated tomato germplasm consisting of a set of 360 genotypes (Table S1). The data were divided into different datasets: (1) a collection of 168 big-fruited S. lycopersicum accessions (fruit weight = 111.33 ± 68. 19) and 17 modern commercial hybrids (F1), altogether called BIG); (2) a collection of 53 S. pimpinellifolium accessions (fruit weight = 2.04 ± 0.85 g, called PIM); (3) a collection of 112 S. lycopersicum var. cerasiforme accessions (fruit weight = 13.29 ± 9.54, called CER). The whole collection of 360 genotypes was referred to as ALL. We selected 10 S-genes (Table S2), of which some are known to reduce susceptibility to pathogens when knocked out or knocked down [2]. The selected S-genes including PMR4, PMR5, PMR6, MLO, BIK1, DMR1, DMR6, DND1, CPR5, and SR1, which facilitate host compatibility by being involved in host recognition and penetration, negative regulation of host immunity, or pathogen proliferation. This work represents the first examination at a genomic level of S-genes and existing putative defective alleles in the Solanaceae family.
Initially, a blastP analysis was performed ( Figure 1) to identify homologs from the 10 chosen genes. A total of 125 S-gene homologs were obtained and used for further analyses ( Table 1). The genome sequences of 360 accessions [38] were analyzed (Table S1, genotypes) for SNP mining, and 11,620,517 SNPs/indels were detected across 34,725 tomato gene locations. The number of SNPs over 185 accessions (BIG) was 7,744,233 (67%). In the 125 gene member subset (Table 1), 54,000 SNPs/indels were observed using the SNPeff pipeline. Among these, 51,000 had no effect on protein function, with them being synonymous with SNPs or located in intergenic regions. A total of 1500 SNPs had a low impact, and 1300 had a moderate impact. A total of 119 high-impact SNPs were observed. The distribution of these SNPs was studied among the 10 S-genes ( Figure 2).    Figure 1. Flowchart of the high-impact SNP mining process within the available sequenced tomato germplasm (the data were originally retrieved from Lin et al. 2014 [38]).  Despite differences in the number and type of genes considered, recent analyses on the nucleotide diversity of S-genes in other species such as apple [39,40] and grape [41] have been conducted. The number and density of SNPs observed in grape (V. vinifera) was 15 SNPs per Kb (1SNP every 66 bp), while in both wild species and hybrid/wild Vitis species, it was 18 SNPs per Kb (1 SNP every 55 bp) [41]; in apple (M. domestica), in Mlo-like genes, values of~41 SNPs per Kb and 1SNP every 24 bp were observed [39]. These values were higher than the ones we obtained, i.e., 1 SNP every 1031 bp in the whole dataset and 1 SNP every 472 bp in tomato (BIG), reflecting the different genetic structures of the species, the homozygosity level, and their domestication history.
Our analyses (Table 1) showed that when both wild and cultivated tomato genotypes were considered, the number of SNPs and their density were higher (119 SNPs with a density of 1 SNP per gene). However, when only "big tomato" genotypes were considered, the number of SNPs and their density was halved (58 SNPs with a density of 0.5 SNPs per gene); this suggests that there is a specific reservoir of S-gene alleles in the wild tomato germplasm that can be used for breeding. Haplotype analysis of the 119 SNPs was conducted, revealing the presence of specific conserved haplotypes ( Figure S1) that were clearly distinguishable from other haplotypes, providing useful information for breeders.
We analyzed the potential impact of 119 highly detrimental mutations, including frameshift-inducing mutations that result in major damage such as knock-out mutations. However, there are also many moderate-impact mutations (1326) that may lead to changes in protein conformation and function. Although we did not delve into these effects in detail, they are worth monitoring in order to gain a deeper understanding of altered S-genes. Among the 119 SNPs, 10 were validated in 10 genotypes readily available within the research group facilities (http://eurisco.ecpgr.org, accessed on 1 April 2022) through Sanger sequencing with a 90% validation rate (Table S3); indeed, some non-validated SNPs were mutations detected in a heterozygous condition or possessed the same allelic profile as the reference; the emergence of such heterozygous/reference-like SNPs during the validation step can be explained by the high genetic diversity existing within the analyzed germplasm set (Figure 3), as observed by Li et al. [15]. The number of SNPs in each family was related to their length, but the SNP density appeared higher in certain genes (Table 1, PMR 4, PMR5, PMR6, MLO, BIK1, and CPR5) and lower in others (DMR1, DND1, SR1, and DMR6). This difference might be due to the fact that some genes are single-copy genes or present in a nodal position (hub) within the cell regulation network, hardly supporting deleterious SNPs [42]. On the contrary, the presence of multiple genes in a gene family may mitigate the impact of deleterious mutations [43]. In specific cases, such as DMR1, a single-copy tomato gene exhibited a deleterious mutation (a gained stop codon) in homozygosity, but its potential impact on protein functionality was likely reduced, as the causative SNP was located in the last six codons of the gene (1129/1134) (Table S4). In some others (e.g., BIK1-like genes), many occurrences were observed since all the 51 serine-threonine kinases, belonging to the RLCK (clade VII) repertoire, were analyzed.
Based on the nucleotide diversity (Pi) analysis, we observed that bottleneck events appear to be present in some S-genes ( Figure S2, BIK-Solyc01g028830 and Solyc05g007050; DMR1-Solyc04g008760; DND-Solyc02g088560; MLO-Solyc06g082820; PMR4-Solyc01g006350; and PMR6-Solyc06g071020) considering BIG varieties in relation to the other two groups (PIM and CER). In general, we found that the PIM group showed the highest diversity in S alleles, while the CER group exhibited a moderate level of diversity, and the BIG group showed the least diversity in accordance with the known tomato history of domestication. Indeed, in few cases the genetic variation is reduced in BIG and CER in a similar way while not in PIM (e.g., MLO-Solyc02g077570 and PMR4-Solyc01g006350); in others (e.g., BIK-Solyc06g005500; BIK-Solyc06g083500; and PMR6- The number of SNPs in each family was related to their length, but the SNP density appeared higher in certain genes (Table 1, PMR 4, PMR5, PMR6, MLO, BIK1, and CPR5) and lower in others (DMR1, DND1, SR1, and DMR6). This difference might be due to the fact that some genes are single-copy genes or present in a nodal position (hub) within the cell regulation network, hardly supporting deleterious SNPs [42]. On the contrary, the presence of multiple genes in a gene family may mitigate the impact of deleterious mutations [43]. In specific cases, such as DMR1, a single-copy tomato gene exhibited a deleterious mutation (a gained stop codon) in homozygosity, but its potential impact on protein functionality was likely reduced, as the causative SNP was located in the last six codons of the gene (1129/1134) (Table S4). In some others (e.g., BIK1-like genes), many occurrences were observed since all the 51 serine-threonine kinases, belonging to the RLCK (clade VII) repertoire, were analyzed.
Based on the nucleotide diversity (Pi) analysis, we observed that bottleneck events appear to be present in some S-genes ( Figure S2, BIK-Solyc01g028830 and Solyc05g007050; DMR1-Solyc04g008760; DND-Solyc02g088560; MLO-Solyc06g082820; PMR4-Solyc01g006350; and PMR6-Solyc06g071020) considering BIG varieties in relation to the other two groups (PIM and CER). In general, we found that the PIM group showed the highest diversity in S alleles, while the CER group exhibited a moderate level of diversity, and the BIG group showed the least diversity in accordance with the known tomato history of domestication. Indeed, in few cases the genetic variation is reduced in BIG and CER in a similar way while not in PIM (e.g., MLO-Solyc02g077570 and PMR4-Solyc01g006350); in others (e.g., BIK-Solyc06g005500; BIK-Solyc06g083500; and PMR6-Solyc06g071020), the genetic variation is reduced in all the three groups.

Homozygous SNPs/Indels
The number of genotypes with two SNPs was 174 (whole dataset) and 36 (BIG tomatoes), while those with three or more SNPs were 114 and 14 (Table 2, Figure 4), respectively. This high representation can be explained by the presence of multigene families such as BIK1-like which might present some degree of redundancy. While examining those high-impact mutations, the results revealed that certain mutations appeared frequently in the cultivated germplasm and were preserved across various genotypes, as displayed in Table S5. One example is BIK1 (Solyc05g024290, SNP in chr5:31013858), which could be maintained under selective pressure in clustering genotypes within the germplasm materials ( Figure 3, e.g., Rowpac, M-82, Santa Chiara, Hunt101, Puno I, and E-6203). The genotypes carrying a high number of SNPs (three or more) were approximately a dozen (e.g., Panama, N 739, Rowpac, Micro-Tom, Guayaquil, Droplet, M-82, Hawaii 7998, and KR2), and information about these SNPs is provided in Table S4. Certain mutations, such as BIK1-like/Solyc01g008860 and DMR1-like/Solyc04g008760 in specific genotypes (e.g., N-739/TS-074), appeared to be of lower relevance as they were present in the final percentile of the sequence length (Table S4). Table 2. Detailed statistics on the allelic richness of the tomato genotypes (BIG, from Lin et al. 2014 [38]) considering the high-impact SNPs in the whole gene dataset and in the selected S-genes.

High-Impact SNPs
High-Impact SNPs in S-Genes

Heterozygous SNPs/indels
The incidence of deleterious SNPs in S-genes in a heterozygous condition was comparatively lesser than that of homozygous ones, as observed in both the complete germplasm collection (ALL) and the BIG tomato varieties (Table 2, Figure 4). This frequency may be due to the genetic structure of tomato as an inbred species, which tends to have a low number of heterozygous mutations [15]. However, the number appears relatively high because such mutations, although harmful, can be maintained in the genome if the normal allelic copy continues to function. This high frequency is particularly noticeable in the case of multiple member S-genes (e.g., BIK1-like genes) which may exhibit some redundancy and have no effects, or due to the position of the SNP within the gene (e.g.,

Heterozygous SNPs/Indels
The incidence of deleterious SNPs in S-genes in a heterozygous condition was comparatively lesser than that of homozygous ones, as observed in both the complete germplasm collection (ALL) and the BIG tomato varieties (Table 2, Figure 4). This frequency may be due to the genetic structure of tomato as an inbred species, which tends to have a low number of heterozygous mutations [15]. However, the number appears relatively high because such mutations, although harmful, can be maintained in the genome if the normal allelic copy continues to function. This high frequency is particularly noticeable in the case of multiple member S-genes (e.g., BIK1-like genes) which may exhibit some redundancy and have no effects, or due to the position of the SNP within the gene (e.g., DMR1/Solyc04g008760 in TS-113 and BIK1-like/Solyc01g008860 in Chiclayo, Table S5). If two SNPs are considered, the number of genotypes was 89 (ALL) and 10 (BIG), while if three SNPs are considered, the number of genotypes decreases to 54 (ALL) and 3 (BIG). Some heterozygous mutants for S-genes were also identified, which have a 50% chance of acquiring resistance through natural mutagenic effects.

sgRNA Design
Introgression of S-genes' alleles through breeding into elite varieties is possible, but it is a long and labor-intensive process and has limitations due to linkage drag. To address this issue, in analogy with the work from Prajapati and Nain [44], sgRNA sequences were designed for eight of the proposed S-genes (Table S6) and made available to a wider audience through the creation of optimal gene editing constructs. In total, 113 sgRNAs were designed, considering only the highly specific categories (A0, B0, A0.1, and B0.1) for CRISPR-Cas9-mediated genome editing to minimize off-target events. Specifically, 39 A0, 20 A0.1, 48 B0, and 6 B0.1 sgRNAs were designed. Each gene was equipped with at least one useful sgRNA, with PMR4, PMR5, PMR6, MLO1, and BIK1 having the most sgRNAs at 13, 15, 20, 8, and 50, respectively.

Disease Assay
As a preliminary assay, five genotypes, readily available within the research group facilities (http://eurisco.ecpgr.org, accessed on 1 April 2022), were selected for a disease assay to assess their resistance to O. neolycopersici (On). They included three varieties (PunoI/TS-108, Droplet/TS-296, and M82/TS-003) with deleterious SNPs and two varieties with no deleterious SNPs in the S-genes (VF-36/TS-01 and Moneymaker/TS-02). M-82 carried three mutated genes (BIK1-like Solyc05g024290 and Solyc04g050970, and PMR4like/Solyc01g073750), which introduced a stop codon and produced truncated proteins. Puno-I carried two mutated genes (BIK1/Solyc05g024290 and PMR4/Solyc01g073750) in the middle of the gene, resulting in truncated proteins. Droplet had four high-impact mutations, including one in the BIK1-like gene (Solyc04g050970), two in the Mlo1-like gene (Solyc02g077570), and one in the PMR4-like gene (Solyc01g073750). These varieties showed sequences that predicted the presence of truncated susceptibility proteins in a homozygous state. To assess whether these selected varieties with deleterious SNPs in S-genes had higher resistance to powdery mildew, we inoculated all of them with O. neolycopersici and evaluated the disease index (Tables 3 and S7, Figure S3). Two of them (Puno1 and M-82) showed reduced susceptibility to O. neolycopersici based on visual scoring of disease symptoms, while no significant differences in the disease index were observed in the others. The reason for this incomplete resistance may lie in the genes under consideration (BIK1like: Solyc05g024290 and Solyc04g050970). The RLCK family encodes for a series (~50) of serine/threonine protein kinases with a role in post-translational regulation through, in the case of BIK-1, the phosphorylation of FLS2 and BAK1 [45,46]. The latter gene is involved in pathogen-associated molecular pattern (PAMP)-triggered immunity (PTI) signaling, including calcium signaling and defense responses downstream of FLS2. With the RLCK subfamily VII being a large clade (46 members in Arabidopsis; 51 in the present work), whose members play a role both specifically or redundantly in immune signaling, some BIK1-like genes could have a vicarious role in case of the emergence of mutant forms (e.g., Solyc04g050970 (49.186.199 bp, chromosome 4) in M82 and Solyc05g024290 (31.013.858 bp, chromosome 5) in the PunoI and M82 genotypes. Moreover, the Mlo1-like (Solyc02g077570) and PMR4-like (Solyc01g073750) genes were found to differ from the SlMlo1 and PMR4 genes (Table 1), which were previously known to provide complete resistance in the pres-ence of a loss-of-function allele. Our research was an extensive genomic study incorporating a small pilot study on the impact of mutations on pathogenesis. We carried out pathogenesis assays using plant material readily available at our academic institutions. However, restrictions imposed by the recent Nagoya protocol on plant material transfer and difficulties in obtaining material for phytosanitary reasons limited our scope. We propose further research on accessions such as Panama, N739, and Rowpac (which have 6, 5, and 5 homozygous deleterious SNPs respectively)-a poorly characterized plant material that deserves further investigation. These materials should also be analyzed using different fungal pathogens (Phytophthora infestans, Botrytis, etc.) or bacteria (Pseudomonas syringeae).

Data Mining on S-Genes
A preliminary blastP (https://ftp.ncbi.nlm.nih.gov/blast, accessed on 1 March 2022) analysis allowed us to identify the possible orthologs for susceptibility genes, using information from different plant species (from Schie and Takken [2]; Table S1) and by considering as a preferential choice criterion the e-value (range 0-1 × 10 −10 ) and the percentage of similarity and the query coverage. Since many genes were present in multigene families, the filtering criteria varied, and previous functional annotations were used to filter out non appropriate candidates.

SNP/Indel Data
The genotypic data discussed in Lin et al.'s study [38] were retrieved from SGN (ftp://ftp.solgenomics.net/genomes/tomato_360, accessed on 1 April 2022) as raw vcf files. The data derived from 360 genotypes (Table S1) were divided into different datasets: (1) a collection of 168 big-fruited S. lycopersicum accessions and 17 modern commercial hybrids (F 1 ), altogether called BIG); (2) a collection of 53 S. pimpinellifolium accessions (called PIM); (3) a collection of 112 S. lycopersicum var. cerasiforme accessions (called CER). the whole collection of 360 genotypes was referred to as ALL. A principal component analysis (PCA) was conducted using the R-based ClustVis suite (https://biit.cs.ut.ee/clustvis, accessed on 1 June 2022). The dataset used for PCA was the whole dataset pruned and filtered using vcftools (https://vcftools.github.io, accessed on 1 June 2022), using the option --maxmissing = 0.2, for filtering loci. The genetic variation of the S alleles was analyzed using the nucleotide diversity (Pi) index implemented in vcftools (https://vcftools.github.io, accessed on 1 June 2022) among the different groups (PIM, CER, and BIG). We focused on a 100 kb region, centered around each deleterious SNP with a 5 k window.

SNP Annotation
The SNP data were newly annotated using the v2.5 assembly with ITAG2.4 information. The SnpEff v5.0 program was adopted to infer functional annotation of any SNPs/indels and any potential deleterious effect on protein structure [47]. The effect of each SNP/indel was classified into four of classes of effects: (1) high effect, as variants changing the frameshift, thereby introducing/eliminating stop codons or modifying splice sites; (2) moderate effect, as variants altering the aminoacidic sequence; (3) low effect, as synonymous variants in coding regions; and (4) modifier effect, as variants located outside the coding sequence (non-transcribed regions or introns). Annotated vcf files from each individual were merged into a single file to integrate the entirety of the information. Bedtools intersect (https://github.com/arq5x/bedtools2, accessed on 1 June 2022) was used to screen for overlaps between the genomic features related to the S-genes (in gff format), and the SNP positions emerged from the SnpEff analysis; genomic coordinates were lifted over from SL2.50 to SL5.0 [48]. Functionally annotated SNPs from both the BIG and ALL dataset were inspected for different categories (high, moderate, and low impact) and were considered and counted for each accession, through custom bash scripts. Conserved deleterious SNPs were utilized as informative markers for generating haplotypes of SNPs, and the resulting haplotype information was analyzed around the S-genes using the software tool Tassel. All the categories were decomposed into homozygous and heterozygous SNPs/indels. A subset of SNPs was validated through Sanger sequencing (BMR Genomics Service, Padova, Italy) of the PCR-amplified gene fragments using the primers listed in Table S3.

Single Guide RNA (sgRNA) Design on Target Genes
The CRISPR-PLANT v2 platform (http://omap.org/crispr2/CRISPRsearch.html, accessed on 1 July 2022) was used to design sgRNAs in S-genes using the gene code as a query for the scan of the SL2.5 genome. We selected sgRNAs only present in exons, discarding the ones with a high possibility to give off-targets. Then, the rest of the sgRNAs were selected using their quality, based on the mismatch score in their seed sequence. The sgRNAs were divided by the CRISPR-PLANT software into different quality classes (A0, B0, A0.1, B0.1, A1, B1, A2, and B2), with A0 being the most specific and B2 being the least specific. The sgRNA sequence of each selected S-gene and the relative quality is reported in Table  S4; only the A0, A0.1, B0, and B0.1 classes were reported in the output as highly specific sgRNAs for CRISPR-Cas9 mediated genome editing.

Disease Assay
Thirty seeds of selected accession, three with mutations (M-82, Puno-I, and Droplet) and two controls (VF-36 and MoneyMaker) were sowed and then inoculated with the Wageningen University isolate of O. neolycopersici (On) by spraying 4-week-old plants with a suspension of conidiospores obtained from the leaves of infected tomato Moneymaker plants and adjusted to a concentration of 3.5 × 10 4 spores per ml. The Moneymaker variety was used as the susceptible control. The inoculated plants were grown at 20 ± 2 • C with 70 ± 15% relative humidity and a day length of 16 h in a greenhouse of Unifarm of Wageningen University & Research, the Netherlands, and placed randomly within the greenhouse. Disease index scoring was carried out 10 and 12 days after inoculation. Symptoms were scored visually using a scale from 0 to 3 as described by Huibers et al. [14]. Statistical differences between each variety and the control were analyzed using a two-tailed t-test (* p < 0.05).

Conclusions
In this study, we conducted a comprehensive genomic survey of various tomato genotypes to identify putative defective alleles of susceptibility genes. Our analysis revealed the presence of natural homozygous/heterozygous deleterious alleles, which we further validated through Sanger sequencing. Interestingly, we observed that certain genotypes carrying high-impact homozygous SNPs in S-genes exhibited significantly reduced susceptibility to O. neolycopersici. These findings offer valuable insights for plant genetics and have the potential to enhance genomic-assisted breeding programs focused on developing resistance to biotic stresses. Nonetheless, it is important to acknowledge that incorporating desirable alleles into elite genotypes can be a time-consuming process with challenges such as linkage drag. To address this, we have also explored the application of a gene editing approach using single guide RNA (sgRNA) design. This alternative method shows promise in disabling targeted genes, presenting a powerful means to obtain elite tomato genotypes resistant to biotic stresses. Furthermore, our genomic survey can contribute to the evaluation and risk assessment of new genomic techniques by tracking existing alleles in the context of their "History of Safe Use" [36].
This study underscores the significance of publicly available data in enabling further analyses and, more importantly, highlights the wealth of potentially beneficial alleles already present in the existing tomato breeding pool. If proven to reduce disease susceptibility, these genes could serve as long-lasting sources of tolerance against various pathogens.

Data Availability Statement:
The sequencing data used in this study are openly available in the NCBI database (SRA/SRP045767).