Characterization of full-length transcriptome and mechanisms of sugar accumulation in Annona squamosa fruit

Annona squamosa is a multipurpose fruit tree employed in nutritional, medicinal, and industrial fields. Its fruit is significantly enriched in sugars, making it an excellent species to study sugar accumulation in fruit. However, the scarcity of genomic resources hinders genetic studies in this species. This study aimed at generating large-scale genomic resources in A. squamosa and deciphering the molecular basis of its high sugar content. Herein, we sequenced and characterized the full-length transcriptome of A. squamosa fruit using PacBio Iso-seq. In addition, we analyzed the changes in sugar content over five fruit growth and ripening stages, and we applied RNA-sequencing technology to investigate the changes in gene expression related to sugar accumulation. A total of 783,647 circular consensus sequences were generated, from which we obtained 48,209 high-quality, full-length transcripts. Additionally, 1,838 transcription factors and 1,768 long non-coding RNAs were detected. Furthermore, we identified 10,400 alternative splicing events from 2,541 unigenes having on average 2–4 isoforms. A total of 15,061 simple sequence repeat (SSR) motifs were discovered and up to three primer pairs were designed for each SSR locus. Sugars mainly accumulate during the ripening stage in A. squamosa. Most of the genes involved in sugar transport and metabolism in the fruit were progressively repressed overgrowth and ripening stages. However, sucrose phosphate synthase involved in sucrose synthesis and more importantly, isoamylase, alpha-amylase, beta-amylase, 4-alphaglucanotransferase genes involved in starch degradation displayed positive correlations with sugar accumulation in fruit. Overall, we provide here a high-quality, full-length transcriptome assembly which will facilitate gene discovery and molecular breeding of A. squamosa. We found that starch degradation during fruit ripening was the main channel for sugar accumulation in A. squamosa fruit, and the key genes positively linked to sugar accumulation could be further studied to identify targets for controlling sugar content in A. squamosa fruit.


Introduction
Annona squamosa Linn. also known as sugar apple is a popular fruit tree widely cultivated in tropical and subtropical areas. The soft granular juicy and sugary pulp of the fruit is enriched in dietary fibers, carbohydrates, and various vitamins, minerals, and fatty acids (Brandão and Santos, 2016). The seeds contain a significant amount of oil with various potential industrial applications (Mariod et al., 2010;Omkaresh et al., 2017;Zahid et al., 2018a;Abdualrahman et al., 2019). Besides, A. squamosa has been shown to possess important medicinal attributes. Different tissues, including leaf, root, fruit, stem, seed, flower, and consumer choice (Vimolmangkang et al., 2016); therefore, the development of cultivars with high sugar content is a priority in fruit breeding programs.
Sugar transport, metabolism, and accumulation have been well studied in plants (Nguyen-Quoc and Foyer, 2001). Sugar transporters are the main carriers of sucrose from source tissues to sink tissues such as fruit. They play a preponderant role in sugar accumulation in fruit. Besides, several genes, such as sucrose synthase and invertases, are involved in the conversion of sucrose into glucose, fructose, and other sugars. Activities of sucrose phosphate synthase or sucrose phosphate phosphatase lead to the re-synthesis of sucrose from fructose 6-phosphate and uridine diphosphateglucose (Rolland et al., 2006;Li et al., 2012). In addition to sucrose transport and metabolism in fruit, starch degradation is another important mechanism for sugar accumulation. Starch is mainly accumulated in developing fruit, while during fruit ripening, starch is hydrolyzed into soluble sugars by the activity of several genes such as amylases (Cordenunsi-Lysenko et al., 2019). It has been shown that starch content increased up to 35% in the pulp of developing banana while at late ripening, starch content was less than 1%. Meanwhile, soluble sugars increased from 1% in developing banana to reach up to 20% of the fresh weight at the ripening stage (Soares et al., 2011).
Using the RNA-sequencing approach, the global change in gene expression levels and mechanisms of sugar metabolism have been reported in various fruit crops such as banana, melon, and watermelon (Asif et al., 2014;Gao et al., 2018;Schemberger et al., 2020). However, the molecular mechanisms of the significantly high sugar content in A. squamosa fruit have not been elucidated. In non-model species without reference genome sequence, full-length transcriptome generated by third-generation sequencing platforms, such as PacBio RS and Nanopore, provides high quality basic genomic resources (Chen et al., 2019;Qian et al., 2019;Yue et al., 2020).
The scarcity of genomic resources in A. squamosa limits gene discovery, functional analysis, and molecular breeding (Gupta et al., 2015;Liu et al., 2016). This study aimed to generate a high-quality, full-length transcriptome in A. squamosa and clarify the molecular basis of its high sugar content. We successfully sequenced and characterized for the first time the full-length transcriptome of A. squamosa using PacBio Sequel platform. With the relatively high error rate of PacBio Sequel, we integrated data from Illumina short-read sequencing at five fruit growth and ripening stages to improve sequencing accuracy (Koren et al., 2012). We further examined changes in gene expression in sugar transport and metabolism, and we also investigated mechanisms of high sugar content in A. squamosa fruit.

Plant materials
Two cultivars of Annona squamosa Linn. ('African Pride' and 'Gefner') were used as plant materials in this study. The cultivars 'African Pride' and 'Gefner' originated in South Africa and Israel, respectively (Paull and Duarte, 2011). These two cultivars were used in this study because they are widely grown in China and very popular among customers. The materials were obtained from the Horticultural Research Institute, Guangxi Academy of Agricultural Sciences, China, and formal identification of the materials was conducted by the corresponding author of this paper. Thirty fruit samples, for each cultivar, were collected from 5-year-old trees at five growth and ripening stages from April to July 2019 (Fig. 1). Fruit from each cultivar and stage were separated into three replicates (ten fruits/ replicate) and pulp samples were collected, frozen in liquid nitrogen, and used for RNA-seq. In addition, samples from both cultivars at all stages were mixed and used for PacBio sequel sequencing.

Analysis of fruit sugar content
The soluble sugars such as sucrose, glucose, and fructose of A. squamosa fruit collected at the different growth and ripening stages were analyzed using high-performance liquid chromatography (HPLC) as described by Folgado et al. (2017). Approximately 200 mg of each sample was accurately weighed in three replicates and inserted in a 2 mL EP tube. Then, 0.6 mL 2-chlorophenylalanine (4 ppm) methanol (−20°C) was added and vortexed for 30 s; 100 mg glass beads were added, and the samples were put into a TissueLysis II tissue grinding machine. Samples were ground at 25 Hz for 60 s. The tubes were placed in an ultrasound bath at room temperature for 15 min; Then, centrifuged at 25°C for 10 min at 1750×g, and the supernatant was filtered through a 0.22-μm membrane to obtain the prepared samples for liquid chromatographymass spectrometry. Chromatographic separation was performed in a Thermo Ultimate 3000 system equipped with an ACQUITY UPLC ® HSS T3 (150 × 2.1 mm, 1.8 μm, Waters) column maintained at 40°C. The temperature of the autosampler was 8°C. Gradient elution of analytes was carried out with 0.1% formic acid in water (D) and 0.1% formic acid in acetonitrile (C) or 5 mM ammonium formate in water (B) and acetonitrile (A) at a flow rate of 0.25 mL/ min. The concentration of sugars in the tissues was calculated based on the molar concentration obtained from the measurements, and the initially fresh weight used.
Library construction and single molecule real-time sequencing Total RNA was extracted by grinding mixed A. squamosa fruit tissues from the five growth and ripening stages and two cultivars in TRIzol reagent (Life Technologies) on dry ice and processed following the protocol provided by the manufacturer. The integrity of the RNA was determined with the Agilent 2100 Bioanalyzer and agarose gel electrophoresis. The purity and concentration of the RNA were determined with the Nanodrop microspectrophotometer (ThermoFisher). The mRNA was enriched by Oligo(dT) magnetic beads. Then the enriched mRNA was reverse transcribed into cDNA using Clontech SMARTer PCR cDNA Synthesis Kit. PCR cycle optimization was used to determine the optimal amplification cycle number for the downstream large-scale PCR reactions. Then, the optimized cycle number was used to generate double-stranded cDNA. In addition, >4 kb size selection was performed using the Blue Pippin TM Size-Selection System and mixed equally with the no-sizeselection cDNA. Then, large-scale PCR was performed for the next SMRTbell library construction. cDNAs were DNA damage repaired, end repaired, and ligated to sequencing adapters. The SMRTbell template was annealed to the sequencing primer and bound to polymerase and sequenced on the PacBio Sequel platform using P6-C4 chemistry with 10 h movies by Gene Denovo Biotechnology Co. (Guangzhou, China).

PacBio long-read processing
The raw sequencing reads of cDNA libraries were classified and clustered into transcript consensus using the SMRT Link v5.0.1 pipeline (Gordon et al., 2015) supported by Pacific Biosciences. Briefly, circular consensus sequence (CCS) reads were extracted out of the subreads BAM file. Then, CCS reads were classified into full-length nonchimeric (FL), non-full-length (nFL), chimeras, and short reads based on cDNA primers and poly-A tail signal. Short reads (<50 bp) were discarded. Subsequently, the full-length non-chimeric (FLNC) reads were clustered by Iterative Clustering for Error Correction (ICE) software to generate the cluster consensus isoforms. To improve the accuracy of PacBio reads, two strategies were employed. First, the nonfull-length reads were used to polish the above-obtained cluster consensus isoforms by Quiver software (Chin et al., 2013) to obtain the FL polished, high-quality consensus sequences (accuracy ≥ 99%). Next, the low-quality isoforms were further corrected using Illumina short reads obtained from the same samples by the LoRDEC tool (version 0.8) (Salmela and Rivals, 2014). Then, the final transcriptome isoform sequences were filtered by removing the redundant sequences with software CD-HIT-v4.6.7 (Fu et al., 2012) using an identity threshold of 0.99.

Annotation of isoforms and transcription factor analysis
Basic annotation of isoforms includes protein functional annotation, pathway annotation, COG/KOG functional annotation, and Gene Ontology (GO) annotation. To annotate the isoforms, they were BLAST analyzed against the NCBI non-redundant protein (Nr) database (http:// www.ncbi.nlm.nih.gov), the Swiss-Prot protein database (http://www.expasy.ch/sprot), the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (http://www.genome. jp/kegg), and the COG/KOG database (http://www.ncbi.nlm. nih.gov/COG) with BLASTx program (http://www.ncbi.nlm. nih.gov/BLAST/) at an E-value threshold of 1e−5 to evaluate sequence similarity with genes of other species. GO annotation was analyzed by Blast2GO software (Conesa et al., 2005) with Nr annotation results of isoforms. Then, the functional classification of isoforms was performed using the WEGO software . Protein coding sequences of isoforms were aligned by hmmscan to PlantTFdb (http:// planttfdb.gao-lab.org/; Jin et al., 2017) to predict transcription factor (TF) families.

Characterization of long non-coding RNAs
The software CNCI (version 2) (Sun et al., 2013) and CPC (version 0.9-r2, Kong et al., 2007) were used to assess the protein-coding potential of transcripts without annotations by default parameters for potential long non-coding RNAs (lncRNA).

Alternative splicing detection and validation
To analyze alternative splicing (AS) events of transcript isoforms, COding GENome reconstruction Tool (Cogent) (Li et al., 2017) was firstly used to partition transcripts into gene families based on k-mer similarity and reconstruct each family into a coding reference genome based on De Bruijn graph methods. Then, the software SUPPA (Alamancos et al., 2015) was used to analyze alternative splicing events of transcript isoforms.

RNA-Seq library construction and sequencing
After high-quality RNA was extracted, mRNA was enriched by Oligo(dT) beads. Then, the enriched mRNA was fragmented into short fragments using fragmentation buffer and reverse transcribed into cDNA with random primers. Second-strand cDNA was synthesized by DNA polymerase I, RNase H, dNTP, and buffer. Then, the cDNA fragments were purified with QiaQuick PCR extraction kit, end repaired, poly(A) added, and ligated to Illumina sequencing adapters. The ligation products were size selected by agarose gel electrophoresis, PCR-amplified, and sequenced using Illumina HiSeqTM4000 by Gene Denovo Biotechnology Co. (Guangzhou, China).

Gene expression quantification and differentially expression analysis
In order to get high-quality clean reads, raw reads were filtered according to the following rules: (a) Removing reads containing adapters; (b) Removing reads containing more than 10% of unknown nucleotides (N); (c) Removing lowquality reads containing more than 40% of low quality (Qvalue ≤ 20 bases). The high-quality clean reads were mapped to the reference transcriptome using short reads alignment tool Bowtie2 (Langmead and Salzberg, 2012) by default parameters. The gene abundances were calculated and normalized to Reads Per kb per Million reads (RPKM) (Mortazavi et al., 2008).
To identify differentially expressed genes (DEG) across groups, the edgeR package (http://www.r-project.org/) was used. We identified genes with a fold change ≥2 and a false discovery rate (FDR) <0.05 in a comparison as significant DEGs. DEGs were then subjected to enrichment analysis of KEGG pathways (Kanehisa et al., 2008).

Gene expression analysis using quantitative real time PCR
The quantitative real-time PCR (qRT-PCR) was performed to validate the RNA-seq analysis on RNA extracted from pulp samples of A. squamosa fruits at 30 days after pollination and 6 days after harvest. Technical procedures were as fully described by Dossa et al. (2019). Specific primer pairs were designed using PrimerPremier 3 tool (primer length: 18 ± 2 bp; search node: automatic; amplicon size: 70-200 bp; melting temperature is 60 ± 3°C) for the selected genes (Tab. S1). RNA was first transcribed into cDNA using a cDNA synthesis kit (Invitrogen, Carlsbad, CA, USA). The qRT-PCR was conducted on a Roche Lightcyler ® 480 instrument using the SYBR Green Master Mix (Vazyme), according to the manufacturer's protocol. The GAPDH gene was used as an endogenous control.

Portray of the full-length transcriptome sequencing and annotation
High-quality RNAs were extracted from the fruits of two A. squamosa cultivars ('African Pride' [AP] and 'Gefner' [LZ]) at three growth (30, 100, and 130 days after pollination [DAP]) and two ripening stages (6 and 8 days after harvest [DAH]) ( Fig. 1) and used for single-molecule real-time (SMRT) Bell libraries on the PacBio Sequel platform using the latest P6-C4 chemistry. The specific bioinformatics analysis pipeline for data is outlined in Fig. 2A. In total, 57 Gb clean reads with 32,757,220 subreads were generated (Tab. S1). A total of 783,647 circular consensus sequences (CCS) with a mean length of 2,398 bp were classified as FL transcripts based on the presence of 5'-primers, 3'-primers, and poly(A) tails. After polishing, clustering, and demultiplexing of FL transcripts, we obtained 48,209 nonredundant high-quality FL non-chimeric (FLNC) transcripts with an average length of 2,076.6 bp, N50 of 2,500 bp, and GC content of 44.58% (Tab. S2). The length distribution of the obtained FLNC sequences ranged from 61 to 11,414 bp ( Fig. 2B), which is similar to previous SMRT reports in plants (Feng et al., 2019;Yue et al., 2020).
Functional annotation of the FLNC transcripts was conducted by using four different public databases, including Nr, Swissprot, KOG, and KEGG (Apweiler, 2001;Koonin et al., 2004;Kanehisa et al., 2008;Wilke et al., 2012). We identified homologs for 45,989, 39,569, 32,035 and 23,647 FLNC transcripts in Nr, Swissprot, KOG and KEGG, respectively. This represents in total 46,008 successfully annotated FLNC transcripts (95% of the total FLNC transcripts) with 19,993 transcripts (41% of the total FLNC transcripts) simultaneously annotated in all databases (Fig. 2C). The comparison of the FLNC transcript sequences with available genome sequenced plant species from the Nr database showed that A. squamosa shared the highest transcript number with Nelumbo nucifera (Fig. 2D).
We further compared the FLNC sequences to PlantTFdb (Jin et al., 2017) to predict TF families. In total, 1,838 TF encoding genes were detected and classified into 53 families (Tab. S3). C3H and bHLH family members were the most abundant, while STAT and S1Fa-like families had each a single member.

Alternative splicing analysis
Since PacBio sequencing generates full-length transcripts, it offers the possibility of identifying alternative splicing (AS) by directly comparing isoforms of the same gene without de novo assembly (Qian et al., 2019). In this study, we identified 10,400 AS events from 2,541 unigenes (Tab. S3). Most of the unigenes had 2-4 isoforms, while COGENT005277 featured 48 isoforms (Fig. 3A, Tab. S4). To obtain detailed information, the number of unigenes falling in each of the seven primary AS types, including exon skipping (SE), alternative 5' and 3' splice sites (A5/A3), mutually exclusive exons (MX), intron retention (RI), and alternative first and last exons (AF/AL), was investigated (Fig. 3B). RI was by far the most abundant AS event (47% of the total unigenes regulated by AS) in A. squamosa, while only one unigene (COGENT004817) showed MX AS event (Fig. 3C, Tab. S4).

Long non-coding RNAs and Simple Sequence Repeat characterizations
The FL transcript sequences that were not annotated to the four databases were submitted to the CNCI (Sun et al., 2013) and CPC (Kong et al., 2007) programs, and only lncRNAs predicted by both programs were kept. From the 48,209 FLNC transcripts, 1,768 (3.67%) lncRNAs were commonly identified by both programs in A. squamosa (Fig. 4A).
SSRs are easily detected by the standard PCR technology, representing one of the most used molecular markers (Dossa et al., 2017). Here, we characterized the SSRs in A. squamosa transcriptome. In total, we annotated 15,061 SSRs contained in 11,574 sequences, representing 24% of the total FLNC transcripts (Tab. S5). Di-and tri-nucleotide SSRs were the most abundant (6,534 and 5,881, respectively), while pentanucleotide SSRs were the least represented (466) (Fig. 4B). Moreover, 1403 SSRs present in compound formation were detected (Tab. S5). Using Primer3 (version 1.1.4), we designed up for three primer pairs to each SSR locus.

RNA-seq and differentially expressed transcripts overgrowth/ripening stages
We further sequenced short reads transcriptome from fruit samples of the two cultivars at the five growth and ripening stages using the Illumina platform. The short reads were used in the polishing process of FL transcripts ( Fig. 2A). Globally, 30 samples were sequenced (three biological replicates for each sample), generating over 276 Gb raw data. After cleaning, 273 Gb data were kept for subsequent analysis, and quality assessment showed a high Q30 score (>93%) (Tab. S6). Statistics of the short reads mapping to the FL transcriptome are presented in Tab. S7. Overall, high mapping ratios, ranging from 86.71 to 90.98 %, were obtained. Gene expression was estimated based on the Reads Per kb per Million reads (RPKM) method, and overall, 47,967 genes were found expressed in all samples. Based on the gene expression profiles, we performed a principal component analysis (PCA) of the 30 samples to assess the quality of the biological replicates and clustering patterns of samples. As shown in Fig. 5A, all biological replicates clustered together, showing high correlations between them. In addition, we observed that samples from the two cultivars grouped together at the different stages except at stage 4 (6 DAH), indicative of profound transcriptome changes between the two cultivars at this specific stage. The PC1 separated the first two stages (30 and 100 DAP) from the latter stages, while the PC2 clearly separated stages 4 and 5 from earlier stages.
In order to detect differentially expressed genes (DEG), pairs of samples were compared by using stage 1 (30 DAP) as a reference for each cultivar. Genes with |log 2 Fold Change| >1 and false discovery rate (FDR) <0.05 were selected as DEGs (Fig. 5B). The numbers of DEGs ranged from 13,842 (AP-1-VS-AP-2) to 31,399 (AP-1-VS-AP-5), reflecting that as the fruit maturation process goes on, more gene expression levels are altered. In particular, the majority of DEGs were downregulated, and as suspected from the PCA result, conspicuous transcriptome changes occurred at stages 4 and 5 in both cultivars.
KEGG annotation of the DEGs revealed that most of them are involved in metabolism, particularly in carbohydrate metabolism, energy metabolism, and amino acid metabolism pathways (Fig. S1).
To validate the DEG analysis, we selected ten genes involved in carbohydrate metabolism and performed qRT-PCR  analysis at 30 DAP and 6 DAH (Fig. S2). The GAPDH gene was used as an endogenous control. The regulation patterns of the expression levels of the selected genes were similar to those in the RNA-seq report with a strong Pearson correlation score (R 2 = 0.83). This result confirms well the reliability of the DEG analysis and subsequent interpretation.

Differentially expressed transcripts involved in sugar and starch metabolism
Sugar content in fruit is an important quality criterion; therefore, we studied the genes involved in sugar accumulation in A. squamosa. First, we analyzed the pattern of sugar accumulation in A. squamosa fruit overgrowth and ripening stages in the two cultivars. As shown in Tab. 1, the trend of sugar accumulation was quite similar between the two cultivars. Globally, sugar content increased from the early growth stages to the ripening stages. Glucose and fructose were the dominant sugars, and a striking sugar accumulation occurred at 6 DAH. That indicates that sugar accumulates mainly post-harvest (during ripening) in A. squamosa. To uncover the molecular mechanism of sugar accumulation in A. squamosa, we investigated the changes in gene expression related to sugar metabolism. In total, we detected 257 DEGs belonging to 27 families playing various functions in sugar accumulation. We mapped these DEGs to the sugar metabolism pathway, and the general patterns of regulation of these gene families in late stages (100 DAP, 130 DAP, 6 DAH and 8 DAH) compared to the first stage (30 DAP) are presented in Fig. 6. Genes highlighted in red and blue are those constantly up-regulated and down-regulated, respectively, after 30 DAP. Genes highlighted in gray display both up-regulation and down-regulation patterns (mixed regulation) at specific developmental stages after 30 DAP. Genes in white boxes are those not differentially expressed between developmental stages (Fig. 6).
Collectively, analysis of sugar metabolism-related DEGs revealed that starch degradation, which is triggered during ripening, is the most active channel for soluble sugar accumulation in A. squamosa fruit.

Differentially expressed transcripts involved in sugar transport
Sucrose transport from source tissues to sink tissues is mediated by special genes belonging to the families of sucrose transporters (SUC), sugar will eventually be exported transporters (SWEET), sugar transport protein (STP), etc. In this study, we searched for all DEGs annotated as SUC, SWEET and STP. In total, four, six, and six SUC, SWEET, and STP DEGs, respectively, were FIGURE 6. The differentially expressed genes (DEG) between late stages (100 DAP, 130 DAP, 6 DAH and 8 DAH) compared to the first stage (30 DAP) were mapped to the starch and sucrose metabolism pathway. In total, 257 DEGs belonging to 27 families were identified and the general patterns of regulation of these gene families were highlighted with colors. Genes highlighted in red, blue, white, and gray are those constantly down-, up-, no-and mixed regulated (down-and up-regulated), respectively. identified across the different growth and ripening stages (Tab. 2). SUCs were globally upregulated in late stages as compared to the 30 DAP, indicating that sucrose transport to A. squamosa fruit is continuous. Nonetheless, the increase in SUC gene expression was not very high, meaning this process is not very decisive for sugar accumulation. On the opposite, SWEET and STP genes were more active at 30 DAP and subsequently strongly repressed at later stages. Collectively, our data suggest that sugar transporters play a preponderant role at the early growth stages, but, during ripening, their contribution to sugar accumulation in A. squamosa fruit is very limited.

Discussion
In this study, we generated, for the first time, full-length transcriptome data for A. squamosa. Although the numerous nutritional, medicinal, and industrial applications of this species, the lack of genome sequence and limited transcriptome data (Gupta et al., 2015;Liu et al., 2016) impair genomic research in A. squamosa. Hence, our FL transcriptome provides crucial genomic data to instigate gene expression, gene discovery, cloning, and evolutionary studies. We identified 48,209 high-quality FL non-chimeric (FLNC) transcripts with an average length of 2,076.6 bp, N50 of 2,500 bp (Tab. S2). The longest unigene has a length of 11,414 bp, which could not be identified by short-read sequencer such as Illumina (Hoang et al., 2017). However, the number of unigenes detected in this study was lower than Illumina based RNA-seq (71,948 unigenes), which is understandable since RNA-seq can recover more transcripts than PacBio sequencing due to relatively higher sequencing depth (Rhoads and Au, 2015).
The advantage of full-length transcriptome sequencing is the possibility to uncover the alternative splicing (AS) events and isoforms (Qian et al., 2019), which are major mechanisms for the enhancement of transcriptome diversity (Keren et al., 2010). We observed that only~5% of the total unigenes yielded different isoforms in A. squamosa fruit transcriptome (Tab. S4), which is quite lower than reports in Crocus sativus (33.7%, Qian et al., 2019), Lateolabrax maculatus (28%, Tian et al., 2019) and Gossypium australe (50,83%, Feng et al., 2019). This low percent of genes with isoforms could be explained by the fact that only fruit tissue was used in this study, while other studies combined various tissues. Therefore, we propose that future works should sample various tissues such as flower, leaf, stem, seed, and root in order to have a better insight into the complexity of AS and isoforms in A. squamosa transcriptome and to discover more genes. Intron retention (RI) was found to be the most abundant AS in A. squamosa (Fig. 3), which is common in plants (Marquez et al., 2012;Shen et al., 2014;Feng et al., 2019). We also reported lncRNAs and, more importantly, simple sequence repeat (SSR) that represents a popular marker type for plant genotyping (Dossa et al., 2017). Only a few SSR markers developed in related species have been used for diversity analysis in A. squamosa (Nagori et al., 2018); therefore, the SSR markers along with their primer sequences provided in this study (Tab. S5) will be a great resource for future diversity studies and molecular breeding.
A. squamosa fruit is one of the sweetest fruits containing up to 28% total sugars at the ripening stage (Brandão and Santos, 2016). Sugar content analysis in developing fruit compared to ripening fruit demonstrated that sugars mainly accumulate during the ripening stage with a continuous increase of sugars over the fruit growth and ripening stages (Tab. 1). Sugar accumulation is controlled by both the translocation of sugars and their metabolism in developing and ripening fruits (Yativ et al., 2010). By analyzing genes involved in sugar metabolism, we found that most of the genes were highly active at the early growth stage (30 DAP) but were repressed at late growth and ripening stages. Similar observations were found for sugar transporter related genes and starch synthesis genes (Tab. 2, Figs. 6 and 7). On the opposite, genes involved in the synthesis of sucrose were found up-regulated during ripening, and this correlates with the detection of sucrose in ripe fruit (Tab. 1). Likewise, genes involved in starch degradation were markedly induced during ripening. For example, we observed the up-regulation of sucrose phosphate synthase genes after 30 DAP. It was reported that over-expression of sucrose phosphate synthase genes enhances sucrose content in sugarcane and tomato fruit Anur et al., 2020). We identified several invertase genes downregulated after 30 DAP in both cultivars, indicative of their negative correlation with sugar accumulation. In accordance with our report, Wang et al. (2015) demonstrated that ectopic over-expression of EjVIN from loquat in tobacco significantly decreased sucrose content. In this study, we noticed the high activity of sucrose synthase genes at early developmental stages, while at late stages, the expression levels of sucrose synthase genes were strongly downregulated. The action of sucrose synthase genes on sucrose cleavage was reported to regulate the import and compartmentation of sucrose in the early stage of tomato fruit development (Demnitz-King et al., 1997;D'Aoust et al., 1999). In addition, it has been documented through transgenic experiments that suppression of sucrose synthase gene expression leads to reduced starch accumulation in potato tubers (Zrenner et al., 1995) and carrot taproots (Tang and Sturm, 1999). These results showed the key roles of sucrose synthase genes in sucrose import and starch accumulation in sink tissues. Therefore, we conclude that the high activity of sucrose synthase genes promotes sucrose import and starch accumulation in developing A. squamosa fruits. And during fruit ripening stages, the accumulated starch is degraded, contributing to the fruit sweetness. In support of this conclusion, we observed several alphaamylase, beta-amylase, 4-alpha-glucanotransferase, and isoamylase genes highly induced during A. squamosa fruit ripening, which are known to be involved in sugar degradation in fruit (Purgatto et al., 2001;Radchuk et al., 2009). Concerning sugar transporters, we observed that they play a preponderant role at the early growth stages of A. squamosa fruit. Meteier et al. (2019) found that overexpression of the VvSWEET4 in grapevine promotes sugar Log2 fold change of the sugar transporter genes in Annona squamosa at late growth and ripening stages as compared to early stage (30 DAP).

Gene ID
Symbol AP-2/AP-1 AP-3/AP-1 AP-4/AP-1 AP-5/AP-1 LZ-2/LZ-1 LZ-3/LZ-1 LZ-4/LZ-1 LZ-5/LZ-1 transport. Similarly, ectopic expression of PbSUT2 enhances sucrose content in tomato fruit . Collectively, our data showed that during early fruit development stages in A. squamosa, sucrose is transported and loaded into fruit, and starch is highly accumulated. During late development and ripening stages, starch is degraded to promote sugar accumulation in the fruit. Extensive studies in banana, tomato, and apple also showed a shift from starch synthesis to starch breakdown in ripening fruit leading to the accumulation of soluble sugars, with a significant impact on fruit taste and flavor (Beck and Ziegler, 1989;Nascimento et al., 2000;Mota et al., 1997;Thammawong and Arakawa, 2010;Maria et al., 2016;Cordenunsi-Lysenko et al., 2019). Interestingly, we identified 56 genes involved in starch degradation during ripening, which should be further studied to uncover the transcription factors that regulate their expression levels (Xiao et al., 2018). This information will be cardinal for controlling the level of starch degradation during A. squamosa fruit ripening, since it has been shown that genotypic variation in sugar content is mainly associated with the ability to breakdown accumulated starch (Cordenunsi-Lysenko et al., 2019).

Conclusions
We generated a high-quality, full-length transcriptome assembly using PacBio Iso-seq, representing the most complete genomic data available in A. squamosa. We further provided extensive short reads data and elucidated the molecular mechanisms of sugar accumulation in developing and ripening fruits. Our work will facilitate future functional and evolutionary studies as well as molecular breeding mainly aiming at controlling sugar content in A. squamosa fruit. Figure S1. KEGG annotation of the differentially expressed genes identified between pairs of samples. AP = African pride, LZ = Gefner; Stages 1-5 represent 30 DAP, 100 DAP, 130 DAP (Harvest day), 6 DAH and 8 DAH. DAP = day after pollination and DAH = day after harvest. Figure S2. qRT-PCR validation of ten selected genes. (A-J) Gene expression analysis based on qRT-PCR. The x-axis represents the ten genes while the y-axis represents the relative expression of each gene. The bars show standard deviation. AP = African pride, LZ = Gefner; Stages 1 and 4 represent 30 DAP and 6 DAH. (K) Pearson correlation analysis between RNA-seq and qRT-PCR expression profiles. DAP = day after pollination and DAH = day after harvest.

Supplementary Figures
Supplementary Tables   Table S1. Primer sequences of genes used for qRT-PCR validation of differentially expressed genes Table S2. PacBio sequencing data statistics Table S3. List of genes annotated as transcription factors Table S4. Alternative events and isoforms identified in Annona squamosa full-length transcriptome  . The funder has no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.