Identification and functional analysis of SWEET gene family in Averrhoa carambola L. fruits during ripening

Sugar Will Eventually be Exported Transporters (SWEETs), a type of sugar efflux transporters, have been extensively researched upon due to their role in phloem loading for distant sugar transport, fruit development, and stress regulation, etc. Several plant species are known to possess the SWEET genes; however, little is known about their presence in Averrhoa Carambola L. (Oxalidaceae), an evergreen fruit crop (star fruit) in tropical and subtropical regions of Southeast Asia. In this study, we established an Averrhoa Carambola L. unigenes library from fruits of ‘XianMiyangtao’ (XM) by RNA sequencing (RNA-seq). A total of 99,319 unigenes, each longer than 200 bp with a total length was 72.00 Mb, were identified. A total of 51,642 unigenes (52.00%) were annotated. Additionally, 10 AcSWEET genes from the Averrhoa Carambola L. unigenes library were identified and classified, followed by a comprehensive analysis of their structures and conserved motif compositions, and evolutionary relationships. Moreover, the expression patterns of AcSWEETs in ‘XM’ cultivars during fruit ripening were confirmed using quantitative real-time PCR (qRT-PCR), combined with the soluble sugar and titratable acids content during ripening, showed that AcSWEET2a/2b and AcSWEET16b might participate in sugar transport during fruit ripening. This work presents a general profile of the AcSWEET gene family in Averrhoa Carambola L., which can be used to perform further studies on elucidating the functional roles of AcSWEET genes.


INTRODUCTION
In higher plants, the end-products of photosynthesis include glucose, fructose, and sucrose. These sugars provide a carbon skeleton for large molecules, such as proteins and nucleic acids, are used as osmolytes and signaling molecules, and are temporarily stored in plant vacuoles and transported to different organs as nutrients for cellular growth and development (Ayre, 2011;Wind, Smeekens & Hanson, 2010;Braun, 2012). Currently, sucrose transporters (SUTs), monosaccharide transporters (MSTs), and Sugar Will Eventually be Exported Transporters (SWEETs) are commonly used for sugar translocation (Chen et al., 2015a;Eom et al., 2015).
The SWEET gene family was first discovered in Arabidopsis (Chen et al., 2010). They are known to play diverse physiological roles, such as phloem loading for distant sugar transport (Braun, 2012;Zhang et al., 2019), seed filling , nectar secretion (Lin et al., 2014), fruit development (Chong et al., 2014;Zheng et al., 2014), pollen nutrition , plant-pathogen interaction (Chen, 2014;Zhou et al., 2015), and stress regulation (Chandran, 2015;Yue et al., 2015). They are characterized by seven transmembrane helices, composed of two 3-transmembrane-helix MtN3 motifs, which are tandemly connected by a link transmembrane helix (Xuan et al., 2013). They mainly transport hexoses or sucrose across the plasma membranes, thus acting as sugar efflux transporters (Chen et al., 2010;Chen et al., 2012). Since appropriate carbohydrate distribution is vital for crop yield and quality; thus, it is necessary to evaluate the functional role of SWEET genes in plants, especially in fruit-bearing trees.
In plants, SWEET genes have been classified into the following four subclades depending on their functional properties: clade I and II proteins transport hexose, clade III proteins mainly transport sucrose and minor quantities of fructose, clade IV proteins transport both monosaccharides and disaccharides (Chen et al., 2010;Chong et al., 2014;Wei et al., 2014;Eom et al., 2015). The SWEET genes belonging to a specific clade may perform similar functions, although they might be different in plants. Previous studies have identified the SWEET sugar transporters in several plants, including Arabidopsis (Chen et al., 2010), rice (Yuanet al., 2013), and several fruit-bearing trees, such as orange (Zheng et al., 2014), grape (Chong et al., 2014, apple (Wei et al., 2014), pear (Li et al., 2017, loquat (Wu et al., 2017), litchi (Xie et al., 2019), and pineapple (Guo et al., 2018).
Averrhoa Carambola L. (family: Oxalidaceae) is an evergreen plant native to tropical and subtropical regions of Southeast Asia and has been grown in China for more than 2000 years. Its ripe fruit is sweet, juicy, and slightly sour, with a pleasant aroma, and has traditional medical efficacy (Hu et al., 2005). The composition of sugars and organic acids in the fruit of Carambola shows significant variation depending on cultivars and the fruit development stages, but the mechanism of sugar transport in Carambola remains to be elucidated.
Here, we developed the Carambola unigenes library from 'XianMiyangtao' ('XM') fruits, which are a sweet carambola variety. The fruit has high sugar content, good flavor, low acidity and can be eaten fresh. It is one of the main carambola varieties currently grown in China. We analyzed protein structure and function, as well as phylogenetic development of ten AcSWEET genes that were identified in A. Carambola. Additionally, we studied the expression patterns of AcSWEET genes during fruit ripening. Based on the results, we constructed a general profile for the AcSWEET family, which helped in understanding the complex mechanism of sugar translocation in this fruit-bearing tree.

Plant materials
We collected all fruit samples from the Carambola 'XM' trees, growing in Yunxiao county of the Fujian province of China (latitude: 24.0168 • ; longitude: 117.2753 • ). The fruits samples were classified into three categories, based on the ripening stage at harvest: young fruits (XM1, 40 d), developing fruits (XM2, 60 d), and ripe fruits (XM3, 80 d) after flowering.
The samples were selected based on their uniform feature with no damage. For each category, we harvested 30 fruits, which were used for the determination of sugar and acid content, RNA-seq, and qRT-PCR. They were randomly assigned to three groups containing ten fruits each. The pulp of these samples was collected, cleaned, cut, flash-frozen in liquid nitrogen, and stored at −80 • C.

Determination of soluble sugars content and titratable acidity (TA)
The sulfuric acid-anthrone colorimetric method was used for determining the glucose, fructose and sucrose content, following the method of Hu et al. (2005). Titratable acidity (TA) was measured using the NaOH solution, following the method of Zhu et al. (2013). Each experiment was performed in replicates, and data were represented as mean ± SD.

RNA extraction, sequencing and bioinformatic analysis
Total RNA was extracted following the method of Shan et al. (2008), followed by treatment with DNase (Takara, China). Next, we performed agarose gel electrophoresis (1%) was performed to assess RNA degradation and contamination. A NanoPhotometer R spectrophotometer was used to analyze the purity of the extracted RNA (IMPLEN, USA). Next, a Qubit R RNA Assay Kit in a Qubit R 2.0 Fluorometer was used to determine the RNA concentration (Life Technologies, Carlsbad, CA, USA). Finally, the RNA Nano 6000 Assay Kit using the Bioanalyzer 2100 system was used to assess the integrity of the RNA sample (Agilent Technologies, USA).
An Illumina Hiseq 4000 platform was used to generate 150 bp/200 bp paired-end reads (Biomarker Technologies Co., China). Next, using Trinity, this data was processed and assembled, resulting in the formation of the Carambola unigenes library. BLAST database (v2.2.26) was used to compare the unigenes sequences with Nr, a manually annotated and reviewed protein sequence database (Swiss-Prot), Clusters of Orthologous Groups (COG), gene ontology (GO), EuKaryotic Ortholog Groups (KOG), protein family (Pfam), evolutionary genealogy of genes: non-supervised orthologous groups of proteins (eggNOG), and Kyoto Encyclopedia of Genes and Genomes (KEGG) using E value ≤ E −5 . The gene expression was quantified as Fragments per kilobase per million fragments (FPKM). Transcriptome data has been submitted to National Center for Biotechnology Information (NCBI) database (SRA accession: PRJNA647672), and this Transcriptome Shotgun Assembly project has been deposited at DDBJ/EMBL/GenBank under the accession GJAU01000000. The version described in this paper is the first version, GJAU00000000.1.

Identification of SWEET gene family members
We used the Hidden Markov Model (HMM) analysis and a Simple Modular Architecture Research Tool (SMART). The HMM profile was downloaded from Pfam protein family database (http://pfam.xfam.org/) to obtain the Pfam number of protein sequence. We used the HMM search command in the HMMER package, with an e-value ≤1e −3 to identify the protein domain. The results of the HMMER sequence alignment were screened to remove protein sequences that were 45% shorter than the length of the HMM model domain, while retaining the longest protein sequence in the variable shear. All nonredundant protein sequences were retrieved and further analyzed with SMART (a Simple Modular

Protein domain prediction, similarity analysis and phylogenetic analysis
HMMER-profile hidden Markov models were used for biological sequence analysis (http://hmmer.org/) and to predict protein domains of sweet genes. The results of the alignment were output by Clustal 2.1. Transmembrane (TM) structures were detected by using TMHMM (http://www.cbs.dtu.dk/services/TMHMM/) and ConPred II (Arai et al., 2004). Protparam tool was used to predict the molecular weight and isoelectric point and ClustalW 2 was used to construct the phylogenetic tree of SWEET gene family.

Quantitative real-time PCR analysis (qRT-PCR)
The expression profile for each member of the AcSWEET gene family was determined through qRT-PCR using the 2 − Ct method with glyceraldehyde-3-phosphate dehydrogenase (AcGAPDH) as the reference gene. We used the Data Processing System v3.01 to calculate the least significant differences (a = 0.05) for the mean separations. All experiments were performed in triplicates. File S1 provides a list of the gene primer sequences for qRT-PCR.

Statistical analysis
All statistical analyses were performed using SPSS Statistics software 19.0 (SPSS Inc., USA) for Windows, and the values were represented as mean ± SE.

Sequencing production statistics and de novo assembly
The total RNA of XM1, XM2, and XM3 fruits (n = 3) was used to construct nine cDNA libraries. Next, we performed sequencing quality control on the raw reads that were obtained from nine sequencing libraries, which resulted in 78.53 Gb of clean data. The results showed that the rate of accordance with Q30 of each sample was greater than 94%, and the rate of GC was in the range of 44.8-46%. Thus the sequencing data obtained in this study were of good quality.

Functional annotation
BLASTX was used to annotate all the 99,319 unigenes sequences against the Nr, GO, Swiss-Prot, KOG, COG, Pfam, eggNOG, and KEGG for comprehensive analysis. The annotation of 51,642 unigenes (52.00%) was performed with a significance threshold of  (Fig. 1B).

GO classification
The gene ontology (GO) database is a universal method for the functional classification of genes. It is used to comprehensively understand the functional characteristics of biologically expressed genes in different organisms. Here, the GO database was used to categorize standardized gene functions of the expressed Carambola unigenes. The results of the Blast2GO and WEGO software analysis revealed that 24,794 unigenes out of 41,386, which had been previously annotated to the Nr database, were classified into three main GO categories (molecular function, biological process, and cellular component) and 55 subcategories (File S3). The major subcategories among the molecular function were 'Catalytic activity', which was probably related to the continuous cell proliferation during fruit development and the vigorous metabolic activity in fruit organs.

Differentially expressed genes
The RNA-Seq analysis revealed that the numbers and expression profiles of DEGs differed at three stages. These DEGs were clustered into seven distinct expression patterns, among which group 5 were all upregulated from the stage 1 to stage 3 (Fig. 2). Group 5 contained several genes related to sugar metabolism, including sucrose metabolic process, sucrosephosphate synthase activity. Thus group 5 were closely related to fruit ripening.

Identification of SWEET genes in Carambola
The putative Carambola SWEETs were identified by protein Blast of the PFAM motif PF03083 against Carambola unigenes library and from the annotation of the Carambola unigenes library. As shown in File S4, we obtained 10 SWEET genes, each containing MtN3/saliva domains (PFAM motif PF03083) in A. Carambola. Simultaneously, from the Nr annotation data of unigenes, we obtained 12 SWEET genes, of which 10 SWEET genes were consistent with the above identification. The other two SWEET genes were unigene c94236.graph_c0 and c89229.graph_c0 and their annotation references were SWEET5 and SWEET13, respectively. Due to extremely low expression of these two SWEET genes in Carambola fruit along with the fact that they were derived only from the NR annotation information, they were not identified as SWEET genes in this study.
The SWEET genes in Carambola fruit, hereafter referred to as AcSWEETs, were labeled as AcSWEET1a, AcSWEET1b, AcSWEET2a, AcSWEET2b, AcSWEET3, AcSWEET11, AcSWEET15, AcSWEET16a, AcSWEET16b, and AcSWEET17, based on their percentage of similarity to Arabidopsis, Oryza, and Vitis SWEET genes. MAFFT analysis showed that the sequences of these ten SWEET proteins were different from each other (File S5). Among these SWEET proteins, AcSWEET15 was the longest, i.e., contained 305 amino acids, while AcSWEET2a contained only 183 amino acids (File S4).

Protein domain prediction of the SWEET proteins in Carambola fruit
Transmembrane (TM) sequences were found to be partially conserved, based on the results of multiple sequence alignment of the ten AcSWEET protein sequences. The results of protein annotation revealed that all ten proteins contained the MtN3/slv motif, which was conservative in plant SWEETs. Also, AcSWEET1a, AcSWEET2a, AcSWEET2b, AcSWEET3, AcSWEET11, AcSWEET15, AcSWEET16a, and AcSWEET17 contained two MtN3/slv conserved domains, while AcSWEET1b, and AcSWEET16b had only one MtN3/slv domain (Fig. 3), as the AcSWEET sequence were all from Carambola unigenes, not full-length genes, consistent with the results in loquat. Using TMHMM and ConPred II, we found that all the Carambola SWEET proteins contained TM structures. Seven TMs were detected in AcSWEET2a, AcSWEET3, AcSWEET11, AcSWEET15, AcSWEET16a, and AcSWEET17. However, only six, two, five, and four TMs were identified in AcSWEET1a, AcSWEET1b, AcSWEET2b, and AcSWEET16b, respectively (Fig. 3).

Phylogenetic analysis of putative SWEET proteins in Carambola
A neighbor-joining phylogenetic tree was constructed using MEGA 7 software to explore the evolutionary relationships between SWEET proteins from other plant species and AcSWEET identified in this study. We obtained a total of 79 amino acid sequences of SWEET proteins, including 10 AcSWEET sequences, 17 AtSWEET sequences, 21 OsSWEET sequences, 15 VvSWEET sequences, and 16 LcSWEET sequences (File S4). Based on the results, the SWEET proteins were classified into four different clades (Fig. 4), with clade I containing AcSWEET1a/1b/2a/2b/3, clade III contained AcSWEET11/15, clade IV contained AcSWEET16a/16b/17, but none of the AcSWEET proteins was found in clade II. AcSWEET1a, AcSWEET2b, and AcSWEET17 showed high sequence similarity with VvSWEET1, VvSWEET2b, and VvSWEET17a, respectively. AcSWEET2a and AcSWEET16b had high sequence similarity with AtSWEET2 and OsSWEET16, respectively.

Differential expression of the SWEET genes in the fruit of Carambola
We used RNA-Seq and qRT-PCR to study the expression of the ten SWEET genes in Carambola fruits at three distinct ripening stages: young fruits (XM1, 40 d), developing fruits (XM2, 60 d), and ripe fruits (XM3, 80 d) post-flowering. We found that ten AcSWEET genes were expressed differently in three different developmental stages, and AcSWEET1a/1b/2a/2b/16b/17 had higher expression during fruit development compared with AcSWEET3/11/15/16a, which were expressed only at certain stages. Among the six highly expressed AcSWEET genes, the expression of AcSWEET16b at three different stages of fruit development was much higher than the other five AcSWEET genes, indicating that AcSWEET16b played a greater role in the entire fruit development process. The expression of AcSWEET2a/2b decreased during fruit development, indicating that it might play a role mainly during the early and middle stages of fruit ripening (Table 3).
Next, we verified the expression of 6 AcSWEETs (AcSWEET1a/1b/2a/2b/16b/17 ) in Carambola using qRT-PCR to validate the accuracy of the RNA-Seq results, which was consistent with our results (Fig. 5 and File S6).

Soluble sugars content and organic acids in 'XM' at distinct ripening stages
As the fruit matures, 'XM' carambola changes its color from green to yellow (Fig. 6). We found that in Carambola fruits, sugar initially accumulates in the form of glucose and fructose, and later as sucrose, and observed a gradual increase in the levels of soluble sugars with fruits ripening. The soluble sugars in XM3 and XM2 were 71.33 mg g −1 FW and 45.27 mg g −1 FW, having 3.26 times and 2.07 times more soluble sugars compared with XM1, respectively (Fig. 7). The content of organic acids during 'XM' fruits development followed an exact opposite trend to soluble sugars, i.e., as the fruit matured, the organic acid content gradually decreased. The organic acid content of XM3 fruit was 0.22%, only 20.75% of XM1 fruit (Fig. 7).

Expression patterns of AcSWEET2a/2b and AcSWEET16b during fruit development
A desirable trait for Carambola fruit is a suitable ratio of soluble sugars to organic acids (Hu et al., 2005). We performed a systematic analysis of the expression of AcSWEET genes at three ripening stages: young fruits (XM1), developing fruits (XM2), and ripened fruits (XM3) at 40 d, 60 d, and 80 d post-flowering.
We found that the AcSWEET16b gene was highly expressed in all three stages of fruit development, and its expression was much higher compared with the other nine AcSWEET genes. Also, AcSWEET16b belonged to the clade IV category, which implied that it could transport both hexose and sucrose (Klemens et al., 2013;Xie et al., 2019), indicating that AcSWEET16b played an important fundamental role in Carambola fruit development.
The AcSWEET2a/2b genes belonged to the clade I category. In Arabidopsis, the SWEET2 gene was mainly located in the vacuole membrane and transported hexose (Chen et al., 2015b). Interestingly, AcSWEET2a/2b showed the highest expression at the young fruit stage, which decreased as the fruit matured. This pattern was consistent with the expression pattern of MdSWEET2.4 in apple (Wei et al., 2014) and SlSWEET2a in tomato (Feng et al., 2015). Also, it is known that in Carambola fruits, sugar initially accumulates in the form of glucose and fructose, and later as sucrose (Hu et al., 2005). Thus, it was inferred that AcSWEET2a/2b and AcSWEET16b synergistically participated in the transportation of sugar during the Carambola fruit development. Since sugar accumulation mainly occurred in the form of glucose and fructose during the initial and middle stages of fruit development, it exceeded the transport capacity of AcSWEET16b, which in turn, resulted in high expression of AcSWEET2a/2b. In the later stage of fruit development, sugar accumulation mainly occurred in the form of sucrose, which was easily accommodated by AcSWEET16b, leading to reduced demand for AcSWEET2a/2b, and thus their expression gradually decreased.

AcSWEET gene family in Carambola
Recently, SWEET genes were analyzed in over 20 plant species, which indicated that they were involved in fruit development and ripening (Chen, 2014;Eom et al., 2015). However, there are no reports that analyzed the SWEET gene family in Carambola fruit.
In this study, we identified and characterized SWEET genes in Carambola, comparing it with its homologous genes in Arabidopsis and grapevine, and transcriptome annotation information. Previous studies have reported that in higher plants, the number of reported SWEET genes varied from 7 to 108 and were found 17 in Arabidopsis (Chen et al., 2010), 17 in grapevine (Chong et al., 2014), 21 in rice (Yuan & Wang, 2013), 16 in litchi (Xie et al., 2019), 7 in loquat (Wu et al., 2017), and 108 in wheat (Gautam et al., 2019). In this study, we isolated 10 AcSWEET genes from the Carambola fruit, containing 229 aa to 300 aa, consistent with studies in other plants, such as 183-305 aa in loquat (Wu et al., 2017), 229-300 aa in litchi (Xie et al., 2019), 233-308 aa in tomato (Feng et al., 2015), 171-333 aa in banana (Miao et al., 2017), and 215-340 aa in apple (Zhen et al., 2018). The phylogenetic analysis of the SWEET genes of Arabidopsis, Oriza, Vitis, and litchi, resulted in their division into four clades. However, based on our results, AcSWEET genes were classified into three clades: clade I, III, and IV, containing 5, 2, and 3 AcSWEET genes, respectively, but we did not find any AcSWEET genes in clade II. It could be attributed to the fact that the samples in this study were obtained only from fruits, not all organs of Carambola, and the AcSWEET genes were derived from the transcriptome database, not the genome.

Transcriptome sequencing enriched Carambola sequence information
Ripe fruit of Carambola is known to be sweet, juicy, and slightly sour. The ratio of soluble sugar to organic acid has been shown to have a significant effect on the flavor of the carambola fruit (Hu et al., 2005). However, there is limited genomic data available for Carambola, which hinders the study of the molecular mechanisms of sugar and organic acid transport in Carambola fruits. Here, the fruit of 'XM', a sweet carambola variety, had high sugar content, good flavor, low acidity, and was analyzed by RNA-seq to explore the mechanism responsible for fruit flavor during fruit ripening. Based on the sequencing results, 99,319 unigenes were generated, which were identical to the 119,701 unigenes of Chinese bayberry (Lin, Zhong & Zhang, 2019) and 82,036 unigenes of litchi (Lu et al., 2014). 51,642 (52.00%) unigenes sequences were annotated, which enriched the Carambola sequence database as well as provided basic data for subsequent research. However, as Carambola was a niche fruit, there was a lack of bioinformatics data on this fruit. As a result, 48.00% of unigenes sequences were not annotated, indicating that about half of the sequences had no apparent homologs, some of which were likely genes with novel functions. Thus, the Carambola unigenes library established in this study accumulated data that could be used for follow-up research on gene mining, cloning, and

CONCLUSIONS
This work is the first report to establish the Carambola unigenes library, containing 99,319 unigenes with a minimum of 200 bp. Amongst these, a total of 51,642 unigenes (52.00%) were annotated. Additionally, we isolated and characterized ten AcSWEET genes and studies their structures, conserved motifs, and evolutionary relationships. The expression patterns of AcSWEETs during fruit ripening, combined with the soluble sugars content and the ratio of titratable acid, indicated that AcSWEET2a/2b and AcSWEET16b probably participated in sugar transport during fruit ripening. Thus, this study laid the foundation for further research on the functional role of the AcSWEET genes in the development of Carambola fruits. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.