Transcriptome Sequencing of Agave amaniensis Reveals Shoot-Related Expression Patterns of Expansin A Genes in Agave

Agave species are widely planted for fiber production. However, the molecular basis of agave fiber development has not been well understood. In this study, we performed a transcriptomic analysis in A. amaniensi, a well-known variety with high-quality fiber production. Approximately 43.87 million clean reads were obtained using Illumina sequencing. The de novo assembly produced 66,746 unigrams, 54% of which were annotated in a public database. In the Nr database, 21,490 unigenes of A. amaniensis were shown to be most closely related to Asparagus officinalis. Nine expansin A orthologs with full coding regions were obtained, which were named EXP1a, EXP1b, EXP2, EXP3, EXP4a, EXP4b, EXP11, EXP12, and EXP13. The maximum likelihood phylogenetic tree revealed the species-specific expansion of expansin genes in Arabidopsis, rice and agave. The expression analysis suggested the negative correlation between the expression of expansin genes and the leaf growth rate, except AhEXP11. Moreover, expansin genes were differentially affected by abiotic and biotic stresses. Notably, AhEXP2 expression level was highly upgraded after the infection of Phytophthora nicotiana. Nutrient deficiency also influent expansin genes expression. Together, our research will benefit future studies related to fiber development, disease resistance and nutrient usage in agave.


Introduction
Agave, a perennial tropical hard leaf fiber crop, has been extensively cultivated as an economic crop for fiber production, food, and medicinal compounds in the tropic areas. It is also well known for its adaptation to the abiotic stresses such as xeric environments [1]. Its diverse usages and unique adaptation to environmental stresses make Agave an excellent candidate as a model crassulacean acid metabolism (CAM) crop [2]. It is composed of approximately 210 species around the world [3]. Among these, A. amaniensis Trel. & W. Nowell (1933) are famous for fibers production. A single plant can grow about 20 leaves, which are large, up to 150-200 cm long, and covered with a thick coating of waxy, pinkishblue, thin fibers. A. amaniensis also has more than 2000 fiber bundles per leaf, twice as many as the common agave. Therefore, they are used to provide males to generate hybrid cultivars, such as H11648 ((A. amaniensis Trel. and Nowell × A. angustifolia Haw.) × A. amaniensis). H11648 with high fiber production and improved fiber-related traits becomes the main cultivar for sisal fiber production, especially in the tropical areas of Brazil, China and Africa [4,5].
To date, several studies related to agave fiber properties and some genes with fibrillar developmental correlations have been identified. For example, by comparing the transcriptomes with domesticated and wild agave species, a series of candidate genes regulating fructan, fiber, and stress response-related traits were identified in the varieties A. tequilana, A. sisalana, and A. deserti. Moreover, 12 cellulose synthase genes (CesA) in Asparagus genome and 38 CesA sequences from A. H11648, A. americana, A. deserti and A. tequilana were further identified [6]. Cinnamyl Alcohol Dehydrogenase (CAD) genes were also characterized in A. hybrid H11648, A. deserti, A. tequilana, A. americana and A. angustifolia, respectively. The expression analysis indicated that CAD1, CAD2, CAD4 and CAD6 were conservatively expressed, which may provide candidate targets for manipulation to improve the lignin properties [7]. However, the huge genomes and the long growth period of agave make it difficult to use genetic strategies to study the molecular mechanisms of fiber traits [8]. Therefore, the mechanisms of fiber development of agave are still not well understood. Due to the rapid development of transcriptome sequencing technology, we can use functional genomics to completely reveal genes related to fiber traits of agave species [9].
Plant cell walls are composed of cellulose, hemicellulose, lignin, and various other components [10]. The complex reticular structure determines the shape and function of cells along with growth and development, as well as their response to external environmental stimuli. During plant growth, cell shape and function are accompanied by a period of metamorphosis in which expansins play an important role [11]. Expansins are known to be important for non-enzymatic protein-wall loosening activity and to be involved in cell expansion and other developmental events during which cell wall modification occurs [12]. The expansin family contains a large amount of genes, and proteins encoded by this family are short: typically 225-300 amino acid residues in length. Generally, expansin family proteins can be divided into four subfamilies: α-expansin subfamily (generally named "EXPA"), β-expansin subfamily (named with "EXPB"), expansin-like A subfamily ("EXLA"), and expansin-like B subfamily ("EXLB") [13], where dicotyledonous plants (Arabidopsis thaliana, poplar, grape, etc.) have more α-subfamily genes, while monocotyledonous plants (rice, wheat) have more β-subfamily expansin genes. In addition to its roles in cell wall metabolism such as cell elongation and expansion for regulating fiber elongation and fruit ripening [14,15], expansin is associated with multiple biotic and abiotic stress tolerance and plant-fungal interaction processes [16,17].Thus, we sequenced and assembled the leaf transcriptome of A. amaniensis based on Illumina sequencing to gain insight into gene expression levels related to fiber traits, and further performed a comparative analysis of expansin genes in six agave species. Together, these results provide new insights into the molecular underpinnings to study lignin traits in agave.

Identification and Cloning of Expansin A Genes in Agave Species
In order to identify expansin A genes in agave species, expansin A proteins from Arabidopsis (26) and rice (33) were first selected to search orthologs in the Asparagus genome. In this way, 17 expansin genes were generated (Table S2). After these genes were combined, the 76 proteins were used to identify orthologs in the transcriptome datasets of six agave species, including A. amaniensis, A. deserti, A. tequilana, A. americana, A. angustifolia and A. H11648. Thirty orthologs with full coding regions were obtained and named as EXP1a, EXP1b, EXP2, EXP3, EXP4a, EXP4b, EXP11, EXP12, and EXP13 based on the sequence similarity. These full-length EXP genes were further used to amplify the gaps or genes in agave species for each group, which finally generated a total of 54 expansin genes (Table S3. Each agave species has nine genes).

Phylogeny of Expansin Genes
The 130 expansin A genes (mentioned in Section 2.2) from Arabidopsis, rice, asparagus and six agave species were selected and used for phylogenetic analysis. After phylogenetic analysis, these selected genes were clustered into five groups ( Figure 3). Typically, the expansin genes of the six agave species were grouped together, with equal numbers present in each group. Each agave species had two genes in Group I, one gene in Group II, three genes in Group III, one gene in Group IV, and two genes in Group V, the numbers of which were much smaller than those in Arabidopsis. Because of that, Arabidopsis contained five, two, five, two, and twelve expansin genes in Group I-V, respectively. In addition, more agave sequences were present in Group III, while more rice sequences were present in Groups II and IV, with the most occurring in Group IV.

Expression Patterns of Expansin A Genes in A. H11648
In order to detect expression patterns of expansin A genes, A. H11648 was selected for further Quantitative Real-Time PCR (qRT-PCR) analysis. Expression patterns were estimated at different leaf developmental stages ( Figure 4). The result revealed that these nine genes had different expression patterns. In shoot, all nine expansin genes had high expression levels, with only AhEXP11 being slightly lower. In unexpanded leaf, AhEXP11, AhEXP1b and AhEXP13 had high expression levels. AhEXP11 and AhEXP13 also had high expression levels in expanded leaf. In other words, the expression levels of AhEXP11 and AhEXP13 were high in the three leaf developmental stages. The difference was that the expression level of AhEXP11 increased with development, whereas AhEXP13 decreased but still maintained a high level. The expression patterns of AhEXP1a and AhEXP4a were similar, with high expression levels during the shooting stages but a drop in the unexpanded leaf and expanded leaf. In summary, there was a downward trend in expression of the agave expansin genes except for AhEXP11. The most important thing to mention was that the expression of the AhEXP1a and AhEXP4a decreased significantly with the leaf growth.

Expression of Expansin A Genes under Abiotic and Biotic Stresses
A. H11648 is often infected by pathogenic microorganisms during its growth and development, while A. H11648 is more tolerant to Cu and Pb stresses. Therefore, Cu and Pb stresses as abiotic stresses and one biotic stress infected with P. nicotianae Breda were evaluated to assess the expression of the expansin genes in A. H11684 leaves, respectively. Seven genes were expressed differentially under one of these stresses, i.e., AhEXP4b and AhEXP12 under Cu stress, AhEXP1b, AhEXP2, AhEXP3, AhEXP4a, AhEXP4b and AhEXP12 under Pb stress. The genes expression level of AhEXP1a, AhEXP1b, AhEXP2, AhEXP12 and AhEXP13 were highly upgraded after infected with P. nicotiana, especially AhEXP2. The expression levels of genes AhEXP11 and AhEXP13 did not change much under control, abiotic and biotic stresses stress, and both were at very high levels ( Figure 5).

Expression Patterns of Expansin A Genes in A. H11648
In order to detect expression patterns of expansin A genes, A. H11648 was selected further Quantitative Real-Time PCR (qRT-PCR) analysis. Expression patterns were e mated at different leaf developmental stages ( Figure 4). The result revealed that these n genes had different expression patterns. In shoot, all nine expansin genes had high expr sion levels, with only AhEXP11 being slightly lower. In unexpanded leaf, AhEXP AhEXP1b and AhEXP13 had high expression levels. AhEXP11 and AhEXP13 also had h expression levels in expanded leaf. In other words, the expression levels of AhEXP11 a AhEXP13 were high in the three leaf developmental stages. The difference was that expression level of AhEXP11 increased with development, whereas AhEXP13 decreas but still maintained a high level. The expression patterns of AhEXP1a and AhEXP4a w similar, with high expression levels during the shooting stages but a drop in the un panded leaf and expanded leaf. In summary, there was a downward trend in express

Expression of Expansin A Genes under Nutrient Deficiency
Further nutrient deficiency treatments were carried out to study the expression profiling of expansin genes. The full Hoagland nutrient solution was used as control (F). Hoagland nutrient solutions without nitrogen (N-), phosphorus (P-), potassium (K-) and water (W) control were used to test the impact of nutritional factors. The results indicated that AhEXP1b, AhEXP2, AhEXP3, AhEXP4a and AhEXP4b were sensitive to nutrient nitrogen, while AhEXP1a, AhEXP11 and AhEXP13 were less sensitive ( Figure 6). Most expansin genes were less sensitive to nutrient phosphorus, including AhEXP1b, AhEXP2, AhEXP11, AhEXP4a, AhEXP4b, AhEXP11 and AhEXP13, but the expression of AhEXP1a was decreased. The expressions of AhEXP4a were upregulated under K-. Additionally, AhEXP1a, AhEXP2 and AhEXP4b showed upregulated expressions in W, but expression of AhEXP12 was decreased.

Expression of Expansin A Genes under Abiotic and Biotic Stresses.
A. H11648 is often infected by pathogenic microorganisms during its growth and development, while A. H11648 is more tolerant to Cu and Pb stresses. Therefore, Cu and Pb stresses as abiotic stresses and one biotic stress infected with P. nicotianae Breda were evaluated to assess the expression of the expansin genes in A. H11684 leaves, respectively. Seven genes were expressed differentially under one of these stresses, i.e., AhEXP4b and AhEXP12 under Cu stress, AhEXP1b, AhEXP2, AhEXP3, AhEXP4a, AhEXP4b and AhEXP12 under Pb stress. The genes expression level of AhEXP1a, AhEXP1b, AhEXP2, AhEXP12 and AhEXP13 were highly upgraded after infected with P. nicotiana, especially AhEXP2. The expression levels of genes AhEXP11 and AhEXP13 did not change much under control, abiotic and biotic stresses stress, and both were at very high levels ( Figure  5).

Expression of Expansin A Genes under Nutrient Deficiency
Further nutrient deficiency treatments were carried out to study the expression pro filing of expansin genes. The full Hoagland nutrient solution was used as control (F). Hoa gland nutrient solutions without nitrogen (N-), phosphorus (P-), potassium (K-) and wate (W) control were used to test the impact of nutritional factors. The results indicated tha while AhEXP1a, AhEXP11 and AhEXP13 were less sensitive ( Figure 6). Most expansi genes were less sensitive to nutrient phosphorus, including AhEXP1b, AhEXP2, AhEXP11 AhEXP4a, AhEXP4b, AhEXP11 and AhEXP13, but the expression of AhEXP1a was de creased. The expressions of AhEXP4a were upregulated under K-. Additionally, AhEXP1a AhEXP2 and AhEXP4b showed upregulated expressions in W, but expression of AhEXP1 was decreased.

Characterization of A. amaniensis Transcriptome
Over the past decade, RNA sequencing (RNA-seq) has become an indispensable tool for transcriptome-wide analysis of differential gene expression and differential splicing of mRNAs [18]. Compared to DNA microarray-based methods, RNA-Seq offers less background noise and a greater dynamic range for detection [19]. Most importantly, RNA-Seq directly reveals sequence identity, which is crucial for analysis of unknown genes and novel transcript isoforms [20]. Because of the large genomes of agave, transcriptome analysis becomes an efficient method for gene mining. However, the expressed sequence tags (EST) sequences of agave species were limited and rare in the NCBI database, which influences the understanding of the sisal genome, transcriptome, and the agave fiber development [5]. Although transcriptomes of A. H11648 [5], A. angustifolia [7], A. tequilana, A. sisalana, and A. deserti [6] were revealed by Illumina Sequencing, and a series of candidate genes were predicted to affect fructans, fiber traits and stress response-related traits, genetic sequence, and transcriptome information about A. amaniensis have not been submitted to GenBank, leaving a knowledge gap in molecular basis. Therefore, we assembled the transcriptome of A. amaniensis with 66,746 unigenes. According to this new assembly, the unigenes number was much the same as that in A. angustifolia (66,314) but far less than A. H11648 (148,046). Because transcriptome sequencing requires high-quality samples, the discrepancy of identified unigenes in agave varieties might be caused by differences in sample collection, sample quality, and transcriptome assembly methods [21]. Together, 18% of the unigenes had sequence lengths over 1000 bp, and 15% of the sequences were in the 500-1000 bp range (Table S1). The longer sequence length provides a greater opportunity to identify homologs in public databases. Therefore, about 54% of unigenes have homologs in public databases (Figure 1), which facilitated the prediction of agave gene functions.

Candidate Expansin A Genes in Shoot Development of Agave
Expansins are known to be involved in the loosening of plant cell walls, which allows for cell expansion and elongation. This process is important for growth and development in many plants, including those used for fiber production such as cotton [22], flax [23] and ramie [15].
In these crops, expansins are believed to play a role in fiber development by promoting the elongation and thickening of individual cells, leading to longer and stronger fibers. As an important fiber crop in tropical area, the quality and yield of fiber production in agave are the important traits for agave cultivation. One of the agave varieties, A. amaniensis, has been used for phytosteroid production for a long time [24]. This variety is also known for its large leaves, which can grow up to 150-200 cm in length. These leaves are typically harvested for their strong and durable fibers, which are composed of more than 2000 fibrous bundles per leaf. However, the plant's large size and slow growth rate make it less suitable for commercial fiber production. Nowadays, A. amaniensis is usually used as male parent to generate the famous Agave hybrid cultivar H11648. Thus, we reasoned that A. amaniensis might be rich in genes related to fiber development.
After transcriptome sequencing in A. amaniensis, we retrieved expansin genes. As a result, we identified 9 expansin genes in A. amaniensis and generated a total of 54 expansin genes in the 6 agave species (Table S3. Each agave species has 9 genes). The expansin gene family size in asparagus (17) (Table S2) and agave (9) is smaller than that in Arabidopsis (26) and rice (33). The smaller size of agave expansin genes might be due to tissue-specific expression or the limitation of RNA-Seq in our study. Although a series of saponinrelated [25] and fructan-related [26] research studies have been reported, few reports were related to fiber development. Therefore, these expansin genes we identified can serve as guidance for further studies on agave fiber development. The phylogenetic analysis indicated that expansin genes have a conserved evolutionary pattern among different species (Figure 3), regardless of some species-specific expansion of expansin genes, such as Arabidopsis in group V and rice in group II and IV.
In shoot, all nine expansin genes had high expression levels, but with the leaf growth, most of the expression genes decreased significantly, especially AhEXP1a and AhEXP4a. Results took on low expression in leaf samples (Figure 4). The negative correlation between the expression of expansin genes and the leaf elongation rate is contrary to expansin's molecular function. Considering the species-specific expression of expansin, we reasoned that the decreased expression level of expansin during leaf growth is due to their tissuespecific expression [27] or transcriptional regulations [5], which needs further study in the future.
Abiotic and biotic stresses, along with artificial selection, determine the economic traits for agave domestication. In our study, Cu, Pb and fungus' effects on expansin expression are evaluated. Two, seven, and four expansin genes were significantly up-or down-regulated under the three stresses, especially AhEXP2, AhEXP3 and AhEXP4a (Figure 5), indicating expansin genes might have different response patterns to abiotic/biotic stresses. Sisal zebra mosaic disease caused by P. nicotiana is one of the most serious diseases in sisal production. Notably, after infection with P. nicotiana, AhEXP2 expression level was highly upgraded, implying it might be involved in the synthesis of secondary metabolites associated with disease resistances. This is mainly due to the different sensing and regulatory networks upstream of various antioxidant enzymes, which are generally produced in plants to clean reactive oxygen species triggered by stress [28]. Expansin genes are also involved in N, P and K responses. In our cases, AhEXP2, AhEX4a and AhEXP4b are involved in the N response, but have minimal responses to P. However, AhEX1a was down-regulated under P deficit condition. Although some reports in Brassica napus examined the crosstalk among the three nutrients [29], more efforts are needed to reveal the interactions among N, P and K nutrients to finally facilitate the application of fertilizers in agave.

Plant Materials and RNA Extraction
A. amaniensis and A. H11648 were planted and normally managed in the Wenchang experimental field (19.00 • N, 110.33 • E) of the Environment and Plant Protection Institute, Chinese Academy of Tropical Agricultural Sciences. Shoots, unexpanded leaves, and expanded leaves were collected separately from two-year-old plants at different stages of development [30].
Leaves treated with abiotic and biotic stress were conducted using one-year-old plants, which has been described in our previous study [30]. A. H11648 is tolerant to stress from heavy metals, such as copper and lead [31,32]. Thus, in this study, we utilized CuSO 4 solution and Pb(NO 3 ) 2 solution watering as the abiotic stresses. The CuSO 4 solution concentration was 1 g/Kg (heavy metal salt/soil) and the Pb(NO 3 ) 2 solution concentration was 1.3 g/Kg (heavy metal salt/soil). About two weeks later, the leaves of treated A. H11648 plants started curling. At this point, the leaves were collected as test samples. Zebra disease is a serious threat to the main cultivar A. H11648 worldwide. The pathogen has been identified as P. nicotianae Breda [33]. So, in our study, P. nicotianae Breda strain was inoculated on A. H11648 leaves as the biotic stress. The strain of P. nicotianae Breda used for inoculation was isolated earlier from our laboratory. The young parts of the sisal leaves were selected for inoculation. The leaves were firstly disinfected with alcohol, then washed with sterilized water, and afterwards they were dried and inoculated. The control group was inoculated without P. nicotianae Breda.
As high levels of irrigation and fertilizer are required to sustain high yields of agaves species, the full (F), nitrogen free (N-), phosphorus free (P-), potassium free (K-) Hoagland nutrient solutions and water (W), were used to irrigate A. H11648 to study the expression pattern of agave expansin genes under nutrient deficiency conditions [30].
The tests for each set of treated and untreated leaves were repeated three times with different individual plants as biological replicates. The collected leaves were immediately frozen into liquid nitrogen. The total RNA was extracted according to the manufacturer's protocol using the RNA extraction kit (Tiangen Biomart, Beijing, China). The total RNA samples were immediately placed into liquid nitrogen and stored at −80 • C.

Transcriptome Sequencing, Assembly and Annotation
The total RNA samples of A. amaniensis were used for library construction and Illumina sequencing (Genoseq Technology Co., Ltd. Wuhan, China) [34]. After quality detection, total RNA was used for mRNA extraction. mRNA was collected using the magnetic bead method and then fragmented using Illumina TruSeq RNA kit (San Diego, CA, USA). The first cDNA was synthesized using mRNA and reverse transcriptase M-MuLV with random hexamer primers as guidance. Additionally, the second-strand cDNA was synthesized using RNase H and DNA polymerase I. Next, the double-strand cDNA fragments were modified with a single 'A' bases and added with adaptors. Additionally, the cDNA library was constructed after gel purification and PCR amplification. Sequencing of the cDNA library was performed with the HiSeq platform of Illumina, which generated paired-end raw reads of 150 bp in length.
We submitted the raw reads to the public Sequence Read Archive (SRA) database and the accession number was PRJNA917773 [35]. Clean reads were obtained by filtering the adaptors and low-quality sequences using the software Cutadapt (version 4.4) and Trimmomatic (version 0.38), respectively [36][37][38]. The assembly of de novo transcriptome of A. amaniensis was carried out with the Trinity software (version 2.8.4), which was annotated according to public databases. Four public databases, Nr (NCBI non-redundant protein database) [39], GO (the Gene Ontology) [40], KEGG (the Kyoto Encyclopedia of Genes and Genomes) [41,42], and the Swiss-Prot database [43], were selected for functional annotation. The Blastx method was selected to search orthologs in the four public databases with a cut-off E value of 10 −5 [5].

Characterization and Phylogeny of Expansin Genes
The expansin A proteins from model plants A. thaliana (26) and Oryza sativa (33) were selected to search orthologs in asparagus genome using the Tblastn method with a cut-off E value of 10 −5 [15,44]. The generated expansin genes were used to identify orthologs in the transcriptome datasets of six agave species, which were previously published. Additionally, the transcriptome data from the species of A. deserti [45], A. tequilana, A. americana, A. amaniensis, A. angustifolia and A. H11648 [5,30,46]. Asparagus officinalis (Asparagoideae) and agave species (Agavoideae) all belonged to Asparagaceae with close phylogenetic position, so Asparagus officinalis was chosen as the reference species [47,48].
The full-length expansin genes of the nine species (Arabidopsis, rice, A. officinalis, A. deserti [45], A. tequilana, A. americana, A. amaniensis, A. angustifolia and A. H11648) were aligned using the ClustalX method for phylogenetic analysis. The maximum likelihood tree with a bootstrap value of 1000 trials was constructed using the MEGA 7.0 software [49]. The conserved amino acid residues of these protein sequences of the nine species were further detected by alignment using the DNAMAN software (version 9.0.1.116).

Expression Patterns of Agave Expansin Genes
The expression patterns of agave expansin genes were tested using qRT-PCR at the developmental stages of shoot, non-expanding leaf, and expanding leaf. The total RNA was extracted using the RNA extraction kit according to the manufacturer's protocol (Tiangen Biomart, Beijing, China), and then digested using DNase I enzymes. All the RNA samples were reverse transcribed to the cDNA using the Reverse Transcription System (Promega, Madison, WI, USA). The qRT-PCR reaction system was 20 µL, including a 1 µL cDNA template, 10 µL TransStart Tip Green qPCR Supermix (Transgen Biotech, Beijing, China), 0.5 µL forward primer (10 µM), 0.5 µL reverse primer (10 µM), 0.4 µL Passive Reference Dye (50×) (Transgen Biotech, Beijing, China), and 7.6 µL ddH 2 O. Technical replicates were performed three times for each sample. Nine pairs of specific primers for agave expansin genes and one pair for reference gene phosphatase 2A (PP2A) as an endogenous control were synthesized; these were designed using Primer 3 software (version 0.4.0) ( Table 1) [50]. The qRT-PCR procedure was conducted with the program of an initial stage (94 • C, 30 s), 40 cycles in a stage of 94 • C for 5 s and 60 • C for 30 s and a final dissociation stage by a QuantStudio 6 Flex Real-Time PCR System (Thermo Fisher Scientific, Waltham, MA, USA). The ∆∆Ct method was used to calculate relative expression levels with PP2A as an endogenous reference gene [51,52]. The data were analyzed according to variance and plotted.