Identification of circularRNAs and their targets in Gossypium under Verticillium wilt stress based on RNA-seq

Circular RNAs (circRNAs), a class of recently discovered non-coding RNAs, play a role in biological and developmental processes. A recent study showed that circRNAs exist in plants and play a role in their environmental stress responses. However, cotton circRNAs and their role in Verticillium wilt response have not been identified up to now. In this study, two CSSLs (chromosome segment substitution lines) of G.barbadense introgressed into G. hirsutum, CSSL-1 and CSSL-4 (a resistant line and a susceptible line to Verticillium wilt, respectively), were inoculated with V. dahliae for RNA-seq library construction and circRNA analysis. A total of 686 novel circRNAs were identified. CSSL-1 and CSSL-4 had similar numbers of circRNAs and shared many circRNAs in common. However, CSSL-4 differentially expressed approximately twice as many circRNAs as CSSL-1, and the differential expression levels of the common circRNAs were generally higher in CSSL-1 than in CSSL-4. Moreover, two C-RRI comparisons, C-RRI-vs-C-RRM and C-RRI-vs-C-RSI, possessed a large proportion (approximately 50%) of the commonly and differentially expressed circRNAs. These results indicate that the differentially expressed circRNAs may play roles in the Verticillium wilt response in cotton. A total of 280 differentially expressed circRNAs were identified. A Gene Ontology analysis showed that most of the ‘stimulus response’ term source genes were NBS family genes, of which most were the source genes from the differentially expressed circRNAs, indicating that NBS genes may play a role in Verticillium wilt resistance and might be regulated by circRNAs in the disease-resistance process in cotton.

Although circRNAs are universally present across the eukaryotic tree of life (Wang et al., 2014), circRNAs in plants are reported with much less frequency than in mammalian research. Very recent reports have shown that circRNAs exist in plants, and circRNAs were comprehensively identified by an analysis of RNA-seq data in rice, Arabidopsis, barley, tomato, wheat, and soybean (Lu et al., 2015;Ye et al., 2015;Sun et al., 2016;Darbani, Noeparvar & Borg, 2016;Zuo et al., 2016;Wang et al., 2017;Zhao et al., 2017). CircRNAs in rice exhibit tissue-specific expression and regulate the expression of their parental genes (Lu et al., 2015). CircRNAs in barley function in response to the micronutrients iron and zinc (Darbani, Noeparvar & Borg, 2016). Sun et al. (2016) found that chloroplasts are an active area for circRNA generation, and complementary sequences exist in not only the intron regions but also the surrounding splice sites in Arabidopsis. Moreover, circRNAs have been suggested to play a role in the plant response to environmental stress. For example, the expression profiles of circRNAs have been reported for rice responding to Pi-starvation stress and in Arabidopsis responding to heat, low-and high-light stresses (Pan et al., 2018;Ye et al., 2015). The differential expression of circRNAs was reported in tomato fruits affected by chilling stress (Zuo et al., 2016). Wang et al. (2017) revealed a possible connection between the regulation of circRNAs and the expression of functional genes in wheat leaves associated with dehydration resistance. The mechanism of how circRNAs participate in the stress response is attributed to the fact that it exhibits alternative splicing circularization patterns and acts as a negative regulator of their parental genes .
Cotton (Gossypium spp.) is one of the most economically important crop plants and is the most important textile fibre crop in the world. Verticillium wilt, caused by the soil-borne fungal pathogen Verticillium dahliae, is the most destructive cotton disease worldwide. Gossypium hirsutum and Gossypium barbadense are the most widely cultivated cotton species today and originate from the interspecific hybridization between the Agenome species Gossypium arboreum (A2) and the D-genome species Gossypium raimondii (D5). G. hirsutum has a higher yield but a weaker Verticillium wilt resistance, while G. barbadense has a lower yield but a higher resistance to Verticillium wilt (Wang et al., 2008;Wang et al., 2012). Chromosome segment substitution lines (CSSLs), known as interspecific introgression lines, possess the same genetic background as the recurrent parent. Recently, cotton CSSLs have been developed by transferring valuable alien genes from G. barbadense to improve the G. hirsutum species (Saha et al., 2013;Zhai et al., 2016). However, a systematic analysis of cotton circRNAs has not been reported thus far. In this study, two CSSLs of G. barbadense introgressed into G. hirsutum, a line highly resistant to Verticillium wilt (named CSSL-1) and a line susceptible to Verticillium wilt (named CSSL-4), were used to study the role of circRNAs in the cotton Verticillium wilt response through an RNA-seq approach. Our results will enrich the knowledge on plant circRNAs and help to reveal the role and mechanism of circRNAs in cotton Verticillium wilt resistance.

Materials and growth conditions
A highly aggressive defoliating fungus, Verticillium dahliae strain V991 (Sun et al., 2013;Zhang et al., 2012), was used for inoculation. For conidial production, V991 was cultured on potato dextrose agar (PDA) and was then inoculated into liquid Czapek's medium at 25 • C for 7 days. The concentration of the conidial suspensions was adjusted to 1 ×10 7 conidia/mL with sterile distilled water for subsequent use.
Two CSSLs of G. barbadense introgressed in G. hirsutum, CSSL-1 and CSSL-4, derived from an interspecific cross of G. hirsutum (CCRI36) ×G. barbadense (Hai1), in which CCRI36 is a susceptible cultivar (G. hirsutum L.) and Hai1 is a highly resistant line (G. barbadense L.), were obtained from the Cotton Institute of the Chinese Academy of Agricultural Sciences of China. CSSL-1 is highly resistant to Verticillium wilt, while CSSL-4 is susceptible to Verticillium wilt .
CSSL-1 and CSSL-4 were cultivated in a controlled-environment chamber under a 16 h light/8 h dark photoperiod with 80% relative humidity at 28/25 • C (day/night). The roots of CSSL-1 and CSSL-4 in the two-true-leaf seedling stage were inoculated with a V. dahliae conidia suspension using a root dip method (He et al., 2014). Control plants were mock-inoculated with sterile water. The roots and stems were harvested at the squaring

Library preparation and Illumina sequencing
Total RNA was extracted from eight samples with three biological replicates to construct RNA-seq libraries. In total, 24 libraries were constructed: C-RRI-1, C-RRI-2, C-RRI-3, C-SRI-1, C-SRI-2, C-SRI-3, C-RSI-1, C-RSI-2, C-RSI-3, C-SSI-1, C-SSI-2, C-SSI-3, C-RRM-1, C-RRM-2, C-RRM-3, C-SRM-1, C-SRM-2, C-SRM-3, C-RSM-1, C-RSM-2, C-RSM-3, C-SSM-1, C-SSM-2, and C-SSM-3. The TRIzol reagent (Invitrogen) was used to extract the total RNA following the manufacturer's instructions. The quality and concentration of the extracted total RNA was validated using an ultraviolet spectrometer and electrophoresis on a denaturing formaldehyde agarose gel. The Ribo-Zero Magnetic Kit (Epicentre, Madison, WI, USA) was used to remove the rRNAs, and RNase R (Sigma Aldrich, St. Louis, MO, USA) was added to digest the linear RNAs. The circRNAs were then disrupted into short fragments using fragmentation buffer (Ambion). Using these short fragments as templates, the first-strand cDNA was synthesized using random hexamer primers, and the second-strand cDNA was synthesized by adding buffer, RNase H, dNTPs, and DNA polymerase I. Then, the resulting short fragments were purified using a QiaQuick PCR (Qiagen, Valencia, CA, USA) purification kit and eluted using EB buffer for end repair, and adenine (A) and sequencing adapters were added. Suitable fragments were collected through agarose gel electrophoresis and selected as templates for PCR to form the RNA-seq libraries. Finally, the libraries were sequenced using an Illumina Hiseq 2000 sequencer.

Read mapping and identification and classification of the circRNAs
The low-quality reads, adaptor reads, and rRNA reads were removed from the raw sequencing RNA-seq reads and were uniquely aligned using TopHat2 (Kim et al., 2013) against the G. hirsutum genome (Gossypium_Hirsutum_v1.1 version) downloaded from the Cotton Research Institute, Nanjing Agricultural University (http://mascotton.njau.edu.cn/). The reads that did not map to the reference genome were named Unmapped Reads, and 20 bp from the two ends of Unmapped Reads were extracted and named the Anchors Reads. The Anchors Reads were independently mapped to the reference genome, and find_circ (v1.2, https://github.com/marvin-jens/find_circ) was used with the default parameters to analyse and identify the candidate circRNAs. Novel circRNAs were identified by blast searches against the circBase database (http://www.circbase.org/) using the candidate circRNAs with E-value of <e −10 .
A circRNA was formed between the junction of a downstream splice donor and an upstream splice acceptor. The end points of the downstream donor and the upstream acceptor were a pair of breakpoints. The circRNAs were further classified into six types according to the location of the pair of breakpoints and the circRNA's origin in the genome as follows: annot_exons (the pair of breakpoints was located in the starting point of an exon and the end point of another exon and the circRNA originated from the multiple exons between the breakpoints); one_exon (the pair of breakpoints was located in a same exon and the circRNA originated from the nucleotides between the breakpoints); intronic (the pair of breakpoints was located in the same intron and the circRNA originated from the nucleotides between the breakpoints); exon_intron (the circRNA contained one or multiple exons, but the breakpoints were not located at the starting point of an exon or the end point of an exon. The circRNA originated from the nucleotides between the first breakpoint and the starting point of the first exon or multiple exons and the nucleotides between the end point of the last exon and the second breakpoint); intergenic (the pair of breakpoints was located in the fragment between two genes and the circRNA originated from the nucleotides between the breakpoints); and antisense (the circRNA originated from the antisense strand of a gene).

Validation of circRNAs in cotton
Total RNA was isolated from root and stem of CSSL-1 and CSSL-4 seedlings using TRIzol reagent (Invitrogen, Carlsbad, CA, USA) according to the manufacturer's protocol, and used subsequent cDNA preparation with random primers. The divergent primers were then designed on the flanking sequences of head-to-tail splicing sites of circRNAs. The divergent primers, that is, the forward primer being located downstream of the reverse primer when they are aligned to genomic sequence. Polymerase chain reactions (PCRs) were done using the divergent primers and cDNA templates. PCR products were then separated using agarose gel electrophoresis, and purified with gel extraction kit. Sanger sequencing were performed to further confirm the presence of the back-spliced junction sites.

Identification and analysis of circRNA source genes
The genes producing circRNAs were called the parental genes or source genes of circRNAs. When the reads containing head-to-tail splicing sites of circRNAs were uniquely aligned to the genomic DNA sequence of annotated genes, the circRNAs originated from the annotated genes and the annotated genes were called the source genes of the circRNAs. Due to their origination from the fragment between two genes, the intergenic circRNAs had no source genes and are described as 'NA'. The source genes of the circRNAs were then subjected to Gene Ontology (GO) and KEGG Ontology (KO) analyses, and the cut-off for judging significant enrichment was set to a corrected P-value ≤0.05.

Expression analysis of the circRNAs
The RPM (reads per million mapping) was used to obtain the relative expression levels of the circRNAs. Then, differential expression analysis of the circRNAs was performed using edgeR (Robinson, McCarthy & Smyth, 2010) with the default parameters. The fold change of the RPM values greater than 2 (|log2 (FC)| >1) and P value lower than 0.05 (P < 0.05) were used as the threshold for detecting differentially expressed circRNA.

RNA sequencing and identification of circRNAs in cotton
To analyse the role of circRNAs in Verticillium wilt response, the two CSSLs of G. barbadense introgressed in G. hirsutum, CSSL-1 and CSSL-4, were inoculated with V. dahliae, and the root and stem of the inoculated plants and the plants in the control group were harvested to construct RNA-seq libraries. In total, 24 RNA-seq libraries (eight samples, C-RRI, C-RSI, C-RRM, C-RSM, C-SRI, C-SSI, C-SRM, and C-SSM, with three biological replicates) were constructed and sequenced. A total of 2,285,685,898 raw reads were obtained. After the low-quality reads, adaptor reads, and rRNA reads were filtered out, a total of 2,270,853,468 clean reads were used to align against the G. hirsutum genome. In total, 450,211,774 reads (approximately 19.82%) did not map to the reference genome and were used to identify the circRNAs using find_circ. Based on the sequence reads, 686 circRNA candidates were identified and named from ghi_circ_000001 to ghi_circ_000686 (Table S1, Dataset S1). The identified circRNAs were all novel circRNAs based on the results of BLAST searches against the circBase database with an E-value of <e −10 .
To confirm our identification circRNAs, divergent primers (Table S2) were designed for three randomly selected circRNAs to perform PCR. The PCR products were further analyzed by agarose gel electrophoresis and Sanger sequencing. The results showed that all circRNAs had bands of expected size and validated back-spliced junction sites (Fig. 1), indicating that our identification of circRNA are credible.

Properties of cotton circRNAs
According to the genome origination, the 686 circRNAs were classified into six types, including annot_exons, one_exon, intronic, exon_intron, intergenic, and antisense, containing 43, 24, 45, 385, 87, and 102 circRNAs, respectively (Fig. 2). The members of the exon_intron type were the most prevalent among the six types. The vast majority of the circRNAs, including the annot_exons, one_exon, intronic, and exon_intron-type circRNAs (497/686), originated from the sense strands of annotated protein-coding genes, indicating that the origination mechanism of circRNAs is closely related to the splicing mechanism of precursor mRNA (pre-mRNA), which is consistent with previous studies showing a non-canonical mode of RNA splicing or back-splicing events Salzman et al., 2012). The annot_exons, one_exon, and exon_intron-type circRNAs containing the exon sequence were in the majority of the 686 circRNAs (approximately 65.9%). The result is consistent with the studies in other species, such as Arabidopsis thaliana and Oryza sativa, whose circRNAs originated from exons of a single protein-coding gene (named Exonic circRNAs), accounting for 50.5% and 85.7%, respectively (Ye et al., 2015).
The length of circRNAs is uneven and ranges from approximately one hundred nucleotides to approximately 100 thousand nucleotides (Table S1). The mean length of the annot_exons, one_exon, intronic, exon_intron, intergenic, and antisense circRNAs are 530, 238, 30,878, 21,901, 14,719, and 19,326 nucleotides, respectively, indicating that the circRNAs containing exons (annot_exons, one_exon, and exon_intron) were shorter than the other circRNAs (intronic, intergenic, and antisense), and the circRNAs originating only from exons (annot_exons and one_exon) were very short, only several hundred nucleotides. These differences are due to having more and longer non-coding regions than the protein-coding regions in the eukaryotic genome. The distribution of circRNAs was also uneven among the chromosomes. The number of circRNAs on the A-genome chromosomes (A01-A13) of G. hirsutum ranged from 14 to 53, while the number on the D-genome chromosomes (D01-D13) ranged from 12 to 66 (Fig. 3, Table S1). Nevertheless, the distribution of the circRNAs was even between the A-genome and D-genome of G. hirsutum, 333 and 332 circRNAs, respectively, indicating that the contributions of the two subgenomes to generating the circRNAs were approximately equal. Thus, circRNAs are important and universal in cotton as in other species, including archaea, mammals, and other plants (Danan et al., 2012;Salzman et al., 2012;Jiang et al., 2014;Gruner et al., 2016;Lu et al., 2015;Ye et al., 2015;Sun et al., 2016;Darbani, Noeparvar & Borg, 2016).

Identification of circRNA source genes
The annotated genes producing circRNAs were the source genes of the circRNAs, while the circRNAs without source genes were described as 'NA'. A total of 599 of the identified 686 circRNAs originated from 416 source genes, and 87 intergenic-type circRNAs originated from the fragment between two genes, having no source genes (Table S1). More commonly, a source gene generated a single circRNA, but 103 genes generated two or more circRNAs, totalling 286 circRNAs. For example, the Gh_A05G3459 and Gh_D05G1184 genes each generated seven and eight circRNAs, respectively (Table S1). The result was also consistent with the conclusion that circRNAs possess an alternative splicing pattern Gao et al., 2016).

Functional categorization of circRNA source genes
The circRNA source genes were subjected to GO and KO analyses. The GO categories indicated the functions of the source genes. The GO contains three main functional  Figure 3 The circRNA distribution on G. hirsutum chromosomes. A01-A13 represent A-genome chromosomes 01-13, respectively, while D01-D13 represent D-genome chromosomes 01-13, respectively. Full-size DOI: 10.7717/peerj.4500/ fig-3 categories, biological process, cellular component, and molecular function. The 416 source genes were further categorized into 11 biological process terms, eight cellular component terms, and nine molecular function terms, in total 28 functional terms (Fig. 4, Table S3). The major enrichments for the biological process terms were 'cellular process', 'metabolic process', 'single-organism process', and 'response to stimulus'. The 34 source genes in the 'stimulus response' term may function in response to Verticillium wilt. Further analysis showed that 20 of the 34 genes belong to the NBS (nucleotide binding site) gene family (Table S4), which is the largest disease-resistance gene family in plants and plays a central role in recognizing effectors from pathogens and in triggering downstream signalling during a plant's response to pathogen invasion (Yue et al., 2012;McHale et al., 2006). The major enrichments for molecular function terms were 'binding' and 'catalytic activity'. The levels of enrichment were not obviously different among the cellular component terms except that the 'extracellular region' term contained significantly fewer genes. The KO categories were used to reflect the biology pathway of the source genes, and 251 of the 416 the source genes were assigned to 91 KEGG pathways (Table S5). The top 20 pathway enrichments are shown in Fig. 5. The major pathways were 'RNA transport', 'flavonoid biosynthesis', 'phenylpropanoid biosynthesis', and 'ribosome', containing eight, seven, six and six genes, respectively.

The relative expression analysis of circRNAs in tissues of cotton under different treatments
The relative expression of the identified 686 circRNAs was analysed among the eight root and stem samples of inoculated and mock CSSL-1 and CSSL-4, C-RRI, C-RSI, C-RRM, C-RSM, C-SRI, C-SSI, C-SRM, and C-SSM. The relative expression levels of the circRNAs in each library (Fig. 6, Table S6) were obtained according to the RPM method. The mean RPM value of a circRNA from three biological replicates was used as the relative expression level of the circRNAs of the samples (Table S7) for further analysis. The number of circRNAs existing in each sample was shown in Fig. 7A. Most of the 686 circRNAs were detected in each sample (Fig. 7B, Table S7), and 319 of the 686 circRNAs were expressed in all 8 samples (Table S8). The CSSL-1 samples (C-RRI, C-RRM, C-SRI, and C-SRM) and CSSL-4 samples (C-RSI, C-RSM, C-SSI, and C-SSM) contained 680 common circRNAs and only 2 and 4 different circRNAs, respectively (Fig. 8A). Separately comparing the root and stem samples of CSSL-1 and CSSL-4, there were 629 and 579 common circRNAs and only 15 and 27, and 17 and 32 different circRNAs, respectively (Figs. 8B, 8C). These data indicated that the number of circRNAs detected in the CSSL-1 and CSSL-4 samples was not different. The distribution of the circRNAs among the root samples showed that 482 circRNAs (70.3%) were detected in all four samples, and 663 (96.6%) were detected in at least two of the four samples (Fig. 7D). The distributions of the circRNAs among stem samples showed that 450 circRNAs (65.6%) were detected in all four samples, and 656 (95.6%) were detected in at least two of the four samples (Fig. 7E). These results indicated that circRNAs were widespread in tissues. However, the root samples (C-RRI, C-RRM, C-RSI, and C-RSM) obviously contained more circRNAs than the stem samples (C-SRI, C-SRM, C-SSI, and C-SSM) (Fig. 7B). In addition, 15 of the 686 circRNAs were not detected in any of the root samples, while 58 of the 686 circRNAs were not detected in any of the stem samples (Fig. 7C), indicating that more circRNAs existed in the root of cotton than in the stem. between stem and root (C-SRI vs C-RRI, C-SRM vs C-RRM, C-SSI vs C-RSI, and C-SSM vs C-RSM), Group 2 contained four comparisons between two root samples (C-RRM vs C-RRI, C-RSI vs C-RRI, C-RSM vs C-RRM, and C-RSM vs C-RSI), and Group 3 contained four comparisons between two stem samples (C-SRM vs C-SRI, C-SSI vs C-SRI, C-SSM vs C-SRM, and C-SSM vs C-SSI) (Fig. 7F).
The results showed that 391 of the 686 circRNAs were differentially expressed in at least one of the 12 comparisons (Table S21). A total of 280 of the 686 circRNAs were differentially expressed in at least one of the eight comparisons in Group 2 and Group 3 (Table S21).
The number of differentially expressed circRNAs in the comparisons in Group 1 was generally greater than that in the comparisons in Group 2 and Group 3. Additionally, each comparison in Group 1 exhibited more up-regulated circRNAs than down-regulated circRNAs (Fig. 7F), which was related to the above results showing that the number of circRNAs expressed in the roots was obviously greater than that in the stems. The results indicated that different circRNAs are expressed in different tissues, which is consistent with the characteristic that circRNAs often show tissue-, cell-or developmental stagespecific expression Memczak et al., 2013;Salzman et al., 2013). Therefore, G. hirsutum should have non-identified and novel circRNAs in other tissues, and the 280 circRNAs differentially expressed in at least one of the 8 comparisons between two root or stem samples may be related to the cotton Verticillium wilt response. Moreover, the number of circRNAs that were differentially expressed in the root comparisons in Group 2 was much higher than that in the stem comparisons in Group 3 (Fig. 7F), indicating that the circRNAs exhibited a stronger and more rapid response to V. dahliae in the root. Therefore, the differentially expressed circRNAs in root comparisons were further analysed. Firstly, the differentially expressed circRNAs in CSSL-1 and CSSL-4 root comparisons, C-RRM-vs-C-RRI and C-RSM-vs-C-RSI, were analysed and compared. The number of circRNAs differentially expressed in C-RRM-vs-C-RRI in the inoculated CSSL-1 was only approximately half of that in C-RSM-vs-C-RSI in the inoculated CSSL-4 (Fig. 8D). More differentially expressed circRNAs in the susceptible line compared to the resistant line indicated that the differential expression of circRNAs may play a role in the cotton Verticillium wilt response. There were 25 differentially expressed circRNAs in common in C-RRM-vs-C-RRI and C-RSM-vs-C-RSI (Fig. 8D). In addition, nine and two of the 25 circRNAs were up-regulated and down-regulated, respectively, in both C-RRM-vs-C-RRI and C-RSM-vs-C-RSI (Fig. 9A), and the differential expression level analysis of the total 11 common circRNAs showed that their levels were generally higher in C-RRM-vs-C-RRI than in C-RSM-vs-C-RSI (Fig. 10), indicating that the differential expression level might be one of the reasons for the resistance to Verticillium wilt differences in cotton. Therefore, the differentially expressed circRNAs may play a role in the resistance or susceptibility to Verticillium wilt.
Then, the differentially expressed circRNAs in the two C-RRI comparisons, C-RRIvs-C-RRM and C-RRI-vs-C-RSI, were analysed and compared. There were 27 common up-regulated and nine common down-regulated circRNAs, for a total of 36 circRNAs, between the two C-RRI comparisons (Fig. 9C). These common circRNAs accounted for up to 54.5% and 41.4% of the total differentially expressed circRNAs in C-RRM-vs-C-RRI and C-RSI-vs-C-RRI, respectively. C-RRI is an inoculated resistant line sample, and its two comparisons (vs RRM, a mock resistant line sample, and vs C-RSI, an inoculated susceptible line sample) had a large proportion of common circRNAs, approximately 50%, which also indicated that the differentially expressed circRNAs may play a role in the response to Verticillium wilt. Finally, the differentially expressed circRNAs in all the root comparisons were analysed. There were 12 common differentially expressed circRNAs in the four root comparisons (Table 1), speculating that the 12 circRNAs may be more sensitive to Verticillium wilt. Eleven of the 12 circRNAs had source genes (Table 1), and nine of the 11 source genes had defined functions according to the gene annotations of the G. hirsutum genome (Table 1). Notably, two source genes might respond to pathogen invasions. The Gh_A01G0389 gene, the source gene of ghi_circ_000009, was annotated as a D-mannose binding lectin protein gene with an apple-like carbohydrate-binding domain, which functions to specifically recognize diverse carbohydrates and mediate a wide variety of biological processes, such as cell-cell and host-pathogen interactions, serum glycoprotein turnover, and innate immune responses. The Gh_D01G0335 gene, source gene of ghi_circ_000336, was annotated as the NB-ARC domain-containing disease resistance protein gene belonging to the NBS gene family.
Considering the stem samples, the number of differentially expressed circRNAs in C-SSM-vs-C-SSI in the susceptible line was also greater than that in C-SRM-vs-C-SRI in the resistant line, but the difference was small (Fig. 7A). However, there were only four and two common differentially expressed circRNAs between C-SRM-vs-C-SRI and Glycosyl hydrolases family 32 protein C-SSM-vs-C-SSI, and between C-SRM-vs-C-SRI and C-SSI-vs-C-SRI, respectively (Figs. 9B, 9D). Moreover, an expression analysis of the circRNAs in the stem comparisons showed that no circRNAs were differentially expressed in all four stem comparisons (Table S21). These results also indicated that the circRNAs exhibited a stronger, more rapid response to V. dahliae in the root than in the stem. Above all, the differentially expressed circRNAs may play a role in Verticillium wilt resistance. A total of 280 circRNAs were differentially expressed in the root comparisons and stem comparisons in Group 2 and Group 3 (Table S21), and 247 of the 280 circRNAs had source genes (Table S22). According to the annotation of these source genes, 13 nonredundant source genes of the 17 circRNAs were related to defence response (GO:0006952) ( Table 2, Table S22). A conserved domain analysis in NCBI showed that 11 of the 13 source genes were NBS family genes, containing an NB-ARC domain related to disease resistance (Table 2). Therefore, the NBS family of genes may play a role in Verticillium wilt resistance, and they might be regulated by circRNAs in the disease-resistance process in cotton.

DISCUSSION
Recently, studies showed that circRNAs exist in plants and play a role in responding to environmental stress (Ye et al., 2015;Zuo et al., 2016;Wang et al., 2017;Liu et al., 2016). Verticillium wilt is the most destructive disease in cotton production worldwide. To elucidate the role of circRNAs in cotton Verticillium wilt response, two CSSLs of G. barbadense introgressed in G. hirsutum, CSSL-1 (a highly resistant line) and CSSL-4 (a susceptible line), were used to construct an RNA-seq library and to analyse circRNA based on V. dahliae inoculation.
In contrast with the number of mammalian studies, few comprehensive studies on circRNAs in plants have been conducted thus far. According to publicly available RNA-Seq data, 12,037, 6,012, and 854 circRNAs were identified in rice, Arabidopsis, and  (Lu et al., 2015;Ye et al., 2015;Zuo et al., 2016). Recently, 88 and 5,372 circRNAs were identified and isolated in wheat and soybean, respectively Zhao et al., 2017). Our identification and analysis of cotton circRNAs thus enriched plant circRNA research. Here, 686 novel circRNAs in cotton were identified, and the contributions of the two subgenomes (A-genome and D-genome) of G. hirsutum to the generated circRNAs was approximately the same, indicating that circRNAs are universally present in cotton.
Concerning the characteristics of circRNAs, first, the origination of circRNAs' mechanism of precursor mRNA (pre-mRNA) splicing and alternative splicing was considered Salzman et al., 2012). In this study, with the exception of intergenic-and antisense-type circRNAs, the other circRNAs (497/686) originated from the sense strand of annotated protein-coding genes, indicating that the origination of circRNAs is closely related to the splicing mechanism of precursor mRNA (pre-mRNA). Moreover, 103 circRNA source genes generated two or more circRNAs, indicating that the circRNAs possessed an alternative splicing pattern Gao et al., 2016). Second, the circRNAs act as miRNA sponges Hansen et al., 2013;Ebert, Neilson & Sharp, 2007;Franco-Zorrilla et al., 2007). In this study, the majority of circRNAs (approximately 65.9%), including the annot_exons, one_exon, and exon_intron types of circRNAs, contained the exon sequence, which is consistent with the results in Arabidopsis thaliana and Oryza sativa, showing approximately 50.5% and 85.7% exonic circRNAs (Ye et al., 2015), respectively. Acting as miRNA sponges is one of main functions of circRNAs, while miRNAs originate mainly from introns and target numerous mRNAs via combining completely or partially with mRNAs (Kim & Nam, 2006). The miRNAs combine with circRNAs and mRNAs; thus, the fact that the circRNAs originated mainly from exon-like mRNAs is reasonable. A close relationship among circRNAs, miRNAs, and mRNAs has been reported, and they may have a corresponding core sequence, which is why miRNAs can combine with circRNAs or mRNAs. However, the origination mechanism of the intergenic and antisense circRNAs remains unknown. The antisense-type circRNAs can also function as miRNA sponges. For example, the mammalian circRNA CDR1as (ciRS-7), an antisense-type circRNA, possesses 74 miR-7 binding sites and acts as a repressor of miR-7 (Hansen et al., 2013). Third, the expression of circRNAs has tissue-specificity Memczak et al., 2013). In this study, an expression analysis of the circRNAs in different tissues showed that the number of circRNAs expressed in the root was obviously greater than in the stem, and the number of differentially expressed circRNAs in comparisons between the stem and the root was generally greater than for the comparisons between two root samples or stem samples. Therefore, non-identified and novel circRNAs should be present in the other tissues of G. hirsutum.
Our analysis of circRNAs in cotton showed that the differentially expressed circRNAs may play a role in cotton Verticillium wilt response. The differential expression of circRNAs are reported in rice responding to Pi-starvation stress (Ye et al., 2015). In wheat, 62 differentially expressed circRNAs are considered to be involved in the dehydration responsive process , and 163 circRNAs are expressed in response to chilling stress in tomato (Zuo et al., 2016). Verticillium wilt is the most destructive disease in cotton production. The study of circRNAs of two chromosome segment substitution lines, CSSL-1 (a highly resistant line) and CSSL-4 (a susceptible line), in Verticillium wilt response showed that the numbers of circRNAs in CSSL-1 and CSSL-4 was not different, and there were many common circRNAs. However, the number of differentially expressed circRNAs in CSSL-4 was much greater than that in CSSL-1 in response to V. dahliae. Meanwhile, the differential expression level of the common circRNAs in CSSL-1 was generally higher than that in CSSL-4. Moreover, C-RRI, as an inoculated resistant line treatment, and its two comparisons, C-RRM-vs-C-RRI and C-RSI-vs-C-RRI, had a large proportion (approximately half) of common differentially expressed circRNAs. These data indicated that the differentially expressed circRNAs may play a role in Verticillium wilt resistance in cotton.
Here, a total of 280 differentially expressed circRNAs were identified in cotton. The analysis of the source genes of these circRNAs showed that 13 non-redundant source genes of 17 differentially expressed circRNAs were related to a defence response, and 11 of the 13 source genes were NBS family genes (Table 2). GO analysis of the source genes of all 686 of the circRNAs showed that 20 of the 34 'stimulus response' term genes belong to the NBS gene family (Table S4). Furthermore, 11, more than half of the 20 NBS genes, were the source genes of the differentially expressed circRNAs. Moreover, 12 circRNAs were commonly and differentially expressed in all the root comparisons (Table 1), and the source gene of ghi_circ_000336, the Gh_D01G0335 gene, was annotated as an NBS family gene. The NBS gene family is the largest disease resistance gene family in plants and plays a central role in recognizing effectors from pathogens and in triggering downstream signalling during the plant response to pathogen invasion (Yue et al., 2012;McHale et al., 2006;Zhao et al., 2017;Xiang et al., 2017). These results indicated that the NBS family genes in cotton may play a role in Verticillium wilt resistance and might be regulated by circRNAs in the disease-resistance process. Our results not only expanded the knowledge of circRNA in plant circRNAs, but also contributed to reveal the role and mechanism of circRNAs in the cotton Verticillium wilt response.

CONCLUSIONS
A total of 686 novel circRNAs were identified in cotton. The circRNAs exhibit stronger and more rapid response to V. dahliae in root than in stem. A total of 280 differentially expressed circRNAs were identified and they may play a role in cotton Verticillium wilt response. The disease resistance-related genes, NBS family genes, may play a role in cotton Verticillium wilt resistance via being regulated by their circRNAs. Our study will help to reveal the role and mechanism of circRNAs in cotton Verticillium wilt resistance.

ADDITIONAL INFORMATION AND DECLARATIONS
Funding