Genomic Identification and Comparative Expansion Analysis of the Non-Specific Lipid Transfer Protein Gene Family in Gossypium

Plant non-specific lipid transfer proteins (nsLTPs) are involved in many biological processes. In this study, 51, 47 and 91 nsLTPs were identified in Gossypium arboreum, G. raimondii and their descendant allotetraploid G. hirsutum, respectively. All the nsLTPs were phylogenetically divided into 8 distinct subfamilies. Besides, the recent duplication, which is considered cotton-specific whole genome duplication, may have led to nsLTP expansion in Gossypium. Both tandem and segmental duplication contributed to nsLTP expansion in G. arboreum and G. hirsutum, while tandem duplication was the dominant pattern in G. raimondii. Additionally, the interspecific orthologous gene pairs in Gossypium were identified. Some GaLTPs and GrLTPs lost their orthologs in the At and Dt subgenomes, respectively, of G. hirsutum. The distribution of these GrLTPs and GaLTPs within each subfamily was complementary, suggesting that the loss and retention of nsLTPs in G. hirsutum might not be random. Moreover, the nsLTPs in the At and Dt subgenomes might have evolved symmetrically. Furthermore, both intraspecific and interspecific orthologous genes showed considerable expression variation, suggesting that their functions were strongly differentiated. Our results lay an important foundation for expansion and evolutionary analysis of the nsLTP family in Gossypium, and advance nsLTP studies in other plants, especially polyploid plants.


Results
Genomic identification of putative nsLTPs in G. raimondii, G. arboreum and G. hirsutum. The availability of G. raimondii, G. arboreum and G. hirsutum genome sequences makes it possible to identify all the nsLTPs in the three Gossypium species. The BLASTP program was utilized to search for candidate nsLTPs in cotton with the query sequences from Arabidopsis. Initially, 104, 104 and 182 protein sequences were identified in G. raimondii, G. arboreum and G. hirsutum, respectively. After screening and selection as described in the materials and methods, a total of 189 nsLTPs were confirmed and described (Table S2). Among them, G. raimondii and G. arboreum contain a similar number of nsLTPs (47 and 51, respectively), despite the fact that G. raimondii has a much smaller genome size (885 Mb/1 C) than G. arboreum (1,746 Mb/1 C). In G. hirsutum (2,173 Mb), 91 nsLTPs were identified, representing almost a two-fold increase over the number of nsLTPs in its diploid progenitors. We designated the genes identified in G. raimondii, G. arboreum and G. hirsutum as GrLTPs, GaLTPs and GhLTPs, respectively.
Phylogenetic analysis of the nsLTP family. To determine the evolutionary relationships of nsLTPs, phylogenetic analysis of the identified nsLTPs in Gossypium and Arabidopsis was completed with the MrBayes and PHYLIP tools (Fig. 1, Fig. S1). There were similar results with high support values from each method. According to Boutrot's classification system, the nsLTP family in Gossypium was divided into 8 subfamilies (Type I, II, III,  IV, V, VI, VIII and IX), and no Type VII nsLTPs were identified in cotton ( Fig. 1, Fig. S1, Table S2). The member proportion was different in each subfamily (Fig. S2a). The Type I subfamily (33.33%) contained the most members, followed by Type II (23.28%), Type V (16.93%) and Type IV (11.64%). The least represented subfamily was Type IX (1.59%). A similar member distribution in each subfamily was found in each Gossypium species (Fig. S2b). Besides, the proportion of nsLTPs in Type I was 35.29% and 38.30% in G. arboreum and G. raimondii, respectively, whereas it decreased to 29.67% in G. hirsutum. The proportion of members in both Type II and Type VIII in G. hirsutum (25.27% and 7.69%, respectively) was higher than that in G. arboreum (23.53% and 3.92%, respectively) and G. raimondii (19.15% and 4.26%, respectively). Moreover, not all the subgroups were present in each Gossypium species, and no Type III and Type IX nsLTPs existed in G. arboreum.
Additionally, all the 189 nsLTP sequences were evaluated with OrthoMCL clustering. With the default stringency, 21 orthologous groups (OGs) in 8 subfamilies were shown (Table S3). Each subfamily contained one or more OGs, and each distinct OG was shared by only one specific subfamily. The OG distributions were similar to the phylogenetic classifications of the nsLTP family in the three Gossypium species. The protein structures were highly diverse in all the identified nsLTPs (Table S1). The amino acid lengths of the nsLTPs in the Type I, Type V and Type VIII subfamilies were relatively longer, while the proteins in Type II and Type III had relatively shorter amino acid lengths. A similar distribution in the molecular weight of the nsLTPs also existed.
Conserved protein motifs and exon/intron structure of nsLTPs. The main characteristic of plant nsLTPs is the presence of 8 CM in highly conserved positions. Using the WebLogo program, the sequence logos of the identified nsLTPs in Gossypium were generated to further confirm the conservation of amino acid residues (Fig. 2). Moreover, a variable number of inter-cysteine amino acid residues was displayed through multiple alignments, and 8 nsLTP types were therefore identified based on the typical spacing of the 8 CM (Table 1). Between the conserved Cys 1 and Cys 2 residues, Type II and VIII nsLTPs contained 6-8 residues, while Type V and IX contained 14 and 13 residues, respectively. Between the conserved Cys 4 and Cys 5 residues, Type I nsLTPs contained 19-20 residues, while the other types contained relatively fewer residues (8)(9)(10)(11)(12). Between the conserved Cys 6 and Cys 7 residues, Type III nsLTPs contained 12 residues, while the other types contained more residues (19)(20)(21)(22)(23)(24)(25)(26)(27). Between the conserved Cys 7 and Cys 8 residues, Type I nsLTPs contained more residues (13-15) than Type III and Type IX nsLTPs (6).
The MEME motif search tool was employed to identify 20 distinct conserved motifs in the nsLTPs (Fig. 3). Based on the distribution of the predicted motifs, the 189 nsLTPs were categorized into 8 distinct subfamilies,  which was consistent with the classification from the phylogenetic analysis. All the nsLTPs had either Motif 4 or 3, which represented Cys 2 . Motif 13, 11 and 14 corresponded to Cys 1 , and Motif 7, 18, 19, 6 and 15 mapped to Cys 3 Cys 4 . In addition, Motif 8, 9, 5, 1 and 12 corresponded to Cys 7 and Cys 8 . Motif 1, 2, 6, 10 and 15 mapped to Cys 5 XCys 6 , and different residues were found in the central position of the Cys 5 XCys 6 motif. Seven hydrophilic residues (Arg, Gly, Glu, Asp, Gln, Ser and Lys) and five hydrophobic residues (Tyr, Phe, Leu, Val and Met) existed at the X position of the Cys 5 XCys 6 motif in the 189 nsLTPs. Moreover, some subfamily-specific motifs were identified. For instance, Motif 5, 7 and 10 were only present in Type I nsLTPs, and Motif 15 only existed in Type VI nsLTPs.
The gene structure of the nsLTP family was also investigated (Fig. 3). Our results revealed low diversity in the distribution of intronic regions amid the exonic sequences. The intron patterns, formed by relative position and phase, were highly conserved within each phylogenetic group. The number of introns per gene varied from 0 to 2. No introns were identified in Type II and Type III nsLTPs, while some nsLTPs in other subfamilies were interrupted by 1-2 introns positioned 15 to 78 bp downstream of the codon, which almost encoded the Cys 8 in the 8 CM.
Chromosomal localization and duplication of nsLTPs in cotton. Based on the genomic location of nsLTPs (Table S2), chromosomal distribution diagrams of the nsLTP family were generated for the three Gossypium species (Fig. 4). In G. arboreum, the 51 nsLTPs were distributed unevenly on 12 chromosomes, and the number of nsLTPs on each chromosome varied widely (Fig. 4a). Chromosome 6 had the largest number of nsLTPs with 11 members, followed by Chromosome 1 with 9 genes. In contrast, only 1 gene was located on Chromosome 5, and no nsLTPs were found in Chromosome 2. In G. raimondii, the 43 nsLTPs were located on 13 chromosomes, with Chromosome 8 containing the largest number of genes (9) (Fig. 4b). Two genes were present on Chromosome 1, 2, 3, 5 and 9, whereas only a single nsLTP was localized on Chromosome 4, 10 and 12 each. In addition, a total of 30 and 35 nsLTPs were mapped in the 9 A t and 10 D t subgenome chromosomes of G. hirsutum, respectively (Fig. 4c). D t -Chromosome 6 contained the maximum number of nsLTPs with 9 genes, followed by A t -Chromosome 2, D t -Chromosome 1 and D t -Chromosome 8 with 6 genes each, while only one gene was present on A t -Chromosome 13, D t -Chromosome 5, D t -Chromosome 7 and D t -Chromosome 10 each. Besides, no nsLTPs were located on A t -Chromosome 1, 3, 10 and 12, and D t -Chromosome 3, 4 and 12. Moreover, several nsLTP clusters were detected on chromosomes such as the top of D t -Chromosome 6 and the bottom of A t -Chromosome 2 in G. hirsutum.
Gene duplication events were investigated to elucidate the expansion pattern of the nsLTP family in Gossypium. In our study, 7, 5 and 6 duplicated gene pairs were identified in G. arboreum, G. raimondii and G. hirsutum, respectively ( Table 2). In G. arboreum and G. raimondii, the duplication events were concentrated in similar subfamilies (including Type I, II and V), whereas in G. hirsutum the duplication events occurred in Type II, IV, V and VIII. This result revealed the strong expansion preference for some nsLTP subfamilies in Gossypium. Meanwhile, based on the sequence analysis and the chromosomal distribution, 2 and 2 paired genes in G. arboreum and G. hirsutum, respectively, were identified to be involved in segmental duplication events, while the other 14 pairs were linked to tandem duplication events.
In addition, the Ka/Ks ratio of each duplicated gene pair was calculated to assess the molecular evolutionary rates ( Table 2). In this study, 83.3% of the duplicated nsLTPs from cotton have mainly undergone purifying selection pressure after the duplication events ( Table 2). As purifying selection apparently constrains the divergence of the duplicated genes, the functions of the duplicated nsLTPs might not diverge much during subsequent evolution. Moreover, the divergence times between the duplicated gene pairs were analyzed. In G. arboreum and G. raimondii, most of the Ks values were less than 0.17, and their corresponding duplication age might be less than 32.69 million years ago (MYA). Only one duplication event (GaLTP47/51) occurred approximately 194.74 MYA. In G. hirsutum, the Ks values were 0.03-0.13, and the duplication ages were estimated to be 5.84-24.50 MYA.
Orthologous gene analysis and synteny block detection. The evolution of nsLTPs was analyzed among the three Gossypium species. We identified 23 and 24 orthologous gene pairs within the A t and D t subgenomes, respectively, in G. hirsutum and their corresponding ancestral A and D diploid genomes (Fig. 5, Table S4).  Table 1. Diversity of eight cysteine motifs in different types of nsLTPs identified in Gossypium. Character "X" represents any amino acid, and the Arabic numeral following "X" stands for the number of amino acid residues.    Of the orthologous gene pairs, most were distributed in Type I, II, IV and V (Fig. S3, Table S4). Besides, 24 GrLTPs have no orthologs in the D t subgenome of G. hirsutum, and 31 GaLTPs have no orthologs in the A t subgenome.
Meanwhile, the Ks values of the orthologous gene sets (A t vs A and D t vs D) were also compared. The divergence times between the allotetraploid and its progenitor genomes were estimated to be 3.46-3.54 MYA (Ks peaks at 0.0180 and 0.0184, respectively) (Fig. S4). The similar divergence time suggested that the nsLTPs in A t and D t subgenomes might have evolved symmetrically. The conservation and rearrangement of certain chromosome segments may play an important role in the adaptive evolution of the three Gossypium species. In our study, we discovered 3 conserved syntenic blocks between the A t subgenome in G. hirsutum and the A diploid genome in G. arboreum, and 1 conserved syntenic block between the D t subgenome in G. hirsutum and the D diploid genome in G. raimondii. This result indicated that the collinear relationships of the nsLTPs between the A t subgenome and the G. arboreum genome were higher than those between the D t subgenome and the G. raimondii genome. These conserved segments contain different numbers of nsLTPs, ranging from 2 to 3. According to the analysis, the genes situated on G. hirsutum A t Chromosome 11, 4 and 5 were predicted to have corresponding orthologs on G. arboreum Chromosome 1, 10 and 7, respectively, whereas the genes on G. hirsutum D t Chromosome 11 were found to originate from the conserved block on G. raimondii Chromosome 11 (Fig. 5). In particular, six paired nsLTPs (GhLTP64/GaLTP49, GhLTP65/GaLTP50, GhLTP38/GaLTP20, GhLTP39/GaLTP19, GhLTP82/GaLTP22 and GhLTP83/GaLTP22) were located in genomic regions with synteny between the G. hirsutum and G. arboreum genomes, while three paired nsLTPs (GhLTP32/GrLTP42, GhLTP33/GrLTP41 and GhLTP34/GrLTP40) were located in genomic regions with synteny between the G. hirsutum and G. raimondii genomes.

Analysis of orthologous gene expression patterns in cotton.
In the current study, expression patterns of the orthologous genes were compared in G. raimondii, G. arboreum and G. hirsutum (Figs 6 and 7 and Fig. S5). Among the intraspecific orthologous genes, 5 paired genes (GaLTP47/51, GaLTP49/50, GhLTP10/86, GhLTP53/74 and GrLTP40/41) shared almost equivalent expression patterns in the tested tissues and leaves inoculated with V. dahliae (Fig. 6). However, this was not the case for GhLTP82/83, GrLTP13/14 and GrLTP21/22. The expression patterns of these duplicated gene pairs were strongly divergent. GhLTP83 and GrLTP14 showed predominant expression in stems compared with GhLTP82 and GrLTP13, respectively. GrLTP21/22 also showed reciprocal silencing in roots and stems. Apart from the tissue-specific expression biases, GhLTP82/83 and GrLTP13/14 showed similar expression profiles after V. dahliae infection, while GrLTP21/22 displayed reverse expression pattern in response to V. dahliae.
Among the interspecific orthologous genes, 10 pairs between G. hirsutum and G. arboreum and 6 pairs between G. hirsutum and G. raimondii were selected to evaluate their expression relationship (Fig. 7, Fig. S5). Among them, the 4 paired genes (GaLTP25/GhLTP69, GaLTP35/GhLTP42, GaLTP36/GhLTP86 and GaLTP36/GhLTP10) shared similar expression patterns, and both genes were up-regulated after V. dahliae infection. However, the other 12 pairs displayed strong expression bias towards one copy in different organs, and the responses to V. dahliae were drastically different (Fig. 7, Fig. S5).

Expression profile of nsLTP members in cotton.
In the current study, a total of 78 genes (16 GaLTPs,19 GrLTPs and 43 GhLTPs) from all the subfamilies were selected, and the expression patterns of these genes varied significantly in different tissues (Fig. 8). Generally, Type V nsLTPs were mainly expressed at high levels in roots and young stems, while nsLTPs in Type I, II, IV and IX were predominantly expressed in stems and leaves. Additionally, nsLTPs play an important role in the protective mechanisms against pathogens in plants. The nsLTPs   for selected duplicated nsLTPs from G. arboreum, G. raimondii  and G. hirsutum in different organs (a-h) and leaves after V. dahliae treatment (i-p).   from some plants have reported to show obvious antifungal activities against V. dahliae 24-27 , we thus investigated the expression patterns of the nsLTPs in response to V. dahliae to take a further step towards understanding the function of cotton nsLTPs against V. dahliae. As displayed in Fig. 8, most nsLTPs in Type I and V showed a marked increase in transcript levels after V. dahliae inoculation, and different genes responded to different stages of V. dahliae infection. Some Type II nsLTPs were also slightly up-regulated in response to V. dahliae stress, while some nsLTPs in Type III, IV, VI, VIII and XI showed a decline in expression levels after treatment with V. dahliae.

Discussion
The categorization of nsLTPs based on phylogenetic clustering provided comprehensive information about the gene family and facilitated further functional analysis. In the current study, the 189 nsLTPs identified in G. raimondii, G. arboreum and G. hirsutum were classified into 8 subfamilies (Type I, II, III, IV, V, VI, VIII and IX) through phylogenetic analysis, and no Type VII nsLTPs were found in Gossypium (Fig. 1, Fig. S1). It has been speculated that Type VII nsLTPs appeared specifically in monocots while Type IX nsLTPs were unique to dicots 3,9 , and our results in Gossypium further confirmed this viewpoint. In addition, G. arboreum lost all the Type III and Type IX nsLTPs (Fig. S2), suggesting that the evolution of plants not only involves gene retentions, but also is accompanied by gene losses and mutations 4,28 . Meanwhile, the proportion of nsLTPs in each subfamily indicated that Type I seemed to have contracted while Type VIII expanded in G. hirsutum compared with its diploid progenitors (Fig. S2b). The gene retentions and losses might be associated with the related functions during plant evolution 29 .
As for multigene families, the analysis of gene expression profiles often provides useful clues for functional assessment. The tissue-specific expression patterns of nsLTPs (Fig. 8) indicated their important roles in performing diverse developmental and physiological functions in cotton. Besides, Type I nsLTPs were reported to be involved in plant defense against phytopathogens 4,10,30 , and were classified as PR-14 family. Such a role is particularly consistent with our results (Fig. 8). Most Type I nsLTPs showed increased expression levels after V. dahliae infection. In addition, many Type V nsLTPs were also significantly induced after V. dahliae attack, suggesting that they may participate in the pathogen response. In a previous study, the expression of Type V nsLTPs was observed primarily in the vascular bundles, and they were deduced to be involved in defense signaling 10 . Additional work is needed to further characterize the role of Type V nsLTPs and their functional involvement in signal transduction. Moreover, the nsLTPs in other subgroups showed no expression changes or a decline of expression levels when treated with V. dahliae, suggesting that they may participate in other biological processes or respond to the attacks of other types of pathogens 3,4,[30][31][32] .
Most nsLTPs within each subgroup shared common motif compositions, and the motif distributions among the different cotton species did not vary much (Fig. 3). Motif 5 and 7 corresponded to the highly conserved residues located in Pro-Tyr-X-Ile-Ser and Thr/Ser-X1-X2-Asp-Arg/Lys, respectively, and the two consensus pentapeptides were only found in Type I nsLTPs. Our result is consistent with previous studies 4,33 . Besides, the properties of the amino acid may determine the Cys pairing style, thus influencing the overall folding of the protein 30,32,34 . Generally, nsLTP1 has a hydrophilic amino acid in the CXC motif, whereas nsLTP2 contains a hydrophobic residue in the same position 30,32 . In our study, Motif 10 represented the pattern with a hydrophilic residue separating Cys 5 and Cys 6 in Type I, while Motif 1, 2, 6 and 15 corresponded to the Cys 5 XCys 6 with a hydrophobic residue in the central position in the other subfamilies. Among the five hydrophobic residues in the CXC motif, Leu is the most frequent residue (75.40%), and this result is in accordance with previous studies 3,4 . Besides, Cys 5 -Val-Cys 6 , which corresponded to Motif 15, was only found in four Type VI nsLTPs.
The intron-exon pattern carries the imprint of the evolution of a gene family 35,36 . Previous studies have indicated the generality that some nsLTPs in Type I, III, IV, V and VI contained introns, while no introns were identified in Type II, VIII and IX nsLTPs 3,4,32 . However, our result showed some differences. In Gossypium, introns were found in Type VIII and IX, while no introns were identified in Type III (Fig. 3). The nsLTP family evolved in early diverged land plants, and new subfamilies evolved during land plant evolution 2,32 . Intron loss events were considered the main cause for the formation of novel nsLTP types, and also contributed to novel gene formation within the specific gene subfamilies 32 . Therefore, compared with the nsLTPs in other plants, Type IX and some Type VIII nsLTPs in Gossypium still possessed introns, while Type III genes have evolved with no introns harbored. The adoption of different nsLTP types could help evolved plants adapt to the stressful conditions on land gradually, so the nsLTP family was selected and expanded in land plants 2,32 .
The role of gene duplication in the genesis of evolutionary novelty and complexity has long been recognized 37,38 . The results in our study indicated that both tandem and segmental duplication events contributed to the expansion of the nsLTP family in G. arboreum and G. hirsutum, while tandem duplication contributed to the nsLTP expansion in G. raimondii (Fig. 4, Table 2). Previous studies have indicated that the ancient and recent whole-genome duplication (WGD) events occurred in Gossypium approximately 115-146 and 13-20 MYA, respectively, and the predicted hybridization and formation of allotetraploid cotton occurred approximately 1.5 MYA 14,15,23 . In the current study, most of the nsLTP duplication events in G. arboreum and G. raimondii might have occurred fewer than 32.69 MYA, and were considered the recent duplication events. The duplication event occurred 194.74 MYA in G. arboreum was therefore regarded as the ancient hexaploidization event. In G. hirsutum, the duplication events occurred approximately 5.84-24.50 MYA, suggesting that these duplication events might have occurred before the interspecific hybridization and subsequent polyploidization in G. hirsutum.
In addition, the orthologous genes in G. arboreum, G. raimondii and G. hirsutum were identified (Fig. 5, Table S4). The results revealed that 31 GaLTPs lost their orthologs in the A t subgenome of G. hirsutum and 24 GrLTPs lost their orthologs in the D t subgenome. These genes were mainly distributed in Type I, Type II, Type IV and Type V, and the proportions of these GrLTP and GaLTP genes distributed in each subfamily were complementary (Fig. S3). For example, in Type I, 66.67% of the GaLTPs had no orthologs in the A t subgenome, whereas a relatively lower ratio (50%) of the GrLTPs lost their orthologs in the D t subgenome. On the contrary, in Type Scientific RepoRts | 6:38948 | DOI: 10.1038/srep38948 V, only 33.33% of the GaLTPs lost their orthologs, while many more GrLTPs (75%) lost their orthologs. The ratios in other subfamilies such as Type IV, VI and VIII showed similar complementary phenomena, suggesting that the loss and retention of nsLTPs in G. hirsutum might not be random. During the process of polyploidization, chromosomal recombinations and rearrangements resulted in gene losses and gene retentions as duplicate orthologs 39,40 . The regular losses in conjunction with retentions might have avoided severe functional loss and redundancy of the genes in plants.
Duplications of genomic content often occur during DNA replication and recombination through independent mechanisms 38,41 . Some duplicated genes might have been subjected to diversification of their functions due to the divergent changes in their protein sequences together with cis-regulatory elements 37 . The nsLTPs may have undergone independent duplication events followed by functional diversification in each species during speciation 42,43 . In rice and wheat, the nsLTPs may have suffered from a complex evolutionary selection mechanism including subfunctionalization 43,44 . The functional diversification of nsLTPs appears to be not random, but seems to have originated from the specific biological processes over the course of evolution. In the current study, five pairs of duplicated genes shared similar expression profiles (Fig. 6). These results indicated that these duplicated genes might retain some essential functions during subsequent evolution, and the similar expression patterns may be related to the highly similar protein architecture of these duplicated genes. However, three paired duplicated genes showed significant expression divergence (Fig. 6). The diversity might be caused by the significant variation in gene regulation after the duplication events 45,46 . The differential expression patterns of duplicated genes in Gossypium demonstrated that these genes might have experienced functionalization during the evolutionary process.
As a prominent mode of speciation in flowering plants, allopolyploidy can cause genomic changes, including chromosomal rearrangements and changes in gene expression [45][46][47][48] . Recent studies have indicated the unequal expression of orthologous genes in allotetraploids such as Gossypium 13,49 , Brassica 47 , Oryza minuta 50 and the synthetic allopolyploid Arabidopsis 51 , and the gene expression levels in these allopolyploids deviated from the expression of their orthologous genes in the progenitors. In our study, the expression patterns of genes in G. hirsutum revealed considerable differences from the expression of their orthologous genes in its progenitors ( Fig. 7 and Fig. S5), indicating that the functions of the orthologous genes were strongly differentiated over the course of evolution.
In summary, 189 nsLTPs were identified in G. raimondii, G. arboreum and G. hirsutum in this study. Comprehensive study of the nsLTPs in Gossypium provided some important features of the gene family such as gene structure, expansion and expression. The findings here provide researchers with novel findings about the molecular evolution and expansion history of the nsLTP family in Gossypium, and offer a good opportunity to further investigate the nsLTP family in plants, especially polyploid plants. Besides, the analysis of the cotton nsLTPs in response to V. dahliae might help guide future identification of disease-resistant genes and exploitation of wilt-resistant varieties.

Materials and Methods
Plant materials and V. dahliae spore treatment. The seeds of G. raimondii, G. arboreum (Shixiya1) and G. hirsutum (TM-1) were kindly supplied by National Medium-term Gene Bank of Cotton in China. Cotton seeds were surface-sterilized with 0.1% HgCl 2 for 10-15 min and washed with sterile water for at least 5 times. Sterilized seeds were sown on the half-strength Murashige and Skoog (MS) solid medium. After 3-day germination, the seedlings were transferred to half-strength Hoagland nutrient solution at pH 6.0. Fresh nutrient solution was continuously aerated with pumps and air stones, and was changed every 3 days. Plants were cultivated in a growth chamber at 28 °C with a photoperiod of 16 h light and 8 h dark. For tissue-specific expression analysis, roots, stems and leaves were harvested from 3-week-old cotton seedlings, and quick-frozen in liquid nitrogen, then stored at − 80 °C until RNA isolation. Three plants constituted an individual replicate and three biological replicates were sampled for RNA extraction.
V. dahliae isolate Anyang (ACCC no. 36207) is a popular V. dahliae strain isolated from Anyang, Henan Province, China. It was supplied by the Cotton Research Institute of the Chinese Academy of Agricultural Science and preserved at the Agricultural Culture Collection, Beijing, China. V. dahliae mycelia growing on potato dextrose agar (PDA) plates were transferred to Czapek's medium (containing 2.0 g·l −1 NaNO 3 , 1.0 g·l −1 K 2 HPO 4 , 0.5 g·l −1 MgSO 4 • 7H 2 O, 1.0 g·l −1 KCl, 0.01 g·l −1 FeSO 4 • 7H 2 O and 30.0 g·l −1 sucrose) on a shaker at 150 rpm at 25 °C for sub-culturing. The spore concentration was determined with a haemocytometer and adjusted to a concentration of 10 7 spores ml −1 with sterile distilled water prior to use. Three-week old plants were inoculated with V. dahliae spores using the root-dip method, and the leaves were collected 0 h, 6 h, 12 h and 24 h after V. dahliae inoculation to examine the expression patterns of the nsLTP family in cotton under V. dahliae treatment.

RNA isolation and quantitative real-time PCR (qRT-PCR). Total RNA was extracted using an
RNAprep Pure Plant kit (Tiandz, China), and first-strand cDNA was synthesized from DNase-treated RNA with a PrimerScript 1st Strand cDNA synthesis kit (TaKaRa). Gene-specific primers were designed based on their coding sequences (CDSs) and then synthesized commercially (Generay, Shanghai, China) (Table S5). qRT-PCR was performed with SYBR premix Ex Taq (TaKaRa) and the CFX96 Realtime System (Bio-Rad, France) by strictly following the manufacturer's instructions. The qRT-PCR machine was programmed with 40 cycles and an annealing temperature of 60 °C. The cotton EF1α gene was used as an endogenous control for all the qRT-PCR analyses. The relative transcription levels were calculated using the 2 −ΔΔCT method. Three technical replicates were performed for each sample. The expression profiles of the nsLTPs were clustered using Cluster 3.0 software 52 .
Sequence retrieval and structural analysis. All the nsLTP sequences of Arabidopsis (Table S1) were collected from TAIR (http://www.arabidopsis.org/index.jsp) and were used as queries by searching against the cotton genome database 14,15,23 using the BLASTP program with default parameters (E_value ≤ 10) (http://cgp.genomics. org.cn/page/species/index.jsp). The conserved LTP domain of each putative LTP sequence in Gossypium was confirmed in the Conserved Domain Database in National Center for Biotechnology Information (NCBI) (http:// www.ncbi.nlm.nih.gov). Then, the candidate nsLTP sequences were manually examined for the presence of the 8 CM, and proteins lacking the essential Cys residues were excluded. After that, the putative proline-rich proteins and the proteins without NSSs (checked with SignalP 4.0, http://www.cbs.dtu.dk/services/SignalP) were also removed. Additionally, the proteins with C-terminal GPI anchor signals (predicted with big-PI Plant Predictor, PSORT and PredGPI) were excluded [53][54][55] . The protein sequences of the 2S-albumins (At2S1 to At2S4) 56 and alpha amylase inhibitor (RATI) 57 were then BLAST-searched against the rest of the candidate nsLTPs to exclude possible inhibitors and cereal storage proteins. Moreover, the proteins with more than 120 amino acids at maturity were discarded. Through these searching and screening procedures, the rest of the nsLTPs were finally confirmed and used for the following analysis. Primary and secondary protein structures were predicted with ProtParam (http://web.expasy.org/protparam/) and SOPMA (http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page= npsa_sopma.html). In addition, the nsLTPs were clustered using OrthoMCL to identify the orthologous groups (OGs) 58 . The inflation parameter was set to the default.
Phylogenetic analysis and sequence alignment. All the nsLTP sequences in this study were aligned using ClustalX version 2.1 59 . The PHYLIP program (http://evolution.genetics.washington.edu/phylip.html) was used to estimate the maximum-likelihood phylogeny for all the nsLTP sequences in Arabidopsis and the three Gossypium species with the JTT model. In addition, MrBayes version 3.1.2 was used to conduct Bayesian analysis for the nsLTP sequences in the three Gossypium species 60 . Trees were visualized with Figtree version 1.4.0 (http:// tree.bio.ed.ac.uk/software/figtree/). Protein motif and gene structure analysis. The conserved motifs of the nsLTPs were investigated using the online MEME program 61 . The analysis was performed with a set of parameters as follows: the optimum motif width was set to ≥ 6 and ≤ 200; the maximum number of motifs was set to 20 62 . Sequence logos for the conserved domains were generated using the WebLogo tool (http://weblogo.berkeley.edu/). Genomic schematic diagrams of the nsLTPs were obtained by comparing the genomic sequences and their predicted CDSs using the GSDS tool (http://gsds.cbi.pku.edu.cn/).

Chromosomal mapping and gene duplications. The chromosome location information of the nsLTPs
was searched in the cotton genome database. MapInspect software (http://www.plantbreeding.wur.nl/uk/ software-mapinspect.html) was used to generate chromosomal distribution images for these nsLTPs in G. raimondii, G. arboreum and G. hirsutum. Gene duplication events were investigated as described in previous reports 62 . Orthologous groups of the nsLTP family among the three Gossypium species were generated using the OrthoMCL database. Conserved syntenic blocks were inferred by running the OrthoClusterDB tool available at GDR (http:// genome.sfu.ca/cgi-bin/orthoclusterdb/runortho.cgi). The diagram was plotted using Circos 63 . Furthermore, the selection pressure for each orthologous gene pair was calculated by the Ka (non-synonymous substitution rate)/ Ks (synonymous substitution rate) ratio 64 . The formula "t = Ks/2r" was used to estimate the divergence time, and a neutral substitution rate (r) of 2.6 × 10 −9 was used in the current study.