From Gene to Transcript and Peptide: A Deep Overview on Non-Specific Lipid Transfer Proteins (nsLTPs)

Non-specific lipid transfer proteins (nsLTPs) stand out among plant-specific peptide superfamilies due to their multifaceted roles in plant molecular physiology and development, including their protective functions against pathogens. These antimicrobial agents have demonstrated remarkable efficacy against bacterial and fungal pathogens. The discovery of plant-originated, cysteine-rich antimicrobial peptides such as nsLTPs has paved the way for exploring the mentioned organisms as potential biofactories for synthesizing antimicrobial compounds. Recently, nsLTPs have been the focus of a plethora of research and reviews, providing a functional overview of their potential activity. The present work compiles relevant information on nsLTP omics and evolution, and it adds meta-analysis of nsLTPs, including: (1) genome-wide mining in 12 plant genomes not studied before; (2) latest common ancestor analysis (LCA) and expansion mechanisms; (3) structural proteomics, scrutinizing nsLTPs’ three-dimensional structure/physicochemical characteristics in the context of nsLTP classification; and (4) broad nsLTP spatiotemporal transcriptional analysis using soybean as a study case. Combining a critical review with original results, we aim to integrate high-quality information in a single source to clarify unexplored aspects of this important gene/peptide family.


Introduction
The non-specific lipid transfer proteins (nsLTPs) are a plant-specific superfamily of cysteine-rich AMPs (antimicrobial peptides). They received this name due to their ability to bind to several hydrophobic molecules, such as phospholipids and fatty acids, among others. nsLTPs are characterized by their reduced size (6.5-10.5 kDa) and the presence of eight cysteine residues (8CM domain), which form four disulfide bonds [1]. They are associated with various plant biological processes, such as growth and development, abiotic stress responses, besides plant defense [2][3][4][5][6][7][8]. The mentioned plant-originated AMPs have demonstrated remarkable efficacy against bacterial and fungal pathogens [9]. The discovery of cysteine-rich AMPs such as nsLTPs has paved the way for exploring plants as potential biofactories for synthesizing antimicrobial compounds, which holds significant promise for their application as biotherapeutic agents in the field of antimicrobial drug development [9].
Antibiotics 2023, 12, 939 5 of 25 In the barley (Hordeum vulgare) genome, 70 HvnsLTPs (Hordeum vulgare nsLTPs) were identified (Table 1), which were classified into five groups ('1', '2', 'C', 'D', and 'G') [27]. Each of these genes shared common structures. Considering their expansion mechanisms, the 70 HvnsLTP genes presented 15 tandem duplication repeats (encompassing 36 genes). The HvnsLTPs' baseline expression profiles in different tissues across developmental stages indicated that this group of genes might perform a variety of functions [27]. In addition, the differential expression profile indicated that HvnsLTP genes might have diverged in terms of the cis-regulatory elements of their promoters [27].
For the Solanaceae family, there are data on nsLTPs for potato (StnsLTPs, Solanum tuberosum nsLTPs). Li et al. [29] found 83 StnsLTP genes in potato genomes, categorized into eight types, namely, '1', '2', '4', '5', '7', '8', '12*', and '13*' (Table 1). Chromosome distribution and collinearity analyses suggested that the expansion of the StnsLTP gene family was enhanced by tandem duplications. In turn, Ka/Ks analysis showed that 47 pairs of duplicated genes have gone through purifying selection during evolution. StnsLTP genes were expressed mainly in younger tissues. Furthermore, StnsLTPs contained a large number of stress-responsive, cis-acting elements in their promoter regions. These results indicated that StnsLTPs might play significant and functionally varied roles in potato plants.
In Arachis duranensis, Song et al. [31] discovered 64 AdnsLTPs (Arachis duranensis nsLTPs) genes, which were divided into six groups ('1', '2', 'C', 'D', 'E', and 'G'; Table 1), anchored over nine chromosomes. Considering the AdnsLTPs' expansion mechanisms, the study revealed some gene clustering by tandem duplication, while other family members showed segmental duplication in several chromosomes. Following treatments with high salt (NaCl, 250 mM), PEG, low temperature (4 • C), and abscisic acid, the AdnsLTPs' expression levels were altered. Three AdnsLTPs were linked to nematode infection resistance. The DOF and WRI1 transcription factors were suggested as potential controllers of the AdnsLTP response to nematode infection.
Fang et al. [30] found 330 TansLTPs (Triticum aestivum nsLTPs) genes in wheat (T. aestivum) ( Table 1). Such a quantitative result can be considered an update of the 461 nsLTP loci found by Kouidri et al. [26] for the same species. To date, T. aestivum is the plant with the highest number of nsLTPs. The TansLTPs clustered into five groups ('1', '2', 'C', 'D', and 'G') by phenetic analysis (Table 1). Gene structure and MEME pattern analyses showed that different groups of nsLTPs had similar structural compositions. Chromosome anchoring revealed that all five groups were distributed on 21 chromosomes. Furthermore, 31 gene clusters were identified as tandem duplications, and 208 gene pairs were identified as segmental duplications. Data mining of RNA-seq libraries, covering multiple stress conditions, showed that the transcript levels of some of the nsLTP genes could be strongly up-regulated by drought and high salt (NaCl, 250 mM) stresses.
Vangelisti et al. [36], studying sunflower (Helianthus annuus) HansLTPs (Helianthus annuus nsLTPs), observed the existence of four ('1', '2', '3', and '4') groups ( Table 1). The authors did not explicitly classify the observed groups according to the available classification systems. The HansLTPs (101 in total) were further examined by looking into potential gene duplication sources, which revealed a high prevalence of tandem-in addition to whole-genome duplication (WGD) events. This finding is consistent with polyploidization events that occurred during the evolution of the sunflower genome. Three ('1', '3', and '4') of the four HansLTP groups responded uniquely to environmental cues, including auxin, abscisic acid, and the saline environment. Interestingly, sunflower seeds were the only source of expression for HansLTP group '2' genes.
In line with the reports mentioned above and other works in Table 1, it is observed that the nsLTP genes act multifunctionally and show genetic variability even within accessions of the same species. nsLTPs are present in a wide range of plants, showing gene expression in different tissues, developmental stages, and stressful conditions.

Filling the Gap: Discovering and Classifying nsLTPs in New Plant Genomes
To provide genomic information for nsLTPs in plants not yet studied in the previous topic, and to update nsLTP data for some species with improved genome versions that have been made available, the following plant genomes were scrutinized ( Table 2)  Three distinct and complementary strategies were used to retrieve nsLTPs in the genomes selected (information about the applied methodologies in Supplementary Material S1). The nsLTP exhaustive mining applied to 12 evaluated genomes (Table 2) returned 258 candidate sequences by BLASTp search, 344 by the cysteine pattern-based strategy (RegEx mining), and 1191 by the machine learning approach (HMMER tool).
The machine learning approach (HMMER tool) recovered a more comprehensive number of sequences considering nsLTP domains. Strategies based on machine learning are emerging as the future of DNA/RNA/protein sequence identification and bioinformatics, in general. The search with the RegEx approach was more restrictive, on the other hand, and not as accurate as the local alignment (BLASTp) method when observing the presence of the conserved nsLTP domain. However, it is worth mentioning that the BLASTp strategy did not recover sequences from the hypothetical proteomes of C. richardii, S. moellendorffii, and M. polymorpha (Table 2). When working with sequences as diverse as nsLTPs, it is advisable to combine several mining methods to increase the chances of finding the maximum number of sequences combined with subsequent data curation. In the present work, however, almost all of the sequences retrieved by the BLASTp and RegEx approaches were also retrieved by the machine-learning-based strategy .
From the analyzed species pool (Table 2), S. moellendorffii (21) and G. hirsutum (218) presented, respectively, the lowest and highest number of nsLTPs. Our search confirmed the tendency of these peptides to be encoded by large gene families: nine (75%) of the 12 ana-lyzed species had more than 50 nsLTP loci in their respective genomes (Table 2). There was no correlation (r = 0.10; methodology in Supplementary Material S1) between genome size and the number of nsLTPs in the analyzed species pool (Table 2). However, angiosperms and the analyzed gymnosperm have a higher amount of nsLTPs than pteridophytes and the analyzed bryophyte (Table 2). This fact may be associated with the evolution of these basal groups, as shown by Edstam et al. [1]. Pteridophytes and bryophytes are phylogenetically closer to green algae than the other clades analyzed. This scenario is possibly responsible for the nsLTPs' reduced number in the mentioned clades since no nsLTPs were identified in green algae [1].
The present work also updated the data for the nsLTP content in cotton (G. hirsutum), potato (S. tuberosum), common liverwort (M. polymorpha), and spikemoss (S moellendorffii) from the respective updated genome versions (Ghirsutum_527_v2.1, Stuberosum_686_v6.1, Mpolymorpha_320_v3.1, and Smoellendorffii_91_v1.0 | Phytozome database). While Li et al. [6] and Li et al. [29] identified, respectively, 91 and 83 nsLTP loci for the G. hirsutum and S. tuberosum (Table 1), 218 and 105 nsLTP loci were identified in our data mining approach ( Table 2). For M. polymorpha and S. moellendorffii, 21 and 36 nsLTPs were found, respectively (Table 2), in the present study (compared to 13 and 23 nsLTPs reported by Fonseca-García et al. [34]; (Table 1)). Genome assemblies are never perfect since they are models for the actual genome. It is hard to completely rule out all potential technological or algorithmic flaws, and no single assembly can accurately capture all the variety within populations of a species. Thus, published genomes that have an active research community are continuously improved. Such modified and updated versions are potential sources for changing minor paradigms.
Group separation, obtained from the implemented neighbor-joining (NJ) approach (methodology in Supplementary Material S1), reprised the nine groups in the Edstam et al. [1] classification. The strategy of performing NJ analysis from 8CM domain sequences (as performed by Edstam et al. [1] and Xue et al. [38]) promoted better group discrimination. The NJ tree derived from the complete nsLTP sequences (8CM + upstream and downstream regions) did not present such a level of resolution (only six groups were formed. As will be seen in the 'How structural nsLTP proteomics correlates with current nsLTP classification systems?' section, nsLTP sequences show high variability in the amino acid sequence. Outside the 8CM region, the mentioned variability is accentuated, which causes greater noise in the distance analysis, resulting in an efficiency reduction in the formation of 'true' groups. Although reduced, compared to other nsLTP regions, the variability of the 8CM region was also a significant factor in the analysis using the NJ method. This is evidenced by the low bootstrap values of the first branches formed (Supplementary Figure S2) and in nsLTP NJ analysis for a manifold of species (see works by Fang et al. [30] and Vangelisti et al. [36], among others, in Table 1). Bootstrap values reflect the proportion of trees/replicates in which a recovered grouping is presented (in other words, a measure of support for that group). Despite reduced bootstrap values, the obtained tree topology was in accordance with the composition of the characterized seed sequences (methodology in Supplementary Material S1) used to perform the nsLTP classification. As will be seen in the 'How structural nsLTP proteomics correlates with current nsLTP classification systems?' section, nsLTP sequences show high variability in the amino acid sequence. Outside the 8CM region, the mentioned variability is accentuated, which causes greater noise in the distance analysis, resulting in an efficiency reduction in the formation of 'true' groups. Although reduced, compared to other nsLTP regions, the variability of the 8CM region was also a significant factor in the analysis using the NJ method. This is evidenced by the low bootstrap values of the first branches formed (Supplementary Figure S2) and in nsLTP NJ analysis for a manifold of species (see works by Fang et al. [30] and Vangelisti et al. [36], among others, in Table 1). Bootstrap values reflect the proportion of trees/replicates in which a recovered grouping is presented (in other words, a measure of support for that group). Despite reduced bootstrap values, the obtained tree topology was in accordance with the composition of the characterized seed sequences (methodology in Supplementary Material S1) used to perform the nsLTP classification.
Regarding the nsLTPs' composition in the scrutinized species, L. sativa, despite not having the highest amount of nsLTPs in its genome (Tables 2 and 3), was the species Figure 1. Neighbor-joining tree (based on 8CM nsLTP domains) of 1191 non-specific lipid transfer proteins (nsLTPs) predicted in Marchantia polymorpha, Ceratopteris richardii, Selaginella moellendorffii, Thuja plicata, Glycine max, Gossypium hirsutum, Lactuca sativa, Manihot esculenta, Mimulus guttatus, Populus trichocarpa, Sinapis alba, Solanum tuberosum, and Spinacea oleracea genomes. All amino acid sequences were aligned using ClustalX2. The obtained result was visualized with the iTOL program. Legend: the numbers (1-4) inside the circles on the edge of the tree indicate different nsLTP subgroups of a given nsLTP group; at the rectangles, the nsLTP group classification is available.
Regarding the nsLTPs' composition in the scrutinized species, L. sativa, despite not having the highest amount of nsLTPs in its genome (Tables 2 and 3), was the species which presented the greatest variety of these peptides (Table 3), with at least one member of each of the nine groups found. M. polymorpha, in turn, had the lowest variety of nsLTPs (presenting members only for the 'LTP1', 'LTP2', 'LTPD', 'LTPG', 'Unknown 1', and 'Unknown 4' groups; Table 3), a fact that is possibly associated with its nsLTPs' small genomic amount. Our results for this species, however, indicated that it presented nsLTP groups (e.g., 'LTP1', 'LTP2', 'Unknown 1', and 'Unknown 4') not yet identified in previous studies (i.e., in Edstam et al. [1] and Fonseca-García et al. [34]).  In another context, considering the different groups, 'nsLTPG' was present in the 12 analyzed plant species, while the 'Unknown 4' group was specific to four species (C. richardii, L. sativa, M. polymorpha, and S. moellendorffii; Table 3; Supplementary Figure S2

The Landscape of the nsLTP Expansion in Plant Genomes
Gene duplication is a crucial evolutionary mechanism providing new genetic material to a genome. New proteins often arise from duplicated copies generating unprecedented capabilities by diversifying protein functionality (paralogy). We analyzed and categorized such mechanisms (methodology in Supplementary Material S1) for the 12 species presented in Table 1. In addition, soybean (G. max) was also analyzed in this context. Although already studied in terms of nsLTP mining and classification [34], this species does not present information about nsLTPs' expansion mechanisms.
Singleton nsLTPs were a minority in the analyzed genomes ( Figure 2), indicating a high duplication rate of loci associated with this gene family. In the present study, two main mechanisms with different performance levels were identified as responsible for nsLTP expansion ( Figure 2): dispersed and tandem duplications. Despite different duplication mechanisms, it is clear that similar evolutionary forces acted in the nsLTPs' expansion in these different genomes.
Dispersed duplicated genes are prevalent in different plant genomes [19]. They generate paralogs that are neither near each other on chromosomes nor show conserved synteny [39]. This is in line with the random nature of the distribution of the analyzed loci in their respective anchor species.
Regarding the second most active mechanism, gene tandem duplication has been commonly found to be important for plant adaptation to rapidly changing environments. In contrast, genes expanded by non-tandem mechanisms tend to have intracellular regulatory roles [40]. Thus, the set of duplication mechanisms observed (mostly tandem and dispersed) may be associated with the heterogeneous physiological role that nsLTPs play in plants. The mentioned peptide group plays roles ranging from plant defense (for a review, see Liu et al. [8]) to liquid secretion, cuticular wax accumulation, pollen, and seed development, among other functions [20]. Dispersed duplicated genes are prevalent in different plant genomes [19]. They generate paralogs that are neither near each other on chromosomes nor show conserved synteny [39]. This is in line with the random nature of the distribution of the analyzed loci in their respective anchor species.
Regarding the second most active mechanism, gene tandem duplication has been commonly found to be important for plant adaptation to rapidly changing environments. In contrast, genes expanded by non-tandem mechanisms tend to have intracellular regulatory roles [40]. Thus, the set of duplication mechanisms observed (mostly tandem and dispersed) may be associated with the heterogeneous physiological role that nsLTPs play in plants. The mentioned peptide group plays roles ranging from plant defense (for a review, see Liu et al. [8]) to liquid secretion, cuticular wax accumulation, pollen, and seed development, among other functions [20].
Considering the genesis of the distribution pattern, the dispersed one would be more expected in polyploids than in diploids. However, this assumption was not Considering the genesis of the distribution pattern, the dispersed one would be more expected in polyploids than in diploids. However, this assumption was not confirmed when comparing the two pteridophytes since there was a significant prevalence of dispersed distribution in S. moellendorffii (diploid with 2n = 16), while in the polyploid C. richardii (2n = 78), the two categories dispersed (44) and tandem (41) had similar prevalence ( Figure 2).
In the diploid gymnosperm T. plicata (2n = 22), the species with the largest genome among those analyzed (ca. 12.5 Gb), the tandem dispersal mechanism clearly prevailed ( Figure 2), indicating a possible importance of segmental duplications in the genesis of the high number of nsLTPs in this species.
Concerning the angiosperms (Figure 2), the species with the highest number of nsLTPs (218) was cotton (G. hirsutum, 2n = 52) with a prevalence of the dispersed mechanism, indicating the importance of WGD (i.e., polyploidy) in the expansion of nsLTPs. A similar result was observed in soybean (G. max), also polyploid (2n = 40). In contrast, this trend was not observed in potato (S. tuberosum), also considered polyploid (2n = 48), where a prevalence of the tandem (50 genes) mechanism was observed, followed by the dispersed mechanism (38). Considering the above, the mechanism of distribution and expansion of nsLTPs (and other gene families) needs to be carefully analyzed for each taxon and lineage without generalizations. This is especially true considering that some polyploids undergo diploidization processes with the retention of duplicate copies of some regions and the elimination of other ones (e.g., Renny-Byfield et al. [41]).
Future actions, such as the transcriptional expression study of tandem duplicated loci, will add more information about the evolutionary impact of identified duplication events. According to Liu et al. [8], the duplicated nsLTP genes might keep certain crucial activities in the course of subsequent evolution. The identical expression patterns could be explained by the duplicated genes' remarkably similar protein architecture. In turn, duplicated genes with a considerable variation in expression may result from duplication events with considerable alteration of gene regulation.

nsLTP Lowest Common Ancestor (LCA) Analysis
Understanding the evolution, diversity, and abundance of nsLTPs within plants is not a trivial task. On this point, evaluating available genomic data using bioinformatics approaches may help the nsLTP origin and diversification studies. Here, we use LCA analysis (methodology in Supplementary Material S1) for the first time with nsLTPs, generating an nsLTP phylogenetic tree according to a taxonomic rank obtained from the NCBI Taxonomy database (Figure 3) using the Edstam et al. [1] classification. The LCA approach corresponds to an evolutionary method based on orthologous groups, part of Ukkonen's Algorithm to construct suffix trees, aiming to determine the common ancestor between two or more distinct organisms based on a taxonomic tree [42,43]. The branching pattern in a phylogenetic tree (Figure 3) reflects how species or other taxonomic groups evolved from a series of common ancestors.
The nsLTP sequences used to generate the phylogenetic tree were grouped in different taxonomic categories, including Magnoliophyta (914 matches), Lycopodiopsida (five matches), and Bryopsida (one match) for the Viridiplantae clade ( Figure 3). The LCA analysis confirmed the presence of nsLTP classes in plants, with some of them (e.g., LTP1 and LTP2) widely distributed among higher plants (Figure 3), corroborating the works by Edstam et al. [1], Salminen et al. [20], and Missaoui et al. [44]. Additionally, no green algae were observed in our LCA tree, strengthening the hypothesis of nsLTP origin in land plants about 470 million years ago [1].
The scientific literature mentions that higher plants exhibit exclusive nsLTPs (with different cysteine patterns) when compared to non-vascular plants (such as the moss Physcomitrella patens). These nsLTPs show low sequence similarity to the nsLTP genes of non-vascular plants [1]. From this presumption, one question arises: why do higher plants present exclusive classes not present in primitive ones? Most of these nsLTP classes probably result from evolution during terrestrial environment occupation, likely linked to their role in stress adaptation and defense, as proposed by Edstam et al. [1] and Liu et al. [8].
The earliest nsLTP groups, found in all land plants, seem to be members of nsLTPd and nsLTPg, presenting probably a common origin. Furthermore, the LCA tree revealed that nsLTPs from some species tend to cluster together-based on their structural features-with members of other distinct taxa (for more information, see the 'Structural proteomics' section), especially when considering early diverging angiosperms [26,34,44]. This could indicate that duplication events in the nsLTP family occurred independently in the different plant groups, even at the intraspecific level, leading to multi-functionalization of nsLTPs. As reviewed by Amador et al. [7], this great diversity and apparent redundancy of nsLTPs in angiosperm genomes, and the neofunctionalization process, contribute to paralogs' emergence and their multiplicity of functions.  The earliest nsLTP groups, found in all land plants, seem to be members of nsLTPd and nsLTPg, presenting probably a common origin. Furthermore, the LCA tree revealed that nsLTPs from some species tend to cluster together-based on their structural features-with members of other distinct taxa (for more information, see the 'Structural proteomics' section), especially when considering early diverging angiosperms [26,34,44]. This could indicate that duplication events in the nsLTP family occurred independently in the different plant groups, even at the intraspecific level, leading to multi-functionalization of nsLTPs. As reviewed by Amador et al. [7], this great diversity and apparent redundancy of nsLTPs in angiosperm genomes, and the neofunctionalization process, contribute to paralogs' emergence and their multiplicity of functions. The molecular evolution of plant nsLTPs evidenced by LCA (Figure 3) showed that nsLTP diversification was mainly based on gene duplications, a known driving evolutionary force and an important factor in increasing genome complexity. Duplication events are probably the main mechanism of nsLTP evolution, including tandem and dispersed duplications (as discussed in the 'The landscape of the nsLTP expansion in plants' topic). In addition to the results presented here, the works by Liang et al. [35] and Yang et al. [45], among others, also corroborate this statement.
The presence/occurrence of horizontal gene transfer to/from bacteria should also be addressed (Figure 3). Supplementary Figure S1 and Figure 3 point to nsLTP diversification based on multiple duplication events in Viridiplantae, possibly associated with terrestrial environment occupation by land plants, and it was directly associated with leaf cuticle formation to avoid dehydration. Possible duplication events are indicated by black dots (Figure 3 and Supplementary Figure S1), pointing out shared nsLTP subgroups among related taxa. These duplication events are thought to have affected the nsLTP gene family, allowing both conservation and divergence of gene function [27,44]. Our LCA analysis, using plant nsLTPs as seeds in alignment against RefSeq-NCBI, also identified a few proteins from bacteria (Gammaproteobacteria (28 matches) and Bacilli (one match)) as well as Metazoan (Mammalia (one match); Figure 3) that contained the nsLTP motif and domain. The mammalian protein sequence identified (accession number XP_036992713, Supplementary Figure S1) regards the Jamaican fruit-eating bat (Artibeus jamaicensis), showing high similarity (87.78%) to a plant nsLTP1 amino acid sequence. Since no other nsLTP genes were identified for Metazoans, we believe this protein sequence may be a plant contamination of the genomic Metazoan sequencing data. We also obtained hits for two different species of bacteria [Acinetobacter baumannii (WP_143045904; Supplementary Figure S1) and Paenibacillus sp. (WP_083442884; Supplementary Figure S1)], presenting similarity to true non-plant nsLTP proteins and conserved domains related to nsLTPs from plants.
Here, we consider this discovery as a possible case of HGT (horizontal gene transfer) since HGT is a well-known and pervasive evolutionary mechanism in prokaryotes for different gene families that could improve the adaptive ability of prokaryotes in changing environments [29,46]. Anyway, contamination cannot be discarded in all situations. Therefore, the presence of such sequences homologous to plant nsLTPs demands further investigation and confirmation.
Our LCA results for nsLTPs do not indicate that the complete nsLTP domain is present in proteins from species outside the plant kingdom. It is clear that to elucidate all questions regarding this kind of 'evolutionary history noise' of the nsLTP gene family depends on a broader range of sequenced species deposited on databases that will contribute to future studies to elucidate nsLTP functions and relevant mechanisms associated with their evolutionary origin and diversification, even considering HGT events.
Besides studying the nsLTP structures, we aimed to analyze protein modeling data to test whether resemblances/differences in the three-dimensional conformation and physicochemical parameters (physicochemical variables in Supplementary Material S1) result in clusters similar to those obtained with the parameters adopted by Boutrot et al. [17] or Edstam et al. [1] in their classification systems.
In general, for both analyzed works, it was observed that the three-dimensional structures of the different nsLTP types/groups were similar ( Figure 4A; Supplementary Figure S3A). Exceptions, however, were found for the theoretical models of type '3' nsLTPs (Boutrot et al. [17] classification; Figure 4A) and the nsLTPc group (Edstam et al. [1] classification), which presented a different structural configuration, since that they have only two disulfide bonds ('Cys1-Cys5', 'Cys2-Cys3'). These structures are mainly composed of α-helices, common in nsLTPs, which can vary from three to four and are connected by small loops and an unstructured C-terminal tail ( Figure 4A; Supplementary Figure S3A).  Table S1; Supplementary Table S2). Such heterogeneity is commonly observed in nsLTPs (and other cysteine-rich peptides). It may be related to the fact that they are involved in a multitude of plant molecular mechanisms, including defense against pathogens [50].
Given the diversity in the nsLTPs' amino acids' linear composition, there is a need to understand the peptide 'sequence-structure-classification' relationship. Such information will enable the understanding of how the obtained three-dimensional structures and their physicochemical characteristics relate to the classification systems [1,17] scrutinized here. In general, such a cavity is a conserved feature in nsLTPs [7]. It is noteworthy that the association between hydrophobic molecules and hydrophobic pockets occurs with different affinities, depending on the molecule size and structure. However, this association does not cause any change in the protein's three-dimensional conformation [49].
Despite the structural conservation observed in the generated models ( Figure 4A; Supplementary Figure S3A), the nsLTPs analyzed by Boutrot et al. [17] and Edstam et al. [1] presented a high diversity of amino acid residues among the conserved cysteine sites (Supplementary Tables S1 and S2). Such heterogeneity is commonly observed in nsLTPs (and other cysteine-rich peptides). It may be related to the fact that they are involved in a multitude of plant molecular mechanisms, including defense against pathogens [50].
Given the diversity in the nsLTPs' amino acids' linear composition, there is a need to understand the peptide 'sequence-structure-classification' relationship. Such information will enable the understanding of how the obtained three-dimensional structures and their physicochemical characteristics relate to the classification systems [1,17] scrutinized here.
Considering the above, when analyzing the RMSD distance, i.e., the measure of the average distance between the atoms (usually the backbone atoms) of superimposed protein models (RMSD values in Figure 4A, Supplementary Figure S3A, and Supplementary Tables S3 and S4), it was possible to observe that most representative present similar values. The exceptions, however, were type '3' nsLTPs (Boutrot et al. [17] classification) and the 'nsLTPc' group (Edstam et al. [1] classification). As mentioned earlier, these nsLTP classes presented a different three-dimensional structure than their peers. The almost generalized similarity was also found in the structural PCA performed with the models. The results presented in Figure 5A,B are inconsistent with the Boutrot et al. [17] and Edstam et al. [1] classification systems since they showed a generalized similar three-dimensional conformation. The explanation for the clustering pattern obtained in Figure 5A,B is related to the conservation of the structure to the detriment of the amino acids' diversity among the cysteine residues anchored in the different nsLTP types/groups. The three-dimensional structural conservation makes it challenging to differentiate the nsLTPs into distinct and isolated groups. However, it is worth noting that there is some incipient group formation, such as type '3', type '5' (except for 5.1, which has five amino acid residues less in the C-terminal tail compared to the others of type '5'), and type '6' ( Figure 5A). The mentioned types slightly stand out from the other structures in the principal components (PCs) area ( Figure 5A). In the structural PCA considering the sequences classified by Edstam et al. [1], however, such a small separation does not occur ( Figure 5B).
In another context, when analyzing the physicochemical properties with PCA ( Figure 6A,B), the absence of individualized groups was also observed. As mentioned before, the nsLTP sequences are very variable in terms of linear amino acid composition. Such diversity also results in variant physicochemical properties. In the sequence similarity index (Supplementary Table S1), groups '4', '5', '8', '9', and 'Y' (Boutrot et al. [17] classification) presented higher indices of intragroup similarity. Considering the classification by Edstam et al. [1], there was no group formation based on the similarity index (Supplementary Table S3).
Despite the almost general conservation of tertiary structure ( Figure 4A; Supplementary Figure S3A), the surface charge qualitatively varied (positive or negative, but with a preponderance of positive charges in the studied sets) among the nsLTP types/groups (Figure 7 and Supplementary Figure S4). In addition, the number of hydrophobic, acidic, basic, and neutral amino acid residues also was diverse. Such observations may be associated with the different nsLTP molecular roles.
Considering both PCA analyses (based on (i) physicochemical features of aa residues or (ii) the peptide tertiary structure), no correlation could be made with the classification systems proposed by Boutrot et al. [17] and Edstam et al. [1]. Even intragroup, there was significant variation in the physicochemical characteristics of the amino acid residues of nsLTPs. In turn, considerable conservation of the tertiary structure was observed (except for type 3, classification by Boutrot et al. [17], or type C, classification by Edstam et al. [1]). Even being not yet ideal, these classifications are less artificial than the initially proposed grouping [11,12] into 'nsLTP1' and 'nsLTP2' subfamilies-based mainly on molecular mass and sequence identity.
In summary, current classifications target mainly the linear structures of nsLTPs, without major implications considering their specific functions. A future association of their roles, in terms of functional omics, with the physical-chemical and tertiary characteristics uncovered here may help elaborate a less artificial classification system reflecting the function of nsLTPs. Antibiotics 2023, 12, x FOR PEER REVIEW 18 of 28  before, the nsLTP sequences are very variable in terms of linear amino acid composition. Such diversity also results in variant physicochemical properties. In the sequence similarity index (Supplementary Table S1), groups '4', '5', '8', '9', and 'Y' (Boutrot et al. [17] classification) presented higher indices of intragroup similarity. Considering the classification by Edstam et al. [1], there was no group formation based on the similarity index (Supplementary Table S3).  Despite the almost general conservation of tertiary structure ( Figure 4A; Supplementary Figure S3A), the surface charge qualitatively varied (positive or negative, but with a preponderance of positive charges in the studied sets) among the nsLTP types/groups (Figure 7 and Supplementary Figure S4). In addition, the number of hydrophobic, acidic, basic, and neutral amino acid residues also was diverse. Such observations may be associated with the different nsLTP molecular roles. Considering both PCA analyses (based on (i) physicochemical features of aa residues or (ii) the peptide tertiary structure), no correlation could be made with the classification systems proposed by Boutrot et al. [17] and Edstam et al. [1]. Even intragroup, there was significant variation in the physicochemical characteristics of the amino acid residues of nsLTPs. In turn, considerable conservation of the tertiary structure was observed (except for type 3, classification by Boutrot et al. [17], or type C, classification by Edstam et al. [1]). Even being not yet ideal, these classifications are less artificial than the initially proposed grouping [11,12] into 'nsLTP1′ and 'nsLTP2′ subfamilies-based mainly on molecular mass and sequence identity.
In summary, current classifications target mainly the linear structures of nsLTPs, without major implications considering their specific functions. A future association of their roles, in terms of functional omics, with the physical-chemical and tertiary characteristics uncovered here may help elaborate a less artificial classification system reflecting the function of nsLTPs.

Expression in Soybean Plants: From Baseline to Differential Expression
Baseline gene expression analysis in 14 tissues/developmental stages of soybean (methodology in Supplementary Material S1) revealed that GmnsTLPs (Glycine max nsLTPs) were expressed on all analyzed substrates (Figure 8). Eighty-three (~70% ;  Supplementary Table S5) of the 120 GmnsLTP loci (Supplementary Table S6) were expressed in at least one of the analyzed substrates. About 15% (13 GmnsLTP loci) were expressed in only one or two different tissues/developmental stages, while

nsLTP Transcriptional Expression: Soybean as a Case Study Expression in Soybean Plants: From Baseline to Differential Expression
Baseline gene expression analysis in 14 tissues/developmental stages of soybean (methodology in Supplementary Material S1) revealed that GmnsTLPs (Glycine max nsLTPs) were expressed on all analyzed substrates (Figure 8). Eighty-three (~70% ; Supplementary Table S5) of the 120 GmnsLTP loci (Supplementary Table S6) were expressed in at least one of the analyzed substrates. About 15% (13 GmnsLTP loci) were expressed in only one or two different tissues/developmental stages, while approximately 44% (37 GmnsLTP loci) were expressed in at least 10 different tissues or developmental stages. Eighteen percent (15 GmnsLTP loci) were expressed on all analyzed treatments. GmnsLTPs were expressed mainly in flower (59 GmnsLTP loci), followed by 'one centimeter pod' (57) and root (57) (Figure 8). The lowest representation of expressed GmnsLTP loci occurred in 'seed 42DAF' (Figure 8).
Regarding the group with the highest number of expressed loci, GmnsLTPg stood out in 13 of the 14 tissues/stages studied (Figure 8), followed by the GmLTP1 group in seven, and by the GmnsLTPd group, in six tissues/stages. Interestingly, none of the analyzed tissues/stages presented expression of GmnsLTPc or GmnsLTPe representatives. The explanation for their absence remains an open question. In another context, there was no direct relationship between tissues/developmental stages and expression level of the different tested loci; it was observed, however, that there were loci with high levels of expression (RPKM counts > 100) in several tissues (Supplementary Table S5). These may become future targets for gene manipulation, with consequent functional characterization of GmnsLTPs.
Our analysis showed that nsLTP1, nsLTPg, nsLTPd, and nsLTP2 groups are expressed in all studied situations. At the loci level, however, promiscuous expression of certain GmnsLTPs (Supplementary Table S5) was found, while others were explicitly expressed only in a given tissue/stage of development. Regarding the first-mentioned expression group, such ubiquity in soybean tissues may be associated with the defense role played by these proteins. Some nsLTPs have been classified as belonging to the pathogen-related (PR) protein group, specifically, the PR14 family [51]. A broad line of evidence indicates an nsLTP role in plant defense (for review, see Liu et al. [8]). Notably, the highest amount of expressed GmnsLTP loci was found in floral, root, and young fruit ('one centimeter pod') tissues. Flowers have been implicated as hubs of disease transmission [52], while root tissue is a site commonly inhabited by microorganisms (whether pathogens [53] or symbionts [54]). In turn, soybean pods are also substrates affected by important diseases (as in soybean pod blight [55]).
transmission [52], while root tissue is a site commonly inhabited by microorganisms (whether pathogens [53] or symbionts [54]). In turn, soybean pods are also substrates affected by important diseases (as in soybean pod blight [55]).
Regarding the GmnsLTP loci with specific expression, in addition to being able to perform defense functions, such proteins may be associated with less generalist functions, such as cuticular wax accumulation, liquid secretion, pollen and seed development, and seed germination [20]. As mentioned, nsLTPs are considered critical proteins for the plant's survival and land colonization. Regarding the GmnsLTP loci with specific expression, in addition to being able to perform defense functions, such proteins may be associated with less generalist functions, such as cuticular wax accumulation, liquid secretion, pollen and seed development, and seed germination [20]. As mentioned, nsLTPs are considered critical proteins for the plant's survival and land colonization.
Regarding the differential expression approach, four assays were analyzed (methodology in Supplementary Material S1) covering 20 different treatments (Figure 9; Supplementary Table S7). The expression of 80 distinct GmnsLTP loci were detected (Figure 9; Supplementary Table S7).
nsLTPs are often associated with defense against pathogens (e.g., Xue et al. [38]). Interestingly, there was no detection of differential expression (−1 > Log 2 FC > 1 and FDR < 0.05) after inoculation with Fusarium oxysporum (pathogenic and non-pathogenic isolates) in soybean root tissue (up to 96 h post-inoculation). No differential expression was observed in response to inoculation with MAMPs (flg22 and chitin) in two parental soybean lines and two progeny lines (from 1 to 8; Figure 9). Detailed analysis of these assays showed that some GmnsLTPs exhibited transcriptional modulation (Log 2 FC) with associated FDR < 0.05. However, the mentioned Log 2 FCs did not show values above the cut-off (−1 > Log 2 FC > 1) limits. Thus, a possible reason for the absence of up-regulation for assays involving biotic conditions in soybean may be associated with a methodological bias of the differential expression. The mining of transcripts encoding nsLTP proteins in assays under abiotic conditions, however, revealed the up-regulation of several GmnsTLPs, namely: 3, for the treatment 'dehydration stress' at '12 h' vs. 'control' (number 10; Figure 9); 15 for the treatment 'salt stress' at '12 h' vs. 'control' (number 13; Figure 9); 8 for the treatment 'salt stress' at '6 h' vs. 'control' (number 14; Figure 9); and 2, for the treatment 'water deficit, 30% soil field capacity' vs. 'control, 70% soil field capacity' at 'Zeitgeber time 0' (number 15; Figure 9). The mentioned amount was associated with 23 non-redundant loci (indicated by red spheres in Figure 9).
The studied data demonstrated an important association between GmnsLTPs and salt stress response, highlighting the GmnsTLPd and GmnsTLPg groups. These presented, respectively, eight and five up-regulated loci for the treatment of 'salt stress' at '12 h' vs. 'control' (number 13; Figure 9), as well as four and three up-regulated loci for the treatment 'salt stress' at '6 h' vs. 'control' (number 14; Figure 9). The nsTLPs' participation in the response to saline conditions has been reported in the scientific literature. Akhiyarova et al. [56] assumed that these proteins could participate in salt-induced pea root suberization or transport of phloem lipid molecules. In turn, Xu et al. [57] observed that nsLTP enhances salt and drought tolerance in N. tabacum. Such information adds importance to this peptide group, reinforcing its multifunctional role in plants.

Conclusions, Perspectives, and Open Questions
There is growing interest in nsLTPs due to their multifunctional roles in plant molecular physiology, which has resulted in some comprehensive reviews regarding their structure and function. Due to this, the present work has proposed a different approach. In addition, to recover published information about the omics of the mentioned gene family, we focused on investigating topics such as evolution and structure using strategies that are still little applied for nsLTPs (as is the case with LCA, expansion mechanisms, and high-throughput protein modeling).
Our investigation corroborates previous works confirming the absence of the nsLTPs and their protein domain in species outside the plant kingdom. Sequences from other groups in databanks with similarity to nsLTPs were considered contamination artifacts (as in the case of the bat Artibeus jamaicensis). Concerning the sequences found in bacteria (A. baumannii and Paenibacillus sp.), they can also be contaminants, or may result from horizontal gene transfer, which needs experimental confirmation.
Tandem and dispersed duplication events figure as the main nsLTP evolution mechanisms in plants. Interestingly the number of nsLTP representatives may vary when comparing plants with similar complexity, but they do not ever correlate with the genome size or the ploidy level.
The gene expression analysis in soybean revealed a complex transcriptional regulation of these peptides, which can be transcribed in a specific or a promiscuous way (considering tissues or developmental stages). Additionally, we emphasized nsLTPs' up-regulation in different tissues under abiotic stresses (such as high salt and drought).
An insight regarding the 'sequence-structure-classification' relationship, in turn, revealed a high conservation of the nsLTPs' three-dimensional structure, in contrast with a significant variation in its physicochemical characteristics (even intragroup). It is clear that the initial classification system (dividing them into nsLTP1 and 2) is artificial. Physicochemical variables should be associated with phenetics and functional features to generate a functional classification system for this peptide group.
Several challenges need to be overcome before we can completely comprehend the roles that nsLTPs play in plants, despite all the efforts made to uncover the diverse spectrum of functions of these molecules. We still cannot explain why nsLTPs exert diverse activities in plants despite sharing the considerable number of structural similarities uncovered here. Research into nsLTP function and evolution shall be accelerated by systems biology studies, with further downstream biochemical analysis and transgenic (or gene edition) research, which will also clarify the precise functions in planta of these peptides. The informational body available here enriches our comprehension of this heterogenic, plantspecific peptide family.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/antibiotics12050939/s1, Figure S1: Radial presentation of lowest common ancestor (LCA) dendrogram, representing relationships from 950 different RefSeq-NCBI nsLTP sequences. Black dots represent points of nsLTP gene duplication. Legend: n(number): LCA levels; * LCA levels not found in the obtained tree; black dots indicate possible duplication events; Figure S2: Neighbor-joining (NJ) tree (based in 8CM nsLTP domains), showing bootstrap values and sequences identification of 1191 non-specific lipid transfer proteins (nsLTPs) predicted in Marchantia polymorpha, Ceratopteris richardii, Selaginella moellendorffii, Thuja plicata, Glycine max, Gossypium hirsutum, Lactuca sativa, Manihot esculenta, Mimulus guttatus, Populus trichocarpa, Sinapis alba, Solanum tuberosum, and Spinacea oleracea genomes. All amino acid sequences were aligned using ClustalX2, and the NJ tree was constructed using the bootstrap neighbor-joining method (1000 replicates). The obtained result was visualized with the iTOL program; Figure S3: Threedimensional reference structures for different types of nLTPs classified by Edstam et al. [1] and visualized in Pymol. A. Theoretical models' alignment of different nsLTP types, solved by Alpha-Fold2, presenting their respective RMSD (root-mean-square deviation of atomic positions) values. For each presented structure, black arrows indicate the location of the N-and C-terminal sites and α-helices. B. Hydrophobic clusters, analyzed by ProteinTools, relative to the theoretical threedimensional reference models of the different nsLTP groups classified by Edstam et al. [1]. The colored spheres represent different clusters formed from interactions between nearby amino acid residues; Figure S4: Adaptive Poisson-Boltzmann solver (APBS)-generated electrostatic surface potential plotted on the solvent accessible surface for nsLTP theoretical three-dimensional models (sequences from Edstam et al. [1]); Supplementary Material S1: Methodology used for nsLTP manuscripts data mining and carrying out the original analyses of the present work; Supplementary Material S2: 8CM domain sequences of the different used species (12 different species + Medicago truncatula, Arabidopsis thaliana, and Oryza sativa categorized nsLTP seed-sequences) to obtain the neighbor-joining tree; Supplementary Material S3: nsLTP protein sequences from Boutrot et al. [17]; Supplementary Material S4: nsLTP protein sequences from Edstam et al. [1]; Table S1: Sequence similarity of nsLTP sequences from Boutrot et al. [17]. Highlighted in red: similarity below 50% between the correlated sequences. Highlighted in green: similarity above 50% between the correlated sequences; Table S2: Sequence similarity of nsLTP sequences from Edstam et al. [1]. Highlighted in red: similarity below 50% between the correlated sequences. Highlighted in green: similarity above 50% between the correlated sequences; Table S3: Root-mean-square deviation (RMSD) between nsLTP sequences from Boutrot et al. [17]. In red: structures presenting greater distances compared to the others; Table S4: Root-mean-square deviation (RMSD) between nsLTP sequences from Edstam et al. [1]. In red: Structures presenting greater distances from the others; Table S5: nsLTP baseline expression analysis in different soybean tissues, presenting the respective expressed loci ID (for Glyma 1.1 and Glyma 2.0 assemblies), nsLTP name, nsLTP group (Edstam et al. [1] classification), and RPKM-normalized log2-transformed counts; Table S6: Information about soybean nsLTPs, presenting the loci ID and classification by Edstam et al. [1]; Table S7: