Allele mining, amplicon sequencing and computational prediction of Solanum melongena L. FT/TFL1 gene homologs uncovers putative variants associated to seed dormancy and germination

The FT/TFL1 gene homolog family plays a crucial role in the regulation of floral induction, seed dormancy and germination in angiosperms. Despite its importance, the FT/TFL1 gene homologs in eggplant (Solanum melongena L.) have not been characterized to date. In this study, we performed a genome-wide identification of FT/TFL1 genes in eggplant using in silico genome mining. The presence of these genes was validated in four economically important eggplant cultivars (Surya, EP-47 Annamalai, Pant Samrat and Arka Nidhi) through Pacbio RSII amplicon sequencing. Our results revealed the presence of 12 FT/TFL1 gene homologs in eggplant, with evidence of diversification among FT-like genes suggesting their possible adaptations towards various environmental stimuli. The amplicon sequencing also revealed the presence of two alleles for certain genes (SmCEN-1, SmCEN-2, SmMFT-1 and SmMFT-2) of which SmMFT-2 was associated with seed dormancy and germination. This association was further supported by the observation that seed dormancy is rarely reported in domesticated eggplant cultivars, but is commonly observed in wild species. A survey of the genetic regions in domesticated cultivars and a related wild species, S. incanum, showed that the alternative allele of S. incanum was present in some members of the Pant Samrat cultivar, but was absent in most other cultivars. This difference could contribute to the differences in seed traits between wild and domesticated eggplants.


Introduction
Flowering Locus T (FT)/Terminal Flower1 (TFL1) gene homologs are important regulators of flowering time, a fundamental process in angiosperms which involves a morphologically complex shift from vegetative to reproductive development in plants [1]. This process has a direct impact on crop yield [2] and has been the focus of much research, particularly on genetic modifications of flowering time and flowering responses for improved crop productivity [3]. Gene duplications in the FT/TFL1 genes have resulted in multiple paralogs with diversified functions rarely reported. Plants utilize them as an intrinsic strategy to refine floral responses towards various environmental and endogenous signals [4]. Acquiring an in-depth understanding of the FT/TFL1 gene family is crucial in expediting the development of new cultivars that possess desirable characteristics, such as improved flowering time and enhanced productivity [4,5]. The ideal timing of flowering is crucial especially in the midst of seasonal progressions to ensure that seeds are set under suitable conditions and to maximize their chances of survival [6]. Multiple endogenous and exogenous signals are integrated into various pathways to regulate flowering time. For example, in the model plant Arabidopsis thaliana, the floral initiation is modulated via a number of pathways which include photoperiod, ambient temperature, age, vernalization, autonomous and hormonal pathways [7]. These routes converge in the key integrators like mobile florigen Flowering Locus T (FT) [8].
FT is a member of the Phosphatidylethanolamine Binding Protein (PEBP) gene superfamily which is highly conserved across bacteria, yeast, plants to mammals with diversified functions. For an instance, PEBP proteins in animals participate in the controlling of cell growth and differentiation. In plants, they act as key players in the floral transition along with other developmental processes. PEBP gene families are basically grouped into three main clades in angiosperms: Flowering Locus T (FT), Terminal Flower 1 (TFL1) and Mother of FT and TFL1 (MFT). Based on the discovery of MFT-like genes in both basal and land plants, it is postulated that they are the evolutionary ancestor to both FT-like and TFL1-like genes as these gene clusters are only observed in gymnosperms and angiosperms [9]. In Arabidopsis, six members representing these three clades have been identified where two genes, FT and Twin Sister of FT (TSF) being FT-like genes, TFL1, Arabidopsis Centroradialis Homologue (ATC) and Brother of FT and TFL1 (BFT) being TFL-like genes, while, MFT being placed under the MFT-like clade [10]. Although these genes share high sequence similarities, they diverged enough to play antagonistic roles as either floral promoters or repressors. FT and TFL1 proteins are small and mobile and are involved in transcriptional regulation but do not possess DNA binding domain [11]. FT interacts with bZIP transcription factor FD through 14-3-3 proteins and thus, promotes floral initiation by activating floral meristem identity genes as in Apetala 1 (Ap1) and Suppressor of Overexpression Of Constans 1 (SOC1) in the shoot apical meristem (SAM) [12].
On the other hand, the FT paralog, TSF, promotes flowering redundantly with FT but shows distinct floral activation under short-day conditions [13]. MFT also acts redundantly with FT in flowering time regulation where overexpression of the gene results in slightly early flowering while loss-of-function mutation was aphenotypic [14]. The characterization of MFT homologs in several plants exhibited different roles in flowering time regulation. For an instance, MFT homologs reported to have no effect in the flowering transition in species such as Populus nigra [15], Glycine max [16], Citrus latifolia [17] and Picea abies [18]. Furthermore, MFT in Dendrobium nobile [19] and Hevea brasiliensis [20] delayed flowering time. MFT homologs have critical role playing in seed dormancy and germination. In Arabidopsis, MFT negatively regulates germination under far-red light conditions while strongly promotes seed dormancy [21]. Similarly, in Triticum aestivum MFT functions as negative regulator of seed germination and positive regulator of dormancy [22]. Converse to FT-like genes, in Arabidopsis, TFL1 maintains the indeterminate plant architecture and also induces delay in flowering transition. ATC, the TFL1 paralog shows functional redundancy with TFL1 and acts as a floral inhibitor in short-day conditions [23]. Furthermore, BFT is suggested to mimic TFL-like activity and functions redundantly with TFL1 in regard to inflorescence meristem development and inhibits floral transition under high salinity environment [24].
Besides manoeuvring flowering processes, FT/TFL1 gene families participate in various indispensable crop developmental events. Recent reports show that FT-like proteins have been involved in tuberization in potato [25], cessation of meristem growth in tomato [26], stomatal control in Arabidopsis [27], bulb formation in onion [28], plant architecture in maize [29], among others. The FT paralogs in various species reflected diversified responses of each paralog within respective species to different environmental and endogenous cues. For an example, in rice, Heading Date 3A (HD3A), an FT ortholog triggers flowering under short-day (SD) conditions. Meanwhile, Rice Flowering Locus T 1 (RFT1) exerts function as a floral promoter under LD as well SD conditions [10]. In order to expand explorations of FT/TFL1 genes, genetic variations in these gene families have been proven to accelerate innovations in the traits governed by these genes. In tomato, combinations of allelic variations in FT and TFL1 genes have been exploited to optimize flowering signals and thus, to increase the crop productivity [2]. Similar approach can be conducted on eggplant. However, eggplant FT/TFL1 gene homologs are not known and have not been characterized.
Cultivated eggplant ranks as the third most important crop species in Solanaceae, following tomato and potato [30]. Eggplant has a global production of around 58.6 million tons in 2021 [31]. Eggplant supplements various nutrients into human diet such as fibers, proteins, vitamins, minerals, phenylpropanoid compounds, antioxidants and so on [32]. Eggplant is particularly a photoperiod-insensitive plant [33]. Generation of advanced germplasms improved yield is one of the major breeding objectives in eggplant [34]. With this goal in mind, we have performed an extensive in silico mining of the genes from multiple eggplant genomes and have also extended the search for allelic variations in four commercially important cultivars using PacBio's long reads amplicon sequencing approach. Here, we have characterised the FT/TFL1 gene homologs in eggplant and provide new insights into their functions and potential applications in eggplant breeding.

In silico mining of FT/TFL1 gene homologs from eggplant genome assemblies
To identify homologs of the FT/TFL1 gene, we conducted a BLAST survey using FT/TFL1 coding sequences from various plant species as queries against three publicly available eggplant genome assemblies. The collections of coding sequences from the 'Nakate-Shinkuro' cultivar [35], eggplant line '67/3' [36], and cultivar HQ-1315 [37] were used and referred to as Sme_r2.5.1, S. melongena-67/3, and S. melongena-HQ, respectively. A consensus sequence with 100% identity was generated by comparing the gene sets extracted from the three genomes. The resultant nucleotide sequences were converted into protein sequences using the Fgenesh gene prediction tool (http://www.softberry.com/) and further annotated using a BLASTp analysis. The gene structure predictions from Sme_r2.5.1 and S. melongena-HQ were utilized, and the coding sequences were compared with the corresponding genomic sequences from the parental scaffolds of the genome assemblies to validate exon/intron boundaries. This process was also complemented by manual curation.

Sequence alignment and phylogenetic analysis
The amino acid sequences of FT/TFL1 gene sequences from various plant species were downloaded from NCBI non-redundant database (https://www.ncbi.nlm.nih.gov). Multiple sequence alignment was carried out with ClustalW using default parameters. A neighbourjoining phylogenetic tree was constructed with Molecular Evolutionary Genetics Analysis software version 10.2.6 [38], using the Poisson model with gamma-distributed rates. The nodal reliability in the phylogenetic tree was evaluated by 10,000 bootstrap replicates.

Functional domain and promoter analysis
To investigate the evolutionary relationships among FT homologs from different plant species, an alignment of the amino acid sequences was performed with a focus on conserved regions at exon II (position 85) and segment B of exon IV (positions 128-141). We included sequences from eggplant, Arabidopsis, onion, sugar beet, longan, soybean, sunflower, tobacco, sugarcane, Norway spruce and tomato. Variations in the critical motifs of eggplant FT homologous regions were analysed via sequence comparisons. In addition, the upstream regions (~8 kb) of the start codon (referring to translational initiation site) which could potentially cover the promoter regions of the FT paralogs were analysed for the presence of any transposon fragments via NCBI BLASTn survey. The protein sequences for the FT/TFL1 gene family were sourced from The Arabidopsis Information Resource (TAIR), GenBank and Phytozome as described previously [1]. The sequences were manually verified to ensure accuracy.

Plant materials
Eggplant cultivars namely, Surya, EP-47 Annamalai, Pant Samrat and Arka Nidhi were procured from the World Vegetable Center (AVRDC). The corresponding AVRDC accessions for the cultivars were VI045276, VI047336, VI045550 and VI045274, respectively. The freshly obtained seeds were cultivated in the glasshouse at Biotechnology Research Institute, Universiti Malaysia Sabah to compare variations among FT/TFL1 homologs among the cultivars. Three biological replicates representing each cultivar were used for the downstream applications.

Amplicon generation, SMRT library preparation, and PacBio sequencing
The genomic DNA of eggplant cultivars was extracted from leaf samples using a modified CTAB-based method [39]. Six regions, SmTFL1, SmCEN-1, SmCEN-2, SmCEN-4, SmMFT-1, and SmMFT-2, were amplified in three biological replicates for each cultivar. An asymmetric barcode system (i.e. different combination of barcodes attached to both forward and reverse primers) was utilized to assign unique barcode combinations to the homologs in each plant. The barcodes were introduced to the amplicons through a two-step PCR protocol according to the Barcoded Universal Primer workflow (https://www.pacb.com). PCR1 consists of primers which were tagged with universal and gene-specific sequences and used with genomic DNA template. The second step, PCR2, introduces barcodes to the amplicons by performing PCR on the amplicon template generated in PCR1. A list of the primers used is available in S1-S3 Tables. Note that this multiplexed amplicon sequencing also includes homologs extracted from mutant populations of the aforementioned cultivars.
The two-step PCR was conducted using Kapa HiFi HotStart ReadyMix PCR kit. Reactions consisted of 1x reaction buffer (Kapa HiFi HotStart ReadyMix), 0.3 μM of forward primers, 0.3 μM of reverse primer, approximately 50-100 ng of genomic DNA in a 25 μl total volume. Cycle parameters were~1 min/1 kb gene at 95˚C, followed by 30 cycles of 20 seconds at 98˚C and 15 seconds at 60˚C, and a 1 min/1 kb gene at 72˚C followed by a final extension of 1 min/ 1 kb gene at 72˚C. Barcodes were attached to the amplicons in the second round of PCR with identical conditions except that 10-20 ng of DNA template (amplicons from the 1st round of PCR) was used.
All the amplicons were pooled in equimolar amounts and the pooled sample was purified with 1.0x volume of AMPure PB Beads (Beckman-Coulter Woerden, the Netherlands) before eluting in 37 μl of elution buffer. SMRTbell library was constructed from the pooled amplicons with a starting amount of 1.26 μg of the sample pool, following the standard procedures for SMRTbell adapter ligation. Sequencing of the libraries was conducted with standard procedures using P6v2C4 chemistry (Pacific Biosciences, California, USA) with six hours of movie time.

Pacbio sequence data processsing
The sequencing data files in the format of.h5 files were converted into.bam files by using the bax2bam program (version 0.0.9). The demultiplexing of the.bam files was performed using the pblima program (version 1.11.0). Finally, the phased amplicon sequences were obtained through the long amplicon analysis protocol conducted in pblaa program (version 2.4.2). The program generated subread coverage for each allele, and the amplicon coverages were manually calculated as the total number of amplicons within each sample that had been sequenced. The corresponding subread identities for each allelic sequence were retrieved from one of the output files of pblaa, and the number of unique ZMWs in the pool of subreads was counted to generate amplicon coverages. The bioinformatic analysis programs were installed through Miniconda 3 (https://conda.io/miniconda.html).
Each set of gene homologs belonging to the four different cultivars were manually transferred to MEGA v10.2.6 [38] and aligned using ClustalW program to screen for nucleotide variations in the alleles. The detected variants were further characterized using the Sorting Intolerant from Tolerant (SIFT) program with the UniProt-SwissProt + TrEMBL 2010_09 database under default parameters. The MFT-like genes were compared with the transcripts of W-4 (S. incanum L.) and the Ramnagar Giant cultivar [40]. The comparison was made using the BLASTn program with the MFT-like genes mined from their respective RNASeq data. The RNASeq data used for this comparison were downloaded from the National Centre for Biotechnology Information (NCBI; https://www.ncbi.nlm.nih.gov) with primary accession numbers GAYR00000000 and GAYS00000000.

Identification of FT/TFL1 gene homologs in S. melongena
The analysis of FT/TFL1 gene homologs on three different publicly available genome sequences, SME_r2.5.1, S. melongena-67/3 and S. melongena-HQ, resulted in the discovery of several FT-like, TFL1-like and MFT-like gene sequences. An equal number of TFL1-like and MFT-like genes were obtained from the predicted coding sequences (CDSs) of the genome assemblies, totalling two MFT-like and five TFL1-like genes. However, the number of FT-like genes varied among the three eggplant genomes, with two, four and five gene sequences found in SME_r2.5.1, S. melongena-67/3 and S. melongena-HQ, respectively, as summarised in Table 1.
The comparison of the coding sequences of each gene obtained from the three genome assemblies revealed that a minimum of two sequences with 100% similarity were present for each gene, providing higher confidence level to the sequences ( Table 2). SmFT-5 was excluded from subsequent analyses as only partial CDS fragments were obtained from the mining process. The application of the gene prediction tool FGENESH on the corresponding genomic sequence produced similar results, indicating that further investigation is necessary to properly identify the gene structure of its homolog.
The gene structure of all the FT/TFL1 gene homologs (except for SmFT-5) were consistent with the typical structures reported for this gene family i.e. four exons and three introns placed at conserved positions as seen in Arabidopsis [23]. However, the lengths of the introns were variable, as depicted in Fig 1. Exons I and IV had variations in lengths from 192 to 216 bp and from 209 to 233 bp, respectively. However, exon IV of SmFT-4 was an exception, with an unusual length of 110 bp. In contrast, exon II and exon III remained conserved in length at 62 bp and 41 bp, respectively, across all analysed genes [23].
Further analysis of SmFT-4 (which had an unusual shortened length), revealed the presence of a premature stop codon, due to a single base pair mutation. From the amino acid sequence alignment (Fig 2), the codon following the stop codon was expected to be tryptophan (W), encoded by the TGG codon, but it was changed to a stop codon (TGA) due to a substitution of G to A. The remaining nucleotide sequences from the point of the stop codon encode a putative full-length coding sequence. The coding sequences of this homolog mined from the both S. melongena-HQ and S. melongena-67/3 were found to be identical.
In contrast, SmMFT-2 had only one residue before a stop codon in segment D, as shown in

Phylogenetic analysis of S. melongena FT/TFL1 gene homologs
A neighbour-joining phylogenetic tree was constructed to analyse the phylogenetic relationships between the homologous FT/TFL1 gene of S. melongena (except for SmFT-5) and other angiosperms. The analysis revealed the clustering of three major subfamilies: SmFT-1, SmFT-2, SmFT-3 and SmFT-4 belong to the FT-like subfamily; SmTFL1, SmCEN-1, SmCEN-2, SmCEN-3 and SmCEN-4 belong to the TFL1-like subfamily; and SmMFT-1 and SmMFT-2 belong to the MFT-like subfamily (Fig 3). The results showed that S. melongena FT/TFL1 proteins have closer relationships with those from the same Solanaceae family such as Solanum lycopersicum and Nicotiana tabacum.
The putative orthologs of FT, SmFT-1, SmFT-2 and SmFT-3 showed all the characteristic features of FT-like protein genes. These include the conserved amino acids Tyr85 and Gln140 (Tyr84 and Gln 139 in SmFT-1, Tyr82 and Gln 140 in SmFT-2, Tyr86 and Gln 141 in SmFT-3, and Tyr83 and a missing Gln in SmFT-4). Furthermore, the highly conserved amino acid sequences in exon IV critical for FT activity, LGRQTVYAPGWRQN, as well as the highly conserved LYN triad, were identical in SmFT-1 [23]. However, minor variations in the LGRQTVYAPGWRQN were observed in SmFT-2 and SmFT-3. With regards to LYN triad, SmFT-1 and SmFT-3 exhibit complete similarities, while SmFT-2 displays an FHN instead. In the next subfamily, the putative orthologs of TFL1-like genes, SmTFL1, SmCEN-1, SmCEN-2, SmCEN-3 and SmCEN-4 displayed conservation in amino acid residues His88 and Asp144 in the corresponding positions (His86 and Asp 142 in SmTFL1, His87 and Asp142 in SmCEN-1, His84 and Asp139 in SmCEN-2, His90 and Asp146 in SmCEN-3, and His88 and Asp144 in SmCEN-4) (Fig 2). SmMFT-1 and SmMFT-2 which were grouped together in the third subfamily carried the critical amino acid residue Trp, which is distinct from Tyr and His in FT or TFL1. Based on the sequence alignment, both SmFT-1 and SmFT-3 possess the amino acids, Y85, Y134, W138 and Q140 which are typical of an inducer FT. These residues have been well established as factors distinguishing the activator and repressor activities in FT [1]. However, SmFT-2 deviates from these critical amino acids, as it contains a non-tyrosine amino acid at the 134th position and a non-tryptophan amino acid at the 138th position. This deviation in SmFT-2 (Fig 4) is identical to that of the repressor FT found in Nicotiana tabacum (NtFT1). Additionally, the presence of residues E109 and N152, which are important for floral activities [1], were also screened in all the three FT-like genes of S. melongena and were found to be present in their respective positions. Of interest, the upstream region of SmFT-1 from the start codon (a putative promoter region) was subjected to a BLASTn survey against the NCBI nonredundant (nr) database. The results revealed the presence of a retrovirus-related polyprotein from transposon RE1 with a length of 2711 bp located at position -1052 to -3763 from the ATG region. This transposon sequence was also found in SmFT-1 of all three cultivars.

Comparison of eggplant and tomato FT gene homologs
The FT homologs of eggplant (S. melongena) and tomato (S. lycopersicum) shared common features as depicted in Table 3. Initially, six FT homologs were identified in tomato and referred to as SlSP3D, SlSP6A, SlSP5G, SlSP5G1, SlSP5G2, and SlSP5G3 [42]. However, further investigation revealed that these represent only five FT genes [43]. SmFT-1 was predicted to be a floral promoter without any alterations detected at critical amino acids determining floral transition. One promoter, SlSP3D, has been identified in tomato, to date [42]. The changes in the amino acid positions 134, 137 and 138 of SmFT-2 and SlSP5G suggests that the former plays a repressor role [42].
Additionally, SmFT-4 and SlSP6A contained a premature stop codon in their last exons. Most notably, the screening of coding sequences from the S. melongena-HQ genome resulted in the identification of two separate partial coding sequences for SmFT-5, which was similar to FTL1 in tomato [43].

FT/TFL1 gene variants discovered across three different genomic resources
The comparison of each allele of the FT/TFL1 gene homologs mined across three different genome assemblies had uncovered variants in the coding regions as indicated in Table 4. The variants were identified in SmFT-2 and SmCEN-4. The SmFT-2 allele sequence obtained from S. melongena-HQ differed from the consensus CDS at two different amino acid positions. Likewise, the variant obtained for SmCEN-4 had two variations in the protein sequence in comparison to the consensus CDS. According to SIFT prediction, both the variations detected in SmFT-2 were targeted to affect protein function, while variations detected for SmCEN-4 were tolerated.   SmMFT-1, SmMFT-2, SmCEN-1, SmCEN-2, SmCEN-4

and SmTFL1 across different cultivars using Pacbio's long-range amplicon sequencing
In order to dissect the allelic sequence variations in the gene pool of FT/TFL1 homologs of eggplant, Pacbio RS II long-range amplicon sequencing was employed. Here, the sequencing of SmFT-3 SlSP5G2 a. SmFT-3 have changes in amino acids at 137 th position whereas SlSP5G2 contain changes at 137 and 138 th positions [42]. b. Both do not carry any additional residues between 134 and 137 th positions of FT protein [42].

SmFT-4 SlSP6A
Both have premature stop codons in their last exons [42].  Differences between the genotypes comprised of base substitutions and deletions of single base pairs as well as a stretch of multiple base pairs, as shown in Table 5. The variations fell in the non-coding regions with a small number of them in the genic regions, specifically in SmMFT-2. The comparison between the coding regions of these gene homologs and the consensus sequences mined from genome assemblies showed that they were identical and notably SmCEN-4 of these cultivars were found to be identical to Variant 2. The gene sequences were deposited in NCBI Genbank with following accessions:

SmFT
Unlike other sequenced genes, SmMFT-2 displayed more than 90× of amplicon coverage with~460-490 subread coverage which were distributed across almost maximum number of samples, as shown in Table 6. Since higher coverages offer greater possibilities to unveil heterozygosity [44], the samples were screened for such occurrences. There were altogether two alleles detected for the gene. Among them, SmMFT-2_allele1 was found in all cultivars examined, including Pant Samrat. Interestingly, in Pant Samrat, the SmMFT-2_allele1 was discovered along with SmMFT-2_allele2 which was detected in a 1:1 ratio. We verified this with sequences of mutant populations of Pant Samrat (S4 Table). The gene coverages of the SmMFT-2 in the mutant population were also similar to the control samples detailed in Table 6 Table 7. Variations at positions 61, 277 and 350 were predicted to impact protein functions (SIFT analysis).

Comparative study of MFT-like gene sequences in S. melongena with transcripts of its wild relative, S. incanum
The MFT-like genes were mined from the de novo assembled transcriptome sequences of S. incanum, the wild relative of S. melongena. The coding sequences of SmMFT-1 gene had one variation as compared to the MFT-1 transcript of S. incanum (hereafter referred to as SiMFT-1). The corresponding amino acid variation from Threonine (T) to Serine (S) was predicted to be tolerated (SIFT analysis).
Next, the exploration of MFT-2 genes in S. incanum unveiled the homolog to be heterozygous (one of the alleles is termed as SiMFT-2_allele1 and the other SiMFT-2_allele2, hereafter).

Discussion
In this study, the availability of multiple genome assemblies of eggplant provided a platform to mine sequences of FT/TFL1 gene homologs with high confidence. This was achieved by comparing sequences from different resources to derive highly identical versions. Altogether, a total of 12 members of the gene family have been uncovered and their putative genomic organizations were found to be similar to those in Arabidopsis. While the number of TFL-like and MFT-like gene homologs was equal across the genome assemblies, there were differences in the number of FT paralogs. These differences may be due to the quality of the genome assemblies. The SME_r2.5.1 [35] was highly fragmented, from which only two FT paralogs were obtained, while the chromosome level genome assemblies of S. melongena-67/3 [36] and S. melongena-HQ [37] identified four and five FT paralogs, respectively. The presence of five FT paralogs in eggplant is in agreement with the findings of duplication and divergence of this gene cluster in various botanical families, including Solanaceae [45,46] Brassicaceae [47] and Salicaceae [48], whereas single genes have been reported in species such as grapevines, apple trees and citrus. Additionally, eggplant contains two MFT-like genes which are similar to other Solanum species like tomato, but in contrast to other dicots like Arabidopsis, which is represented by a single gene [49]. Besides, the presence of more than one MFT-like gene has also been reported in several monocot genomes, for instances, two genes in rice and three genes in maize. Furthermore, it is interesting to note that eggplant with its 12 FT/TFL1 genes share identical number of the genes with its congener, tomato [42,43]. This suggests that the genome mining of FT/TFL1 genes in eggplant has likely resulted in a complete or near-complete collection of the homologs.

Characteristics of FT/TFL1 gene homologs in S. melongena
The sequence analysis of FT-like genes of S. melongena revealed structural and putative functional divergence among individual paralogs, as predicted through protein sequence alignment with previously characterized FT promoters and repressors. Screening of residues Tyr-134 Trp-138 codons, commonly present in floral activators [1], revealed that SmFT-1 matched the predicted promotor, suggesting its role in promoting floral induction as based on Arabidopsis.
SmFT-2, however, exhibited variations in these critical regions, with Y134N and W138S substitutions. This is consistent with the functional shift observed in the sugar beet FT ortholog BvFT2, where three mutations in the external loop region (residues Tyr-134, Gly-137 and Trp-138) were associated to conversion of BvFT2 into a floral repressor. To investigate the effect of individual changes in these residues, mutations were introduced in these positions in Arabidopsis FT. Introduction of mutations such as G137A, G137W, G137E (which is similar to G137Q) and G137R did not impart any repressive activity on the Arabidopsis FT. Furthermore, the introduction of point mutations at Tyr-134 and Trp-138 was able to convert the FT into a TFL1-like molecule and it was discovered that manipulating either of the residues was sufficient to confer floral repressive activity on the FT [50]. Similar accounts of functional shifts were also shared by FT orthologs of various species. Interestingly, the amino acid variations observed in SmFT-2 in these conserved positions were identical to the floral repressors of tobacco NtFT1, NTFT2 and NtFT3 [1] as well as that of tomato (SlSP5G) [42], as previously documented. This is indicative that floral repressing activity was acquired by SmFT-2, through evolution of the FT clade. Finally, SmFT-3 has shown variations at Gly-137 position. However, further functional validation is required to elucidate the impact of this positional change in eggplant. In addition to the residues described, mutations at other amino acids such as Tyr-85, Glu-109, Gln-140 and Asn-152 have been reported to impact the functional specificity of FT and TFL1 in Arabidopsis [50]. These residues were found to be invariant in these genes examined. The mutations at Tyr-134 and Trp-138 indicate successive evolutionary changes of the FT clade after the divergence of FT-like and TFL1-like genes [1].
We also identified the presence of a transposable element in the putative promoter of SmFT-1, located in the upstream region of the homolog (-1052 to -3763 from the start codon). Transposable elements constitute a significant portion of plant DNA and their effect on gene expression can vary depending on their location [51]. Further investigation is necessary to determine whether this specific element affects the expression of the gene. Additionally, one of the SmFT paralogs, SmFT-4, contains a premature stop codon in exon IV. Truncated codons in the FT gene have been found in tomato. An FT homolog in tomato, SlSP6A, had a premature stop codon in exon IV and its expression has not been detected in various organs, leading to its prediction as a pseudogene [42]. In addition to the FT genes, analysis of the TFL-like and MFT-like gene sequences in eggplant showed that they were invariant at the known critical amino acids.
Variations in FT paralogs reflect a strategy of the crop to precisely time flowering in response to diverse external and internal stimuli [42]. For example, in sugar beet, BvFT1 is a floral repressor that prevents flowering under short-day conditions and before vernalization by repressing BvFT2 expression [52]. Similarly, some Solanaceous crops have undergone gene duplication and divergence in FT homologs to fine-tune flowering initiation in response to environmental stimuli. In tobacco, NtFT4 promotes flowering whereas NtFT1, NtFT2 and NtFT3 are floral repressors, all of which are expressed during short day conditions [10]. Recently, a novel FT gene, NtFT5, was discovered to be expressed regardless of the day length, suggesting a regulatory role under both long-day and short-day conditions [53]. In tomato, there are three FT orthologs that act as floral repressors. One of them is triggered by long days, while the other two are triggered by short days. Moreover, an FT homolog of tomato, known as SFT/SP3D, is a floral promoter whose expression is insensitive to photoperiod [42]. As such, the divergence of FT paralogs in eggplant is likely to confer differential activities influencing floral transitions in response to environmental cues. The identification of gene homologs provides additional avenues to further our understanding of flowering mechanisms in eggplant. The genome mining indicates the presence of variants such as Variant1 of SmFT-2 and Vari-ant2 of SmCEN-4. Substitutions found in Variant1 were predicted to affect protein function (SIFT analysis) while the changes in Variant2 were possibly tolerated. Taken together, our findings support the postulation that individual eggplant cultivars carry gene variants which are absent in the reference genome which can be unveiled through resequencing of various accessions and such variants have been expected to influence phenotypic traits [36].

Comparison of FT homologs between eggplant and tomato
Five FT homologs were identified from the genomes of eggplant and tomato [42,43]. We found basic similarities between the FT homologs of both Solanum species, as described in Table 3. Interestingly, we obtained two partial coding sequences from the genomic resources of eggplant that corresponded to a single gene, SmFT-5. We retrieved the upstream and downstream sequences of these partial sequences from the corresponding genomic scaffolds and subjected them to Fgenesh gene structure prediction (S2 Fig). Fgenesh predicted two partial sequences. We expect that some mutations may have caused inaccuracies in the predicted gene structure, although it is also possible that the genome assembly process introduced sequence variations. In tomato, a comparable circumstance was encountered during the genome-wide identification of FT/TFL1 gene homologs, where six FT gene sequences were initially identified, but two were later found to correspond to a single gene, called Flowering Locus T-Like 1 (FTL1) [42,43]. A 2 bp deletion in the FTL1 gene had resulted in the premature termination of the protein, producing a fragmentary PEBP domain [43]. These findings from another Solanum species suggest that further investigations are necessary to verify the gene structure of eggplant SmFT-5.

Utilization of long-range targeted sequencing of Pacbio in the allele mining of FT/TFL1 gene homologs across different cultivars
Since allele mining is an efficient way to gather gene variants and study their implications in agricultural adaptations [54], we extended our search for additional variants among the commercial cultivars of Surya, EP-47 Annamalai, Pant Samrat and Arka Nidhi. We used Pacbio RSII long-range targeted amplicon sequencing to discover different genotypes in SmMFT-1, SmMFT-2, SmCEN-1, SmCEN-2, SmCEN-4 and SmTFL-1. We found that, except for SmCEN-4 and SmTFL-1, each of the genes showed presence of two different genotypes among the cultivars. Screening for variations between the alleles of each gene revealed that most variations fell in intronic regions, while a few resided in the exonic region, which is consistent with the high conservation of protein coding sequence [55].
We used a single barcode type for the entire gene homologs (~1kbp to~3kbp in lengths) extracted from an individual plant, despite high sequence similarities between these members of a multi-gene family. This was possible due to the long-range sequencing property of the Pacbio platform which negates the need to fragment long sequences and reconstruct them back during assembly [56]. However, as the amplicons sequenced were of unequal size distributions, the gene coverages obtained were also uneven. Nonetheless, some plant samples showed the presence of heterozygous loci which contain both alleles. Here, the long-read amplicon sequencing provided a straightforward remedy for variant phasing [57].
Sequencing of the gene homolog, SmMFT-2 however resulted in almost equal and highest range of coverages, as per this study. Some members of the Pant Samrat cultivar were shown to be heterozygous at the locus (i.e. SmMFT-2_allele1 and SmMFT-2_allele2). However, SmMFT-2_allele2 was absent in any other cultivars examined, as only SmMFT-2_allele1 was detected among them. Comparison of the two alleles indicates the presence of four non-synonymous mutations, three of which (at positions 61, 277 and 350) could impacts protein function, as predicted by the SIFT analysis.
Comparative study of SmMFT gene sequences with transcripts of its wild relative, S. incanum MFT gene homologs are considered to be the ancestors of FT/TFL1 gene sequences. However, its function is not well understood. In addition to their redundant floral inductive function, these homologs also play important roles in seed dormancy and germination, in line with their seed specific expressions [58]. In Arabidopsis, MFT has been shown to increase dormancy during seed development stage, while promoting germination in after-ripened seeds imbibed with exogenous ABA. In contrast, Triticum aestivum MFT (TaMFT) positively regulates seed dormancy while inhibiting seed germination [16]. In S. melongena, seed dormancy is considered to be a trait that has undergone human selection since it is rarely related in this species [59]. Nevertheless, the trait is commonly observed in wild-type Solanum species [60]. S. incanum, a wild ancestor of S. melongena, has been reported to display a slow and low germination rate, taking about 30 days to reach 15 to 50% germination [61]. Therefore, a comparative genetic analysis has been undertaken to determine coding sequence differences that could possibly exist between S. incanum and the cultivars examined in this study.
Through the mining of MFT-like genes using S. incanum transcriptome, we identified an allele related to SmMFT-1 homolog. We found one non-synonymous nucleotide variation between SmMFT-1 and the corresponding allele detected in the S. incanum transcript. Based on the results of the SIFT analysis, the corresponding amino acid variation was predicted to be tolerated.
With regard to the MFT-2 gene, S. incanum showed the presence of heterozygous alleles, which were named as SiMFT-2_allele1 and SiMFT-2_allele2, respectively. Similar to the cultivar Pant Samrat, SiMFT-2_allele1 and SiMFT-2_allele2 differed at positions 61 and 277 (referring to the coding sequences in line with SmMFT-2 genes). SiMFT-2_allele1 was detected and found to be identical across cultivars, Surya, EP-47 Annamalai, Pant Samrat and Arka Nidhi. All the cultivars showed homozygosity for the allele, except for Pant Samrat. The other allele of S. incanum (SiMFT-2_allele2) was distributed in some members of Pant Samrat, where SmMFT-2_allele2 shared similar nucleotides at positions 61 and 277. However, the latter carried two additional mutations at positions 350 and 367. The absence of SmMFT-2_allele2 genotype in most cultivars could be responsible for the discrepancies in seed traits between wild type and most cultivated species. However, these findings need to be further supported by empirical data.

Conclusions
The functional interpretations of FT/TFL1 gene homologs through computational approaches indicate the divergence of FT paralogs. Similar functional alterations have been demonstrated to modulate floral regulation in various species, including Solanaceous crops like tomato [42] and tobacco [10]. Therefore, such changes in FT paralogs in eggplant could be implicated on the crop's differential responses to environmental signals. This study has also uncovered unique variations in the MFT-2 gene sequences across different cultivars, as well as in comparison to the wild relative, S. incanum. These variations suggest a possible role of these alleles in regulating seed dormancy and germination. The hypotheses derived from this study add to the fundamental points to direct future functional validations pertaining to floral regulation, seed dormancy and germination in eggplant.