A dataset on multi-trait selection approach for the evaluation of F1 tomato hybrids along with their parents under hot and humid conditions in Bangladesh

This dataset aims to evaluate the use of multiple trait-based selection methods with multi-trait genotype-ideotype distance index (MGIDI) models to identify superior summer F1 tomato hybrids suitable for the climatic conditions of countries like Bangladesh. The dataset was generated using 14 cross combinations from a Line × Tester mating design, along with seven parental lines and two tester parents of tomatoes with diverse genetic bases and heat tolerance qualities in a randomized complete block (RCB) design. The likelihood ratio (LR) test indicated highly significant genotype effects for most of the analyzed traits. A heatmap of correlation analyses between 16 traits identified a highly significant positive correlation (r > 0.8) between NFrPC and NFPC and between AFW and FW, preliminarily indicating a clear trace of multicollinearity among these traits. The traits NFPP, YPP, and Yield showed the highest predicted genetic gains, indicating their potential for substantial improvement through selection. Additionally, the heritability estimates ranged from 0.54 to 0.99, highlighting high heritability across the traits, which suggests favourable conditions for effective selection strategies. The strengths and weaknesses of hybrids AVTOV1002×C41 and AVTOV1010×C41 were evaluated based on their contributions to MGIDI across four major factors. These hybrids demonstrated strong performance, particularly excelling in traits associated with FA1, FA2, and FA4. The dataset of MGIDI can be universally applied to rank treatments based on desired values of multiple traits, with its potential for rapid expansion in evaluating various types of plant experiments.

toes with diverse genetic bases and heat tolerance qualities in a randomized complete block (RCB) design.The likelihood ratio (LR) test indicated highly significant genotype effects for most of the analyzed traits.A heatmap of correlation analyses between 16 traits identified a highly significant positive correlation ( r > 0.8) between NFrPC and NFPC and between AFW and FW, preliminarily indicating a clear trace of multicollinearity among these traits.The traits NFPP, YPP, and Yield showed the highest predicted genetic gains, indicating their potential for substantial improvement through selection.Additionally, the heritability estimates ranged from 0.54 to 0.99, highlighting high heritability across the traits, which suggests favourable conditions for effective selection strategies.The strengths and weaknesses of hybrids AVTOV1002 ×C41 and AVTOV1010 ×C41 were evaluated based on their contributions to MGIDI across four major factors.These hybrids demonstrated strong performance, particularly excelling in traits associated with FA1, FA2, and FA4.The dataset of MGIDI can be universally applied to rank treatments based on desired values of multiple traits, with its potential for rapid expansion in evaluating various types of plant experiments.
© The seeds of the F1 hybrids and their parents were sown in a well-prepared seedbed.Forty-day-old tomato seedlings were then transplanted into the main field under transparent polytunnels.The polytunnels were 2.3 m wide and contained two-unit beds, each measuring 0.8 m by 1 m, with a 30-cm drain between the 14-unit beds.Each unit bed had double rows, accommodating 24 plants.Most of the data were collected from randomly selected plants-five plants per parental line and their crosses.Fruits per plant, yield per plant, and yield per hectare were calculated from the plot yield.

Data source location
The experiment was conducted at the vegetable experimental field under the polytunnels of the

Value of the Data
• The dataset of the multi-trait genotype-ideotype distance index (MGIDI) helps select superior treatments/genotypes in plant experiments by combining desired traits, enhancing breeding efficiency, and reducing reliance on univariate analyses.Its straightforward, graphical approach allows quick interpretation and application of results, identifying effective traits and balancing strengths and weaknesses.• Farmers and agricultural practitioners can optimize resource allocation by choosing the bestperforming hybrids for cultivation, leading to improved resource utilization and increased productivity.• Other researchers can reuse these datasets to validate and further develop the MGIDI index in different agricultural contexts or crop species, expanding its applicability and refinement.

Background
Tomato (Solanum lycopersicum L.), of the Solanaceae family, is widely grown in Bangladesh and other parts of the world for its taste, nutritional value, uses, and commercial importance [1][2][3][4][5].In developed countries, hybrid tomatoes are popular for their high yield and quality.Still, in Bangladesh, hybrid seed use is limited, necessitating the development of high-yielding, highquality, and widely adaptable hybrid varieties.In Horticultural experiments, evaluating multiple traits is common, but identifying genotypes/treatments that excel across many traits is challenging.Researchers often choose univariate analyses and post-hoc tests for mean comparisons, suggesting that multi-trait framework benefits may be underutilized.Classical linear multi-trait selection indexes exist, but multicollinearity and arbitrary weighting coefficients can hinder genetic gains [6][7][8].In this dataset, we have used the MGIDI (Multi-trait Genotype-Ideotype Distance Index), introduced by Olivoto and Nardino [ 7 ], which offers a novel approach to selecting genotypes and recommending treatments based on multiple traits.MGIDI provides more efficient and accurate treatment recommendations by focusing on desired or undesired crop characteristics.It is unique, easy to interpret, and free from weighting coefficients and multicollinearity limitations.

Variance components, genetic parameters and phenotypic correlations
The likelihood ratio (LR) test indicated highly significant genotype effects ( p < 0.01) for most of the analyzed traits.Except for NPBLH, NFPC and NFrPC, all the other traits had the genotypic variance ( σ 2 g) as the main component of the phenotypic variance ( σ 2 p) ( Tabel 1 ; Fig. 1 ).The broad-sense heritability on a genotype mean basis (h 2 ) ranged from 0.37 (NPBLH) to 0.99 (AFW).High values of heritability (h 2 > 0.8) were observed for FW, AFW, NFPP, YPP, Yield, DFPF and NLPF, suggesting good prospects of selection gains for these traits.The assessment of accuracy (AS) for the mean trait value showed significant genetic variation among the genotypes, with an accuracy level greater than 0.70, enabling precise prediction of the genetic value of the trait.A heatmap of correlation analyses was conducted between 16 traits to preliminarily identify those Notes: * * * * , * * and * significant at < 0.0 0 01, < 0.01 and < 0.05 respectively; ns -not significant.LRT , Likelihood ratio tests for genotype; AIC , Akaike's Information Criterion for the selected model; σ 2 p , phenotypic variance; h 2 , heritability; As , the accuracy of genotype selection; CVg and CVr , the genotypic and the residual coefficient of variation, respectively; CV ratio , the ratio between genotypic and residual coefficient of variation.See Table 4 for the full trait names.4 for the full trait names.
contributing to multicollinearity ( Fig. 2 ).A highly significant positive correlation ( r > 0.8) was found between NFrPC and NFPC, as well as between AFW and FW.

Factor analysis and predicted selection gains
Four principal components were retained, which explained 79.6 % of the total variation among the traits ( Table 2 ).Thus, it was possible to reduce the data dimensionality by 75 % keeping a high explanatory power.After varimax rotation, the average communality ( h ) was 0.796 (wilt 0.52 ≤ h ≤ 0.94 FW), indicating that a high proportion of each variable's variance was explained by the factors.The 16 traits were grouped into the four factors (FA) as follows: In FA1 the fruit-related traits FL, FW, and AFW with positive loadings, and NFPC, NFrPC and NLPF with negative loadings; In FA2 the traits NFPP, YPP, Yield, DFPF and wilt (with positive loadings); In FA3 the traits FSI, TSS and TLCV (with positive loadings) and in FA4 the plant-related traits PHLH and NPBLH (with negative loadings) ( Table 2 ).
The predicted genetic gain (SG) for effective traits in the MGIDI index is presented in Table 3 .Results indicated a higher SD% for major measured traits, such as NFPP, YPP, Yield, AFW and TSS.The estimates of heritability on the entry-mean basis ranged from 0.54 (NPBLH) to 0.99 (AFW, NFPP, YPP Yield and DFPF), which were high for all filtered traits.This suggests that there are good prospects of selection gains for these traits.The selected traits with the highest genetic gains (SG%) were NFPP (32.70 %), YPP (29.90 %), and Yield (21.90 %).The only trait with undesired selection gain (-21.50 %) was AFW.

Treatment ranking according to the multi-trait index
Fig. 3 presents a brief visual illustration of the rankings of genotypes according to their MGIDI index values, and highlights selected genotypes based on the given selection criteria.Out of all Fig. 2. Phenotypic correlation heatmap between the traits evaluated (The practice of genotype selection involved a preliminary examination of traits that contribute to multicollinearity).See Table 4 for the full trait names.
the genotypes, AVTOV1002 ×C41 and AVTOV1010 ×C41 were selected and highlighted in red, indicating their significant performances.Additionally, two other genotypes, AVTOV1007 ×C41 and AVTOV1001 ×C41, were also ranked among the top four best genotypes based on their performance across multiple traits.These genotypes possess favourable characteristics for the given traits, making them suitable for the study or the desired purpose.

Strengths and weakness of hybrids
Fig. 4 represents the strengths and weaknesses of the genotype, labelling the contribution of factors toward MGIDI into four major categories.Factors with a greater contribution are plotted closer to the centre, while those with a lesser contribution are plotted toward the edge.The information provided by these contributions can assist in the selection of appropriate parent contributors in crossbreeding programs.FA1 had a smaller effect on hybrids AVTOV1010 ×C41, AVTOV1001 ×C41 and AVTOV1002 ×C41, indicating that these hybrids were  Notes: X 0 = overall mean, X S = mean of selected hybrids and their patents, SD = selection differential, h 2 = broad-sense heritability on the entry-mean basis, SG = selection gain, goal = selection gains match desired sense (100 for yes and 0 for no).See Table 4 for the full trait names.
good performers for most FA1-related traits (), namely NFPC, NFrPC, FL, FW, AFW and NLPF.FA2 had the lowest effect on hybrids AVTOV1010 ×C41 and AVTOV1001 ×C41, indicating that these two hybrids have strengths in NFPP, YPP, Yield, DFPF and wilt.FA3 had a lower impact on the AVTOV1010 ×C41 hybrid, suggesting that this hybrid performed well for most of the FA3-correlated traits, namely the FSI, TSS, and TLCV.Finally, FA4 had a smaller effect on hybrids AVTOV1010 ×C41, AVTOV1001 ×C41 and AVTOV1002 ×C41, indicating that these three genotypes have strengths in PHLH and/or NPBLH.The ranking of selected genotypes based on their combinations of multiple traits has revealed that hybrids AVTOV1002 ×C41 and AVTOV1010 ×C41 are the two highest performing.

Location and cultivation environment
The experiment was conducted at the Olericulture division of the Horticulture Research Centre (HRC) of Bangladesh Agricultural Research Institute (BARI), Gazipur-1701 (23 °59 27.7 N 90 °24 42.4 E, 8.4 masl).The climate of the experimental site is subtropical characterized by heavy rainfall from May to October and medium to scanty during the rest of the year.The monthly average minimum and maximum temperature during the crop period were 24.7 °C and 32.5 °C respectively.The monthly average relative humidity was 79.35 %.The monthly average Fig. 4. The strengths and weaknesses view of the selected genotypes is shown as the proportion of each factor on the computed multi-trait genotype-ideotype distance index (MGIDI).The smaller the proportion explained by a factor (closer to the external edge), the closer the traits within that factor are to the ideotype.The dashed line shows the theoretical value if all the factors had contributed equally.The traits grouped into each factor where: FA1: NFPC, NFrPC, FL, FW, AFW and NLPF; FA2: NFPP, YPP, Yield, DFPF and wilt; FA3: FSI, TSS and TLCV and FA4: PHLH and NPBLH.rainfall during the crop period was 183.29 mm.The soil of the experimental site belongs to the general soil type (Shallow Red Brown).The top soils were clay loam in texture having soil pH ranging from 6.0-6.6 and had organic matter of 0.84 %.The experimental area was flat having an available irrigation and drainage system and above flood level.The experimental area was enhanced with a recommended dose of fertilizers (550-450-250 kg/ha of urea, TSP, MOP and cow dung 10 t/ha).

Plant material and experimental design
The experiment was laid out in Randomized Complete Block Design (RCBD) with two replications.Seeds of selected 14 cross combinations from a Line × Tester mating design and their seven parental lines were the plant materials used for the study (seven female parents include AVTOV1001, AVTOV1002, AVTOV1006, AVTOV1007, AVOV1008, AVTOV1009 and AVTOV-1010 and The disease incidence scale given by Mew and Ho [ 9 ] % decrease two male genotypes were C41 and BARI-4 with diverse genetic bases and heat tolerance quality).Altogether, seeds of 23 genotypes were sown densely on 18th May 2012 in the primary seedbed.Forty-day-old tomato seedlings were transplanted in the main field under transparent polytunnels in the same location where F1 (experimental hybrids) were synthesized.The polytunnels were 2.3 m wide having two-unit beds with 0.8 m × 1 m sizes keeping a 30 cm drain in between 14-unit beds.Each unit bed contained double rows accommodating 24 plants.

Assessed traits and collection of data
The harvests began in the maturation stage and were carried out twice a week.Through the production cycle, five random competitive plants per treatment were selected, and observations were recorded.Observations for all the 16 characters described below were recorded for each of the genotypes and developed F1 hybrids ( Table 4 ).

Statistical analysis
The theory of the MGIDI index is arranged into four main steps to select the best genotypes based on statistics about multiple trait information [ 6-8 , 10 ].

Rescaling the traits
Let X ij be a two-way table with i rows/genotypes/treatments and j columns/traits.The rescaled value for the i th row and j th column ( rX ij ) is given by [ 6 ] : Where ϕ nj and η oj represent the minimum and maximum original values for the j th trait, respectively, while ϕ nj and η nj represent the new minimum and maximum values for the j th trait after rescaling, respectively.The original value for the j th trait of the i th genotype is represented by θ ij .The values of ϕ nj and η nj were selected based on the desired gains for each trait: for traits with positive gains, ϕ nj = 0 and η nj = 100 was used, while for traits with negative gains, ϕ nj = 100 and η nj = 0 were used, as suggested by Olivoto and Nardino [ 7 ].

Factor analysis
The second step is to compute an exploratory factor analysis (FA) with rX ij to account for the correlation structure and dimensionality reduction of the data, as follows: Where X is a p × 1 vector of rescaled observations; μ is a p × 1 vector of standardized means; L is a p × f matrix of factorial loadings; f is a p × 1 vector of common factors; and ε is a p × 1 vector of residuals, being p and f , the number of traits and common factors retained, respectively.The eigenvalues and eigenvectors are obtained from the correlation matrix of rX ij .The initial loadings are obtained considering only factors with eigenvalues higher than one.Then, the varimax rotation criteria [ 11 ] are used for the analytic rotation and estimation of final loadings.The scores are then obtained as follows: Where F is a g × f matrix with the factorial scores; Z is a g × p matrix with the (rescaled) standardized means; A is a p × f matrix of canonical loadings, and R is a p × p correlation matrix between the traits.g, f and p represent the number of rows/genotypes/treatments, and factors retained and analyzed traits, respectively.

Ideotype planning
By definition [ Eq. ( 1) ], the ideotype has the maximum rescaled value (100) for all analyzed traits.Thus, the ideotype can be defined by a 1 × p vector I such that I = [100, 100, …….100].The scores for I are also estimated according to Eq. (3) .

The MGIDI index
The fourth and last step is the estimation of the multi-trait genotype-ideotype distance index (MGIDI), which is used to rank the treatments based on the desired values of the studied trait, as follows [6][7][8]: Where MGIDI i is the multi-trait genotype-ideotype distance index for the i th row/genotype/treatment; γ ij is the score of the i th row/ genotype/treatment in the j th factor ( i = 1, 2, …. g; j = 1, 2, …. f ), being g and f the number of rows/genotypes/treatments and factors, respectively; and γ ij is the j th score of the ideotype.The row/genotype/treatment with the lowest MGIDI is then closer to the ideotype and therefore presents desired values for all the p traits.The selection differential for all traits was computed considering a selection intensity (10 %), i.e., the first two treatments/genotypes with the lowest MGIDI index were selected.The proportion of the MGIDI index of the i th row/genotype/treatment explained by the j th factor ( ω ij ) is used to show the strengths and weaknesses of genotypes/treatments and is computed as [6][7][8]: Where D ij is the distance between the i th genotype/treatment and the ideotype for the j th factor.Low contributions of a factor indicate that the traits within such a factor are close to the ideotype.
Data manipulation and index calculation were performed in the R Software version 4.3.1 (R Core Team, 2024) using the package metan v1.18.0 [ 12 ].

Limitations
None.

Fig. 1 .
Fig. 1.Estimated variance components for the traits evaluated.See Table4for the full trait names.

Fig. 3 .
Fig. 3. Treatment ranking based on the MGIDI index (The selected genotypes are shown in red and the unselected in black circles in the electronic version of the article.The circle represents the cut point according to the selection pressure ≈10).

Table 1
Deviance analysis and genetic parameters for traits evaluated.

Table 2
Eigenvalues, explained variance, factorial loadings after varimax rotation, and communalities obtained in the factor analysis.

Table 3
Predicted genetic gain for the effective traits in the MGIDI index.

Table 4
Code, description and goal for selection of the traits evaluated.