USE OF COMPUTATIONAL INTELLIGENCE IN THE GENETIC DIVERGENCE OF COLORED COTTON PLANTS

The objective of this work was to analyze the genetic diversity using conventional methods and artificial neural networks among 12 colored fiber cotton genotypes, using technological characteristics of the fiber and productivity in terms of cottonseed and cotton fiber yield. The experiment was conducted in an experimental area located at Fazenda Capim Branco, belonging to the Federal University of Uberlândia, in the city of Uberlândia, Minas Gerais. Twelve genotypes of colored fiber cotton were evaluated, 10 from the Cotton Genetic Improvement Program (PROMALG): UFUJP 01, UFUJP 02, UFUJP 05, UFUJP 08, UFUJP 09, UFUJP 10, UFUJP 11, UFUJP 13, UFUJP 16, UFUJP 17 and two commercial cultivars: BRS Rubi (RC) and BRS Topázio (TC). The experimental design used was complete randomized block (CRB) with three replications. The following evaluations were carried out at full maturation: yield of cottonseed (kg ha-1) and the technological characteristics, which include, fiber length, micronaire, maturation, length uniformity, short fiber index, elongation and strength, using the HVI (High volume instrument) device. Genetic dissimilarity was measured using the generalized Mahalanobis distance and after obtaining the dissimilarity matrix, the genotypes were grouped using a hierarchical clustering method (UPGMA). A discriminant analysis and the Kohonen Self-Organizing Map (SOM) by Artificial Neural Networks (ANN’s) were performed through computational intelligence. SOM was able to detect differences and organize the similarities between accesses in a more coherent way, forming a larger number of groups, when compared to the method that uses the Mahalanobis matrix. It was also more accurate than the discriminant analysis, since it made it possible to differentiate groups more coherently when comparing their phenotypic behavior. The methods that use computational intelligence proved to be more efficient in detecting similarity, with Kohonen's SelfOrganizing Map being the most adequate to classify and group cotton genotypes.


Introduction
Cotton is grown in more than 72 countries on five continents with more than 90% of world's production is of the Gossypium hirsutum species, with a large part consisting of white fiber (Borém and Freire 2014). Cotton is considered the most important natural textile fiber in the world, as it is used to dress almost half of the global population (Cardoso et al. 2019).

USE OF COMPUTATIONAL INTELLIGENCE IN THE GENETIC DIVERGENCE OF COLORED COTTON PLANTS
In Brazil, it is an important commodities in the agriculture. It is the fourth producer in the world and the second in export volume, with emphasis on Mato Grosso, Bahia and Minas Gerais as the largest producers in the country (ABRAPA 2020).
However, cotton plants produce colored fibered cotton naturally, which has a small niche market. This naturally colored cotton is important since the fiber does not need to be dyed, eliminating the use of water and reducing production costs (Dutt et al. 2008). However, these fibers are of low quality when compared to white fibers, and therefore research into genetically improving the plants is required (Cardoso 2019).
In this sense, one of the pillars of plant breeding is genetic diversity, as it makes it possible to identify superior hybrid combinations with greater heterotic effect and greater heterozygosity, in order to find genotypes with characteristics of interest (Cruz et al. 2014).
The diversity among parent plants is usually measured using techniques that use biometric models, by cluster analysis methods, main components or canonical variables. On the other hand, there is computational intelligence, which uses models that simulate the human brain, where learning is done through mistakes, successes and experiences (Cruz and Nascimento 2018).
Computational intelligence is an alternative to conventional analysis. It has the advantage of being non-parametric, analyzing the data even if they are unbalanced, have experimental errors and contain flaws in the assumptions (Cruz and Nascimento 2018). Among the techniques used in plant breeding, artificial neural networks, Fuzzy logic and evolutionary computing stand out.
Artificial neural networks (ANN's) simulate human behavior, with neurons and synapses transmitting information (estimating weights between them), making mistakes and getting it right, learning from experience and making decisions. In plant breeding, it is used to classify and group genotypes, in genetic diversity, prediction of genetic value, adaptability and stability, among other things (Haykin 2008;Nascimento et al. 2013;Oliveira et al. 2013;Bhering et al. 2015;Cardoso et al. 2019).
In cotton culture, ARR has been shown to be efficient. Hu et al. (2019), analyzing the cotton yarn quality prediction model based on the artificial recurrent neural network, found that the experimental results show better accuracy. Cardoso et al. (2019), found greater efficiency of ARR for studies of adaptability and stability in cotton, when compared to conventional statistics.
One class of ANN's is Kohonen's Self-Organizing Maps (SOM) that recognizes patterns, clusters and data organization (Cruz and Nascimento 2018) detecting the dissimilarity between genotypes through competitive learning, determining weights for the winning neuron and a radius establishes its neighborhood, with neurons being classified as individuals.
SOM is used in several areas of scientific knowledge. Rodrigo et al. (2012), using SOM observed consistency when checking the gait of individuals with Parkinson's disease. Silva (2018) used SOM to estimate genetic divergence in corn, finding divergence between the use of ANN in relation to multivariate methods.
Based on the above, the objective of this work was to analyze genetic diversity through conventional methods and artificial neural networks among 12 colored fiber cotton genotypes, using the technological characteristics of the fiber, yield and productivity of cotton and cottonseed.

Material and Methods
The experiment was conducted in an experimental area located at Fazenda Capim Branco (18º52'S; 48º20'W and 805m altitude), belonging to the Federal University of Uberlândia, in the municipality of Uberlândia, Minas Gerais in the 2013/14, 2014/15, 2015/16, 2016/17 and 2017/18 seasons. The city has an average air temperature of 22.4ºC, an average relative humidity of 70% and an average annual rainfall of 1,584 mm per year. The area where the experiment was carried out is a dystrophic Dark Red Latosol, with a clay texture.
The experimental design used was complete randomized blocks (CRB) with three replications. The experimental plot consisted of four lines of five meters, spaced one meter apart, with the useful area being composed of the two central lines neglecting 0.5 m from each end of the line.
At full maturity, the weight of cottonseed (kg ha -1 ) and fiber yield were evaluated. The technological characteristics of the fiber were analyzed in the fiber quality analysis laboratory of the Minas Gerais Association of Cotton Producers (AMIPA). These technological characteristics were fiber length, micronaire, maturation, length uniformity, short fiber index, elongation and strength, with the aid of the HVI (High volume instrument) device.
The data were submitted to univariate and multivariate analysis of variances and, from this, the means were obtained to perform the analyzes. The genetic dissimilarity between the pairs of genotypes using the Generalized Mahalanobis Distance (D 2 ii') were estimated as below: In which: D²ii´: generalized Mahalanobis distance between the genotypes i and i'; Ψ: matrix of variances and residual covariance; δ´: [d1 d2 ... dv] where dj = Yij -Yi´j; Yij: mean of the i-th genotype in relation to the j-th variable. After obtaining the dissimilarity matrix, the genotypes were grouped using the Unweighted Pair Group Method with Arithmetic Mean (UPGMA), generating a dendrogram of greatest similarity in which the distance between the genotype and the group formed by individuals i and j is given by: Through computational intelligence, discriminant analysis and Kohonen's Self-Organizing Map (SOM) were performed using Artificial Neural Networks (ANN's). The architecture of SOM was of the feedforward type with an input layer and an output layer, called topological map which is divided into three stages (Cruz and Nascimento 2018): 1 st Stage: Definition of the topological map and establishment of random weights. The following parameters were used for the formation of SOM: 3 neurons in two dimensions ( Figure 1) (three rows and 3 columns), 2000 times and radius neighborhood pattern = 2 and the dist activation function (Euclidean distance), and topology of the hexagons. Afterwards, the synaptic weights and an input vector Xi will start. 2 nd Stage: Given the input values, the measurement of the distance in competition was calculated, and the winning neuron was established as the one with the shortest distance between it and the input data, and the neighboring neurons had their weights adjusted in relation to the input, to determine the neighborhood for the rate of learning (η), and was determined by the following expression: η= measurement of the learning rate; w = weight of neurons; xi= input vector; f(x)= half of the learning rate.
3 rd Stage: Each input participates in the competition, ending one time and stage 2 is resumed when there are no major changes between the weights of input and actuals. The discriminant analysis was performed by means of ANN's using a neural network of the Multilayer Perceptron (MLP) type formed by two layers containing between two and five neurons in each layer, using the logarithmic activation function. The training algorithm chosen was Trainlm (Levenberg-Marquardt backpropagation). The training cycle was set at 5000 times and an error rate of 0.01. The network had 1000 observations and separated 80% of the data for training and 20% for validation ( Figure 2). The analyzes were performed using the statistical program (GENES), integrated with the R and Matlab software (Cruz 2016).

Results and Discussion
The means of the characteristics demonstrate the formation of groups for all characteristics, therefore, there is variability between the evaluated genotypes. In general, commercial genotypes had the best averages, with the exception of elongation, which shows that the variability needs to be explored between these genotypes (Table 1), due to the responsiveness of PROMALG genotypes to the environment. For the dendrogram, the cut was made considering the abrupt change in level (Miranda 2019). With this it is possible to observe the formation of four distinct groups (Figure 3). There is also a co-phenetic correlation of 0.95, which indicates a good graphic representation of the dendrogram, based on data from the dissimilarity matrix (Nardino et al. 2017). Co-phenetic correlation coefficient (r): 0.95.
The four groups formed by the UPGMA method were highly influenced by fiber strength and cottonseed productivity, as they are the factors that most contributed to genetic dissimilarity (Figure 4), therefore, the greater the variation in cottonseed productivity and fiber strength, the greater the divergence between the genotypes. The most productive and strongest genotypes, BRS-Rubi (RC) and BRS-Topázio (TC) and are in the same group. The Mahalanobis method is one of the most used in breeding to estimate dissimilarity, however for its reliability it is necessary that they have a multinormal distribution and homogeneity of the residual covariance matrix. To circumvent these limitations, computational intelligence is an alternative, as it depends only on learning and has no assumptions about the model, using a non-linear structure, such as ANN's that emulate the human brain, simulating and adjusting information by synaptic weights, similar to biological neural connections (Cruz and Nascimento 2018).
Through discriminant analysis with graphic dispersion using the ANN's, eight distinct groups were formed. The RC and TC genotypes remained isolated, as well as UFUJP-16 and UFUJP-17, similar to the dendrogram. However, the ANN's were more representative when we analyzed the other clusters as they were more coherent in relation to the means of the genotypes ( Figure 5). The genotypes that showed the lowest productivity (UFUJP-01, UFUJP-05 and UFUJP-13) were allocated in different groups, which was not observed by the dendrogram (Figure 3), which suggests a different importance attributed to each method, for the characteristics. The UFUJP-05 genotype obtained the second longest elongation (8.99), UFUJP-01 one of the longest fiber lengths (24.29).
The lowest productivity was 1829.41kg ha -1 (UFUJP-13) ( Table 1) and was grouped with UFUJP-08 and UFUJP-10. Intermediate productivity was decisive for grouping UFUJP-02 and UFUJP-09, as they obtained the fifth and sixth highest productivity, with UFUJP-02 having a high length value and a high short fiber index. The UFUJP-09 genotype has the highest IFC and the lowest fiber yield and, due to its unique characteristics, is isolated.
The greatest formation of groups in ANN's was due to the method not being affected by experimental errors, as there may be unbalanced data that do not meet the assumptions. Another relevant point is the fact that they are not based only on means and variances (Cruz and Nascimento 2018), but also increase the number of observations and decrease the apparent error rate, by quantifying the weights between neurons.
The Kohonen Self-Organizing Map (SOM) using ANN's has the ability to detect and organize the similarities of the input patterns through competitive learning, simulating the cerebral cortex with connections between the strongest neurons due to their proximity (Braga 2011;Cruz and Nascimento 2018).
Light colors show less distance between neurons which means that the characteristics have more importance, for the distinction of groups. On the other hand, dark colors represent greater distances, and therefore IFC, ALG and Pkg were the highest determining weights, respectively, in the formation of groups, corroborating the contribution of Singh (1981) only in productivity. It is possible to check the characteristics and their weights in the activation of each SOM neuron. The UHML, STR and Re% were correlated with each other, as they have the same distance pattern, represented by the same color pattern. Only the IFC and ALG characteristics did not contribute to the formation of the line 1 column 1 neuron, where RC and CT were grouped, which suggests that they were the characteristics that contributed the least to the classification ( Figure 6).  It is possible to observe a group for the RC and TC genotypes, corroborating with the other methods ( Figure 7). Distinction justified by having the highest averages for all characteristics.
There was a good representation of the SOM method. The genotypes with lower productivity were isolated (UFUJP-01, UFUJP-05 and UFUJP-13), demonstrating that this characteristic was very important in all methods to determine the classification.
The high simulation capacity of the neural networks, expand the input data by estimating new values, validating them and adjusting weights for each variable in the connections between neurons, organizing the groups by similarity through competitive learning (Cruz and Nascimento 2018) and this allows for a better distinction between genotypes (Figure 8). The neuron in row 1 columns 3 was the neuron that most grouped, these being the genotypes with the lowest productivity and low fiber quality. This grouping was distant from the neurons that grouped the highest productivity, in the first column and with that there are less similarities between these accessions. This method distinguishes and classifies neurons (genotypes) by distance, that is, the closer the neurons, the greater the affinity between them. SOM was able to detect differences and organize similarities between accesses in a more coherent way, forming a larger number of groups, when compared to the UPGMA method and graphical dispersion, being also more accurate than the discriminant analysis, which does not corroborate Silva (2018), that when analyzing the genetic divergence between partially inbred lines of corn by multivariate methods and artificial neural networks, found greater coherence for the canonical variables.

Conclusions
The methods that use computational intelligence proved to be more efficient to detect similarity. Kohonen's Self-Organizing Map was the most suitable to classify and group the colored fiber cotton genotypes.