Characterization of Selected Maize Inbred Lines Adapted to Highland Agro-Ecologies of Ethiopia Using Morphological and Molecular Genetic Distances

Characterization of available genetic diversity is a crucial step in effective crop improvement which provide basis for the analysis of combining ability and exploitation of heterosis of inbred lines in hybrid breeding. Characterization involves evaluation of quantitative and qualitative attributes of a given genotypes in order to differentiate their usefulness, structure, genetic variability and relationships among them. The objectives of this study were to characterize elite maize inbred lines adapted to highland agro-ecologies and classify groups of similar inbred lines by means of cluster and principal component analysis based on morpho-agronomic and SNP markers data. A total of twenty-three maize inbred lines of highland breeding department at Ambo Plant Protection Research Center formed the core plant materials in the current experiment for 23 morpho-agronomic traits and SNP markers characterization. The evaluated lines were sown in alpha lattice design 12 × 2. It was planted in two replications at each locations (Ambo and Holetta agricultural research centers). The inbred-lines were highly significantly different for all traits evaluated (p ≤ 0.01). The PCA indicated that the first nine principal components (PCs) with eigen value greater than unity accounted for 83% of the entire diversity among 23 inbred lines for all traits. Cluster analysis performed on the basis morpho-agronomic traits using unweighted paired group method arithmetic averages (UPGMA) grouped 23 tested lines in to five distinct classes and an outlier genotype whereas cluster analysis based on molecular resulted from distance matrix of genetic categorized the entries into four main groups. Five inbred lines (L5, L8, L18, L12 and L7) with comparatively high yielder and other phenotypic characters were selected using the morpho-agronomic traits and SNP based genotyping for cultivar development and germplasm utilization.


Introduction
Maize (Zea mays L. 2n=20) is a major crop of the world, categorized to the tribe Maydeae of the grass family poaceae. It has great worldwide significance as food, feed and as a source of industrial products [1]. Maize has a vital role in world financial system and market [2]. Apart from manufacturing of mixed feed, maize is a row material to produce corn starch, corn oil and corn syrups [3,4]. Globally, maize is considered as the first most-produced cereal followed by rice and wheat, but in terms of dietary intake, it is third after rice and wheat [5]. It is an important source of protein accounting for up to 60% of the daily human protein supply in sub-Saharan Africa [6]. Phenotypic and molecular characterization of elite highland maize inbred lines is the most vital to determine the existing genetic variability and the relationship among them for further utilization of germplasm in improved cultivar development. Characterization of available genetic diversity is a crucial step in effective crop improvement which provide the basis for the analysis of inbred lines combining ability and exploitation of heterosis. It involves evaluation of quantitative and qualitative attributes of given genotypes in order to differentiate them and determine their usefulness, structure, genetic variability and relationships among them [7,8].
The existing magnitude and nature of genetic variability among genotypes matters the preference of approaches of breeding for genetic improvement of a crop [9]. The probability that two randomly sampled alleles are different is genetic diversity [10]. The distance reflects definite amount of genetic difference present among the genotypes. These measures can be calculated by measuring morphological characteristics and/or using molecular markers. Even though, phenotypic evaluation has useful attribute for grouping inbred lines and populations, these phenotypic traits have limitations in distinguishing variation in highly related genotypes and elite breeding germplasm [11,12] due to genotype by environment interaction (GEI) Advances in molecular technology have produced a shift towards detecting individual difference using molecular marker such as SNP markers [13,14]. The nature and magnitude of genetic variability of every elite maize inbred lines is essential however limited number of highland maize inbred lines are characterized so far because of only certain researches have been conducted for the agro-ecology.
Concomitantly knowledge of magnitude of breeding material variation and associations among each new elite maize germplasm adapted to highland agro-ecology of Ethiopia is essentially significant to identify parents for further breeding in developing improved  [15,16]. Research reports also indicated large genetic differences among most lines of quality protein maize adapted to the highlands of Ethiopia [17]. On molecular characterization study, high genetic distance among most pairs of maize inbred lines were reported [18]. Maize inbred lines used in the present study were previously developed from Kenya foundation populations and advanced in the highland of Ethiopian maize breeding program at Ambo Plant Protection Research Center. However, they were not characterized to identify the usefulness and prospect how to use them in the breeding program. Therefore, the objectives of this study were to characterize elite maize inbred lines adapted to highland agro-ecologies and classify groups of similar inbred lines by means of cluster and principal component analysis based on morpho-agronomic and SNP markers data.

Study area description
This study was executed at two highland maize research testing sites: Ambo and Holetta, in the main season of 2014. Ambo is located at west of Addis Ababa at 8°59ꞌ N latitude and 37°51ꞌ E longitudes and an elevation of 2225 meters above sea level (m.a.s.l). It is in wet subhumid highland agro-ecological region of central Ethiopia. The area receives an average annual rainfall of 1115 mm. The rainy season spreads from April to October with maximum rainfall during June to August. The soil type of the experimental site is characterized by clay Vertisoil. Maximum and minimum temperatures of the centre are 26°C and 11°C respectively. Holetta is also located at West of Addis Ababa at 9°03ꞌ N latitude and 38°30ꞌ E longitude with an altitude of 2390 m.a.s.l. The average annual rainfall of the area is 1100 mm. The soil type of this site is characterized by Eutric Nitisol with minimum and maximum temperature is 6°C and 26°C respectively.

Experimental design and field management
The evaluated lines were sown in alpha lattice design 12 × 2. It was planted in two replications at each locations (Ambo and Holetta Agricultural Research Center). The trails were planted during the last week of May 2014 main season in both testing sites. Trials were hand planted at reliable moisture level of the soil to ensure good germination. Each entry was planted with two seeds per hill in a tworowed plot of 3.75 m long and 0.75 m with 0.25 m intra plant distance. After 35 days of germination each entry was thinned to a single seedling per hill to give a plant population of 53,333 ha -1 .
Weed was controlled by the application of Primagram-gold 660SC Pre-emergence herbicide, at the rate of three liters per hectare. Fertilizers were applied at both locations, (DAP) was given 69 kg ha -1 of phosphate (P 2 O 5 ) in the form of diammonium phosphate and onethird of 119 kg nitrogen (N) ha -1 in the form of urea as basal dressing at sowing time. The remaining nitrogen each one-third of 119 kg were side-dressed at 35 days after plant germination and before the stage of tasseling (female flowering time).
Measurements were taken on days to anthesis as the number of days from planting to 50% pollen shed. Anthesis silking interval was calculated as the difference between days to silking and anthesis. Leaf orientation was recorded at flowering. Silk, Stem and Tassel color was recorded on plot basis. Tassel size was recorded as small, medium and large while Tassel peduncle length and Tassel length was measured after milk stage. Leaf length was measured as the length of the leaf from ligule to apex. Leaf width was taken at mid-way along its length. Leaf area represented the area of the upper most ear leaf computed as maximum width × length × 0.75 in centimeter squares.
Number of primary branches on tassel were counted on plant basis. Ear diameter was measured at the mid-section along the ear length. Number of rows per ear was counted as the average number of kernel rows per ear. Number of kernels per row was counted as the average number of kernels per row. Plant height was measured as the distance from ground level to the first tassel branch while ear height was measured as the distance from ground level to the node bearing the uppermost ear.
Ear aspect was scored on 1-5 scale based on visual evaluation of harvested ears for general performance with regard to diseases and uniformity. Overall phenotypic appearance of the plant; where 1=excellent and 5=poor. Length of the ear from the base to tip; it was measured as the average length of 10 randomly sampled ears from each plot. Grain weight from all the ears of each experimental unit was measured and used to calculate grain yield, adjusted to 12.5% moisture content and expressed in ton ha -1 .

DNA extraction and genotyping
Plants were raised on plastic trays at Biosciences for Eastern and Central Africa (BecA) hub in Nairobi, Kenya screen-house. A single leaf from each of 10 seedlings per sample were piled collectively, the tips trimmed off and around equal amount of leaf segment cut at once to make a bulk and transferred into 1.2 mL of narrow piece tubes that contained two 4-mm stainless steel grinding balls (SpexCetriPrep, USA).
Genomic DNA was extracted using CetylTrimethyl Ammonium Bromide (CTAB) method [19]. DNA concentration was measured using the Quant-iT TM PicoGreen® dsDNA assay kit (Invitrogen TM , Paisley, UK) and the Tecan Infinite F200 Pro Plate Reader (Grödig, Austria), and normalized to 50 ng/μL. The quality of the extracted DNA was checked by digesting 250 ng of the genomic DNA from 10 randomly selected samples with 3.6 units of ApeKI restriction enzyme (New England Biolabs, Boston, USA) at 75°C for three hours.
DNA samples were shipped to the Genomic Diversity facility at Cornell University for genotyping. DNA samples were genotyped using genotype by sequencing (GBS) [20]. GBS data was generated by the Genomic Diversity Facility, Cornell University using ApeKI as restriction enzyme and 96-plex multiplexing.

Data analysis
Data were collected on 23 morpho-agronomic traits according to maize descriptors of International Board for Plant Genetic Resource (IBPGR) and CIMMYT [21]. Abbreviations, list of traits used in the study and their descriptions are summarized in Tables 2 and 3. Phenotypic data were subjected to analysis using the PROC PRINCOMP of SAS version 9.1.3. [22] and MINTAB [23].    Table 3: Genetic distance matrix based on phenotypic traits (above diagonal) and SNP markers (below diagonal) for all pair-wise combinations of 23 inbred lines adapted to highland areas of Ethiopia.
During data analysis entries were considered as fixed factors whereas incomplete blocks within replicates and replications were as random factors. Multivariate analysis such as, principal component (PCA) and cluster analysis were performed. Assessment of the importance and contribution of the morpho-agronomic characters in PCA was done in explaining variation.
Mean data standardization was performed to make certain that all have equal weight in the analysis [24]. The standardized mean values (mean of each trait was subtracted from the data values and then divided by the standard deviation) were used to perform PCA. Euclidean distance matrices in both morpho-agronomic and molecular markers data were used in cluster analysis to determine phenotypic and genotypic interrelationship among the lines.
Correspondence between both the distance matrices was determined by the product moment correlation derived from mantel Z test using (NTSYS-pc version 2.1). Molecular distances in all each pair of inbred lines were calculated using genotyping by sequencing (GBS) analysis pipeline TASSEL version 4.0.28, software [25].
The filtered sequences were aligned to the maize reference genome B73RefGenv1 [26] using the Burrows-Wheeler alignment tool. This procedure provided a total 3, 825 polymorphic SNPs covering all the 10 chromosomes of the maize genome. SNP loci, having not less than 0.05 allele frequency and no missing value, were selected and used in the genetic diversity analysis of the inbred lines.
A dendrogram was constructed from the genetic distance matrix based on genetic dissimilarity using the unweighted pair group method with arithmetic average (UPGMA) and the resulting tree was visualized using MEGA version 6.0.

Across locations variance analysis in phenotypic traits
The overall variance analysis showed in the result of current work, Table 2 indicated that the 23 entries were highly significantly different for all traits evaluated (p ≤ 0.01). Substantial variability existed among the lines as was discovered by great variations between minimum and maximum values (Figure 1). The minimum yield were recorded from L14 (1.13 ha -1 ) while L5 was the highest 4.04 t ha -1 . The male flowering date lies between 100 (L7) to 112 days (L22). Inbred line L17 showed the highest kernel rows per ear (16) while L14 showed the lowest (10). Molecular distance matrix estimates based on SNP markers revealed as Euclidean distance between pair wise comparisons of all the 23 entries varied from 0.16 to 0.35 (Table 3) and the overall average genetic distance was 0.32.
The highest value of genetic distance was recorded for L9 vs. L1, L11 vs. L1 and L12 vs. L5 (0.35), while L17 and L16 showed higher similarity with value 0.16. The UPGMA dendrogram resulted from distance matrix of genetic categorized the entries into four main groups (Figure 2). Cluster I consists of seven inbred lines and cluster II consists of one inbred line while cluster III consists of six inbred lines. Cluster IV consists of a greater number of lines, compared to the other groups.
The association analysis between molecular and morphological genetic distance and the correlation between them as tested by twoway mantel test was low (r=0.20) even though, their correlation result is low in magnitude, their association is positive.

Phenotypic cluster analysis
Cluster analysis performed on the basis of morpho-agronomic traits grouped 23 tested lines in to five distinct classes and an outlier genotype ( Figure 2). The number of maize genotypes in a particular cluster ranged from two in cluster (IV and V) to nine in cluster (I). Cluster I consisted of nine inbred lines characterized by having highest number of leaf, maximum anthesis date, highest ear height, highest thousand kernel weight, highest ear diameter and ear position. Cluster II consisted of six entries in which a traits (leaf orientation and silk color) showed the maximum while their stem color and ears per plant are the smallest. Genotypes in cluster III were designated by the highest leaf area, highest width of leaf, leaf length, high length of tassel, have more ears per plant and high grain yield. Inbred lines in cluster IV were represented by having highest tassel length, number of primary branch on tassel, plant height whereas their grain yield was the lowest. The unique features of genotypes in cluster V were greatest kernels rows per ear, more kernels per row on the contrary, plant aspects and ear aspects are poor (Table 4). Maize line L22 was observed as an outlier against the remaining lines for exhibiting either maximum or minimum value for the characters studied.

Characters
C-I C-II C-III C-IV C-V

Principal component analysis
Agro-morphological variability was explained by a total of 23 components. The PCA indicated that the first nine principal components (PCs) with eigen value greater than unity accounted for 83% of the entire diversity among 23 inbred lines for all traits (Table 5). Under the first PC (17%) the major significant characters explaining the variation were weight of thousand kernel, height of plant, silk color and yield of grain. The 2nd PC, which contributed 13% of the entire variation, resulted predominantly from parameters like length of leaf, area of leaf, ear aspect and size of tassel.  The 3 rd PC, which explained 12% of the whole variation, was governed by characters such as, tassel peduncle length and orientation of leaf. The most discriminating traits under the 4 th PC were length of tassel, number of primary branch on tassel, kernels number per row and grain yield production which contributed for 10% of the entire diversity. Kernel row number per ear, grain yield, ears per plant, ear aspect, plant aspect and leaf orientation are important traits contributing to more than one principal components.
As it is clearly seen in Figure 3, inbred lines in this study were distributed in all the four quadrants of the principal component axis.
Line L22, which remained solitary in the former clusters of Figure 2 was still plotted far apart from the other groups in Figure 3. The traits of genotype number 11, 16 and 13 were highly and positively contributed for the first principal component while that of inbred lines number 21, 5, and 8 were highly and negatively contributed to PCI. On the other hand, the traits of inbred lines number 2, 20, 10, 23, 13, 23, 21 and 4 contributed highly and positively to PCII whereas inbred lines 22, 12, 5, 8, 19 and 17 contributed highly but negatively to the same principal components.

Discussion
Characterization of genetic diversity and relationships among elite inbred lines within a given set of maize germplasm in the use of phenotypic and molecular markers is fundamentally useful in crop improvement for understanding how to use the assembled germplasm for further breeding, such as selecting parental lines [27], assigning heterotic groups [7,8] and creating a core set of germplasm [28].
Significant differences were found among the 23 inbred lines studied which indicates the existence of high level of variation for various characters which makes selection possible for improvement of grain yield and other agronomic traits. Both morpho-agronomic and molecular characterizations were effective in distinguishing the germplasm. A group inbred lines from the same group have a tendency of unnecessary contribution to crop genetic improvement therefore, genetically distinctive genotypes should be selected depending on their distance estimated from morpho-agronomic traits and/or marker information [18]. According to the current study, more variations were observed for the agronomical characteristics (LA, TKWT, PH, EH, KPR and GY) than were obtained for the morphological traits ( Figure  1).
The broad range means of the inbred lines from morpho-agronomic data implies that there is great potential for the development of hybrids or OPVs using these materials. The wide range in leaf area (339.14-632.65) for instance, suggests the opportunity to develop a cultivar suitable for different purposes, like intercropping and animal feed. Similar results were indicated by earlier researchers on the area studied [29][30][31][32]. Morpho-agronomic data revealed the presence of significant genetic variability between the lines studied. In grouping of the inbred lines tested, the dendrogram also indicated the resolution power of morpho-agronomic characters. In most cases of this study lines clustered in the same group are related by their pedigree recordings. The best grain yielder inbred line in cluster III can be utilized in the formation of better yielding cultivars. Whereas, genotypes grouped in cluster II can be sought to breeding for reduced ear height and optimum variety in plant height. Previous reports by, Lucchin et al. [33] grouped 20 Italian flint maize landraces into clusters using agronomic characters and morphological traits. Similarly, 15 morphological characters were classified 62 traditional highland maize accessions in to three groups [15]. Diversity analysis of 45 maize inbred lines, [34] using morphological data grouped effectively closely related inbred lines which is in concurrence with pedigree information. On the other hand, reports consistently point out that the prevailing environmental conditions highly influenced morphological markers [15,34].
The broad range of variability observed between genotypes based on morpho-agronomic traits was furthermore substantiated by principal component analysis which showed that the general variation resulted possibly will not explained by a small number of eigenvectors. The percentage contribution of the first nine PCs to gross genetic variation obtained in the current study was 83%. Classification of entries was majorly due to grain yield and other agronomic characters including, plant height, ear height, ears per plant, thousand kernel weight, number of kernel row per ear, number of kernels per row and ear diameter. PCI represented variables reflecting grain yield and its contributor elements, whereas the supplementary components reflected morphological traits that contributed to yield. Morphoagronomic characters like, length of leaf, leaf numbers, anthesis date, grain yield, aspect of plant, kernels per row on ear, number of primary branch on tassel, width of leaf and stem color that had high values in the first nine components indicated their importance as maize descriptors and could be helpful for differentiation of maize genotypes. Overall, PCA enable to recognize the most crucial characters for classifying the variability within the genotypes. Previous reports Gissa and Kamara et al. [31,35] similarly used PCA to identify traits that explained for majority of the variation among different maize genotypes. The present study found, 83% of the entire variation which is higher than [15] who obtained 71.8%. According to their findings, traits like height of ear placement, kernels per row on ear, weight of thousand kernels, size of tassel, length of leaf are mainly contributing to the entire variability. The outcome of PCA was consistent with the results of cluster analysis, whereby the major differences between clusters were attributed to similar traits that contributed most to the first and second principal components.
The moderate molecular genetic distances exhibited by the inbred lines suggests reasonable genetic diversity, and hence enable a breeder to make crosses and maximize heterosis. The crosses involving parents from most divergent clusters are expected to manifest maximum heterosis and create variability in genetic architecture. Several research reports revealed grouping using the molecular marker UPGMA clustering methods follows pedigree data [32,36]. In this study too, almost all of SNP based clustering was the same with pedigree data and is in agreement with earlier studies [31,37,38]. According to the pedigree data of the current study, lines L1, L2, L3 and L4 showed a close relationship having a common parent TUXCML159 hence, grouped in the same cluster. Similarly, inbred lines L5, L6, L7, L8, L9, L10 and L11 were grouped in the same cluster and have line SADVLACML176 in common, just as lines L12, L13, L14, L15, L16, L17, L18, L19 and L20 were close due to the common parent P502 SRCML 384X176 in their pedigree. In some cases, the diversity analysis based on phenotypic and molecular markers resulted in a similar pattern of grouping of inbred lines however, the association of the two markers was low. Similar to the present result, earlier study [39] used morphological data and SSR markers in assorted maize inbred lines which found a correlation coefficient of 0.23 between the markers. A study on 36 genotypes of sorghum using molecular and morphological data also confirmed the absence of significant relationship between the markers [40]. The cause of low association between morphoagronomic and genetic marker is due to some reasons including distribution of markers in the genome, the magnitude of markers utilized and the nature of the evolutionary mechanism underlying the variation measured can affect the genetic distance estimates [41].

Conclusion
The existence of genetic variability and relationship within each all pairs of lines, obviously represents the uniqueness of the majority of the inbred lines used in the current work. The information resulted from the present experiment is significant in selecting best parent for improved variety development. In general, the two genetic distance measurements are associated positively. SNP markers grouped the genotypes into four using UPGMA clustering algorithm which is in accordance with their pedigree information. The output of the current work point out the strength of SNP markers for diversity analysis and clustering related inbred lines together more effectively than morphoagronomic characters. The resulted information is useful for better understanding of the genetic relationships and efficient use of the identified inbred lines in the breeding programs for the formation of improved hybrid maize varieties adapted to highland agro-ecological zone of Ethiopia.