Phenotypic correlation, path coefficient and multivariate analysis for yield and yield-associated traits in groundnut accessions

Abstract Yield is a complex quantitative trait largely influenced by the environment. Direct selection for grain yield is less efficient in improving groundnut productivity. The selection efficiency can be enhanced by exploiting the relationship between yield and its related traits. Moreover, the use of genetically diverse parents is essential to generate genetic variation for successful selection of genotypes in a breeding program. Therefore, the study aimed at analysing the relationship between grain yield and its related traits and determining the morphological diversity among selected groundnut genotypes under natural rosette disease (GRD) infestation. The genotypes were evaluated in a 7 × 4 alpha lattice design with three replications. Data were collected on yield and yield-related traits. Correlation, path coefficient and multivariate analyses were done. The results revealed that yield was directly associated with plant height, number of pods per plant, hundred seed weight, GRD incidence and number of secondary branches. Therefore, these traits should be considered in selection when improving groundnut for yield. Cluster analysis revealed existence of diversity among the evaluated groundnut genotypes with no influence of geographical origin to the clustering pattern. The Principal Components Analysis (PCA) biplot was effective in showing the genetic distance among the genotypes and the results were comparable with those of the cluster analysis. Moreover, Shannon-Weaver diversity indices revealed existence of high diversity among the genotypes, an implication that groundnut improvement for yield is possible through selection in breeding.


Abstract:
Yield is a complex quantitative trait largely influenced by the environment. Direct selection for grain yield is less efficient in improving groundnut productivity. The selection efficiency can be enhanced by exploiting the relationship between yield and its related traits. Moreover, the use of genetically diverse parents is essential to generate genetic variation for successful selection of genotypes in a breeding program. Therefore, the study aimed at analysing the relationship between grain yield and its related traits and determining the morphological diversity among selected groundnut genotypes under natural rosette disease (GRD) infestation. The genotypes were evaluated in a 7 × 4 alpha lattice design with three replications. Data were collected on yield and yield-related traits. Correlation, path coefficient and multivariate analyses were done. The results revealed that yield was directly associated with plant height, number of pods per plant, hundred seed weight, GRD incidence and number of secondary branches. Therefore, these traits should be considered in selection when improving groundnut for yield. Cluster analysis revealed existence of diversity among the evaluated groundnut genotypes ABOUT THE AUTHOR Nelson Mubai is a researcher and lecturer at the School of Agricultural Sciences, Save University of Mozambique. His areas of research include: Plant Breeding, Genetics and Biotechnology. Julia Sibiya is an Associate Professor in Plant Breeding and Academic Leader in the School of Agricultural, Earth and Environmental Sciences, University of KwaZulu-Natal, South Africa. Her research focuses mainly on maize and sorghum and supervision of postgraduate students. James Mwololo is a scientist working at International Crops Research Institute for the Semi-Arid Tropics as a groundnut breeder. His current research focus on development of improved groundnut varieties using modern technologies and innovations to address global challenges. Patrick Okori is a Principal scientist, at the International Crops Research Institute for the Semi-Arid Tropics and a country director of Malawi office. His research focus include legume plant breeding and genetics, coupled with extensive knowledge in allied socio-economic disciplines, needed to unlock genetics in crops.

PUBLIC INTEREST STATEMENT
Groundnut yield is one of the key traits that influence adoption of improved varieties. The development of high yielding varieties would be made possible through effective selection. Moreover, the use of genetically diverse parents is essential to generate genetic variation for successful selection in breeding. The study analyzed the relationship between grain yield and its related traits and determined the diversity among selected groundnut accessions based on morphological traits under natural rosette disease (GRD) infestation. The results revealed that plant height, number of pods, hundred seed weight and GRD incidence would be key traits as a proxy to yield. The multivariate analysis revealed existence of genetic diversity and effectively showed the genetic distance among the groundnut accessions. The implication is that groundnut improvement for yield is possible through breeding. The findings are of significance to breeding programs to exploit the genetic variation while using the key traits to support selection.

Introduction
Cultivated groundnut (Arachis hypogaea L., AABB, 2 n = 4x = 40) also known as peanut, is a legume crop that originated in South America through hybridization of its diploid ancestors, Arachis duranensis (AA) and Arachis ipaensis (BB), followed by spontaneous chromosome doubling (Talawar, 2004;Bertioli et al., 2015;Zhang et al., 2016). The crop is grown in tropical and subtropical countries for its high-quality oil (47-53%) and easily digestible protein (24-36%) (Maiti, 2002;Singh & Nigam, 2016). The crop is the sixth and third most important source of vegetable oil and protein, respectively and ranks 13 th among the world food crops (Singh & Nigam, 2016). However, several biotic, abiotic and socio-economic factors constraint groundnut production in Malawi and other developing countries (Chala et al., 2014;Chikowo et al., 2015). The groundnut rosette disease (GRD), is among the major biotic constraints. It is a viral disease caused by a complex of three agents (Groundnut rosette assistor virus (GRAV), Groundnut rosette virus (GRV) and a satellite-RNA (satRNA) associated with GRV) and transmitted by an aphid (Aphis craccivora Koch). Therefore, the development of high yielding cultivars that are resistant to both biotic and abiotic stresses, with farmers` preferred traits should be continuous and a priority.
Plant breeding programs aim to improve one or more traits at the same time, with yield increase being the most important objective (Mandal et al., 2017;Yusuf et al., 2017). It has been indicated that grain yield is a complex quantitative trait, resulting from an interplay of various related traits (Acquaah, 2009;Kiranmai et al., 2016). It is largely influenced by the growing environment and generally has low heritability (Luz et al., 2011;Mukherjee et al., 2016). Hence, direct selection for yield is less efficient in improving groundnut productivity. Nevertheless, efficiency in yield improvement can be enhanced by exploiting the relationship between yield and its associated traits. As such through correlation and path-coefficient analysis, it would be possible to elucidate the most important traits that would help in achieving progress (Zaman et al., 2011). Trait association studies are important in groundnut than other crops, because the pods are formed underground thus it may not be possible to effect proper selection prior to harvesting (Kiranmai et al., 2016). Correlation and path-coefficient analyses have been reported in groundnut (Patil et al., 2006;Rao et al., 2014). However, it has been indicated that their estimates are influenced by the environment and/or the genotypes used (Kiranmai et al., 2016).
The selection of genetically diverse parents is essential for a successful breeding program, as it provides opportunity for the development of new improved cultivars with desirable traits (Govindaraj, 2015;Niveditha et al., 2016). Cluster and principal component analysis (PCA) are useful tools for the determination of genetic relationship among genotypes in crop improvement. This is due to the fact that they group genetically similar genotypes together and create a scatter plot of genotypes with the geometrical distances among them reflecting their genetic distances with minimum distortion, respectively (Ali, Kanwal, Ahsan, Ali, Bibi, Niazi et al., 2015;Jolliffe, 2002;Mohammadi et al., 2003;Pereira et al., 2015). This study, aimed at analysing the relationship between grain yield and its associated traits through correlation and path analyses among selected groundnut genotypes under natural GRD infestation. The key objective was to identify traits contributing the most to yield and identification of genotypes that would be useful in future groundnut improvement.

Plant materials
Twenty-eight groundnut genotypes, comprising 25 accessions originating from various countries and three cultivars released in Malawi were evaluated under natural GRD infestation (Table 1).

Experimental site
The genotypes were evaluated at ICRISAT Malawi, located at Chitedze Agricultural Research Station (33°38ʹE and 13°85'S) in 2008/2009 and 2017/2018 cropping seasons. The station is located 16 km West of Lilongwe (Malawi) with an altitude of 1146 meters above sea level (masl). The accessions were evaluated under natural GRD infestation, since the station is a hotspot area with high GRD pressure during the growing season, especially with late-planted groundnut crop. Based on the long-term climatic data, the station has an average minimum and maximum temperature of 16°C and 24°C, respectively, with a mean annual rainfall of 892 mm.

Experimental design and management
The 28 groundnut genotypes were evaluated in a 7 × 4 alpha lattice design with three replications. Border rows of genotype JL24 that is highly susceptible to GRD were sown around the trial to enhance GRD inoculum build-up. Each genotype was hand-sown in a 3 row-plot at a spacing of 0.6 m and 0.15 m between rows and within plants, respectively. Two seeds were planted per hill and later thinned to one seedling per hole three weeks after planting. Fertilizers and pesticides were not applied and three hand weedings were carried out. The trial was conducted under rainfed conditions supplemented with irrigation when necessary. Harvesting and shelling were done manually.

Data collection
Data on quantitative and qualitative traits were collected. These included disease incidence and severity, days to flowering and maturity, number of branches, plant height, yield and its components and shelling percentage. Disease incidence was measured according to Waliyar et al. (2007), while yield and agronomic traits were measured as described for groundnut descriptors (IBPGR & ICRISAT, 1992). Data were collected on five randomly selected plants and 10 mature pods that were randomly chosen except for disease incidence and yield (Waliyar et al., 2007).

Percentage of disease incidence (PDI)
Visual scoring for GRD development was carried out at 60, 80 and 100 days after sowing (DAS). The number of plants showing GRD symptoms in each plot was obtained by counting and PDI was calculated as follows: Where: PDI is the percentage of disease incidence, NIP is the number of plants showing GRD symptoms and TP is the total number of plants in a plot.

Days to flowering (DTF) and days to maturity (DTM)
Days to flowering and maturity were determined as the number of days between sowing date and the date when 50% of plants in a plot had flowered and matured, respectively.

Plant height and number of branches
Plant height (PH), and number of primary (NPB) and secondary branches (NSB) were recorded at 85 DAS. Plant height was measured from the ground to the top of the main stem axis using a ruler, whereas branch numbers were counted. These traits were recorded on the five randomly chosen plants in each plot and a mean was calculated.

Yield and yield components
The number of pods per plant (NPP) was recorded during harvesting by counting the mature pods on the five selected plants and a mean was determined for each plot. Pod length (PL) and pod width (PW) were measured on 10 pods randomly chosen, at the longest and widest points, respectively. The pods were sun dried to approximately 8-10% moisture content and then weighed to determine pod yield per plot. A pod sample of approximately 100 g which was randomly drawn from each plot was shelled, then the seed weighed and the shelling percentage (SP) was determined as follows: Where: SP is the shelling percentage, SW is the seed weight and PWT is the pod weight before shelling.
One hundred seeds were counted and weighed from the shelled samples and the hundred seed weight (HSW) was recorded and expressed in grams. Seed yield (SYD) was estimated using the formula: Where: SYD is the seed yield, PY is the pod yield per plot (Patil et al., 2006), PS is the plot area (m 2 ) and SP is the shelling percentage (expressed as a fraction).
Qualitative data were recorded on 10 traits, following groundnut descriptors (IBPGR & ICRISAT, 1992). The recorded qualitative data included growth habit and branching type (recorded at podding stage), stem surface, leaf shape, leaf colour and flower colour (recorded at flowering), pod constriction (recorded at harvest), seed colour, primary seed colour and seed size (recorded after shelling).

Path-coefficient analysis
Path-coefficient analysis was carried out using two procedures, the conventional and sequential path analysis to establish the direct and indirect effects of yield component traits on yield. The conventional path analysis considered all the traits as first-order predictors with grain yield as response variable whereby the correlation coefficients were partitioned into direct and indirect effects (Dewey & Lu, 1959). Sequential path analysis used sequential stepwise multiple regressions, in SAS version 9.4 (SAS Institute, 2015), to organize the traits into first-and second-order predictors, based on their contribution to the variation in grain yield and minimum collinearity (Mohammadi et al., 2003). Traits with minimal contribution and/or high multi-collinearity are automatically dropped by the multiple regression model. The procedure was repeated, taking each first-order predictor as dependent variable to find their associated predictors, which were second-order predictors for grain yield. The direct and indirect effects in the different path orders were estimated (Dewey & Lu, 1959) and classified as negligible (0.00-0.09), low (0.1-0.19), moderate (0.2-0.29) and high (0.3-0.99) (Lenka & Misra, 1973). Tolerance (TOL) and variance inflation factor (VIF) were used to measure the level of multi-collinearity for each predictor trait.
where R 2 j is the coefficient of determination for the prediction of j th variable by the predictor variables) is the amount of variance of the selected independent variable not explained by other independent variable. The variance inflation factor is the inverse of tolerance (VIF ¼ 1=TOL) and designates the extent of effects of other independent variables on the variability of the selected independent variable (Hair et al., 1995;Paul, 2006). Generally, variance inflation factor greater than five is an evidence of high multi-collinearity (Akinwande et al., 2015;Belsley et al., 2005).

Cluster analysis
The measured variables were standardized to unit variance, by dividing each observation by the standard deviation of the trait (Gan et al., 2007). The standardized values were used for cluster analysis using UPGMA (Unweighted Pair Group Method with Arithmetic Mean) clustering method in PROC CLUSTER program in SAS version 9.4 (SAS Institute, 2015), with average linkage method based on Euclidean distance. The associations among the genotypes were determined using Jaccard similarity coefficients based on squared elucidian distances. The genetic distances obtained from cluster analysis were used to construct a dendrogram using PROC TREE on the same software, depicting the relationships of the genotypes using.

Principal component analysis
Principal component analysis (PCA) was performed based on the correlation matrix in SPSS version 25 (Bryman & Cramer, 2012) using the standardized variables. The principal component analysis (PCA) biplot was plotted using Genstat 18 th Edition (Payne, 2014). Only the principal components with eigenvalues greater than one were considered in determining variability among the accessions (Iezzoni & Pritts, 1991).

Shannon-Weaver diversity index
The Shannon-Weaver diversity index (H') was calculated using Microsoft Excel 2016 (Hutcheson, 1970). The index was used as a measure of phenotypic diversity of each qualitative trait and was determined as follows: Where: n is the number of phenotypic classes for a trait and pi is the proportion of accessions in the i th class of an n-class trait. Each value of diversity index was divided by its maximum value (log e n) to keep the values between zero and one. Table 2 shows the magnitude of relationship among the quantitative traits. The results show that there was high degree of association between some of the traits. Grain yield had significant (P < 0.001) positive correlation with plant height (r = 0.66) and number of pods per plant (r = 0.87). However, it was negatively correlated (P > 0.05) with days to flowering (r = −0.26), days to maturity (r = −0.21), number of secondary branches (r = −0.12), pod width (r = −0.17) and pod length (r = −0.20). Further, it showed non-significant positive correlation with number of primary branches (r = 0.15) and hundred seed weight (r = 0.19), but strong negative correlation with rosette incidence (P < 0.01, r = −0.66). Number of pods per plant had strong positive correlation with plant height (P < 0.01, r = 0.51), but weak positive correlation with shelling percentage (P > 0.05, r = 0.27). Positive correlation coefficients were also recorded between hundred grain weight with days to flowering (P < 0.05, r = 0.47), days to maturity (P < 0.01, r = 0.57), number of primary branches (P > 0.05, r = 0.32), number of secondary branches (P < 0.01, r = 0.52), pod width (P < 0.001, r = 0.82) and pod length (P < 0.001, r = 0.61). There was negative correlation coefficient with disease incidence (%) (P > 0.05, r = −0.09). Days to maturity had high positive correlation with days to flowering (P < 0.001, r = 0.86) and number of secondary branches (P < 0.001, r = 0.84). However, it had moderate positive correlation with number of primary branches (P < 0.001, r = 0.49) and moderate negative correlation with plant height (P < 0.01, r = −0.49).

Conventional path analysis
The conventional path analysis considered all the traits as first-order predictors with grain yield as a response variable. The estimates of direct and indirect effects are shown in Table 3. High levels of multi-collinearity were observed among some predictor traits. The indirect effects were lower in magnitude than the direct effects. Number of pods per plant recorded the highest positive direct effect on grain yield of 0.586, followed by days to maturity (0.332), plant height (0.281), grain yield per plant (0.259) and hundred seed weight (0.155). Disease incidence (0.019), shelling percentage (0.018) and number of primary branches (0.079) showed the lowest and negligible positive direct effects on grain yield. The number of secondary branches (−0.271) had moderate negative direct effect on yield. In addition, pod length (−0.047), pod width (−0.020) and days to flowering (−0.012) showed negligible negative direct effects. The highest positive indirect effect on yield was grain yield per plant via number of pods per plant (0.451) while the most negative indirect effect was GRD incidence through number of pods per plant (−0.410).

Sequential path analysis
The sequential path analysis (Table 4 and Figure 1) had low multicollinearity for all the predictor traits. The predictor traits were classified into first-and second-order predictors. This ordering provided a better understanding of their interrelationships and relative contribution to grain yield. Based on the sequential stepwise multiple regressions, plant height, number of pods per plant, grain yield per plant and hundred seed weight were considered first-order predictors, which accounted for 88% of the variation in yield (Figure 1). These traits had low to high positive direct effects on grain yield, with the highest effect being observed for number of pods per plant (0.552), followed by grain yield per plant (0.276), plant height (0.236) and hundred seed weight (0.177). The indirect effects of grain yield per plant (0.425) and plant height (0.282) on overall yield were the highest (Table 4 and 5). These indirect effects were higher in magnitude than the corresponding    direct effects, while the rest were lower. The second-order predictors included number of secondary branches, rosette disease incidence (%), pod length and pod with. The sequential path analysis of the second-order predictors over the first-order predictors, revealed that 44% of the variation for plant height was due to number of secondary branches and groundnut rosette disease incidence. They both had high negative direct effects on plant height of −0.388 and −0.510, respectively, and negligible indirect effects. In the same order path, groundnut rosette disease incidence (−0.698) and pod length (−0.405) had high negative direct effects on number of pods per plant and together accounted for 63% of the variation in number of pods per plant. Pod width and rosette incidence explained 70% of the variation in hundred grain weight. The pod width had high positive direct effect (0.859) whereas groundnut rosette disease incidence showed moderate negative direct effect (−0.231). These two second-order predictors had lower indirect effects on hundred grain weight. The other remaining traits (days to flowering and maturity, number of primary branches and shelling percentage) were automatically dropped by the sequential path analysis multiple regression model. The implication is that these traits were found to have negligible contribution to variation in yield.

Cluster analysis
Cluster analysis showed a clear variation among the evaluated groundnut accessions (Figure 2 and 3). At truncation level of 0.85 in the Jaccard coefficient scale, the genotypes were grouped into four clusters. The similarity coefficients ranged between 0.4 and 0.85 indicating high level of genetic diversity among the genotypes. The cluster means for the recorded quantitative traits are shown in Table 6. Apart from other differences among the clusters, botanical grouping prevailed. Cluster II was the largest with 13 genotypes (46.4% of the total germplasm) which were mostly Spanish and Valencia. The genotypes in this cluster had low yield. Cluster I and III had seven (25%) and three (11%) of the genotypes, respectively, whereby most of them were the Virginia type. High hundred grain weight was observed in Cluster 1. The least number of genotypes (2) were observed in cluster IV, and these were high yielding and rosette resistant Spanish type. Genotype ICG 6813 was distinct and close to the first cluster while ICG 14985 and ICG 12509 were distinct and close to the fourth cluster.

Principal component analysis
The first three principal components accounted for most of the variation observed and cumulatively explained 77.44% of the total variation among the 13 quantitative traits (Table 7). The first principal component (PC1) had an eigenvalue of 5.27 and explained 40.5% of the total variation. This partitioned genotypes mainly based on grain yield, yield per plant, number of pods per plant, plant height and groundnut rosette disease incidence. This component discriminated the genotypes based on productivity and response to the groundnut rosette disease. The genotypes were therefore separated according to their superiority and resistance. The second principal component (PC2) accounted for 24.8% of the total variation, with most of the variation being attributed to days to flowering and maturity, number of primary and secondary branches. This component can be associated with physiological and geneology traits, whereby genotypes were separated based on their botanical groups (Spanish, Valencia and Virginia). The traits that contributed most to the third principal component (PC3), that accounted for 12.09% of the total variation were pod width and pod length.

Principal component analysis biplot
The principal component analysis (PCA) biplot (Figure 3) shows the relationship among the different variables and genotypes with respect to the first two principal components. The geometrical distances among genotypes in the biplot reflect the genetic distances among them. Smaller angles between dimension vectors in the same direction indicated high correlation of the traits in relation to discrimination of genotypes. Genotypes excelling in a particular trait were plotted closer to the vector line and further in the direction of that particular vector, often on the vertices of the convex hull. Genotypes ICGV-SM 99568 and ICG 12988 excelled in grain yield, which was contributed mostly by number of pods per plant, shelling percentage and plant height. Genotypes ICG 13942 and ICG 6057, being Virginia types were plotted in the direction of late maturing genotypes as expected. Genotypes ICGV-SM 90704 and CG7 were clustered together in the direction of high hundred seed weight and high number of secondary branches. Genotype ICG 12509 was plotted in the direction of high disease incidence. The high yielding and less diseased genotypes (lower incidence values) were plotted at the positive side of the biplot. It was evident that the first principal component analysis (PCA) scores effectively separated the genotypes.

Shannon-Weaver diversity index
Diversity indices (H`) were determined to compare phenotypic diversity among the 10 qualitative traits among the groundnut genotypes. Generally, high diversity indices were observed, which ranged between 0.95 for leaf colour and 0.9996 for flower colour. The diversity indices indicate the existence of high diversity for the qualitative traits among the genotypes and this was consistent with results of the cluster analysis.

Description of variable codes:
SYD-seed yield, PH-plant height, NPP-number of pods per plant, SYDP-seed yield per plant, HSW-hundred seed weight, NSB-number of secondary branches, PDIpercentage of disease incidence, PL-pod length and PWpod width.

Correlation analysis
Grain yield had positive correlations with number of pods per plant, plant height, shelling percentage, hundred seed weight and number of primary branches. Similar associations have been reported in previous studies by Zaman et al. (2011) and Rao et al. (2014). These positive associations suggest that selecting for these traits would simultaneously contribute to improvement in yield. The strong positive correlation between grain yield and number of pods per plant may suggest that these traits share some common genes (Almeida et al., 2014;Kozak & Azevedo, 2014). Moreover, one SSR marker has been linked to both traits, and another marker to pod length and hundred seed weight (Gomez Selvaraj et al., 2009), and this agrees with the observed strong positive correlation between the two traits in the current study. The positive correlation between grain yield and plant height may indicate that tall genotypes have more capacity to accumulate photo-assimilates, resulting in higher yields. This is of significance to breeding programs that height should be a key trait in selection for yield. The implication is that selection based on these high positively correlated traits could maximize yield in groundnut.
Grain yield showed strong negative correlation with rosette disease incidence, confirming previous reports that the disease has a devastating effect on groundnut productivity (Mohammed et al., 2018;Muitia, 2011;Van der Merwe et al., 2001). This further confirms the negative effect that the disease has on grain yield, whereby 100% yield losses can occur depending on the stage of growth at which infection occurs. Grain yield also showed weak negative correlations with days to flowering and maturity. Similar results were observed by Khan et al. (2000), Meta and Monpara (2010), and Rathod and Toprope (2018). Differences in flowering pattern and days to 50% flowering lead to variation among genotypes, thus the possible contribution to the weak negative correlation. This agrees with previous reports that variation in pod number and ultimate yield is due to both the timing and the initial rate of flower production and genotype (Craufurd et al., 2000). However, weak positive correlations between yield with days to flowering and maturity were reported by Reddy et al. (2017). This is particularly true among the Virginia types that flower and mature late, therefore endowed with sufficient time to accumulate photo-assimilates, resulting in higher yields. This negative and/or weak positive correlation between flowering, a phenological trait with yield make it a poor trait for selection. The number of secondary branches per plant had a weak negative correlation with yield, contradicting the previous strong positive correlations reported by Balaraju and Kenchanagoudar (2016). This disparity in correlation coefficients results could be due to differences in either genotypes and/or environment used in these studies.

Conventional and sequential path analysis
The correlation analysis may not provide a clear picture of the importance of each secondary trait in determining yield (Dewey & Lu, 1959;Kozak & Azevedo, 2014). Wright (1921), developed pathcoefficient analysis, which partitions the correlation coefficients into direct and indirect effects, allowing the estimates of contribution of each trait to yield. Several researchers have used the conventional path analysis (all the traits used as first-order predictors) in groundnut, and the traits often highlighted in this regard were number of pods per plant (Patil et al., 2006;Rao et al., 2014), plant height (Reddy et al., 2017), hundred seed weight (Rao et al., 2014;Zaman et al., 2011), days to maturity and number of secondary branches (Patil et al., 2006). The conventional path analysis in the current study, revealed that number of pods per plant, days to maturity, plant height and hundred seed weight had high positive direct effect on grain yield agreeing with findings from earlier studies (Patil et al., 2006;Rao et al., 2014;Reddy et al., 2017). The number of secondary branches had high negative effect on yield, whereas that of pod width and pod length though negative was negligible. The results are contrary to earlier reports by Patil et al. (2006) that these traits have positive contribution to grain yield. The difference in results between the current and previous studies, could be explained by the genotypes used and their reaction to groundnut rosette disease. For example, the Virginia (which generally produce high number of secondary branches) and Valencia (which have long pods) were low yielding, mainly due to the fact that they were susceptible. Hence, more studies should be conducted, particularly under both rosette disease and disease free-environments, to ascertain the contribution of these traits on yield across the three botanical groups. High levels of multi-collinearity were observed for some predictor traits in the conventional path analysis in the current study. Although conventional path analysis easily identifies the direct and indirect effects of secondary traits on grain yield, it usually leads to high levels of multi-collinearity, which confound the detection and interpretation of the actual contribution of each of the traits to yield (Blalock, 1963;Mohammadi et al., 2003).
Specific contribution of each trait to yield was therefore explored further through sequential path analysis, that automatically drops traits with negligible contribution to yield and/or with high multi-collinearity. The sequential path analysis, resulted in low multi-collinearity for all the predictor traits and allowed ordering of these traits into first and second-order predictors through sequential stepwise multiple regression. These provided a better understanding of the interrelationships among the traits and their relative contribution to yield (Kozak & Azevedo, 2014;Olivoto et al., 2017). The magnitude of contribution of the secondary traits to grain yield varied, and this should be considered for more efficient selection (Figure 1). The results revealed that high grain yield was directly associated with taller plant types, higher number of pods per plant, yield per plant and hundred seed weight. This is further linked to higher pod width, pod length, lower rosette disease incidence and number of secondary branches. Similar to the current results of the study, the number of pods per plant and hundred seed weight have been reported consistently to have positive direct contribution to yield (Rao et al., 2014;Reddy et al., 2017;Zaman et al., 2011). These researchers reported positive and significant direct effects of the latter traits on grain yield. Hence, more emphasis should be given to these traits when selecting for grain yield in groundnut. Generally, path analysis is influenced by the environment and/or the genotypes used, supporting the divergence between the current and the earlier reports (Kiranmai et al., 2016).

Cluster and principal component analysis
Clustering genotypes based on their agro-morphological characters is useful as it assists in identification and selection of best performers and genetically diverse parents for use in breeding Description of trait codes: DTFdays to flowering, DTM-days to maturity, NPB and NSB-number of primary and secondary branches, respectively, PHplant height, NPP-number of pods per plant, PW-pod width, PL-pod length, SYD-seed yield, SYDP-seed yield per plant, SPshelling percentage, HSWhundred seed weight and PDIpercentage of disease incidence pipelines (Govindaraj, 2015;Niveditha et al., 2016). The study indicated the presence of diversity among the tested groundnut genotypes. Groundnut genotypes grouped in different clusters could be evaluated for combining ability to constitute a pool of elite parents. These findings are supported by previous reports by Siddiquey et al. (2006) and Banerjee et al. (2007), that there is high genetic diversity in groundnut. The grouping of the genotypes indicated that geographical origin had no influence on clustering pattern. Moreover, this is an indication that geographical diversity is not a measure of genotypic diversity. Similar results were reported in groundnut by Ariyo (1987); and Makinde & Ariyo, 2010) and in maize by Subramanian and Subbaraman (2010). The high Shannon-Weaver diversity indices, which indicated the existence of high diversity for the qualitative traits among the accessions, are consistent with results of the cluster analysis. The findings of the current study are also consistent with previous studies that reported high diversity indices for qualitative traits in groundnut (Gokidi, 2005;Upadhyaya et al., 2002).
The principal component analysis revealed three components with eigenvalues greater than one. Components with eigenvalues greater than one are meaningful and theoretically have more information than any single variable alone (Iezzoni & Pritts, 1991). The traits correlated with the three meaningful principal components are important as they contributed the most towards divergence of the groundnut accessions. The first and the second component explained most of the variation among the accessions. Similar results were reported in groundnut (Makinde & Ariyo, 2010) and in soybean (Aondover et al., 2013;El-Hashash, 2016). The first component had eigenvalue of 5.27 and was effective in partitioning yield and GRD related traits. This component can be called productivity and GRD response dimension, since they discriminated the genotypes according to their yield and reaction to the groundnut rosette disease. The second component was correlated with days to flowering, days to maturity, number of primary and secondary branches, and separated the accessions in such way that the Virginia types were plotted together at the positive side of the biplot. The third component had an association with pod width and pod length, suggesting that it represents the pod size. These findings are in agreement with previous studies by Niveditha et al. (2016), that the first principal component is highly correlated with yield-related traits in groundnut. Moreover, the PCA biplot was effective in showing the genetic distance among the accessions with results consistent to those of the cluster analysis. For instance, ICGV-SM 99568 and ICG 12988 were clustered together in both analyses. Similar trend was reported earlier in groundnut by Niveditha et al. (2016). As such the groundnut rosette disease, that highly compromise productivity should be a must have trait in breeding programs. High diversity for the qualitative traits among the genotypes was evident and this was consistent with results of the cluster analysis. Flower colour had the highest diversity index while leaf colour recorded the lowest, revealing more and less diversity for these traits, respectively. These findings are consistent with previous studies that reported high diversity indices for qualitative traits in groundnut (Gokidi, 2005;Upadhyaya, 2003;Upadhyaya et al., 2002). The implication is that these qualitative traits can complement quantitative traits in the selection process as a part of groundnut improvement initiative.

Conclusions
The sequential path analysis clearly indicated that high yield was directly associated with taller plant types, higher number of pods per plant and hundred seed weight. It is imperative that these traits should be prioritized when improving grain yield in groundnut. The cluster analysis revealed existence of diversity among the evaluated groundnut genotypes and geographical origin did not have any influence on clustering pattern. The first principal component Analysis (PCA) scores explained 40.5% of the total variation, mainly associated to genotype, yield and groundnut rosette disease incidence traits. The PCA biplot was effective in showing the genetic distance among the genotypes and their discrimination based on key traits of importance in groundnut. The latter results were consistent to those of the cluster analysis. Moreover, the Shannon-Weaver diversity indices revealed existence of high diversity among the genotypes, a key driver for groundnut improvement through selection. Genotypes ICGV-SM 99568 and ICG 12988 were the most superior among the genotypes tested and therefore could be exploited in groundnut breeding to improve on yield.