Identification of Genomic Regions Associated with Agronomic and Disease Resistance Traits in a Large Set of Multiple DH Populations

Breeding maize lines with the improved level of desired agronomic traits under optimum and drought conditions as well as increased levels of resistance to several diseases such as maize lethal necrosis (MLN) is one of the most sustainable approaches for the sub-Saharan African region. In this study, 879 doubled haploid (DH) lines derived from 26 biparental populations were evaluated under artificial inoculation of MLN, as well as under well-watered (WW) and water-stressed (WS) conditions for grain yield and other agronomic traits. All DH lines were used for analyses of genotypic variability, association studies, and genomic predictions for the grain yield and other yield-related traits. Genome-wide association study (GWAS) using a mixed linear FarmCPU model identified SNPs associated with the studied traits i.e., about seven and eight SNPs for the grain yield; 16 and 12 for anthesis date; seven and eight for anthesis silking interval; 14 and 5 for both ear and plant height; and 15 and 5 for moisture under both WW and WS environments, respectively. Similarly, about 13 and 11 SNPs associated with gray leaf spot and turcicum leaf blight were identified. Eleven SNPs associated with senescence under WS management that had depicted drought-stress-tolerant QTLs were identified. Under MLN artificial inoculation, a total of 12 and 10 SNPs associated with MLN disease severity and AUDPC traits, respectively, were identified. Genomic prediction under WW, WS, and MLN disease artificial inoculation revealed moderate-to-high prediction accuracy. The findings of this study provide useful information on understanding the genetic basis for the MLN resistance, grain yield, and other agronomic traits under MLN artificial inoculation, WW, and WS conditions. Therefore, the obtained information can be used for further validation and developing functional molecular markers for marker-assisted selection and for implementing genomic prediction to develop superior elite lines.


Introduction
Maize is an important staple food crop in sub-Saharan Africa (SSA) where a large area is under maize production [1]. In east Africa, 82.48 million hectares (m ha) were covered by maize and about 156.21 million tons of maize grain were produced with productivity of 1.89 tons per ha (http://www.fao.org/faostat/, accessed on 2 November 2021). Both biotic and abiotic stresses are the major threats to crop production, particularly maize in SSA. Drought stress, high costs of improved seeds and fertilizers [2], and biotic stresses such as maize lethal necrosis (MLN) disease are the limiting factors for maize production in east Africa.
MLN was first reported in Kenya in 2011 and later reported in Tanzania, Uganda, Rwanda, D.R. Congo, and Ethiopia [3][4][5]. Maize chlorotic mottle virus (MCMV) and sugarcane mosaic virus (SCMV) viruses were the confirmed pathogens that have jointly incited the MLN disease [5][6][7]. Both MCMV and SCMV are transmitted by insect vectors (MCMV by thrips and semipersistent beetles; SCMV by aphids) [5,8]. MCMV has been confirmed for its transmission by seeds and infected soils, making the management of MLN more challenging [6,[9][10][11]. Based on the maize plant growth stages and environment conduciveness for MLN causing pathogens, the yield losses ranged from 30-100% [12]. Thus, the management of MLN demands proper identification of resistant germplasm sources and associated genes or quantitative trait loci (QTL) that aid to develop the resistant hybrids or varieties [13].
Doubled haploid (DH) lines allow complete homozygosity over lines developed through pedigree breeding; this allows precision in phenotyping over multiple locations and years [14]. Further, high genetic variance in DH lines enhances response to selection [15] by increasing heritability for various traits. Compared to breeding under well-watered (WW) conditions, the genetic variability, trait heritability, disease resistance, and selection gain are very low for breeding under water stress (WS) conditions [16]; thus, WS condition makes the identification of best genotypes and expression of complex traits. These challenges are designed to be solved through established managed drought tolerance and disease screening facilities, not to lose the genetic variations, and to produce good yield under stress conditions. Understanding the maize crop's behavior under WS for grain yield and yield-related traits, proper statistical design and breeding scheme help to select the best genotypes under WS environments [16,17].
Advancement in next-generation sequencing tools promoted genome-wide association studies (GWAS) in many crops including maize [18]. Association analysis is based on the non-random association between genotypes and phenotypes of the diverse distantly related individuals [19]. The significance of the marker-phenotype association could be declared when the marker polymorphism is located within the linkage disequilibrium (LD) region. To detect an association of complex traits, a minimum LD average with cut-off point of r 2 =0.1 was used [19]. In maize, the rate of LD decay approximated to 1, 2, and 200-500 kb in landraces, diverse inbred lines, and commercial elite inbred lines, respectively [20].
GWAS is useful in allele mining by dissecting the quantitative traits [19]. QTL or gene mapping consists of linkage map construction and identifying genomic regions associated with the targeted QTL [21]. QTL mapping helps to understand the genetic inheritance of quantitative traits [22,23]. Breeding for drought tolerance is complex since the trait is influenced by the environment and many genes with small effects [24]. In maize, about 239 QTLs related to drought tolerance were reported [25,26]. Five droughttolerant QTLs closely linked to grain yield were reported by Agrama and Moussa [27]. Semagn et al. [2] reported four meta-QTLs associated with grain yield for both under drought and optimum management. The high QTL detection power and fine resolution of mapping are exploited by joint linkage association mapping in multiple biparental populations [28][29][30]. The identification and validation of novel genomic regions associated with economically important traits under WW and WS as well as MLN are important to accelerate the development of climate-resilient improved maize varieties to enhance high maize productions in smallholder families and contribute to food security [12,31,32].
Genomic selection (GS) uses genome-wide markers to predict the breeding values of individuals by trapping the effects of both major and minor genes [33]. In GS, from the training population, the effect of all markers are estimated, and then the genomic estimated breeding values (GEBVs) of the untested but genotyped lines are computed [33]. Lines in the testing population are only genotyped, not phenotyped, and thus important in reducing the breeding cycle and increasing the genetic gain per unit time. GS is effective in several crops over a wide range of marker densities, trait complexities, and breeding populations [34][35][36], where varying levels of prediction accuracy have been achieved in different studies.
To understand how WS affects grain yield and other key traits, this study was performed using a tropical maize population under drought and optimum conditions across multi-location field trials and the MLN effect under artificial inoculation in Kenya. The objectives of the study were to (i) evaluate the large set of 879 tropical and subtropical maize DH lines for their responses to MLN disease severity under artificial inoculation, grain yield (GY), and other yield-related traits under WW and WS conditions; (ii) identify genomic regions and putative candidate genes associated with these traits across the three management conditions; and (iii) assess the potential of GS within management conditions. This study will provide valuable information for uncovering the genetic basis of GY under WW and WS conditions.

Plant Materials and Field Trials
In this study, 1462 DH lines from 40 populations were phenotyped in multiple locations under WW, WS, and MLN artificial inoculation conditions. Whereas, among these DH lines, 879 DH lines derived from 26 DH populations were genotyped, for the final analyses, we used only 879 DH lines. There are 26 parental lines were used to develop these DH populations (Supplementary Table S1). Among these, three lines with LapostaSequiaC17 background are known for their drought tolerance, whereas other CIMMYT maize lines such as CML312, CML395, CML442, CML444, and CML539 are commonly used as parents for most of single cross testers of most of commercial hybrids released in east and southern Africa. Additionally, new lines, which showed a better level of resistance for foliar diseases, were also used in a way to bring both biotic and abiotic stress tolerant lines together in a set of lines. The DH lines were crossed to a single cross tester from the opposite heterotic group. All DH lines were formed 17 sets and planted as 17 trials. There were seven commercial checks used, which were repeated in each trial, acting as connecting genotypes for each trial. Both genotypes and checks were replicated two times. The trials were connected by common checks (DK8031, H517, Pioneer30G19, PAN4M19, DUMA43, DH04, and WE1101). All the DH lines were evaluated in 17 connected trials under WW (Kakamega and Kiboko), WS (Kiboko), and MLN (Naivasha) conditions in Kenya. A single row of the plot with 4 m length in two replications was arranged in an α lattice design. Two seeds were planted per hill and thinned to one while 75 cm spacing between rows and 25 cm between plants was used. Eleven commercial checks were used in each trial. All recommended agronomic practices were applied uniformly to each trial.

Mass Production and Artificial Inoculation of MLN Viruses
The detailed protocol on the preparation of inoculum is explained in earlier studies [12,13,37]. In brief, the stock isolates of MCMV and SCMV pathogens were massproduced in separately managed greenhouses. The sap extraction of both MCMV and SCMV pathogen inoculum was made using 0.1 mM potassium-phosphate and pH 7.0 extraction buffers in 1:10 ratio and mixed at a ratio of 4 SCMV:1 MCMV to create MLN disease inoculum. The inoculum was sieved using cheesecloth, and then carborundum was added to the mixture of MLN inciting pathogen inoculum at the rate of 0.02 g/mL to create a wound that enhances an attachment and penetration of the virus particles into the host plant. Before field inoculation, the mixture of MCMV and SCMV virus inoculum was checked using the target pathogen-specific antibodies using enzyme-linked immuno-sorbent assay (ELISA). Field inoculation was performed using a backpack motorized knapsack sprayer at the four weeks of plant growth stage after planting, and the second inoculation was made one week after the first spray to keep a uniform inoculation.
Two weeks after the second inoculation, the establishment, development, and existence of the MLN disease-causing viruses were rechecked using ELISA kits. The MLN disease severity (MLN-DS) rating scale of 1-9 was used, where 1 is highly resistant with no MLN disease symptoms and 9 is highly susceptible or necrosis symptoms or total death of the plant. The MLN disease data were recorded four times at ten-day-intervals starting from the third week of the post-inoculation. The progress of MLN disease or area under the disease progress curve (AUDPC) was calculated from the recorded MLN-DS data over four time intervals [5,32,38].

Phenotyping and Data Analysis
The DH lines were evaluated at Kakamega under WW management, Kiboko under both WW and WS at different sites, and Naivasha under artificial inoculation of MLN for two seasons. Grain yield (GY, ton/ha), anthesis date (AD, 50% of pollen shed), silking date (SD, 50 % of silking), anthesis silking interval (ASI, the difference between anthesis and sinking dates), plant height (PH, cm) measured from the ground level to the base of the tassel after milk stage, ear height (EH, cm) measured from the ground level to the node bearing the uppermost ear after milking stage, moisture (MOI, percent moisture content of the grain at the time of harvesting using moisture meter), senescence (SEN, percent leaves lost chlorophyll to green leaves at the mid-silking), grey leaf spot (GLS, recorded using 1-9 scale), turcicum leaf blight (TLB, measured using 1-9 rating scale), common rust (CR, recorded using 1-9 rating scale), and MLN-DS (measured using the rating scale of 1 to 9 score) were recorded and analyzed. All the traits were phenotyped in all the trials but not in all the management conditions. For example, AD, ASI, EH, PH, MOI, and GY traits were phenotyped under both WW and WS; TLB, GLS, and CR under WW; SEN and ER under WS; and MLN-DS and AUDPC under MLN management conditions. Statistical model fitting for different traits was checked by plotting the histogram with standardized residuals. A plot of residuals against fitted values has shown that the residuals were symmetrically distributed with constant variance for all traits; thus, the data were not transformed. The phenotypic traits were analyzed with the restricted maximum likelihood (REML) method designed in the multi-environment trial analysis (META) R software developed in CIMMYT [39]. The following mixed model was used for across environments data analyses.
where: Y ijkl is the phenotypic observation at the ith genotype, jth environment in kth replication of the lth incomplete block, µ is overall means, G i is the genetic effect of the ith genotype, L j is the effect of the jth environment, (GL) ij is genotype by environment interaction, R(L) kj is the effect of the kth replication at the jth environment, B(RL) ljk is the effect of the lth incomplete block in the kth replication at the jth environment, and e ijkl is the residual. The selected traits' broad-sense heritability (H 2 ) was calculated as follows: where σ 2 G , σ 2 GxE , σ 2 e , L, and R referred to the genotypic, genotype by environment interaction, error variance, environment, and replication, respectively. Best linear unbiased estimates (BLUEs) and best linear unbiased predictions (BLUPs) for all traits were calculated. The traits phenotypic distribution and Pearson's correlation coefficient were performed and displayed using R scrips (http://www.R-project.org, accessed on 17 November 2021).

Genotypic Data Analyses
All 879 DH lines were genotyped with a high-density genotype by sequencing (GBS) platform using the pre-developed protocol at the Institute for Genomic Diversity, Cornell University, Ithaca, USA [2,31,40]. DNA was extracted from the young leaves using the cetyltrimethylammonium bromide (CTAB) method [41]. Raw GBS data had a total of 955,120 SNPs loci distributed across maize genome. The raw GBS SNPs data were imputed by default parameter filling methods [24,42]. Different filtering criteria were applied to the raw data to obtain input data for LD and GWAS analyses. For LD, the raw data were filtered based on no missing data and >10% minor allele frequency (MAF). The BLUPs for the selected traits (MLN-DS, AUDPC, AD, ASI, GY, EH, MOI, SEN, and PH) across environments were used for the GWAS study. SNPs quality screening was performed using trait analysis by association, evolution, and linkage (TASSEL v.5.2.24) software [43] by filtering and discarding the SNPs with a <0.05 of MAF and heterozygosity of >0.05, resulted into 226,940 SNPs. SNPs and physical distance between SNPs were used to detect genome-wide LD [44]. LD decay was calculated at r 2 = 0.2 and r 2 = 0.1 using average pairwise distance, where the nonlinear model r 2 was used [45,46]. Scatter plots and fitted smooth curves for estimating LD decay were plotted using the LOESS function in R [47].

Population Structure and GWAS
The genetic relationship tree construction for 879 DH lines was performed using Darwin 6.0.21 software. At the first step, the genetic distance matrix was calculated based on the mean Euclidean method where homozygote 100 and heterozygote 50% similarity were considered. Secondly, the unweighted neighbor-joining clustering method was employed to construct the diversity tree of the genotypes. The population structure of 879 DH lines, which had both phenotypic and genotypic data, were analyzed and sub-grouped using structure software 2.3.4 version with 6745 SNPs [48,49] based on the variability of allele frequencies both within and between populations genetic distance. The number of discontinuous population structure clusters (K) was predicted from one to five with ten iterations. The true number of population structure clusters (delta K value) were harvested online using an available structure harvester software based on the highest ln P(D). The unique population genetic subcluster was represented by each color bar at a p = 0.001. The period of length of burn-in was set to 10,000, and Markov Chain Monte Carlo (MCMC) values were set to 10,000 cycles [48].
GWAS analysis was performed with the R package "FarmCPU-Fixed and random model Circulating Probability Unification" [50]. GBS marker data in the "hapmap" format were converted to numeric (0, 1, 2) with the "GAPIT" package [51]. The first three principal components (PCs) obtained from TASSEL [43] were used as an input for GWAS in FarmCPU. The kinship matrix was calculated with the default kinship algorithm. The analysis was performed with a maxLoop of five, p threshold of 0.01, a quantitative trait nucleotide (QTN) threshold of 0.01, and a MAF threshold of 0.05. The maxLoop refers to the total number of iterations used. The p values selected into the model for the first iteration, the p-value selected into the model from the second iteration, and the minimum MAF of SNPs used in the analysis refers to the p threshold, QTN threshold, and MAF threshold, respectively. To determine the significance threshold, multiple testing correction was conducted where the total number of tests was estimated based on the average extent of LD at r 2 = 0.1. Concerning the above, the significant associations were declared when p values in independent tests were less than 9 × 10 −6 [38,52]. The Blast search against maize reference genome "B73" was performed for the significant SNPs; subsequently, the candidate gene adjacent or exactly in the same position with the significant SNPs identified and annotated and the candidate gene biological function described for each of the studied target traits (http://blast.maizegdb.org/home.php, accessed on 23 November 2021; http://www.maizegdb.org, accessed on 23 November 2021). CurlyWhurly Version 1.19 was used to plot and visualize the first three analyzed PCA components (https://ics. hutton.ac.uk/curlywhirly/, accessed on 19 November 2021).

Genomic Predictions
The phenotypic traits BLUEs were used for the GS analysis. Ridge-regression BLUP (RR-BLUP) with five-fold cross-validation was applied. From the GBS data, a subset of 6745 SNPs distributed uniformly across the genome, with no missing values, and minor allele frequency >0.10 were used for GS in GWAS panel under different management conditions. Details of the implementation of the RR-BLUP model are described in Zhao et al. [36]. We applied a five-fold cross-validations 'within population' approach, where both training and estimation sets were derived from within the association panel under different management conditions. The prediction accuracy was calculated as the correlation between genomic estimated breeding values (GEBVs) and the observed phenotypes. A sampling of the training and validation sets was repeated 100 times for each approach.

Phenotypic Variations and Correlations
The normal distribution was observed for each trait under WW and WS conditions ( Figure 1). The analysis of variance revealed significant genotypic variance (Table 1) Table 1). The mean performance for AD showed 0.41 days earliness under WS compared to WW conditions. The range is higher for ASI under WS (−3.95 to 8.17 days) compared to WW (−3 to 4.5 days) conditions. The mean of PH and EH were reduced significantly under WS compared to WW conditions. Further, the range of distribution reduced drastically for SEN under WS. The META R combined analyses result revealed that the studied genotypes had a wide range of responses against the MLN-DS ranging from 2 to 9 ( Figure 2, Table 1). GY had moderate broad-sense heritability (H 2 ) under both WW and WS conditions, while MLN-DS and TLB had relatively high H 2 with 0.67 and 0.80, respectively.
Under WS management, the genotypes CKDHL140940, CKDHL142056, CKDHL142091, CKDHL141377, and CKDHL142061 had produced the highest GY values of 5.67, 4.91, 4.85, 4.81, and 4.77 t/ha, respectively. The genotype CKDHL140037 had a lesser AD (62.11 days) than the grand mean (67.55 days) and the best check Duma43 (62.68 days). Genotype CKDHL140091 has shown 0.38 days less than the best check (0.74 days) and grand mean (2.28 days). Comparatively, plant height (185.55 cm), which is not too tall or short, was obtained in the genotype CKDHL140125, even though it was slightly higher than the best check KD8031 (183.88 cm) and grand mean (176.40 cm). Under WW management, the genotype CKDHL141097 was performed better than the best check CML444 and overall mean, which had the GY values of 9.17, 7.20, and 8.28 t/ha, respectively. An AD (65.93 days) was recorded in the CKDHL140933 genotype, which was earlier than the overall mean (67.78 days) and comparable to the best check Duma43 (64.47 days) under WW management. Similarly, the genotype CKDHL140876 had a good ASI of 0.37 days lesser than the grand mean (0.42 days) but not better than the check CZL04003 (0.35 days). About 52 DH lines were rated from two to four and have depicted the resistance reactions against the MLN-DS, while 735 DH lines rated from four to seven were grouped as moderately resistant, and the remaining 92 DH lines had seven to nine values and were grouped as susceptible genotypes ( Figure 2).     Correlations with >0.10 and >0.15 were significant at 0.05 and 0.01 levels, respectively. AD-anthesis date; ASI-anthesis silking interval; CR-common rust; EH-ear height; ER-ear rot; GLS-gray leaf spot; GY-grain yield; MOI-grain moisture; PH-plant height; SEN-senescence; and TLB-turcicum leaf blight.

Genetic Relationship, Population Structure, and Linkage Disequilibrium
The selected markers distribution was graphically presented in Supplementary Figure S1. A kinship matrix was developed that depicts the relatedness among the used DH lines (Supplementary Figure S2). Population relationship analyses in Darwin's software had displayed the neighbor-joining of 879 DH lines dissimilarity tree, which was constructed based on the genetic distance matrix of 0.01 calculated by the Euclidean method. The total populations were clustered into three main diverse groups with many subtrees (Supplementary Figure S3). The first population diversity group had DH lines derived from CML395/CML505, which has contained about 440 individuals represented by red, the second group had LaPostaSeq C7-F64 as a common parent with 174 individuals represented by blue, and the third group had DH lines having one of either LaPostaSeq C7-F86 or LaPostaSeq C7-F18 as common parent with 265 DH lines represented by a purple color (Supplementary Figure S3).
Population structure analyses revealed delta K probability value with three to four clusters of 879 DH lines based on the highest ln P(D) values (Figure 4). An Evanno table was constructed in the structure harvester with the highest values of 204,444.45 ln P(K), 156.41 standard deviations ln P(K), and 1307.01 delta K. The delta K value-based line plot had suggested that the population could be structured into two to four groups (Figure 4). Pair-wise markers LD decay was measured as the r 2 and plotted against their distance. LD sliding window type with 11,225 comparisons were obtained from adjacent markers, while each dot represented a pair of distances between two markers on the window and their squared correlation coefficient. The LD decay cut-off point (r 2 ) at 0.2 and 0.1 had 3.69 and 10.49 Kbs average physical distance, respectively ( Figure 5).   Table S1). The first four principle components explained 10.51%, 7.63%, 6.16% and 3.19% of the total variation (Supplementary Figure S4).

GWAS Results
Based on the marker p-value at the significance threshold cut-off (p = 9 × 10 −6 ), the marker positions, putative candidate gene, and its biological function were annotated for each trait. The GWAS results for all traits are summarized using Manhattan plots ( Figure 6A,B) and QQ plots (Supplementary Figure S5). The GWAS analyses identified SNPs associated with the studied traits, i.e., 7 and 8 SNPs were associated with GY; 16 and 12 SNPs with AD; 7 and 8 SNPs with ASI; 14 and 5 SNPs with EH; 14 and 5 SNPs with PH; and 15 and 5 with MOI under WW and WS management, respectively (Table 2 and  Supplementary Table S2). Similarly, 14 and 11 SNPs were associated with GLS and TLB resistance under WW environments, respectively. Under the WS environment, 11 SNPs were associated with SEN, whereas 12 and 10 SNPs associated with MLN-DS and AUDPC traits, respectively, under MLN artificial inoculation (Tables 2 and 3). Some of the SNPs that had the highest significance value were closely associated with the putative genes governing the studied target traits (Supplementary Table S2). Table 2. Number of SNPs significantly associated with grain yield and other traits at 5% false discovery rate (FDR) threshold level under well-watered (WW), water stress (WS), and MLN management. GY  7  8  -AD  16  12  -ASI  7  8  -EH  14  5  -PH  14  5  -MOI  15  5  -GLS  14  --TLB  11  --SEN  -11  MLNDS  --12  AUDPC  --10   Total  98  54  The most closely associated and identified SNPs to the studied traits on different chromosomes and their position on the chromosomes are the SNPs S5_206615806 and S7_157468954 on chromosomes 5 and 7 linked to GY (Table 4), S8_148392640 and S10_88394535 on chromosomes 8 and 10 linked to AD; S2_194040196 and S9_136924349 on chromosomes 2 and 9 to ASI (Table 5), S5_27226539 and S8_158986117 on chromosomes 5 and 8 to PH, S4_166924899 and S7_87194068 on chromosomes 4 and 7 to EH (Table 6), and S8_162561752 and S5_200299111 on chromosomes 8 and 5 to MOI (Supplementary Table S2) under WW and WS environments. The SNPs S1_87301408 and S1_92348483 on chromosome 1 were associated with the GLS and TLB resistance, whereas S3_205474517 SNP was associated with the SEN trait under the WS environment. Under MLN artificial inoculation, the SNPs S3_184235364 and S1_22259426 were among the best marker associated with MLN-DS and AUDPC values, respectively (Tables 3-6 and Supplementary Table S2).

Genomic Prediction
The

Discussion
MLN is the major challenge to maize production in SSA, specifically in east African countries. CIMMYT in collaboration with national research institutions has developed resistance breeding strategies against MLN. A large number of maize genotypes were screened, and MLN disease-resistant source materials and resistance QTLs were identified to develop resistant varieties by integrating both conventional and molecular breeding techniques [12,31,37,38]. Nevertheless, searching additional MLN disease-resistant lines, evaluation of the genotype's performance, identification and validation of QTLs associated with the target disease, GY, and other related traits play a vital role in the development of MLN disease-resistant varieties. In this study, 879 maize DH lines derived from 26 different populations were genotyped, and the performance of genotypes were evaluated under WW, WS, and MLN artificial inoculation management conditions. Among these 879 lines, 440 DH lines shared LapostaSeqC7 background lines as one of the parent, and the line LapostaSeqC7-F64 alone used as one of the parent to develop >250 DH lines, so, data was analyzed combinedly rather making it into subgroups based analyses.
A significant genotype, genotype by environment interaction variances, and moderate to high broad-sense heritability were observed for GY and other related traits AD, ASI, PH, EH, TLB, MOI, and GLS measured under WW and WS conditions similar with the results reported by Yuan et al. [24]. MLN-DS and AUDPC were highly heritable with 0.67 and 0.74, respectively, which is consistent with earlier reported studies [31,37,53,54]. Several genotypes have been evaluated by CIMMYT against MLN disease in search for resistant materials [12,24,31,32,38,55]; with the current study, we identified about 52 MLN disease resistant/tolerant genotypes while most of other genotypes were susceptible. Some of the maize genotypes with a score from 2 to 3 against MLN-DS were CKLMLN145667, CKLMLN145667, CKLMLN144135, CKLMLN145119, CKLMLN145173, CKLMLN143806, and CKLMLN143351, which could be selected as resistant materials to MLN disease. The mean performance of lines for GY was 7.54 t/ha and 2.7 t/ha under WW and WS environments, respectively, which has revealed a similar result in earlier study [56]. The GY had positive correlations with both EH and PH and negative correlations with ASI and MOI under WW and WS management, respectively, which could help in an indirect selection for the GY under WW and WS conditions [24].
The number of SNPs required to achieve maximum mapping resolution depends on the magnitude of LD and LD decay with genetic distance [57]. For GWAS, a large population is required since the LD or correlation between alleles in different genomic locations is generally based on the historical recombination between polymorphisms. In this study, we observed that the LD decay at r 2 = 0.1 and 0.2 cut-offs were 10.49 and 3.69 kb, respectively. Similarly, [54] in the IMAS association panel also reported the genome-wide average LD decay of 14.97 kb at r 2 = 0.1 and 5.23 kb at r 2 = 0.2 [54], and a similar range of LD decay was also reported by Rashid et al. [58] in their association panel. LD decay in tropical maize germplasm was rapid compared to the temperate germplasm; possibly due to a broader genetic base, resulting from high recombination events [59]. This provides an opportunity for breeders to select germplasm that integrates high GY with disease resistance and abiotic stress tolerance.
For population structure analyses, the Delta K line plot, principal component analyses, and population genetic distance relationship analyses suggested that the utilized DH populations are structured into three to four groups. In STRUCTURE, the optimum number of subgroups was determined based on the output log-likelihood of data (LnP (D. The peaks of the line plot ( Figure 4) suggest that the population could be divided into three or four distinct groups in order of possibility, with the K = 4 of delta K intersecting with LnP (D) showing a higher possibility. When K = 4, all lines were grouped as a mixed group and were further divided into three groups. The DH populations used in this study were grouped into CML395/CML505 derived DH lines, LaPostaSeq C7-F64 derived DH lines (174 individuals), and LaPostaSeq C7-F86 and LaPostaSeq F18 derived DH lines (265 individuals) (Figure 4). Due to the inclusion of DH lines derive from crosses of selected inbred lines in the panel, we observed moderate structure in the present study. Several researchers also been reported moderate structure in the tropical maize germplasm [29,31,37,53,54,60].
In this study, we identified the significant SNPs associated with target traits under WW, WS, and MLN artificial inoculations (Tables 3-6). The results of this study for MLN-DS and AUDPC are similar to the reports in the biparental and DH population studied for the MLN-DS, AUDPC, and other traits genetic architecture [12,31,38,53,54,60]. Several putative candidate genes associated with the significant markers were identified for each of the studied traits (Tables 3-6). For GY under WW, two putative candidate genes, GR-MZM2G017470 and GRMZM2G030713, were identified, both located on chromosome 1 and, respectively, described as Dof zinc finger protein DOF3.6-like and O-fucosyltransferase 36 synthesis biological functions; whereas the candidate genes, GRMZM2G472167 on chromosome 1 and GRMZM2G019404 on chromosome 2, identified under WS were functionally described as peptide transporter PTR2 mha2 that involved in seed germination maternal control and plasma-membrane H+ATPase 2 that aid in activating secondary transport, respectively [61][62][63]. These genes are more relevant to plants' response to drought stress.
Putative candidate genes GRMZM2G142383 and GRMZM2G124136 detected for AD under WW and WS are functionally designated as Uridine kinase-like protein 2 chloroplastic involved in the pyrimidine salvage pathway [64] and putative glycerol-3-phosphate transporter 4 involved in molecular function of transmembrane transporter activity [65]. The SNPs S8_3482389 and S2_205904889 on chromosomes 8 and 2 were closely linked to ASI under both WW and WS associated with the putative candidate genes, GRMZM2G136158 and GRMZM2G105869, respectively. These candidate genes are involved in Peroxidase 24 that aid in responding to environmental stresses such as wounding, pathogen attack, and oxidative stress [66], and histone-lysine N-methyltransferase SUVR3 known to be involved in the development of pollen and female gametophyte, flowering, plant morphology, and the responses to stresses [67], respectively.
The two important SNPs linked to PH S6_161804186 under WW have shown a candidate gene GRMZM2G170625, and S2_43203188 under WS, which is located with the candidate gene, GRMZM2G114523. Both designated candidate genes have been described as Jacalin-related lectin 3 and lysine histidine transporter-like 6 functions, respectively. Jacalin-related lectin 3 are proteins that bind carbohydrates and play an important role in plant development and resistance development to fungal pathogens [68]. Lysine histidine transporter-like 6 helps to transport amino acid within or between the cells and is involved in plant uptake of amino acids [69]. SNP, S2_184012021 linked to EH under WW management was associated with the putative candidate gene, GRMZM2G116196, that was described as AUGMIN subunit 5 (AUG5) essential for the development of gametophyte and sporophyte [70] reproductions; another annotated gene GRMZM2G365374 encoded as heat shock 70 kDa protein (HSPA1A) under WS was known to respond to heat-shock stress [71].
The SNPs S1_188031152 and S8_11662494 associated with the MOI were detected with well-described putative candidate genes, GRMZM2G419436 and GRMZM2G700386, respectively. The gene GRMZM2G419436 is characterized as well-associated receptor kinase 5 (WAK5), which significantly controls cell expansion, morphogenesis, and development [72], while the GRMZM2G700386 gene characterized as β-1,2-xylosyltransferase XYXT1 is involved in the xylosylation of xylan, the primary and secondary walls or major hemicellulose of angiosperms [73]. The SNP, S1_204865984, linked to SEN under WS environment was the annotated putative candidate gene, GRMZM2G328309, explained as ribonuclease E/Glike protein, chloroplastic, which is a family of proteins that plays a pivotal function to metabolize RNA [74].
The putative genes, GRMZM2G009591 and GRMZM2G101117, annotated from the SNPs S1_246469847 and S7_82649117 linked to GLS disease resistance had been characterized as pyrophosphate fructose 6-phosphate 1-phosphotransferase and GDSL esterase/lipase, respectively. The first gene, GRMZM2G009591, is known to catalyze Dfructose 6-phosphate phosphorylation [75]; the second gene, GRMZM2G101117, is known for the molecular function hydrolytic activities of GDSL esterases and lipases enzymes [76]. Under WW management, putative candidate genes, GRMZM2G039173, GRMZM2G071023, and GRMZM2G106119 were identified based on the associated SNPs S6_157820129 and S4_212595942 with the TLB resistance. Rédei [77] has described the GRMZM2G039173 gene as the major facilitator superfamily protein that aided in transporting small solutes based on the chemiosmotic ion gradients, while the second putative gene characterized by Chai et al. [78] has functioned as a probable NAD kinase 2 chloroplast, which is actively involved in the protection of chloroplast against oxidative damage and synthesis of chlorophyll.
MLN-DS trait-associated SNPs S3_184235364 and S6_38115747 are annotated with GRMZM2G429982 and GRMZM5G818106 candidate genes that have osmotin-like protein and phospholipase A1-II 7 functions, respectively [79,80]. Kumar et al. [80] characterized the candidate gene GRMZM2G429982 as being involved in biotic and abiotic stresses tolerance in plants, whereas the candidate gene GRMZM5G818106 has been described as protective of high temperature, cold, salt, and drought [79]. Wu et al. [81] reported that the function of the putative candidate gene GRMZM2G003752, which was characterized as fasciclin-like arabinogalactan protein 10, was to respond to abiotic stress and mediate the growth and development of the plant. This candidate gene was annotated from the S2_16652265 marker associated with AUDPC values. Similarly, the S10_125845596 marker was linked to the AUDPC value and then the putative candidate gene, GRMZM2G003917, was identified. This gene has been described by Wu et al. [81] as a fasciclin-like arabino-galactan protein 7 (FLA7) gene responsible for the development of microspores and, under salt stress environment, maintaining proper plant cell expansion.
In the present study, a total of 98, 54, and 22 SNPs associated with various agronomic traits under WW, WS, and MLN conditions, respectively, were identified. Among these SNPs, some existed within different gene models whose genetic role is associated with either biotic or abiotic stress mechanisms. The favorable alleles can be identified by resequencing the detected candidate genes from contrasting, and these SNPs could be potentially converted to simple PCR-based markers to follow MAS in molecular breeding [82]. Similarly, several GWAS studies reported large numbers of SNPs associated with important traits in maize [83,84].
High genetic gain can be achieved for complex traits by integrating modern tools into maize breeding [85,86]. With several genotyping service providers available with a lower cost per sample and availability of advanced statistical models, genomic prediction is routinely applied in maize for several quantitative traits [24,85,86]. In the present study, we compared the prediction accuracies under WW and WS conditions ( Figure 7). As expected for all the common traits measured in both WW and WS conditions, the prediction accuracies were slightly higher under WW conditions compared to WS conditions. The observed accuracy for all traits under WW, WS, and MLN conditions reveals the effect of heritability as the traits with higher heritability generally had higher prediction accuracy. The main factors affecting genomic prediction accuracy are the relationship between the training and testing populations, training population sizes, the population structure of training and testing sets, marker densities, genetic architecture and heritability of target traits, genotype by environment interactions, and statistical methods [36,62,87,88]. Knowing the genetic architecture of the target traits, it is possible to improve prediction accuracy while implementing GS [35,89]. Moderate-to-high accuracies observed in this study for the association panel offer promise in breeding for MLN and drought tolerance. The prediction accuracy of the association panel for MLN-DS and AUDPC is in agreement with earlier studies on MLN [31] and MCMV [38]. The prediction correlations observed for GY and other agronomic traits are equivalent to earlier studies reported in maize under different stresses [24,62,85]. In GS, AD and ASI had higher accuracy compared to GY, which is expected, as these traits are less complex compared to GY [24,61,62]. GWAS results revealed GY, and other agronomic traits evaluated under WW and WS conditions are complex in nature, controlled by many loci with minor effects, influenced by environmental factors. Therefore, they are difficult to track effectively in conventional breeding alone. Increase in prediction accuracy as well as increase in accumulation of favorable alleles with both minor and major effects is possible by integration of GS with GWAS results leads.

Conclusions
Phenotypic evaluation of 879 DH lines under artificially inoculated MLN has identified about 52 genotypes resistant/tolerant to MLN-DS, while seven of the selected genotypes (CKLMLN145667, CKLMLN145667, CKLMLN144135, CKLMLN145119, CKLMLN145173, CKLMLN143806, and CKLMLN143351) can be used as sources of resistance to MLN. GWAS identified SNPs associated with the studied traits i.e., about seven and eight SNPs for the GY; 17 and 31 for anthesis date; 10 and 22 for anthesis silking interval; 14 and 6 for ear height; and 15 and 5 for moisture content under WW and WS environments, respectively. Similarly, about 13 and 11 SNPs associated with GLS and TLB, respectively, were detected. Eleven SNPs were significantly associated with senescence were identified under WS management. Under MLN artificial inoculation, a total of 12 and 10 SNPs were associated with MLN-DS and AUDPC traits, respectively; these SNPs and the identified candidate genes for each trait can be used in the trait improvement program in maize breeding. GS under WW, WS, and MLN disease artificial inoculation environments revealed moderateto-high prediction accuracies. All the detected SNPs in this study need further validation before introducing to breeding pipelines, and it will be a great help for the understanding of complex genetic architecture traits under WW and WS. Overall, the present study identified several significant SNPs associated with GY and other agronomic traits that help in the selection of donor lines with favorable alleles for multiple traits. These results provide insights into the genetics of MLN resistance and other agronomic traits under optimum and drought stress conditions.