Multi-Trait Genome-Wide Association Studies of Sorghum bicolor Regarding Resistance to Anthracnose, Downy Mildew, Grain Mold and Head Smut

Multivariate linear mixed models (mvLMMs) are widely applied for genome-wide association studies (GWAS) to detect genetic variants affecting multiple traits with correlations and/or different plant growth stages. Subsets of multiple sorghum populations, including the Sorghum Association Panel (SAP), the Sorghum Mini Core Collection and the Senegalese sorghum population, have been screened against various sorghum diseases such as anthracnose, downy mildew, grain mold and head smut. Still, these studies were generally performed in a univariate framework. In this study, we performed GWAS based on the principal components of defense-related multi-traits against the fungal diseases, identifying new potential SNPs (S04_51771351, S02_66200847, S09_47938177, S08_7370058, S03_72625166, S07_17951013, S04_66666642 and S08_51886715) associated with sorghum’s defense against these diseases.


Introduction
Sorghum (Sorghum bicolor (L.) Moench), a multipurpose crop used as a source of food, fodder, feed and fuel, is ranked among the top five cereal crops worldwide [1]. Still, the yields are highly constrained due to fungal pathogens that cause severe diseases in the crop. Among various fungal diseases, anthracnose caused by Colletotrichum sublineola Henn. ex Sacc. and Trotter 1913 is one of sorghum's most destructive fungal diseases, with annual yield losses of up to 100% [2]. Downy mildew, caused by Peronosclerospora sorghi, is another critical disease of sorghum that can cause severe epidemics, resulting in considerable yield losses [3]. Head smut, caused by the soil-borne facultative biotrophic basidiomycete Sporisorium reilianum (Kühn) Langdon and Fullerton (syns. Sphacelotheca reiliana (Kühn) G.P. Clinton and Sorosporium reilianum (Kühn) McAlpine), is a serious global sorghum disease as well [4,5]. Grain mold, a complex fungal disease, is associated with many fungal species, such as Fusarium spp., Curvularia lunata (Wakker) Boedijn, Alternaria alternata (Fr.) Keissl. (1912) and Phoma sorghina (Sacc.). Boerema, Dorenb. and Kesteren is considered one of the most important biotic constraints of grain sorghum production worldwide [3].
Sorghum's resistance to diseases caused by fungal pathogens heavily relies on novel sources of resistance genes or defense response genes identified by genome-wide association studies (GWAS) that test hundreds of thousands of genetic variants across many genomes to find those statistically associated with a specific trait or disease [6]. In GWAS, linear mixed models (LMMs or MLMs) have been extensively applied, and most GWAS are conducted under a univariate framework. Still, recent studies began to apply the multivariate linear mixed model (mvLMM) to GWAS due to the increased importance of detecting genetic variants that affect multiple traits or different growth stages [7]. Principal component analysis (PCA) is a valuable tool that has been widely used for the multivariate analysis of correlated variables, including mvLMM, and it has been shown that multivariate GWAS approaches yield a higher true-positive quantitative trait nucleotide (QTN) detection rate than comparable univariate approaches [8,9].
In sorghum pathology, multiple subsets of sorghum populations, including the Sorghum Association Panel (SAP), the Sorghum Mini Core Collection and the Senegalese sorghum population, have been screened against various sorghum diseases such as anthracnose, downy mildew, grain mold and head smut. Still, these studies were performed in a univariate framework [10][11][12][13][14][15][16][17]. The studies measured multiple disease-related traits in the three different populations, and both single-nucleotide polymorphism (SNP) data and phenotypic data are publicly available. In this study, a multivariate GWAS was conducted based on the top principal components (PC GWAS with LMMs) of defense-related traits against fungal pathogens, resulting in the identification of new potential SNPs associated with sorghum's defense against these diseases.

Sorghum Association Panel
Prom et al. [12,17] evaluated 377 lines from the Sorghum Association Panel (SAP) for grain mold resistance by inoculating them with either A. alternata alone; a mixture of A. alternata, Fusarium thapsinum and Curvularia lunata; or untreated water as a control group at the Texas A&M AgriLife Research Farm, Burleson County, Texas. Based on the phenotypic data, a univariate GWAS with over 79,000 SNP loci from a publicly available genotype was conducted by sequencing dataset for the SAP lines [18] using the TASSEL version 5.2.55 [19] association mapping software to identify chromosomal locations associated with differences in the grain mold response. Of the three treatments, a PCA was performed with TASSEL version 5.2.88 [19], and a multivariate GWAS (PC GWAS based on PCs (PC1 = data and PC2&3 = covariates)) was conducted through LMMs (Table S1). False associations were reduced by removing SNPs with greater than 20% unknown alleles and SNPs with minor allele frequency (MAF) below 5% [20].

Sorghum Mini Core Collection
Ahn et al. [13] performed a univariate GWAS based on phenotypic data of 242 Mini Core lines which had been screened for anthracnose, downy mildew and head smut resistance. The screening results were combined with over 290,000 SNP loci from an updated version of a publicly available genotype by a sequencing (GBS) dataset available for the Mini Core collection [21][22][23][24], and the GAPIT (Genome Association and Prediction Integrated Tool) R package was used to identify chromosomal locations that differed in their disease responses [13,25,26]. With the phenotypic data of the four pathogens, a PCA was performed with TASSEL version 5.2.88 (Table S1) [19], and a multivariate GWAS (PC GWAS based on PC1-3 (PC1 = data and PC2&3 = covariates)) was performed through LMMs by applying the identical standard to filter false associations out [20].

Senegalese Sorghum Population
A total of 159 to 163 Senegalese sorghum lines were evaluated for their responses to C. sublineola (traits included average score for seedling inoculation, highest score for seedling inoculation, and average score for 8-leaf-stage sorghum inoculation) and S. reilianum (traits included the appearance rate of dark spots in seedlings and the average time of the spot appearance) [10,11,14]. A total of 193,727 SNP data points were extracted from an integrated sorghum SNP dataset based on sorghum reference genome version 3.1.1, and were originally genotyped using GBS [21][22][23][24]. PCA combined the three traits for C. sublineola (average score for seedling and 8-leaf-stage inoculations and the highest score for seeding inoculation), and the two traits (spot appearance rate and time) were used to generate PC 1&2 for S. reilianum through TASSEL version 5.2.88 [19]. Furthermore, all five traits were analyzed with PCA to identify potential plant defense-related loci commonly associated with the two diseases. Multivariate GWAS (PC GWAS based on PC1 = data and other PCs = covariates)) was performed using the method described above (Table S1).
The SNPs that passed the Bonferroni correction were mapped back to the published sorghum reference genome to be tracked to the specific chromosome location based on the sorghum reference genome sequence, version 3.1.1, available at the Phytozome 13 (https://phytozome.jgi.doe.gov accessed on 5 March 2023) [27], and the top SNPs that failed to pass Bonferroni correction were also checked to identify the nearest genes from SNPs that had previously reported roles in biotic or abiotic resistance/stress responses.

Results
Multivariate GWAS identified nine SNPs that passed the Bonferroni threshold ( Figure 1 and Table 1). Among the nine SNPs, only the SNP locus S01_72868925 had previously been reported as a top SNP in univariate GWAS studies [10]; the other eight SNPs were novel sources of resistance to fungal pathogens in sorghum. Table 1. Annotated genes shown nearest to the most significant SNPs were associated with multi-trait GWAS in all three sorghum populations. The distance, in base pairs, to the nearest genes and p-value are shown. * = passed Bonferroni correction.
In this study, we revisited multiple studies that applied univariate GWAS in experiments on sorghum pathology. We reanalyzed the data in multivariate frameworks, obtaining eight novel SNPs potentially associated with sorghum's defense against fungal pathogens ( Figure 1 and Table 1). Sobic.005G141700 was identified as a top candidate for grain mold resistance under different inoculations in the SAP lines. Sobic.005G141700 is speculated to be associated with D-mannose binding lectin. Although not an identical molecular marker, one study reported that the SNP locus S02_61590648, a coding region similar to the mannose-binding lectin coding region of pepper, was a top candidate for sorghum defense against downy mildew [13], and mannose-binding lectins are essential for plant defense signaling during pathogen attacks as they recognize specific carbohydrates on pathogen surfaces [29].
In the Sorghum Mini Core Collection, the SNP locus S04_51771351, which passed the Bonferroni threshold, was only 381bp away from the zinc finger-related gene (Sobic.004G167500). Zinc finger proteins have been shown to play diverse roles in plant stress responses [30], and multiple previous studies have identified zinc finger-related proteins for sorghum defense-related genes against fungal pathogens [12,13,15].
Nine SNP loci associated with anthracnose resistance passed the Bonferroni threshold in the Senegalese sorghum lines. Except for the SNP locus S01_72868925 (tagged to Sobic.001G451800), a reported SNP for potential anthracnose resistance in sorghum seedlings associated with leucine-rich repeat [10], the other eight SNPs were newly identified and had not previously been reported as top SNPs in univariate GWAS studies. The SNP locus S02_66200847 is closely located to Sobic.002G280800, associated with the cAMP response element binding protein, which has remarkable roles in plant defense, salt stress, and ethylene responses [31]. The SNP locus S09_47938177 is associated with Sobic.009G126000, which encodes for endo-beta-mannosidase. This gene regulates sugar metabolism and cell wall function [32][33][34].
The SNP locus S08_7370058, located 1117 bp away from Sobic.008G065700, is associated with splicing factors. Splicing factors function in various physiological processes, including plant disease resistance and abiotic stress responses [35,36]. The SNP locus S03_72625166 tagged Sobic.003G421201, a gene related to hexokinase-3. In Nicotiana benthamiana Domin, hexokinases are known to play a role in controlling programmed cell death [37]. Sobic.007G096700 tagged by SNP locus S07_17951013 is similar to Ole e 1 proteins involved in pollen tube development [38]. It is unclear whether the gene is associated with sorghum defense or if it is simply a false positive. If the SNP is a false positive, the SNPs exhibiting high LD with SNP locus S07_17951013 in Table 1 may potentially be truly associated with sorghum defense response.
The SNP locus S04_66666642, located in the glucose-related gene Sobic.004G334300, was also listed as a top candidate. Numerous studies have shown that sugars play a crucial role in plant defense responses to various abiotic and biotic stress factors [39]. The MYB transcription factor (Sobic.008G112200) was tagged by the SNP locus S08_51886715, and MYB transcription factors are reported to be associated with sorghum defense against fun-gal diseases such as smut and head smut [11,40]. Although no SNPs passed the Bonferroni threshold, PC GWAS based on the traits associated with head smut and combinations of the traits related to anthracnose and head smut in the Senegalese sorghum lines identified a few SNPs of potential importance.
Although not the same SNPs, zinc finger-(Sobic.008G125400) and calcium-related binding proteins (Sobic.002G113900) have been identified as top candidates conferring defense in plants in several GWAS studies [14]. Another SNP locus, S05_8867065, tagged Sobic.005G073200, which showed similarity to the DUF803 domain. This domain was reported to be one of the top candidate SNPs that differed between resistant and susceptible types of Upland cotton (Gossypium hirsutum L.) against the southern root-knot nematode [Meloidogyne incognita (Kofoid & White, 1919)] [41]. Lastly, the SNP locus S03_9787536 tagged DUF3339 (Sobic.003G108200). Although not in an antagonistic relationship, mycorrhizal symbiosis upregulated DUF3339 in Vaccinium myrtillus L. 1753 [42]. Among the top SNPs identified in this study, the SNP locus S01_72868925 was previously reported as a top SNP in univariate GWAS studies [10]. Still, other SNPs were newly listed, indicating the exceptional potential of multivariate GWAS for sorghum-pathogen interactions.
Nearly all the candidate genes identified in this study through multivariate GWAS had known defense functions in plants. The genes' functions can be further analyzed and validated through advanced techniques such as real-time quantitative reverse transcription PCR (Real-time qRT-PCR), RNA sequencing analysis (RNA-Seq) and CRISPR-Cas9-associated gene editing. However, validating these functions requires tremendous time and funding. Agrobacterium-mediated transformation and regeneration protocols in monocot crops, such as maize and sorghum, are notoriously difficult to transform, with complications arising in both methodologies for each of these crops and long tissue culture periods being required for the regeneration of plants from transformed tissue [43].
Hence, it is essential to continue mining novel SNPs associated with sorghum defense against fungal pathogens in order to obtain more precise knowledge on crucial sorghum defense genes before functional validations of candidate genes are carried out. Applying multi-trait GWAS in sorghum pathology will provide avenues to identify further defenserelated sorghum genes that have yet to be revealed. Additionally, conducting multivariate GWAS on multiple fungal pathogens of sorghum could reveal defense-related genes that are either pathogen-specific or universally involved in sorghum defense.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/pathogens12060779/s1, Table S1: Principle components generated from subsets of the three populations.
Author Contributions: E.A.: conceptualization, methodology, software, validation, formal analysis, investigation, original draft. L.K.P.: resources, data curation, methodology, writing-review and editing. C.M. project administration, funding acquisition, supervision, writing-review and editing. All authors have read and agreed to the published version of the manuscript. Enabling marker-assisted selection for sorghum disease resistance in Senegal and Niger. E.A. and L.K.P. were supported by the U.S. Department of Agriculture, Agricultural Research Service. Mention of any trade names or commercial products in this article is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the U. S. Department of Agriculture. USDA is an equal opportunity provider and employer, and all agency services are available without discrimination.
Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.