Multispectral and thermal infrared data, visual scores for severity of common rust symptoms, and genotypic single nucleotide polymorphism data of three F2-derived biparental doubled-haploid maize populations

Three F2-derived biparental doubled haploid (DH) maize populations were generated for genetic mapping of resistance to common rust. Each of the three populations has the same susceptible parent, but a different resistance donor parent. Population 1 and 3 consist of 320 lines each, population 2 consists of 260 lines. The DH lines were evaluated for their susceptibility to common rust in two years and with two replications in each year. For phenotyping, a visual score (VS) for susceptibility was assigned. Additionally, unmanned aerial vehicle (UAV) derived multispectral and thermal infrared data was recorded and combined in different vegetation indices (“remote sensing”, RS). The DH lines were genotyped with the DarTseq method, to obtain data on single nucleotide polymorphisms (SNPs). After quality control, 9051 markers remained. Missing values were “imputed” by the empirical mean of the marker scores of the respective locus. We used the data for comparison of genome-wide association studies and genomic prediction when based on different phenotyping methods, that is either VS or RS data. The data may be interesting for reuse for instance for benchmarking genomic prediction models, for phytopathological studies addressing common rust, or for specifications of vegetation indices.

a b s t r a c t Three F2-derived biparental doubled haploid (DH) maize populations were generated for genetic mapping of resistance to common rust.Each of the three populations has the same susceptible parent, but a different resistance donor parent.Population 1 and 3 consist of 320 lines each, population 2 consists of 260 lines.The DH lines were evaluated for their susceptibility to common rust in two years and with two replications in each year.For phenotyping, a visual score (VS) for susceptibility was assigned.Additionally, unmanned aerial vehicle (UAV) derived multispectral and thermal infrared data was recorded and combined in different vegetation indices ("remote sensing", RS).The DH lines were genotyped with the DarTseq method, to obtain data on single nucleotide polymorphisms (SNPs).After quality control, 9051 markers remained.Missing values were "imputed" by the empirical mean of the marker scores of the respective locus.We used the data for comparison of genome-wide association studies and genomic prediction when based on different phenotyping methods, that is either VS or RS data.The data may be interesting for reuse for instance for benchmarking genomic prediction models, for phytopathological studies addressing common rust, or for specifications of vegetation indices.

Value of the Data
The data offers phenotypic data of maize DH lines for different traits related to susceptibility to common rust.The traits comprise visual scores (VS) and remote sensing (RS) traits including vegetation indices.Moreover, the data set provides genotypic data on single nucleotide polymorphisms (SNPs).This combination of phenotypic and genotypic data can for instance be further used for • benchmarking genomic prediction models with different traits and different types of cross validations for instance related to the population structure, • benchmarking of models for genome wide association studies (GWAS), for instance models including cofactors or interactions of loci • for phytopathological studies addressing common rust, • as a reference data set for high-throughput phenotyping in resistance breeding • for specifications of vegetation indices In particular, it may be of value for scientists working in the area of • high-throughput agricultural phenotyping and breeding, • statistical geneticists • phytopathologists

Background
The objective when generating this data set was to explore the potential of remote sensing (RS) phenotyping methods in the context of resistance breeding, in particular in comparison to low-throughput visual scoring (VS) and when used for follow-up genetic evaluations of the plant material.We compared VS and RS traits with respect to the corresponding results of downstream genome-wide association studies and genomic prediction [1] .The present article describes the data in more detail to provide a solid basis and ideas for a secondary use.

Data Description
Three different biparental, F2-derived DH populations were generated.All of them had shared the same parent susceptible to common rust.The parent resistant to common rust differed between populations.
The DH lines were genotyped for single nucleotide polymorphisms (SNPs, for more details see Eexperimental Design, Materials and Methods ).Genotypic data is available in File Lo-ladze_et_al_genotypes_GID.txt.gz(see Table 1 ).DHs should be fully homozygous by construction.A heterozygous state of a marker indicates either an error in the genotyping, or in the process of creating the DH line.
Fig. 1 illustrates the distribution of heterozygous calls (the number of "0"s) relative to the total number of calls (sum of the number of "−1", "0" and "1"s) for each individual and across the three populations.15, 23 and 22 individuals show a relative heterozygosity above 5% for populations 1, 2 and 3, respectively.Fig. 2 illustrates the distribution of heterozygous calls (the number of "0"s) compared to the total number of calls (sum of the number of "−1", "0" and "1"s) for each locus and across all lines of the respective population.Out of the 9051 markers, 221, 883 and 316 showed a heterozygosity of higher than 5% for populations 1, 2 and 3, respectively.
The phenotypic raw data is provided by the six files  • Pop2_2020_raw.txt for the respective combination of population and year (see Table 1 ).The raw data includes the VS as well as RS traits.For illustrative purposes of the data properties, we highlight the distributions of the VS raw data across population and year in Fig. 3 .The phenotypic adjusted data is provided by the files Fig. 4 illustrates the distribution of adjusted VS across population and year.File names, file content, and the data format are described in Table 1 .

Limitations
Generalizability of results obtained from this data set to other traits in the context of highthroughput phenotyping, RS, genomic prediction or genome-wide associations studies will be limited.For the benchmarking of models and methods, the data set provides a specific example of maize and the disease common rust.Results obtained in this context will be specific and the generalizability will be limited.

Data Availability
Replication Data for: Use of Remote Sensing for Genome-Wide Association Studies and Genomic Prediction (Original data) (Dataverse) [2] .

Fig. 1 .Fig. 2 .
Fig. 1.Boxplots of the relative heterozygosity per individual line for the three populations.

Fig. 3 .
Fig. 3. Boxplots of VS raw data across the six combinations of population and year of evaluation.

Fig. 4 .
Fig. 4. Boxplots of adjusted VS across the six combinations of population and year of evaluation.

Table 1
File names, content and data formats of the data set.