An intensive study plot to investigate chestnut tree reproduction

Pollination is a key step for fruit production. To provide a tool for future in-depth analysis of pollination in chestnut, we describe in detail a chestnut orchard (location, genotype, phenotype and seed-set of all trees). Chestnuts, which are insect-pollinated trees, have been massively planted around the world for nut production. Orchards are planted with clonal varieties selected from crosses between the European chestnuts (Castanea sativa) and Japanese chestnuts (C. crenata) or Chinese chestnuts (C. mollissima) because these two last species are tolerant to blight and ink diseases. To characterize chestnut genetic resources and accurately model male and female fitness as well as pollen exchanges in orchards, we characterized all chestnuts of the INRAE chestnut germplasm collection located near Bordeaux (France). All chestnut trees were geolocated and genotyped using 79 SNP and 98 SSR loci. We scored their flowering phenology using chestnut BBCH scale and precisely described their phenotype (height, diameter a breast height (DBH), canopy diameter…), their capacity to produce pollen (flower type, catkins length…) and their fruit production (number of burrs, seed-set…). We geolocated 275 trees and genotyped 273 of them. We identified 115 unique genotypes and assigned each genotype to species. To assess phenology, we evaluated 244 trees twice a week, for 6 weeks from early June to mid-July. We also described tree phenotypes with 11 variables, pollen production with 5 variables and fruit production with 3 variables. All measures were recorded in 2018 except seed set that was measured two consecutive years, in 2018 and 2019. The data collected is very detailed and allows modelling precisely pollen exchanges between trees. Parts of this data have been successfully published in scientific articles. Data are available at: https://data.inrae.fr/dataset.xhtml?persistentId=doi:10.15454/GSJSWW Associated metadata are available at:https://metadata-afs.nancy.inra.fr/geonetwork/srv/fre/catalog.search#/metadata/02c5ca07-1536-4f89-9a0c-9e8d44a91287


Background
The INRAE chestnut germplasm collection is located in Villenave d'Ornon near Bordeaux (Gironde, France). This collection consists of two experimental plots called "A" and "E" composed of respectively 29 and 215 chestnut trees from three species: the European chestnut (Castanea sativa), the Japanese chestnut (Castanea crenata), and the Chinese chestnut (Castanea mollissima), as well as their hybrids. We characterized the chestnut trees from these study plots, other chestnut trees planted across the INRAE experimental station, and all remaining chestnut trees found within a radius of 1 km around the chestnut collection (Fig. 1). We genotyped all these trees with SNP markers, identified 113 genotypes ("Multilocus matches" function from GenAlEx, Peakall and Smouse 2012) and assigned them to chestnut species (STRU CTU RE analysis, Pritchard et al. 2000;Larue et al. 2021b). We further described the architecture, male catkins and fruit production of all trees in 2018, and repeated seed set measurements in 2019.

Identification and geolocation of chestnut trees
We identified 275 chestnuts and geolocated them with a Garmin 64st. Tree positions were verified and corrected using QGIS Software (Qgis Desktop 3.16.4) with satellite photos from IGN BdOrtho. Tree coordinates are expressed in Lambert 93. Each tree received a unique identifier according to its position and this ID is used as reference across all files. There is an introduction register of INRAE chestnut germplasm established since 1950, but for this paper, with some few exceptions for illustration purposes, no attempt was made to systematically use common names for the varieties.

Genotyping
Leaves were sampled from all identified trees and stored at − 20 °C until analysis. DNA isolation was performed with a CTAB custom protocol (Larue et al. 2021b). Samples were characterized using 120 Single Nucleotide Polymorphism markers (SNPs) using Agena MassARRAY Platform (Larue et al. 2021b). We identified all samples having the same multilocus genotype using "Multilocus Matches" function from Genalex 6.503 (Peakall and Smouse 2012) and carefully inspected the results manually. We also computed "Multilocus near matches" to verify that different genotypes differ at multiple markers.
When the unique multilocus genotypes were identified (the genets), we obtained the consensus genotypes by summarizing genotypic data from all ramets of each genet. Finally, we used STRU CTU RE software (Pritchard et al. 2000) to assign each ramet to species, as explained in Larue et al. (2021b).

Phenotyping
The architecture of each ramet was described using the diameter at breast height (at 1.3 m) for all stems > 1 cm, total height and canopy average diameter (in meter). We then calculated the basal surface area (in square meter). We measured the density of male flowers of unisexual catkins Fig. 1 Map of INRAE chestnut genetic collection and of all isolated chestnuts found within a radius of 1 km around the collection (number of flowers per square meter) to estimate the capacity of each ramet to produce pollen. We identified flower type ( Fig. 2) according to Solignat and Chapa (1975), measured catkins length (in centimeter) and diameter (in millimeter), and estimated relative stamen density. The phenology of all trees was recorded twice a week in late spring of 2018 (from June to mid-July) using a specifically developed standardized scale (Larue et al. 2021c). Briefly, at each visit, each tree receives three scores, one for male flowers of unisexual male catkins, one for female flowers and one for male flowers of bisexual catkins. We estimated burr production (burr number per square meter) in July by counting the number of burrs in the canopy or on the ground underneath each tree. Finally, we collected burrs in the fall and estimated seed set by counting the number of developed nuts per burr (female inflorescences of chestnut trees are composed of three female flowers located side by side). Each flower, if pollinated, produces a fruit surrounded by the pericarp; if pollination fails, the pericarp is still present but remains empty. If the three flowers of an inflorescence are not pollinated, the burr contains three empty nuts. This dataset is composed of 17 Excel files: 1 file describing all variables entitled "0_0_Read_Me.xlsx" and 16 data files, described below:

Access to the data and metadata description
We labeled and mapped all chestnut trees located inside and outside the INRAE plantations: -1_1_List_Chestnuts.xlsx contains a unique identifier for each chestnut tree. -1_2_List_INRAE_Chestnuts_Germplasm_Collection.
xlsx is a list of all chestnut trees that are part of the two plantations, making-up the INRAE germplasm collection (excluding isolated trees that were not planted as part of this germplasm collection).
We genotyped all these chestnut trees at SNP markers: -2_1_Genotypes.xlsx is the corresponding raw data.
-2_2_Genotypes_Genalex_Input.xlsx includes the SNP genotypes in Genalex format. We identified trees having the same multilocus genotype (clones):

-3_1_Clonal identification.xlsx
We compiled a list of all unique multilocus genotypes (the genets): -4_1_Consensus_genotypes_Genets.xlsx We then attributed a genet to each ramet (several ramets of the same genet are grafted on different rootstocks). -5_1_Genotypes_Ramets.xlsx Using a Bayesian approach (Structure), we assigned each genet to the different gene pools corresponding to different chestnut species.
-6_1_Species_Identification_Genets.xlsx We listed 275 chestnut trees: 244 adult trees in the INRAE germplasm collection (A plot = 29 trees/E plot = 215 trees), 24 small trees in the nursery, and 7 adult trees outside INRAE campus. All trees are geolocated, but all the young trees of the nursery are represented by a single GPS point. Of the 275 trees identified, two died before collecting leaves for DNA isolation, so we have the genotypes of 273 individuals. We identified 113 unique genotypes (genet), with an average of 2,4 ramets/genet. All this information is summarized in: -8_1_Summary. xlsx All chestnuts are located on a map: -8_2_Map. xlsx

Technical validation
We validated the dataset first by hand and then using numerical and graphical analyses with R software (R software v4.0.4). Laboratory and measure equipment were regularly calibrated, and standards were used for each analysis. Genotyping errors with SNPs using MassArray platform are extremely rare (Guichoux et al. 2011;Larue et al. 2021b), and it is therefore possible to quickly and reliably characterize a large number of samples at low cost. The genetic characterization of this collection is a first step for the creation of a database to describe chestnut cultivars. Users will be able to genotype their samples with the same markers (or a subset of them) and compare their results with the database.

Reuse potential and limits
The reuse of the data presented here is simple. The Excel files can be easily imported into R by saving them as.txt or.csv files with minor modifications. Parts of the data were used successfully in previous studies, demonstrating their usefulness and portability. For instance, we successfully performed Structure analyses (Pritchard et al. 2000) with 68 SNPs and 94 SSRs (Larue et al. 2021b), showing that the SNP markers are very reliable to identify clones, species and interspecific hybrids, including advanced hybrids.
The collected phenology data are also very detailed, allowing inter-varietal and interspecific comparisons, as performed in Larue et al. (2021c). In Larue et al. (2021a), we describe in detail the phenology of two ramets for each of eight genets. We show that whereas the phenology of the different genets can vary greatly, it is very repeatable among ramets of the same genet. Seed set measurements also provide valuable data that allow us to highlight differences in probability of fecundation according to the flower types. In particular, we have shown that astaminate trees have a higher see-set than staminate tree (results not shown).
A limitation of the present work is that we have not attempted to provide common names for all accessions studied. This would require a lot of curation, which is under way. Indeed, by genotyping the chestnut collection, we have highlighted problems of varietal identification. In some cases, a single genet has been designated with several cultivar names, a case of synonymy. In other cases, different genets have received the same cultivar name, a case of homonymy. At this point, trees are therefore named only according to their position in the plot.
In the literature, male catkins are classified in four categories according to the length of stamen filaments: astaminate (no stamens emerging from the flowers), brachystaminate (stamens 1-3 mm), mesostaminate (stamens 3-5 mm) and longistaminate (stamens > 5 mm), with pollen production of the tree strongly depending on flower type. However, variation of pollen production across trees is a continuous trait and these categories have limits.
Note also that when we calculated the seed set, we measured the number of fruits per burr, i.e. the number of flowers in each female inflorescence that give a fruit, but not burr set, i.e. the percentage of female inflorescences that give burrs. The probability of fecundity is therefore overestimated. To better estimate pollination success, it would be necessary to measure both burr set and seed set, which would be very labor-intensive.