Genetic Status of the Swedish Central collection of heirloom apple cultivars

Cultivated apple is one of the most widely grown fruit crops worldwide. With the introduction of modern apple cultivars, from foreign and national breeding programs, the use of local cultivars decreased during the 20th century. In order to minimize genetic erosion and avoid loss of special genotypes, a number of local clonal archives were established across Sweden, with the goal of retaining old and local cultivars. About 220 apple cultivars, appointed for preservation, obtained the status of mandate cultivars. Initially, they were identified based on pomological traits, but prior to the establishment of the Swedish Central Collection they were genotyped with simple sequence repeat (SSR) markers. SSR markers helped to evaluate the status of the preserved material, as well as to find the best possible true-to-type source for propagation, thus guiding the establishment of the Central Collection. Recently, 215 accessions from this collection were genotyped using the 20 K apple Infinium® single nucleotide polymorphism (SNP) array, in order to gain insight into its genetic structure. The initial SSR analysis confirmed the identity of multiple samples with the same cultivar name grown in different locations and identified several mislabeled samples. In the subsequent SNP analysis we identified 30 clonal relationships and a number of parent-offspring relationships, including 18 trios. We also identified five cultivar samples with inconsistent ploidy levels between the SNP and SSR data, in some cases indicating problematic samples preserved in either the Central Collection or some of the local clonal archives. These cultivars need further investigation to ensure their true-to-typeness. Furthermore, the Swedish Central Collection has continued to grow since the onset of this work and now contains additional cultivars, which should be included in future studies. The results indicate that a number of the preserved mandate cultivars holds high potential value for modern breeding programs.


Introduction
The cultivated apple (Malus domestica Borkh.) is one of the world's most widely grown fruit crops. It belongs to the family Rosaceae, is diploid (2n = 34x), and has life-history traits characteristic of perennial fruit crops, i.e., it is outcrossing and to a large extent self-incompatible, has a long juvenile period and a long life span, and is often clonally propagated (Gaut et al., 2015). This generally leads to a high degree of heterozygosity and a large proportion of genetic diversity being retained following domestication (Miller and Gross, 2011). These lifehistory traits also facilitate close genetic relationships between very old cultivars and those emerging from modern breeding programs. In Sweden, a program studying pollination compatibility between cultivars was established in Alnarp in the 1920s. Seeds from interesting crosses were sown out and resulted in a number of cultivars being released (Nilsson, 1987). The current Swedish apple breeding program was established in Balsgård in the 1940s, and since then has released a number of cultivars adapted to the Scandinavian climate, both for commercial production and home gardening (Nybom, 2019). Some of the cultivars released from these two programs have been appointed for preservation, despite their modern source.
As modern cultivars increased in popularity among commercial growers, a national inventory and collection of local cultivars was initiated in 1979 by Nordiska Genbanken (now NordGen), to prevent loss of genetic resources. This resulted in the establishment of the first local clonal archives spread across the country and a compilation of https://doi.org/10.1016/j.scienta.2020.109599 Received 28 February 2020; Received in revised form 9 June 2020; Accepted 9 July 2020 pomological descriptions (Nilsson, 1987). Today the responsibility for conservation of heirloom apple cultivars lies with the National Gene Bank of Vegetatively Propagated Crops, similarly to other Scandinavian countries, e.g. Norway (Gasi et al., 2016) and Finland (Heinonen and Bitz, 2019). Starting in 2013, the local clonal archives were complemented by the establishment of the Swedish Central Collection of mandate cultivars at SLU in Alnarp, with the majority of the trees being planted in 2014 and 2015. Mandate cultivars are heirloom cultivars assigned for preservation by the National Gene Bank and are defined as either i) being of local origin according to lore, or ii) originating from Swedish breeding programs, or iii) foreign cultivars with a long documented history of cultivation (Hjalmarsson, 2019, Hjalmarsson, 2020. Thus, the primary criteria for the selection of mandate cultivars was their specific cultural importance and not the inherent genetic diversity. The Swedish Central Collection of mandate cultivars is expected to include genetic variation relevant for local adaptation, as it contains cultivars with a wide range of climate hardiness (Nilsson, 1987). This collection has the potential to become a rich source of favorable alleles for the Swedish apple breeding program. Current knowledge concerning relationships among the Swedish mandate cultivars is limited, and mainly based on historical pomological records (Nilsson, 1987), pedigree records in the breeding programs, and previous studies using SSR markers (Garkava-Gustavsson et al., 2008;Urrestarazu et al., 2016). Recently, the pedigrees of some cultivars in the collection have been elucidated using a large set of SNP markers (Muranty et al., 2020). Clarifying unknown pedigrees greatly enhances the prospects for germplasm utilization, by increasing the accuracy of pedigree-based quantitative trait loci analyses (Howard et al., 2017) and reducing the risk of unwanted inbreeding in future crosses.
Genotyping of gene bank material is crucial in order to ensure that genetic variation is maintained, but also to make gene bank material a useful resource for breeding. In the past, SSR markers have been widely used in germplasm studies of apple at the national, regional, and European level (Garkava-Gustavsson et al., 2013, 2008Gasi et al., 2016;Heinonen and Bitz, 2019;Larsen et al., 2017;Lassois et al., 2016;Marconi et al., 2018;Patocchi et al., 2009;Routson et al., 2009;Urrestarazu et al., 2016;van Treuren et al., 2010). More recently, genotyping-by-sequencing (GBS), SNP arrays, and diversity array technology (DArT) markers have been used to unravel apple pedigrees (Larsen et al., 2018;Muranty et al., 2020;Ordidge et al., 2018;Vanderzande et al., 2017). The IRSC 8 K SNP array (Chagné et al., 2012) available for apple has been useful in elucidating pedigrees (Howard et al., 2017), while the 20 K Infinium® SNP array (Bianco et al., 2014) that followed is currently being used in an ongoing apple pedigree project (Howard et al., 2018). The latter has also been used in the construction of a high-density genetic map  and QTL-discovery for several traits (Allard et al., 2016;Di Guardo et al., 2017;van de Weg et al., 2018). Thus, genotyping a collection with the 20 K SNP array not only provides robust assessments of pedigree inference and population structure, but also improves the usefulness of the collection by allowing integration with larger germplasms and facilitates its implementation in a breeding-by-design concept (Peleman and Van Der Voort, 2003) that utilizes published information on the genetic control of relevant traits.
In the present study, we first demonstrate how SSR markers were used to assist the establishment of the Swedish Central Collection of mandate cultivars by identifying the best possible true-to-type source of graft wood for the trees intended for preservation in the Central Collection. Recently, we used the 20 K apple SNP array to characterize 215 apple accessions from the Swedish Central Collection. The genotypic data was used to describe the status of the collection by identifying clonal relationships, putative 1st degree relationships, and possible parent-offspring relationships, and by assessing population structure.

Plant material
For SSR analysis, young leaves from a total of 204 apple cultivars were collected in Mayearly June (2007)(2008)(2009) and stored at −80°C until use. Samples were collected from different clonal archives, the stock collection of the Swedish Elite Plant Station (EPS, holding true-totype and virus-free genotypes in stock orchards in Fjälkestad), the research-and breeding-oriented germplasm collection at the Swedish University of Agricultural Sciences (SLU-Balsgård) and the Institute of Horticulture in Kiev (Ukraine). If several trees of the same cultivar were present in different collections, all those were sampled and analyzed resulting in 340 samples (Supplementary File 1).
For SNP genotyping, leaf samples were collected from 215 accessions of Malus domestica Borkh. in the Swedish Central Collection of mandate cultivars (Alnarp, Sweden), including two rootstocks: 'A2′ on which the collection is grafted and 'Bemali' (Supplementary File 1). Young leaves from growing shoots of a single tree were collected at the end of May 2018, freeze-dried, and stored at −80°C. Extraction of total genomic DNA was performed using the DNeasy Plant Mini Kit (Qiagen) following the standard protocol. The quality and concentrations of the DNA samples were measured by spectrometry (Nanodrop, Thermo Scientific). The sample names used in this work are the accepted names according to the Swedish Utility and Cultivated Plants Database (SKUD) in January 2020, whenever such were available.
Amplifications were performed for each primer pair separately as described in (Garkava-Gustavsson et al., 2008). PCR-products were pseudo-multiplexed before being analyzed on a 3730 DNA analyzer (Applied Biosystems). Size of the amplified products was calculated based on an internal standard (500ROXTM Size Standard (Applied Biosystems)) and evaluated using GeneMapper ® software v. 3.0 (Applied Biosystems).

Flow cytometry
The flow cytometry analyses were conducted on the samples showing more than two alleles at the Plant Cytometry Services (JG Schijndel) in the Netherlands. The fresh leaves were chopped in ice-cold buffer with DAPI, and flow cytometry was performed on a CyFlow ML (Partec GmbH, Műnster, Germany) using Lactuca sativa as internal standard, and 'Cox Orange' as a ploidy level standard.

Analysis of SSR-data
The SSR fragments were scored in terms of loci and alleles, and thus the allelic composition of each sample was determined. Then Simple Matching coefficient (Sokal and Michener, 1958) was calculated for all the analyzed samples with NTSYS-pc statistical package v. 2.2 (Rohlf, 2005) to produce a similarity matrix and detect accessions with identical allelic profiles (a few mis-matches in one allele were allowed for the genotypes to be considered as identical). Allelic compositions were examined to verify the identity of cultivars with their known sports, to assure that accessions with the same name had identical profile, as well as to identify problematic cases (e.g., obvious identification mistakes and mislabeling).
Next, all triploid cultivars and a tetraploid cultivar were removed from the data set. The genetic diversity of the remaining 150 unique profiles of diploid cultivars was estimated using GenAlex 6.503 Smouse, 2006, 2012). Genetic diversity parameters analyzed included: number of different alleles (Na), number of effective alleles (Ne), Shannon's information index (I), observed heterozygosity (Ho), expected heterozygosity (He), unbiased expected heterozygosity (uHe) and inbreeding coefficient (F).

SNP genotyping
The SNP genotyping was carried out using the 20 K Infinium® SNP array (Illumina Inc.) (Bianco et al., 2014). Samples comprising 200 ng of genomic DNA from each of the 215 accessions were analyzed, following the standard Illumina protocol detailed in Chagné et al. (2012). Intensity data were analyzed using the workflow described by Vanderzande et al. (2019), with one major deviation. In Vanderzande et al. (2019) the cluster definitions were obtained from ASSiST (Di Guardo et al., 2015), while in this project we used manually adjusted cluster definitions and a subset of SNPs obtained through an ongoing apple pedigree project (Howard et al., 2018) consisting of 10,368 SNPs. The cluster definitions were loaded to Genome Studio (GS), v 2.0 (Illumina Inc.) was implemented, and the SNP genotype calls obtained were used for further analyses. Sample quality was assessed using the overall call rate and the histogram distributions of the B-allele frequency and the R-parameter.

SNP-data analysis
Putative ploidy levels were determined using frequency plots according to Chagné et al. (2015), as described by Vanderzande et al. (2019). Duplicate individuals were identified using PLINK 1.9 (Chang et al., 2015), using the Pi_Hat parameter as an approximation of the global pairwise identity-by-decent (IBD) of individuals. PLINK utilizes a method-of-moments approach to estimate proportion of alleles shared IBD (PI_HAT, π ): where P(Z = 1) and P(Z = 2) is the global estimate of the proportion of loci with 1 and 2 alleles in common, respectively (Purcell et al., 2007). Duplicates as well as known sports were used as quality controls, to verify the accuracy of the SNP genotyping procedure. In the case of unexpected duplicates, these were further examined for morphology of individual trees, records from sample preparations, historical descriptions and SSR data from local archives. All duplicates were removed from the dataset, which was then analyzed with PLINK to identify putative 1st degree relationships, i.e., pairs of individuals with coefficients of relationship equal to 0.5, such as full siblings and parent-offspring relationships.
The available pedigree records were verified using the R-script for counting Mendelian inconsistent errors, as described by Vanderzande et al. (2019). The pedigree records were adjusted for any errors detected and the script was re-run to identify possible parent-offspring combinations, using thresholds of 30 and 60 Mendelian inconsistent errors for parent-offspring (P-O) and parent-parent-offspring (P-P-O) combinations, respectively. To illustrate the results, network plots were generated using the iGraph package (Csardi and Nepusz, 2006) in RStudio version 1.1.463.
Assessment of population structure was performed following removal of synonymous samples, but without pruning for linkage disequilibrium (LD) or minor allele frequencies (MAF). Principal component analysis (PCA) was carried out in PLINK and plotted in Excel 2016 (Microsoft). Statistical analysis was carried out using a model-based clustering as implemented in the software STRUCTURE v2.3.4 (Falush et al., 2007(Falush et al., , 2003Pritchard et al., 2000). STRUCTURE was run with Parallel GNU (Tange, 2011) while assuming admixture, correlated allele frequencies between population, and implementing the LD model, using the genetic positions from Di Pierro et al. (2016). Initially, a single-replicate analysis was run assuming 1-10 subpopulations (K), in order to determine suitable settings. Five independent replicates were then performed, with a burn-in of 15,000 and a run length of 50,000 for each K from 1 to 5. Next, the most probable number of subpopulations was estimated using the method from Evanno et al. (2005), as implemented in Structure Harvester (Earl and von Holdt, 2012).

Ploidy level and duplicates
All primer pairs produced clear and consistent DNA-profiles, which were used in the analysis. In all loci except two (CH04c06 L1 and CH02c02b) three alleles were found in some genotypes. Some of these were previously known triploid cultivars, e.g. 'Gravensteiner'. However, some cultivars were not previously described as being triploid and their ploidy level was confirmed by flow cytometry analyses, e.g. 'Sköldinge'. Analysis of the similarity matrix confirmed a number of samples with identical SSR profiles, as expected as multiple samples of the same cultivar had been collected from different locations. Others concerned samples with an SSR based indication of a need for further investigation, i.e. were 9 cases of samples with the same name having distinct SSR profiles when collected from different collections. In addition, there were 49 cases of multiple cultivars sharing the same SSR profile, indicating potentially synonymous cultivars or mislabeling (Supplemental File 1).

Genetic diversity
The number of alleles varied from 2 (CH04c06 L1) to 20 (CH02c06) ( Table 1). Loci with a high number of alleles also produced high values for H o and H e , and vice versa. For each locus, H o values (from 0.30 to 0.89) were quite similar to H e values (from 0.34 to 0.88). The highest H o was found at locus CH02b10, while the lowest at locus CH02c02b. Locus CH02b10 yielded the highest H e value, while locus CH04c06 L1 had the lowest. All loci had F coefficients bellow 0.1, except CH02c02b which deviated with a value of 0.49.

SNP quality check and ploidy
Of the 10,368 SNPs considered here, 406 showed significant deviations from Hardy-Weinberg equilibrium ('ChiTest100′ column in the Table 1 Genetic diversity in terms of number of different alleles (Na), number of effective alleles (Ne), Shannon's information index (I), observed heterozygosity (Ho) expected heterozygosity (He), unbiased expected heterozygosity (uHe) and inbreeding coefficient (F) from the 150 unique diploid genotypes analyzed. GS SNP table) in this collection. Six of these SNPs were not called, 35 displayed an excess of homozygosity and the remaining 365 SNPs displayed an excess of heterozygosity. Since the fraction of markers deviating from Hardy-Weinberg equilibrium was low (< 5%), these markers were included in the downstream analyses.
Using cluster definitions and a robust subset of SNPs from an ongoing apple pedigree project (Howard et al., 2018), all samples had an overall call rate above 0.88 (except for the tetraploid 'Alfa68′, which had a call rate of 0.80) and p50 GC values between 0.75 and 0.81. All samples had B-allele frequency (distinct heterozygous and homozygous peaks) and R-parameter histograms (low values on the x-axis) in GS, indicating good sample quality without contamination (distinct heterozygous and homozygous peaks), as described by Vanderzande et al. (2019). Next, we inspected the B-allele frequency plots generated using the method described in Chagné et al. (2015) which, together with the B-allele frequency histograms in GS, allowed us to identify sample ploidy. A few samples had ploidy levels differing from those revealed based on SSR markers (Table 2). Interestingly, 'Tegnérsäpple' showed a B-allele frequency plot indicating a translocation/aneuploidy at Linkage Group 15 (Supplementary File 2)

Duplicates in the Swedish Central Collection
Duplicate samples were defined as having global pairwise estimates of IBD > 0.999, as there were no sample pairs with IBD values between 0.99 and 0.7. Analysis for the presence of duplicates resulted in 185 unique samples out of the total 215 apple accessions tested. In the case of triploid synonymous samples, their B-allele frequencies were investigated to ensure that they had the same heterozygous state (i.e., either 'AAB' or 'ABB'). All synonymous triploid samples had B-allele frequencies with correlations above 0.99, indicating that they were identical. All expected duplicates were identified, both duplicate samples and color sports. However, we identified a number of unexpected duplicates and these were compared against the SSR data obtained from the local clonal archives. Two cases, 'Rödluvan'/'Borsdorfer' and 'A2′/ 'Björnegårdsäpple', were clear examples of mislabeling or propagation mistakes, and the true-to-type individuals were revealed based on examination of morphological characters. In the case of 'Cox's Orange Pippin'/'Skälbyäpple', the IBD coefficients to other cultivars were investigated and were found to be consistent with the pedigree of 'Cox's Orange Pippin'. Duplicate samples were excluded from further analysis (Table 3).

1st degree relationships, P-O relationships, and possible crosses
Having removed duplicates, the remaining 185 samples were reanalyzed for pair-wise IBD. A threshold for putative 1st degree relationships was set at IBD = 0.484 (0.5-1/64), guided by the IBD values of samples with known relationships. A draft pedigree based on previous SSR studies (Urrestarazu et al., 2016) and historical records (Svensson, 2005) concerning the cultivars was then constructed and analyzed for pedigree errors using an R-script as described by Vanderzande et al. (2019). Erroneous parents (3 cases; Supplementary File 1) were removed and the pedigree was re-analyzed to search for possible parent-offspring combinations. Nineteen possible crosses were identified, all of which appeared plausible according to the historical record of cultivar ages (Nilsson, 1987;Svensson, 2005) (Table 4). Most of the offspring were local Swedish cultivars, with two exceptions, 'Transparante Blanche' and 'Gestreifter Wintercalvill'. These are thought to be of Baltic/Russian and German origin, respectively, and both have possible parents that are either very old or of unknown origin (Nilsson, 1987).

Possible parents
A number of possible parent-offspring relationships within the collection were also identified, heavily biased towards a few cultivars. The cultivars with the highest number of possible parent-offspring relationships were 'Gimmersta' (29), 'Grågylling' (19), 'Vitgylling' (16), 'Rosenhäger' (7), and 'Klockhammarsäpple' (6). For most cultivars, the number of possible parent-offspring relationships was in line with the number of 1st degree relationships estimated from IBD values (Supplementary File 1).

Network
Based on the possible parent-parent-offspring combinations identified, parent-offspring relations, and estimated 1st degree relationships, a single network was identified for the collection, comprising 115 cultivars interconnected through 1st degree relationships (Fig. 1). Forty-nine of the 185 unique cultivars were found to have no 1st degree relationships within the collection. A number of putative 1st degree relationships that are not supported by the test for Mendelian inconsistent errors were also highlighted. Notably, 'Sköldinge' and 'Alnarps Favorit' had a low number of possible parent-offspring relationships, while having a relatively high number of putative 1st degree relationships.

Genetic structure
In PCA, the first three principal components explained 9.0, 6.1, and 4.8 % of the genetic variation in the SNP set in the collection, with no evident population structure (Fig. 2). Accordingly, the STRUCTURE analysis yielded the highest L(K) value for K = 1 and ΔK dropped to 5 already at K = 2, indicating that the Swedish Central Collection most likely consists of one sub-population (Supplementary File 1).

SSR-analysis of local clonal archives
The SSR analysis of the local clonal archives elucidated several accession that were either duplicates or mislabelings. Some of the interesting cases involved the different accessions traditionally referred to as 'Antonovka'. 'Antonovka' from Bergianska trädgården has the same SSR profile as 'Antonovka' from Julita and 'Antonovka Pamtorutka' from Balsgård. The latter was found to be identical to 'Antonovka Polotora Funtovaja' by Urrestarazu et al. (2016), which has been the preferred name in subsequent studies (Muranty et al., 2020). In contrast, 'Antonovka' from Ekebyhov is identical to 'Antonovka Kamenichka' at Balsgård and a sample of 'Antonovka' imported from the Institute of Horticulture, Kiev (Ukraine). According to the same studies as above, the preferred name for this cultivar has been 'Antonovka Obyknovennaja' (Muranty et al., 2020;Urrestarazu et al., 2016). 'Antonovka Polutora Funtovaja' and 'Antonovka Obyknovennaja' was recently found to have a parent-offspring relationship, which was directed based on historical data with the latter suggested as being one of the possible parents (Muranty et al., 2020). For the establishment of the Central Collection 'Antonovka' from Bergianska trädgården was used as graft wood source, i.e. 'Antonovka Polotora Funtovaja'. Thus, the collection might benefit from including 'Antonovka Obyknovennaja' as well as it is likely to have a long history of cultivation in Sweden, even though there has been some confusion in past as both of these cultivars have commonly been designated as simply 'Antonovka'. There were also some clear cases of mislabelling, e.g. 'Eva-Lotta' in one of the local archives appears to be identical to 'Alice' and 'Hornsberg' in one of the local archives was identical to 'Hanaskogsäpple'. Revealing of such cases provided valuable support for the choice of graft wood source for the establishment of the central Swedish Central Collection. Noteably, some accessions from the breeding collection in Balsgård appears to be not true to type, including 'Domö Favorit', 'Menigasker' and 'Silva' which all have different SSR profiles than accessions with the same names at the local clonal archives. Though beyond the scope of this study, this might be a relevant note, as several international studies including old Swedish cultivars have been based on samples obtained from the Balsgård collection (Muranty et al., 2020;Urrestarazu et al., 2017Urrestarazu et al., , 2016. Of the 12 SSR-markers utilized here, only 5 are common with those used by Urrestarazu et al. (2016) and their usefulness for comparison against a larger germplasm is therefore limited. Nevertheless, SSR profiles have been made available to the MUNQ-database and MUNQ genotype codes have been assigned as previously described, whenever possible (Muranty et al., 2020;Urrestarazu et al., 2016).
In the analysis for genetic diversity only diploid accessions were considered, as triploids are considerably less efficient as parents. Thus, diversity conserved in polyploids has limited impact as an actual resource. Considering the diversity parameters analyzed, the local archives seem to retain a high degree of genetic diversity. The mean observed heterozygosity (Table 1) Table 3 Synonymous sample names based on identity-by-decent (IBD) analysis. Samples 1-4: Synonymous sample names. SSR: Simple sequence repeat data from the samples collected from the local archives confirming that the samples are synonymous (Same) or indicating that the sampled tree was not true to type (Diff). TTT: Phenotypic characters of the sampled trees in situ indicated that one of the sampled trees was not true-to-type (Diff).   Nilsson (1987). n/a = unknown origin, old = cultivars of unclear origin, but older than ∼1850, *arrival in Sweden, ** from Svensson (2005   Cultivars originating from the current Swedish breeding program in Balsgård (diamond), from the program in Alnarp (triangle) and their respective parents (star and cross) are indicated. Cultivars which can be considered as suitable parents for future breeding efforts, for northern Sweden, are labeled with names.

2018).
It should be noted, however, that the main criterion for the establishment of the Swedish Central Collection was not genetic diversity per se, but rather cultural diversity. Consequently, several duplicates with different names that were revealed by analysis with SSR markers were nonetheless considered as distinct mandate cultivars. While being genetically identical, duplicate accessions might still be sports, which can be difficult to distinguish without detailed investigations of morphology in common gardens. Another aspect is the richness in folk lore accompanying many of the old cultivars, which cases can be just as important as the genetic properties of an old cultivar.

Central collection -ploidy and synonyms
From the SNP markers analysis, we found the Swedish Central Collection of mandate cultivars to be comprised of 190 diploid, 24 triploid, and 1 tetraploid accessions, corresponding to 89, 10, and 1 % of all unique samples. Slightly higher frequencies of triploids have recently been reported for Belgian and Danish collections (15 % and 19 %, respectively). Similarly, we found the Swedish central collection to contain 86 % unique accessions, which is in line with the 79 % and 85 % reported for the Belgian and Danish collections, respectively (Larsen et al., 2018;Vanderzande et al., 2017).
When comparing ploidy levels with those obtained during SSR analysis of the same cultivars, but from the local clonal archives, we found five inconsistent cases. For one such case, 'Skälbyäpple', the sample analyzed from the local clonal archive was triploid and genetically identical with 'Gravensteiner', and accordingly a different source for budwood was used, which had not been analyzed prior to propagation. In the Central Collection, 'Skälbyäpple' was found to be morphologically similar to 'Cox's Orange Pippin' (Nilsson, 1987), while also having estimated global pairwise-IBD to other accessions that are in line with the pedigree of 'Cox's Orange Pippin'. According to pomological description 'Skälbyäpple' appears to be quite similar to 'Cox's Orange Pippin', thus it appears likely that 'Skälbyäpple' is synonymous with 'Cox's Orange Pippin'. The other accessions with inconsistent ploidy levels identified here were 'Bosebo Sötäpple', 'Cellini', 'Holländare' and 'lPomme de Cannelle' (Table 2). Similar to 'Skälbyäpple', 'lPomme de Cannelle' was propagated from source not analyzed by SSR markers, thus explaining the inconcistency. However, none of the other three samples appears to be synonymous with any other accession in the Central Collection, and this inconsistency thus requires further investigation.
The consistency between the use of SNP array data and flow cytometry to determine ploidy levels has previously been established (Chagné et al., 2015), making incorrect calling of ploidy levels unlikely. Consequently, the accessions from both the local clonal archives and the central collection should be analyzed using the same genetic markers, to clarify these inconsistencies. For example, 'Cellini' has previously been described as diploid (Urrestarazu et al., 2016), making it likely that the 'Cellini' accession present in the central collection is not true-to-type. Interestingly, although the 'Cellini' sample might not be true-to-type, it could still be a valuable part of the collection, as it does not appear to be synonymous with anything else in the collection. Similarly, 'Bergianäpple' has historically been sold as 'Antonovka' (Hjalmarsson, 2019), but it is now recognized as a separate cultivar with, according to our data, no close relationship to 'Antonovka' (IBD = 0.14). Regarding the synonymous samples, for nine out of the 24 pairs, SSR data were lacking for at least one of the accessions. This illustrates the usefulness of the current thorough examination of the collection, and complements the SSR-based information acquired during the establishment of the gene bank. Some of the duplicate pairs/ trios reported here are either duplicates from different sources, e.g., 'Gyllenkroks Astrakan' (EPS) and 'Gyllenkroks Astrakan' (mother tree), or well-known sports such as 'Sävstaholm' and 'P.J. Bergius'. The remaining duplicate pairs would benefit from further genetic investigation of the material in the local clonal archives and from morphological studies, in order to establish whether they are synonymous accessions, mixed-up trees, or sports. For example, 'Hedemoraäpple' appears to be synonymous to 'Grågylling' in the Swedish Central Collection and these two cultivars are described as having rather similar morphology, but 'Hedemoraäpple' is not described as having the twisted stem that is characteristic of 'Grågylling' (Nilsson, 1987). The accessions of 'Rödgylling' from the local clonal archives also had the same SSR profiles as 'Hedemoraäpple' and some accessions labeled as 'Grågylling'. According to lore, 'Hedemoraäpple' is said to originate from a seedling, and thus it would be interesting to investigate whether it is a mix-up or a stem morphology sport, or whether the 'Hedemora' tree investigated previously was not old enough to express the twisted stem trait clearly. Another example is 'Björnegårdsäpple', which is described as "particularly valuable", with a taste and aroma reminiscent of both pineapple and wild strawberries (Nilsson, 1987). In both the Swedish Central Collection and the local clonal archive analyzed, it seems to have been lost and overtaken by shoots from the rootstock. Thus, a new source has to be found for this cultivar through inventory of sources closer to the supposed origin, preferably the mother tree.

Parent-offspring test
Using a parent-offspring test for Mendelian inconsistent errors, we were able to reject some stipulated relationships. For example, 'Kaniker' was speculated by the early 1900s' pomologist Carl Dahl to be the parent of 'Oranie' (Dahl, 1943), but according to our current SNP marker analysis these cultivars clearly did not share a parent-offspring relationship. On the other hand, the data suggest that they might both be the offspring of 'Gimmersta', and their pairwise IBD estimates indicate either a half-sibling or a grandparent-offspring relationship (0.28). According to tradition 'Flädie' is supposed to originate from a seed of a 'Gravensteiner' apple (Dahl, 1943), which is not supported by our data. However, these cultivars appear to share a cryptic relationship (IBD = 0.37). Interestingly, Larsen et al. (2017) also reported a complete lack of parent-offspring relationships involving 'Gravensteiner', despite some suggested by pomologies. Yet another example that has been the cause of speculation is 'Sparreholm'. It was discovered as a pippin by the gardener L. G. Hedlund, who believed it to be the offspring of 'Rosenhäger' and an astrakhan-type cultivar (Dahl, 1943). According to our results, this speculation was correct, as it was identified as the possible offspring of 'Rosenhäger' and 'Gimmersta', the latter being of astrakhan type. More recently, Garkava-Gustavsson et al. (2008) speculated that 'Oranie' could be one of the parents of 'Hanaskogsäpple', based on SSR profiles, which our data seem to support. Similarly, 'Kabbarp' and 'Vittsjöäpple' were found to be genetically similar according to our SSR-analysis. Accordingly, our SNP data suggest that they are likely full siblings (IBD = 0.6), with 'Gimmersta' as one of the parents, as supported by the Mendelian inconsistent errors test.

Possible parent-parent-offspring combinations
From the analysis of the Swedish Central Collection we identified several possible parent-parent-offspring combinations, all of which seem plausible based on historical records of the cultivar ages. All accessions involved were diploid, except for the triploid 'Ullströmsäpple' which is a possible parent of 'Fredriksdalsäpple' (Table 4, Supplementary File 1). Triploid apples perform much better as pollen donors than as mothers (Sato et al., 2007), so it is reasonable to assume that 'Ullströmsäpple' is the father. However, the procedure used here calls triploids only as heterozygous, regardless of their heterozygous state, and some proportion of true heterozygous SNPs can be expected not to be called, as they will have signals outside the heterozygous clusters. Thus, further clarification of putative parent-offspring relationships would clearly benefit from automated SNP calling of triploid samples, allowing more robust integration of triploids with diploid datasets.

Possible parent-offspring combinations
The most common possible parents were all old cultivars of unknown origin, e.g., 'Gimmersta' (29 P-Os), 'Grågylling' (19 P-Os), 'Vitgylling' (16 P-Os), 'Rosenhäger' (7 P-Os) and 'Klockhammarsäpple' (6 P-Os). Thus 'Gimmersta', a sparsely spread local cultivar considered to be related to an astrakhan-type cultivar (Nilsson, 1987), surprisingly emerged as having the highest number of possible parent-offspring relationships in the collection. At the same time, a very old cultivar which was previously among the most common grown in central Sweden, 'White Astrachan' (Nilsson, 1987), is currently missing from the Central Collection. In our SSR analysis of local archives, 'Gimmersta' was found to be identical to samples of 'White Astrachan', as well as the not trueto-type accession of 'Arvidsäpple' preserved in the breeding collection in Balsgård. The accession of 'Arvidsäpple' preserved at Balsgård has previously been found to be synonymous to 'Astrakan Bilyi' (Urrestarazu et al., 2016). In our analysis of SNP data from the Swedish central collection, 'Stäringe Karin' was found to be the offspring of 'Gimmersta' x 'Grågylling', whereas Muranty et al. (2020) found the same cultivar to be the offspring of 'White Astrachan' x 'Grågylling'. While 'Gimmersta' is described as having red fruit over color (Nilsson, 1987) a plausible explanation is that 'Gimmersta' is in fact a sport of 'White Astrachan', which would then explain the rather large number of possible parent-offspring relationships.

Residual putative 1st degree relationships
Two cultivars, 'Sköldinge' and 'Alnarps Favorit', had far more putative 1st degree relationships than possible parent-offspring relationships. Many of the 1st degree relationships suggested by IBD values, but not passing the test threshold for Mendelian inconsistent errors, involved triploid cultivars or occurred within the network of 1st degree relationships. Due to their higher heterozygosities, triploids are expected to have higher IBD using the method-of-moments technique. Despite this, a few recorded heteroploid relationships were confirmed by the test for Mendelian inconsistent errors ('Alfa 68′ -'Filippa' and 'Alnarp Favorit' -'Alfa 68′).
Remaining couples (those not involving 'Sköldinge' and 'Alnarps Favorit') with putative 1st degree relationships, but failing the test threshold for Mendelian inconsistent errors, generally had quite high IBD values (up to 0.6). However, they also had Z0 parameters above 0.1 (Supplementary File 1), illustrating that the method-of-moments technique implemented in PLINK generally inflates IBD values (Morrison, 2013). The structure of the Swedish Central Collection is similar to the Danish collection, in that an equal proportion of the collection belongs to a single network of parent-offspring relationships and has a similar distribution of 1st degree relationships (Larsen et al., 2017).

Genetic structure
The three first principal components explained a greater proportion of the genetic variation (9.0 and 6.15 % for the first two dimensions) than in two recent studies with similar numbers of cultivars that used GBS (2.87 and 2.47 %) or the 8 K apple SNP array (5.9 and 4.5 %) for genotyping (Larsen et al., 2018;Vanderzande et al., 2017). No population structure was evident from the PCA analysis, even though the dataset was not pruned for markers in LD. As reviewed by Miller and Gross (2011), perennial fruit crops generally exhibit limited population structure due to their life history traits such as clonal propagation, outcrossing, and long juvenile periods. Thus, as the Swedish Central Collection was established with the aim of preserving mainly local cultivars, a lack of population structure is to be expected. Comparable studies have found other European collections to consist of different subpopulations. However, they also contain modern cultivars (Lassois et al., 2016;Marconi et al., 2018;Pereira-Lorenzo et al., 2017;Vanderzande et al., 2017) and wild material (Larsen et al., 2018), in addition to local cultivars. An investigation of Norwegian ex situ collections found separate clustering between cultivars from southern countries and traditional Scandinavian or very winter-hardy cultivars (Gasi et al., 2016). Similar clustering patterns have been proposed elsewhere (Garkava-Gustavsson et al., 2013;Urrestarazu et al., 2016), but as the Swedish Central Collection is well curated, containing only mandate cultivars, little clustering is expected. From a genetic resource point of view, a lack of population structure indicates that allelic variants underlying e.g., adaptation to a northern climate are present in an otherwise diverse genetic background, making it a promising germplasm to identify those underlying regions. Based on our PCA analysis, the cultivars released from the Swedish National breeding program at Balsgård seem to encompass only a small fraction of the genetic variation available in the mandate cultivar collection along the two first principal components. Furthermore, a majority of the parents of those cultivars are not represented in the collection (Fig. 2). The Swedish central collection can therefore be expected to contain considerable variation that can invigorate future breeding efforts. Based on the clustering of cultivars released from the Balsgård breeding program, diploid cultivars with coordinates below zero along principal component one and above zero along principal component two might serve to broaden the genepool (upper-left part of Fig. 2). Considering cultivation in northern Sweden as a potential breeding goal for the Swedish breeding program, the cultivars 'Gubbäpple', 'Risäter' and 'Suislepp' as well as 'Förlovningsäpple' and 'Sundsäpple' might be interesting for crosses. 'Gubbäpple' is described as having big fruits similar to 'Gravensteiner', with a balanced sweetness and acidity as well as crispy flesh and an aroma reminiscent of almonds. 'Risäter' is described primarily suited as a cooking apple but is hardy, resistant to diseases and having broad branch angles. 'Suislepp' is a cultivar of Estonian origin, which is considered to combine excellent hardiness and a good, aromatic, taste. 'Förlovningsäpple' and 'Sundsäpple' are very briefly described in the literature, but originate from the northern part of Sweden and can thus be expected to be adapted to the climate (Hjalmarsson, 2019;Nilsson, 1987;Svensson, 2005). To our knowledge, these are not the recorded parents of any of the cultivars released from the Swedish breeding program in Balsgård, which has mainly been directed at commercial apple production in Sweden's most southern province Scania, but this does not exclude the possibility that these cultivars have been used in unsuccessful crosses.

Conclusions
SSR markers were useful for first screening of the material preserved in local clonal archives and the collection at SLU-Balsgård and the Swedish Elite Plant Station in order to pinpoint putative identification mistakes as well as providing guidance in choosing the best possible source of graft wood for the Swedish Central Collection. The later has subsequently been analyzed using a subset of 10,368 SNPs from the 20 K SNP array. Thus, we obtained robust information from characterization of the Swedish central collection of mandate cultivars. A few, previously unknown, clonal relationships were identified and these might merit revision of cultivar names. It was also possible to identify putative ploidy levels and possible parent-offspring relationships.
Notably, a few cases were identified where the SNP data were in conflict with data obtained from the presumed graft-wood source using SSR markers. In order to guarantee the true-to-typeness and genetic integrity of the Swedish Central Collection and its congruence with local clonal archives, these conflicting cases should be investigated further to identify the sources of the errors. Furthermore, the collection has continued to grow since the onset of this study, and thus molecularbased efforts should continue. Despite considerable phenotypic variation in traits related to hardiness and diverse geographic origins, the collection exhibits limited genetic structure. This information is expected to help improving the future development of the Swedish Central Collection and to facilitate use of the material in breeding programs.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.