Whole Genome Comparison Reveals High Levels of Inbreeding and Strain Redundancy Across the Spectrum of Commercial Wine Strains of Saccharomyces cerevisiae

Humans have been consuming wines for more than 7000 yr . For most of this time, fermentations were presumably performed by strains of Saccharomyces cerevisiae that naturally found their way into the fermenting must . In contrast, most commercial wines are now produced by inoculation with pure yeast monocultures, ensuring consistent, reliable and reproducible fermentations, and there are now hundreds of these yeast starter cultures commercially available. In order to thoroughly investigate the genetic diversity that has been captured by over 50 yr of commercial wine yeast development and domestication, whole genome sequencing has been performed on 212 strains of S. cerevisiae, including 119 commercial wine and brewing starter strains, and wine isolates from across seven decades. Comparative genomic analysis indicates that, despite their large numbers, commercial strains, and wine strains in general, are extremely similar genetically, possessing all of the hallmarks of a population bottle-neck, and high levels of inbreeding. In addition, many commercial strains from multiple suppliers are nearly genetically identical, suggesting that the limits of effective genetic variation within this genetically narrow group may be approaching saturation.

genome sequencing industrial yeast comparative genomics fermentation Humans have been producing and consuming wines for more than 7000 yr, making wine one of the first processed agricultural products (Sicard and Legras 2011). Until the middle of the 20th century, winemaking relied on naturally occurring yeasts to complete the fermentation process. However, spontaneous fermentations such as these, produced inconsistent results from vintage to vintage and, due to their protracted fermen-tation times, were often vulnerable to spoilage by undesirable yeast and/or bacteria.
One of the most significant technological advances in winemaking was the introduction of pure starter strains of the major wine yeast, Saccharomyces cerevisiae, in the 1950s, with the first commercial active dried starters being released in 1965. Most commercial wine fermentations are now inoculated with high numbers of these selected strains directly after the grapes are crushed, to ensure consistent, reliable and reproducible fermentations (Heard and Fleet 1985;Henick-Kling et al. 1998). Since their introduction, hundreds of strains of S. cerevisiae have been have been developed into a wide variety of commercial starter cultures.
Genome sequencing has shown that, in general, vineyard and wine strains form a phylogenetically related group (Fay and Benavides 2005;Liti et al. 2009;Borneman et al. 2011). Recently, this group has also been shown to contain strains from Mediterranean oaks, which may be the historical progenitor of "domesticated" wine yeasts (Almeida et al. of Europe are unrelated to "indigenous" S. cerevisiae strains, except in cases of close proximity to winemaking environs (Hyma and Fay 2013). This suggests that European wine strains have accompanied the migration of winemaking around the globe, and are maintained as distinct populations through phenotypic selection (Fay et al. 2004;Warringer et al. 2011;Clowers et al. 2015a). Interestingly, despite their common geographic origins, and roles in the production of alcoholic beverages, wine strains are also genetically distinct from S. cerevisiae strains used for brewing (Borneman et al. 2011;Dunn et al. 2012).
In order to investigate the genetic diversity that has been captured by over 50 yr of commercial wine yeast development, whole genome sequencing was performed on 212 strains of S. cerevisiae, including 106 commercial wine starter strains from nine different commercial yeast suppliers. In addition to the wine yeast strains, 13 commercially available brewing strains were also sequenced to compare general features of the two industrial groups. Comparative genomic analysis shows that, despite their large numbers, commercial strains, and wine strains in general, remain genetically similar, with a population bottle-neck and/or high levels of inbreeding apparent between isolates. In addition, commercial strains from multiple suppliers are often genetically identical, suggesting that the limits of effective natural variation within this group may have been reached.

MATERIALS AND METHODS
DNA isolation, extraction, and sequencing Noncommercial wine yeast isolates were obtained from The Australian Wine Research Institute (AWRI) Microorganisms Culture Collection. Commercial strains were obtained as purified strains from the manufacturer, or were sourced as active-dried yeast preparations.
For strains from the AWRI culture collection, samples were grown overnight at 28°in YPD. For the commercial strains obtained from freeze-dried packets, 5 g of active dry yeast was rehydrated in 50 ml of water (38°) for 20-40 min to obtain an homogenous suspension of rehydrated yeast; 25 ml of rehydrated yeast was then inoculated into 5 ml of YPD and grown overnight at 28°. For both culture types, 1.5 ml of the overnight culture was used for DNA extraction (Gentra Puregene Yeast/Bact kit, Qiagen). For strains from the White labs (http://www.whitelabs.com), and WYeast collections (https://www. wyeastlab.com), strains were grown overnight in YPD at 30°with shaking. Washed cell pellets were frozen and lyophilized, and DNA extracted via the CTAB extraction method as described (Pitkin et al. 1996).
Genomic libraries for AWRI strains were prepared using the Nextera XT platform (Illumina), and sequenced using Illumina Miseq, paired-end 300 bp chemistry (Ramaciottti Centre for Functional Genomics, University of New South Wales, Australia). White Labs and WYeast strains were sequenced using paired-end 100 bp chemistry (BGI).
Sequence processing and reference-based alignment An extended Saccharomyces sensu stricto reference sequence was assembled from existing genomic sequences for S. cerevisiae (Goffeau et al. 1996), S. paradoxus, S. mikatae, S. kudriavzevii, S. uvarum (Scannell et al. 2011), and S. arboricolus (Liti et al. 2013). As a de novo assembled genome was not available for S. eubayanus, the non-S. cerevisiae contribution of the S. pastorianus genome (S. cerevisiae-S. eubayanus hybrid) was used as a proxy (Nakao et al. 2009). In addition to these reference genomes, 26 pan-genomic segments from S. cerevisiae were included in order to track the presence of these elements (Supplemental Material, File S1), which included key industry-associated elements from wine, brewing, biofuel, and sake yeasts (Ness and Aigle 1995;p. 6 in Hall and Dietrich 2007;Novo et al. 2009;Argueso et al. 2009;Borneman et al. 2011;Akao et al. 2011).

Genome analysis
Copy number analysis was performed on the per-base coverage information included in the output of samtools mileup (v1.2; Li et al. 2009) with a custom python script used to apply smoothing via a 10-kb sliding window, with a 5-kb step. Results were presented relative to the mean coverage of all windows containing at least 10 reads.
Heterozygosity levels were calculated by recording the total number of heterozygous and homozygous single nucleotide polymorphisms (SNPs) called for each strain relative to the reference using Varscan (Koboldt et al. 2012). Results were smoothed using a 10-kb sliding window, with a 5-kb step via custom python scripts.

Data availability
All mapped sequence data have been deposited in the NCBI short read archive under the accession number SRP066835 (BioProject accession PRJNA303109). All AWRI-designated strains sequenced in this study are available from the AWRI Wine Microorganism Culture Collection, while the WL-and WY-designated strains are available from Dr. James A. Fraser.

RESULTS AND DISCUSSION
Whole-genome sequence data were generated for 212 S. cerevisiae strains, of which 106 are commercially available strains from nine different yeast supply companies. In addition to these 212 strains, another 24, from a variety of sources, and for which existing wholegenome sequence was available, were used for comparison purposes, resulting in a total of 236 strains for which analysis was performed (Table 1).
A whole-genome maximum-likelihood phylogeny was constructed based upon 1,455,253 bp of genome sequence, which exceeded coverage thresholds for SNP calling in all 236 strains ( Figure 1). The resulting phylogeny displayed very clear stratification, with all but four of the commercial wine strains, and nine of the strains from the AWRI culture n collection, clustering within a large, and highly related, clade containing other strains of either wine, or European origin, which is consistent with previous studies (Fay and Benavides 2005;Liti et al. 2009;Borneman et al. 2011;Dunn et al. 2012). This wine clade was characterized by little overall genetic variation, and the presence of a prominent subclade comprised of a third of all of the wine strains ( Figure 1). This subclade could be further divided into two distinct lineages. The first, comprising the majority, was dominated by strains associated with the Prize de Mousse (PdM) collection of champagne yeasts, such as PDM, EC1118, and N96. The second lineage was dominated by other fructophillic strains, such as Fermichamp, and the Saccharomyces interspecific hybrids S6U and VIN7. Given that the phylogeny may be confounded by attempts at breeding between strains (either natural or as part of a strain development program), IBS analysis was also employed in order to ascertain the pairwise level of relatedness for all 27,730 pairwise combinations of strains ( Figure S1). This data reinforced the highly related nature of the PdM clade, with the majority of the group displaying almost identical genotypes, such that these strains could be considered to have arisen from a single progenitor strain, or n highly interrelated progenitor population ( Figure 1B). The exceptions to this are AWRI1501, NT116, NT202, and NT112, which all display a higher than average amount of IBS2|1 events, normally indicative of parent-progeny relationships. AWRI1501 is a 1n:1n S. cerevisiae · S. paradoxus interspecific hybrid, with the S. cerevisiae parent being a spore of AWRI838 (Bellon et al. 2015). The NT series of wine yeast are also the result of structured breeding, sharing a common PdM-series parent (N96). In addition to NT series strains that fell within the PdM clade, there was a higher degree of relatedness than expected from the structure of the phylogeny between the PdM clade and the strains AWRI1537 (Vin13) and AWRI2897 (ES 181). These two strains, which are adjacent on the phylogeny, display a pattern that, like the NT series, is consistent with these strains being hybrid progeny of a cross between a PdM-clade parent and a second, wine yeast, strain. Like the wine clade, of the 13 commercially available "brewing" strains that were sequenced, the nine ale strains formed a clade that included other known ale isolates (Figure 1). Interestingly, three of the remaining strains (WLP705; sake, WLP099; high gravity ale, and WLP775; cider) were distributed throughout the wine yeast clade ( Figure S1). These out-of-industry positions in the phylogeny were supported by the fact both WLP705 and WLP099 contain wine-specific maker loci, while lacking the pan-genomic hallmarks of true sake yeasts from the Far-East and ale-specific marker loci, respectively (Akao et al. 2011;Borneman et al. 2013) (Figure S2), and suggests that phenotypic spill-over can occur between industries for some strains. While the proportions of non-S. cerevisiae sequences were highly variable, but generally very low, 17 strains were found to contain greater than 10% of at least one chromosome of at least one other Saccharomyces sensu stricto species, with contributions from S. kudriavzevii (n = 10) being observed most frequently ( Figure  2A). Of these 17, 12 appear to contain an almost complete, second non-S. cerevisiae genome (Figure 3). At least five of these (AWRI1501, AWRI1502, AWRI1503, AWRI1504, and AWRI1505), were laboratory generated via rare mating events between a wine strain of S. cerevisiae and other Saccharomyces sensu stricto members for the production of new commercial strains (Bellon et al. 2011(Bellon et al. , 2013(Bellon et al. , 2015. Of interest, despite being used as an ale-brewing yeast, the genome of WLP862 clearly classifies it as lager yeast (S. pastorianus) (Figure 3). The remaining strains all displayed highly variable levels of aneuploidy, with strain NT50, for example, predicted to comprise a tetraploid S. cerevisiae genome with only a single copy of S. kudriavzevii chromosome XIII. Furthermore, both S6U and WLP862 were shown to contain significant genomic contributions from three different species; however, the third species was shown to contribute a minor (10%) portion of its genome (Figure 3).

Pan-genomic analysis
In addition to the strain-specific genomic contributions from Saccharomyces species other than S. cerevisiae, there were also significant intraspecific differences in several loci that have been shown to comprise the accessory, or noncore elements of the pan genome of S. cerevisiae ( Figure 2B). Of these loci, two in particular, the "wine-circle" and the "RTM1-cluster," broadly define strains used for winemaking and brewing, respectively (Borneman et al. 2011(Borneman et al. , 2013. Of the 124 strains found to carry the wine-circle, 111 (90%) are found within the generic wine clade; 35 strains were shown to carry the RTM1-cluster, of which 33 (94%), were from outside the wine clade. All 11 of the brewing strains carried the RTM1-cluster, but lacked the wine-circle. Interestingly, of the 13 strains from outside of the wine-clade that proved positive for the wine circle, 11 (85%) also contain the RTM1cluster. At least four of these are commercial wine yeast strains, and the vast majority were highly heterozygous (see below). This suggests that they represent interclade hybrids between wine and nonwine parental strains that contain marker genes for both clades, and mosaic hybrids such as those commonly observed in natural populations (Liti et al. 2009;Hyma and Fay 2013;Clowers et al. 2015b).
Like the wine-circle, the yeast stress response gene MPR1 (Shichiri et al. 2001) was also shown to be primarily associated with the wineclade. Of the 197 strains that were shown to possess MPR1, 182 were within the wine clade (92% of the wine clade strains), with the remaining 15 in the nonwine group (38%). As for the wine-circle, all of the strains from within the ale subclade were shown to lack MPR1.
Finally, there were several loci, including the biotin-prototrophy genes BIO1 and BIO6 (Hall and Dietrich 2007), that were found almost exclusively in strains of Far-Eastern origin, such as those used for the production of sake. These genetic loci do not appear to have been introgressed into either brewing or winemaking strains, and may provide a source of useful unharnessed genetic variation for future wine yeast strain development.

Figure 2
Genomic content differs across strains. (A) Sequence coverage was used to determine the genomic contribution of sequences from the Saccharomyces sensu stricto group in each strain. Each tile represents one of the 16 chromosomes of each species, except for the S. cerevisiae sequence set, which also contains 26 strain-variable accessory (pan-genome) loci. Strains are ordered according to the genome-wide SNP Enoferm M2 and AWRI796 appear to be genomically equivalent (see below), Lalvin ICV D254 and Uvaferm HPS (which are also equivalent to each other) are distinct from this pair (. 2% IBS0 events), and display no evidence for recent common parentage. This suggests either multiple independent horizontal transfer events occurred with this rare accessory locus, or that the strains shared a past common ancestor, but any clear IBS signal for this event has been lost.
Heterozygosity and genome renewal As S. cerevisiae strains are generally able to undergo sporulation and mating type switching, the formation of homozygous diploids has been postulated to occur frequently in nature, leading to "genome renewal" (Mortimer et al. 1994;Bradbury et al. 2006). In order to determine the level of heterozygosity present in each strain, SNP calls made against S288c were classed as either homozygous or heterozygous based on the frequency of multiple alleles at each nucleotide position in the genome (a minimum frequency of 30% was required to call a heterozygous SNP). Data were then collected for 10 kb genomic windows (5 kb step), in which the proportion of heterozygous and homozygous SNPs were calculated (Figure 4). Levels of heterozygosity ranged considerably across the strains, but also displayed significant variation within the genome of individual strains, with evidence for large blocks of homozygosity present within otherwise heterozygous genomes ( Figure S3).
Using a genome-wide 0.75 quartile cut-off of zero heterozygous SNPs to class strains as homozygous (to account for false negative variant calls), 55 strains were considered to be homozygous, with 15 of these being the result of single-spore isolation prior to sequencing (Liti et al. 2009).
However, even when considering only commercial wine strains, 15% (16 of 106) are predicted to be homozygous. These homozygous strains are likely to be the products of sporulation and selfing (Mortimer et al. 1994), although there may be cases in which specific phenotypes have been introduced via backcrossing. This appears to be the case for AWRI2914 (Maurivin UOA Maxithiol)-a strain that appears genetically similar, albeit homozygous, to heterozygous diploid commercial strains such as AWRI1487 (Lalvin Rhone L2056) and AWRI2928 (Enoferm RP15), but which contains the Irc7 paralog from S. paradoxus (Roncoroni et al. 2011).
Of those strains displaying the highest levels of heterozygosity, the ale yeasts and baking strains figured prominently, which may be due to the common occurrence of polyploidy in these strains. Of the commercial wine yeasts, AWRI2912 (Maurivin Primeur), AWRI2865 (Collection Cepage Pinot), AWRI1493 (71B), and AWRI2906 (Top Floral), were shown to have the highest levels of heterozygosity, and were all located outside of the European wine yeast clade. As mentioned previously, it is likely that some of these strains have undergone recent interclade phylogeny, and colored as in Figure 1A. (B) A detailed display of the S. cerevisiae accessory elements of the pan-genome. Sequences of each locus can be found in File S1, and a high resolution figure, containing strain names is presented in Figure S2. hybridization events, as all contain both the wine-circle and the ale-yeast RTM1 cluster of marker genes ( Figure 2C). Interestingly, there is some evidence that the trade off for the increased genetic diversity afforded by this interclade hybridization, is fermentation robustness, with 71B being considered less robust than most wine strains in some environments (Aranda et al. 2006;Schmidt et al. 2011).

Genomic equivalency and strain redundancy
From the combined genomic data available, it is apparent that, even if the large PdM clade is excluded, there are many yeast strains that appeared genetically identical. For some of these, this was due to multiple, independent isolates of the same strain being sequenced (e.g., AWRI1083, NCYC 738 and AWRI1910, and NCYC 738), or the direct Figure 4 Heterozygosity is commonly observed in S. cerevisiae wine strains. Heterozygosity levels observed in 50-kb sliding windows (25 kb step) across the S. cerevisiae chromosomes (I-XVI) in each strain. Box and whisker plots summaries are shown. Median values are also listed for each strain. The plot for each strain is shaded according to the phylogenetic clades defined in Figure 1A.
derivative of another strain being sequenced (AWRI767, WE 14 and AWRI795, and a spontaneous mutant of WE 14). Using these control comparisons as a baseline for false-positives in the SNP calling protocol, a baseline of 0.05% total IBS0, and 1% total IBS1 events between strains was chosen to reflect strains that show overall genetic equivalence ( Figure S4). By applying these parameters, there were 69 strains that displayed genomic equivalence with at least one other strain ( Figure 5). These could be further divided into 23 distinct equivalence groups, with the largest of these (two in total) being defined by six strains each, and with 13 groups containing at least two independent commercial isolates.
However, despite being genomically redundant at the level of SNP polymorphism, there were seven equivalency groups where one strain displayed a different pattern of accessory loci to the other member(s), with this generally involving a single accessory locus ( Figure 5B). For example, the high throughput SNP pipeline showed that AWRI796, AWRI1494 (Enoferm M2), and AWRI1431 (Lalvin W15) differed only by up to 29 called heterozygous differences, yet Lalvin W15 lacks the entire 45 kb aryl-alcohol cluster ( Figure 5). Likewise, strain AWRI1482, a member of the large equivalence group that includes the commercial strains AWRI1487 (Lalvin Rhone L2056) and AWRI2928 (Enoferm RP15), lacks the MPR1 locus that is present in all other strains of this clade ( Figure 5).
These differences in accessory loci, and the potential for small numbers of SNPs between otherwise redundant strains, likely reflect the variation that can arise during the independent isolation (and the possibility of long-term passaging) of new strains from "identical" progenitor material, or, in limited cases in this dataset, from the isolation of mutant strains from parental populations. The concerted loss of large tracts of DNA, such as the 45-kb aryl-alcohol cluster of AWRI796, supports the view of subtelomeric genomic plasticity leading to high rates of concerted gene gain and loss in these regions (Argueso et al. 2009).

The PdM clade
Within the PdM clade, there were an additional 163 pairs of strains that passed the 0.05% total IBS0, and 1% total IBS1, test for genetic equivalency. Unlike the majority of the nonPdM strains, a continuum of values were observed ( Figure S4), such that there were many more pairs that fell just outside of the threshold. This reinforces the fact that, while the PdM clade has a very recent common ancestry, the highly desirable winemaking traits of PdM yeasts have seen a wide variety of isolates and strain-development programs focusing on these strains. This has resulted in a large collection of similar, but not identical, strains, as highlighted by subtle heterozygosity and pan-genome differences ( Figure 6). Several strains within the group have lost one or more accessory regions relative to the other members of the clade ( Figure  6A), with 11 strains lacking the accessory locus that was first identified at the subtelomeric region of chromosome XV of EC1118 (Novo et al. 2009), and six strains lacking the MPR1 stress resistance gene (Shichiri et al. 2001).
When heterozygosity patterns are examined, there are numerous examples of localized loss-of-heterozygosity (LOH) in members of the PdM clade, with some regions conserved across multiple isolates ( Figure  6B). There is a characteristic LOH event that encompasses most of chromosome IV in N96 (AWRI 1575 and AWRI1697), as well as AWRI1775 and AWRI1762. A smaller LOH event in the same area is also found in AWRI838, AWRI2340, AWRI1638 (Platinum), and AWRI2776 (NT 116). Likewise, there is a conserved LOH on the right arm of chromosome II in at least nine isolates, and the left arm of chromosome VII in another ten strains.
These data point to LOH events, resulting in the loss of SNPs, but also potentially heterozygous subtelomeric accessory genes being a common occurrence across this large, conserved group of highly successful wine yeasts, with the concomitant phenotypic consequences of these large structural changes likely driving differences in their commercial performance.

Conclusions
Despite sequencing a large number of wine strains of S. cerevisiae, including the majority of those that are commercially distributed, all appear to represent a highly inbred population that contains relatively little genetic variation compared to the global pool of S. cerevisiae diversity. Indeed, a large percentage of the strains analyzed in this study fall within one exceptionally related clade. Strain development efforts should therefore be focused on introgressing new variation, from outside of the wine yeast clade, into these economically important yeasts in order to increase the genetic, and therefore phenotypic, diversity that can be exploited in this industry.

ACKNOWLEDGMENTS
Thanks to Chris Curtin for assisting with the selection of strains for sequencing, and Paul Chambers for critical review of this manuscript. This work was supported by Australian grape growers and winemakers through their investment body, Wine Australia, with matching funds from the Australian Government. The Australian Wine Research Institute is a member of the Wine Innovation Cluster in Adelaide.