Metagenomics insights into food fermentations

Summary This review describes the recent advances in the study of food microbial ecology, with a focus on food fermentations. High‐throughput sequencing (HTS) technologies have been widely applied to the study of food microbial consortia and the different applications of HTS technologies were exploited in order to monitor microbial dynamics in food fermentative processes. Phylobiomics was the most explored application in the past decade. Metagenomics and metatranscriptomics, although still underexploited, promise to uncover the functionality of complex microbial consortia. The new knowledge acquired will help to understand how to make a profitable use of microbial genetic resources and modulate key activities of beneficial microbes in order to ensure process efficiency, product quality and safety.


Introduction
Culture-independent techniques have helped to change the way to study food microbial ecology, leading to consider microbial populations as consortia . Further evolution was stimulated by the advent of high-throughput sequencing (HTS) technologies in the mid 2000s, which in the past decade became ubiquitous in microbial ecology studies. HTS entails higher sensitivity compared with traditional culture-independent methods beyond being considered quantitative. Two different HTS approaches can be used. In the most commonly applied option, marker-genes are amplified from genomic DNA (or RNA, after a reverse-transcription step) through PCR and sequenced. Taxonomic relevant genes are usually sequenced through this approach, leading to the description of the phylobiome (van Hijum et al., 2013), that is the taxonomic composition of the microbial community and the relative abundance of its members. In metagenomics or metatranscriptomics studies, no PCR is performed and total DNA or cDNA is sequenced. Besides the taxonomical composition of the community, this approach allows obtaining the abundance of all microbial genes. The output of metagenomics analysis, based on DNA sequencing, are the potential activities of the microbial communities, because a specific gene may not be really expressed in that condition or because DNA may arise from dead or metabolically inactive cells. In order to identify the genes actually expressed in a food sample, RNA sequencing (RNAseq) is the most appropriate path to take, which is what happens in metatranscriptomics.
The study of microbial ecology is relevant in biotechnology as it is the basis for not only the development of fermentations but also for the comprehension of the microbial interactions that drive a premium quality process. The availability of such a powerful tool box offers tantalizing opportunities to study food microbes understanding how their potential functions can be changed or modulated with the ultimate scope of improving food quality.

Monitoring microbes in food fermentations
Amplicon-based HTS targeting genes of taxonomic relevance has become the most widely exploited approach in food microbial ecology. In the past decade, it was widely used to monitor microbial communities during fermentation of different types of foodstuffs and beverages (Fig. 1). Table 1 reports an extensive, although not complete, list of them. Several questions can be addressed by the description of microbial communities during fermentations. An in-depth characterization of the normal or abnormal microbial consortia at different stages of fermentations is important in order to evaluate lot-to-lot consistency, identify biomarkers for product quality or spoilage, and learning how to manipulate fermentation conditions to improve the process control.
Although fungi can be very important in some kinds of food fermentations, there is a huge difference in the number of published studies describing fungal and bacterial communities through HTS (Fig. 1). While 16S ribosomal RNA (rRNA) is the common choice for bacteria, more variability in the target gene to use was highlighted for fungi (Table 1). The most used target is the internal transcribed spacer (ITS), most of all thanks to the presence of a well-curated database (https://unite. ut.ee). However, the uneven ITS length among species may promote preferential amplification of shorter fragments during the PCR step and therefore higher number of sequences for those OTUs with shorter ITS fragment. Since the OTU abundance is proportional to the number of reads obtained, this may lead to an incorrect estimation of OTU abundance (Bokulich and Mills, 2013b;Ercolini, 2013;De Filippis and Ercolini, 2016). Therefore, the use of different targets would be also advisable, such as the 26S (Garofalo et al., 2015;Stellato et al., 2015;Wang et al., 2015) or the 18S rRNA (Liu et al., 2015b;Minervini et al., 2015) genes.
Although PCR-dependent, the amplicon-based HTS is considered quantitative: the number of reads obtained for each operational taxonomic unit (OTU) is proportional to the abundance of that OTU in the sample and the higher sensitivity allows identifying also sub-populations previously difficult to detect. However, the data generated need to be interpreted with the awareness of culture-independent PCR biases that have been reviewed elsewhere (Sipos et al., 2010), such as the possibility of preferential amplification, due to the different efficiency of the primer towards selected species, that may result in the under-representation of some clades (Pinto and Raskin, 2012).
In several occasions RNA was preferred as target instead of DNA, that may arise from dead or inactive     (2015) 16S rRNA gene ( . Since RNA is more easily degraded, it allows obtaining a structure of the microbiota that is potentially ascribable to the viable populations. Nevertheless, the treatment of food samples with propidium monoazide (PMA) was recently proposed as an alternative (Erkus et al., 2016) to the use of RNA templates to describe live microbial populations. PMA selectively binds to DNA freely present in the matrix, arising from dead or compromised cells, but cannot enter intact cell membranes. Therefore, treatment of the sample with PMA before DNA extraction inhibits its amplification by PCR and allows obtaining a profile of the viable microbial community (Erkus et al., 2016). Notwithstanding the thick body of literature accumulated on food microbial communities assessed by amplicon sequencing, most of the studies are basically descriptive. In addition, well-known microbial players have been identified and thus limited new information was provided on food fermentative processes.
However, thanks to the extensive amount of 16S rRNA gene data generated by amplicon-based food microbiota descriptions, there currently is an unprecedented possibility of data sharing. Sequence data can be made available through public databases, e.g. the Sequence Read Archive of the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/ Traces/sra) or the European Nucleotide Archive of the European Bioinformatics Institute (http://www.ebi.ac.uk/ ena). In this way, researchers from all over the world can easily access datasets generated by other laboratories, re-analyse and use them for meta-studies. FoodMicrobionet (http://www.foodmicrobionet.org, Parente et al., 2016a) was recently launched with the aim to collect the results obtained in different HTS studies of food microbial ecology and to provide an easy-to-use tool for visualization and comparison of the microbiota in diverse foodstuffs. Samples are classified using the Foo-dEx classification, as suggested by the European Food Safety Authority (http://www.efsa.europa.eu/en/data/datastandardisation). Further metadata include the nature of food (raw, intermediate, finished), the use of thermal treatments and the occurrence of spoilage and/or fermentation. Users can easily extract subsets of samples for the food matrix of interest, recreate abundance tables in long and wide format and visualize them in a network or use in comparative studies. As an example, graphical representations of the network structure of dairy samples extracted from Foodmicrobionet (v. 2.0) are provided here (Fig. 2). Figure 2A, a bipartite network representation, shows high abundance of starter lactic acid bacteria (SLAB) in intermediates and undefined starters coming from the production of different types of cheeses, while non-starter LAB (NSLAB) display higher abundances in ripened cheeses, with the exception of spoiled products. Such separation is even more evident in the network shown in Fig. 2B, where long-ripened (> 30 days) clustered apart from fresh/mid-ripened cheeses and intermediates of production (natural whey cultures, fermented curds).
Interesting and relatively new application in the amplicon-based HTS is its use for the monitoring of microbial populations beyond the species level. Genes with high polymorphism within a species can be selected for this application, potentially allowing discrimination between different strains within a species . In spite of the limitations due to possible sequencing errors, this application was recently exploited for monitoring Streptococcus thermophilus in whey and milk cultures and in cheeses, using lactose permease (lacS, De

Exploring microbial functions directly in food
Shotgun metagenome or metatranscriptome sequencing offers several advantages over amplicon-based approach (Fig. 3). Whole genomes are sequenced directly (often after fragmentation and library preparation), without any prior PCR step, avoiding the possibility of amplification biases. Moreover, a picture of the entire microbial community can be obtained, tracking and comparing the abundance of bacteria and other organisms at the same time, although the methods have to be chosen carefully, in order to avoid preferential nucleic acids extractions from bacterial cells. The exceptional advantage of shotgun sequencing is the possibility to monitor the abundance of microbial activities directly in the food matrix, and hence to collect information on the genetic capacity of the whole microbial community. Moreover, it is possible to recover complete or draft microbial genomes from the metagenomes, achieving in this way a strain-level resolution. Although metagenome and metatranscriptome sequencing potentially give a much higher amount of information their cost is still substantially higher compared with the amplicon-based approaches. Additional technical precautions are needed in metatranscriptomics studies. For example, when sequencing messenger RNA (mRNA), all efforts are needed in order to 'freeze' the gene expression at sampling and protect mRNA from degradation by using stabilizing solutions, since microbial mRNA has extremely short half-life (Rauhut and Klug, 1999;Deutscher, 2006). In addition, mRNA is only a small percentage of the total RNA extracted from the sample, which is mainly rRNA. Therefore, a further step is usually necessary, in order to reduce the amount of rRNA by bacterial or eukaryotic rRNA depletion. Finally, the bioinformatics analysis of shotgun data is much more computationally intensive and complex compared with amplicon sequencing, since well-defined pipelines have not been implemented yet. The above reasons have limited the application of metagenomics and metatranscriptomics to only few pioneer studies, again mainly focused on dairies. Insights about microbial activities involved in cheese ripening process were provided. Monitoring of microbial gene expression during ripening of a Camembert-type (Lessard et al., 2014) and a surface-ripened cheese (Dugat-Bony et al., 2015) by metatranscriptomics revealed the presence of microbial genes involved in flavour production from amino acids and highlighted that these activities were enhanced in the first phase of ripening. Moreover, Monnet et al. (2016) reported differences in amino acid catabolism between the two yeasts inoculated on a Reblochon-type cheese: Geotrichum candidum reached a maximum expression of amino acids-related genes in the first phase of ripening, while Debaryomyces hansenii took over after 30 days, highlighting a different role of these yeasts in flavour production. Indeed, Wolfe et al. (2014) suggested that pathways leading to the production of flavouring compounds, such as those involved in branched-chain and sulfur-containing amino acid degradation, were enriched in washed-rind compared with natural-and bloomy-rind cheeses metagenomes. These studies provided an in-depth analysis of the cheese maturation process and allowed to better understand the metabolic activities of the different community members and their possible interactions. Indeed, cheese microbial communities were proposed as simplified model to study both microbial assembly and metabolism (as affected by abiotic factors), providing a possible interesting model for understanding microbial dynamics in more complex environments (Wolfe and Dutton, 2015). In a recent example, the evolution of bacterial activities during manufacturing and ripening of a traditional cheese made with an undefined starter was monitored by metatranscriptomics highlighting how to manipulate microbial gene expression through modifications of the process parameters ). An increase in ripening temperature was shown to foster NSLAB growth and the expression levels of genes related to proteolysis, lipolysis and to the production of flavours from amino acids, thus promoting cheese ripening. These kinds of studies help to shed light on microbial community dynamics involved in fermentative processes and suggest how microbiome responses can be modulated in order to optimize production efficiency and product quality.

Targeting microbial genomic variability for food fermentation processes
Technologically relevant microbial species and strains are extensively used alone or in mixed cultures in industrial processes. While allowing a better control and a standardization of the process, the use of selected cultures can lead to a loss of microbial diversity and flattening of the sensorial properties of the final products. Recent advances in microbial genomics are making it possible to exploit the wealth of genetic and phenotypic variability existing among food-fermenting microbes (Smid and Hugenholtz, 2010). Thanks to the reduction in sequencing costs, whole genome sequencing of several strains from the same species is now a common practice. Pangenomics (the pool of essentially different genes found within a species) of common food-borne microbes highlighted important genetic differences, which can be taken into account in order to select the best combination of species/strains for microbial industrial starters. Moreover, recently developed bioinformatics tools consent to recover microbial genomes directly from metagenomes, allowing strain-level monitoring during the process and genomic comparison (Eren et al., 2015;Scholz et al., 2016). Comparing the genomes of five Lactococcus lactis strains (two wild-type of non-dairy origin and three dairy isolates), only about 60% of the genes were found to be shared among them and hundreds of accessory genes were not present in the dairy isolates (Siezen et al., 2008). Some of these coded for industrial-interesting functions, such as exopolysaccharide (EPS) production, sugar and amino acids metabolism. Pangenomics may also help to understand microbial evolution and adaptive mechanisms to different food niches. The genome of a Lactobacillus delbrueckii sub. bulgaricus strain industrially used for yoghurt production was compared with other collection strains of the same subspecies and genomic traits conferring higher efficiency in industrial fermentations were highlighted (Hao et al., 2011). The industrial strain showed higher efficiency in EPS production and well-equipped stress tolerance, explaining its initial selection for yoghurt production. Moreover, its effective proteolytic system clarified the protocooperation mechanism with Strepococcus thermophilus (Hao et al., 2011).
Microbial interactions may be also elucidated. Coupling metagenomics and pangenomics, Erkus et al. (2013) characterized an undefined cheese starter culture with a long history of use. Although only two species were detected, high level of diversity at strain level was found and genome comparison suggested that a kill-the-winner mechanism subsisted: the phage sensitivity of the fittest strain was density-dependent, preventing the eradication of other genetic lineages during back-slopping regimes.
Comparative genomics of industrial strains is providing a richer and deeper understanding of the genetic composition and variability in these important microbes, promising to rapidly identify genetic loci that shape industrially significant traits. This will enable the development of a genetic catalogue of strains, tailoring starter composition at strain level to meet specific demands.

Conclusions
Food fermentations are often complex phenomena, involving several microbial species and strains. We have learned how to identify and quantify them as well as to study their traits on the bases of state-of-the-art methodologies. As discussed, the most widely used application in food microbiology is the use of amplicon-based sequencing, leading to an in-depth description of the ecosystem studied. This can be undoubtedly useful in order to understand microbial dynamics and evolution during food production, as well as to identify the presence of possible spoilers. Nevertheless, the real advance led by HTS is the application of shotgun metagenomics and metatranscriptomics. These approaches are still underexploited in food microbial ecology. Their application to food fermentations may be extremely useful in order to explore microbial functions directly in the food matrix and understand microbial behaviour in response to different process conditions. Moreover, recovering microbial genomes from the metagenomes allows to monitor the evolution of different strains during the process and to compare their genomic potential. These tools promise to be an invaluable help to better understand and possibly tune microbial activities in order to ensure process efficiency, product quality and safety.

Conflicts of interest
None declared.