Contrasting genes conferring short- and long-term biofilm adaptation in Listeria

Listeria monocytogenes is an opportunistic food-borne bacterium that is capable of infecting humans with high rates of hospitalization and mortality. Natural populations are genotypically and phenotypically variable, with some lineages being responsible for most human infections. The success of L. monocytogenes is linked to its capacity to persist on food and in the environment. Biofilms are an important feature that allow these bacteria to persist and infect humans, so understanding the genetic basis of biofilm formation is key to understanding transmission. We sought to investigate the biofilm-forming ability of L. monocytogenes by identifying genetic variation that underlies biofilm formation in natural populations using genome-wide association studies (GWAS). Changes in gene expression of specific strains during biofilm formation were then investigated using RNA sequencing (RNA-seq). Genetic variation associated with enhanced biofilm formation was identified in 273 genes by GWAS and differential expression in 220 genes by RNA-seq. Statistical analyses show that the number of overlapping genes flagged by either type of experiment is less than expected by random sampling. This novel finding is consistent with an evolutionary scenario where rapid adaptation is driven by variation in gene expression of pioneer genes, and this is followed by slower adaptation driven by nucleotide changes within the core genome.


INTRODUCTION
Listeria monocytogenes is an opportunistic non-fastidious bacterium that causes serious invasive disease in humans and animals, known as listeriosis [1,2].Infection with L. monocytogenes is caused primarily by the consumption of contaminated food products, with infections displaying the highest hospitalization rate of any food-borne pathogen, at 92 % of cases [3], and a high mortality rate, at 20-30 % of cases [4].In fact, L. monocytogenes is responsible for the third most deaths caused by any food-borne pathogen in the USA, despite infections only accounting for a small minority of food-borne disease cases.This problem is only expected to increase, as cases of L. monocytogenes have continued to rise in North America and Europe over the last decade [5,6].
L. monocytogenes is commonly isolated from animal hosts (cattle, sheep and birds) [7][8][9][10] and their food products (raw meat, pasteurized milk, fish) [11][12][13], as well as environmental sources, such as soil, vegetables, sewage and other water sources [14][15][16][17][18]. Consumption of any of these food sources can lead to infection of the host via translocation of the intestinal barrier, infection of the lymph nodes, and the pathogen subsequently entering the spleen and liver [19].Infection of healthy individuals frequently results in febrile gastroenteritis, a non-invasive disease manifesting within a day of ingesting contaminated food [20].However, in immunocompromised patients unrestricted proliferation of L. monocytogenes can cause bacteraemia, leading to invasion of additional organs, especially the brain and spinal cord, causing meningitis [21].
The population structure of L. monocytogenes reveals four phylogenetic lineages that are distinct in terms of their pathogenicity and host specificity.Lineages I and II are associated with food and human outbreak samples whereas lineages III and IV are found in a broader range of environments and are usually isolated from animal sources [22].Isolates of lineage I have been most often isolated from epidemic listeriosis samples whereas lineage II is associated with food and related food production environments [23,24].There are several hundred clonal complexes (CCs) identified for L. monocytogenes.Historically CC1 and CC2 have been the most prevalent isolates [25].However, in recent years CC1 and CC2 have been found less often in datasets and previously non-typical isolates, especially CC5, CC6 and CC121, are rising in frequency [26].The Listeria serotyping scheme has also identified 13 serotypes of L. monocytogenes [25,27].Most human cases of listeriosis are sporadic and tend to involve isolates of serotypes 1/2a, 1/2b and 4b, with serotype 4b being an important contributor (~36 % of cases) [28][29][30].
Multicellular biofilm formation is a key feature for bacterial tolerance to adverse environmental conditions [31,32].In L. monocytogenes, biofilms promote surface adherence [33,34] and persistence in food production environments [35][36][37][38], even enabling survival and replication in refrigeration conditions [39][40][41] and resistance to antimicrobials, including disinfectants [42][43][44][45].This is a major public health concern since food contamination can result in fatal listeriosis outbreaks following consumption [46,47].Interestingly, biofilm-forming strains are structurally and physiologically different from their planktonic counterparts, with the former being more coccoid than rod shaped, and displaying increased antibiotic resistance.Furthermore, biofilms formed by L. monocytogenes strains found in food industries are thicker than those formed by sporadically occurring isolates [48].Therefore, there is clear incentive to understand the genetics of biofilm formation.
Genes involved in cell motility, surface adhesion and expression of extracellular polymers are all linked to L. monocytogenes biofilm formation [48][49][50].However, biofilms are a complex phenotype that can be underpinned by numerous micro-evolutionary changes to the genome operating over different timescales.For example, short-term genetic changes may include mutations affecting gene expression such as phase variation, which is a rapid and reversible mechanism by which repeat regions can switch a particular gene between expression states [51][52][53].Phase variation within the inlA gene has been implicated as a mechanism of niche adaptation and virulence attenuation in L. monocytogenes [54].In contrast to rapid adaptation and expression state switching, de novo mutation and horizontal gene transfer (HGT), usually considered infrequent in L. monocytogenes, can also introduce genetic variation conferring novel functionality [25,55].However, these processes take place over longer timescales than phase variation.
High-throughput sequencing and growing collections of sequenced isolates provide opportunities to investigate the genetic basis of complex traits such as biofilm formation.Comparative transcriptomics and genome-wide association studies (GWAS) provide an effective method to identify associations between genetic variants and phenotypes [56][57][58].In this study we use a quantitative population genomics approach to investigate the genes involved in short-and long-term adaptation to biofilm formation in L. monocytogenes.

Bacterial sampling
The isolate collection comprised 868 L. monocytogenes isolates from a variety of sources (Table S1, available in the online version of this article).These included 760 publicly available whole genome sequences from the Institut Pasteur database [59,60] and

Impact Statement
Listeria monocytogenes is a problematic food-borne bacterium that can cause severe illness and even death in humans.Some strains are known to be more common in disease and biofilms are crucial for survival in the environment and transmission to humans.To unravel the genetic basis of biofilm formation, we undertook a study employing genome-wide association studies (GWAS) and gene transcription profiling.We identified 273 genes associated with robust biofilm formation through GWAS and discovered differential expression in 220 genes through RNA sequencing.Statistical analysis revealed fewer overlapping genes than expected by chance, supporting an evolutionary scenario where initial adaptation relies on gene expression variation, followed by slower adaptation through genetic changes within the core genome.
108 isolates sampled in Japan and sequenced as part of this study.Of the Japanese samples, 41 were sampled from food products, specifically 10 from chicken, 9 from beef, 18 from pork, 3 from salad products and 1 from fish, 35 isolates were sampled from agricultural sources (cow and colostrum), 28 isolates were sampled from human clinical cases, and 4 isolates were from environmental samples.

Genomic DNA extraction and whole genome sequencing
L. monocytogenes isolates were cultured in Todd-Hewitt broth at 37 °C for 18-24 h, and genomic DNA was extracted using the QIAmp DNA Mini Kit (Qiagen), according to the manufacturer's instructions.DNA was quantified using a Nanodrop spectrophotometer before sequencing.Genome sequencing was performed on an Illumina MiSeq sequencer (Illumina) using the Nextera XT Library Preparation Kit with standard protocols.Libraries were sequenced using a 2×300 bp paired-end v3 reagent kit (Illumina, following the manufacturer's protocols).Short-read paired-end data were filtered, trimmed and assembled using the de novo assembly algorithm, SPAdes (version 3.7) [61].The average number of contigs in 108 L. monocytogenes genomes was 564 (range: 37-3790) for an average total assembled sequence size of 2.91 Mb (range: 2.00-3.71).The average N50 was 109 313 (range: 958-583 650) and the average GC content was 38.2 % (range: 37.3-43.0).Two datasets were constructed.The first, including all isolates (n=868), and a second dataset, excluding 12 low-quality genomes (n=854) identified using CheckM [62].CheckM is a quality control software that provides an estimate of genome assembly completeness and contamination.Both datasets were analysed to test how genome quality impacted on our findings.We found that this did not have a major impact on our findings (Supplementary methods), and specifically the same biofilm-associated genes were identified.An overview of assembly information is provided in Table S1.Genomes and short-read data are archived on the NCBI GenBank and SRA depositories, associated with BioProject accession #PRJNA971143.

Phenotype testing
L. monocytogenes is commonly isolated from diverse environmental sources and can persist in harsh conditions.To understand this persistence, we sought to quantify aspects of biofilm formation.Biofilm formation was measured by a semi-quantitative adherence assay using 96-well tissue culture plates [63,64].Aliquots (5 µl) of an overnight culture were used to inoculate 200 µl of liquid media and plates were then incubated on a moving platform at either 30 or 37 °C for 48 h.Culture medium was removed and wells washed with PBS.Bacteria adhering to the plate were fixed with 150 µl of Bouin's solution before being washed with PBS.Plates were left to dry and stained with 150 µl of 0.1 % (w/v) crystal violet for 5 min.Spectrophotometric measurements were taken at OD 600 in a 96-well plate reader and the average of three replicates was calculated (BMG Omega).

Pangenome archiving and phylogenetic tree reconstruction
Whole genome assemblies were annotated using Prokka (version 1.12) [65].Annotated assemblies were processed by the PIRATE pan-genome pipeline (version 1.0.4)[66] to identify clusters of orthologous genes (COGs).Gene sequences were translated to amino acid sequences and COGs were defined by PIRATE using a wide range of amino acid percentage sequence identity thresholds for Markov cluster algorithm (MCL) clustering (45,50,60,70,80,90,95,98). The pangenome of all 868 L. monocytogenes isolates contained 23 637 COGs.The pangenome of 108 L. monocytogenes isolated in Japan contained 11 202 COGs.Genes shared by >95 % of all isolates were defined as part of the core genome.Concatenated core genome alignments were produced gene-by-gene using MAFFT (1642 genes, 1656 306 bp) and the phylogenies were inferred from these alignments using a maximum likelihood (ML) algorithm using RAxML (version 8.2.4) [67] with the generalized time-reversible (GTRGAMMA) substitution model.The resulting phylogenies and core-genome alignments were used as input to ClonalFrameML (version 1.12) [68] to account for recombination taking place within the core genome.

RNA preparation for RNAseq
To understand changes in gene expression involved in biofilm formation, we performed a differential gene expression analysis of L. monocytogenes in a planktonic state, whereby bacteria were grown in shaken media to prevent the formation of any biofilms, versus bacteria in a biofilm-forming state.The transcriptome of isolate Lm0132 was analysed.Lm0132 is a clinical isolate belonging to one of the most common L. monocytogenes complexes of lineage I, CC2.The biofilm thickness of Lm0132, recorded as a relative measure at OD 600 , was 0.34 and 0.38 at 30 and 37 °C, respectively.Strains were cultured in Todd-Hewitt broth at 37 °C and RNA was extracted at 24 h, during exponential growth phase, to maximize capture of genes involved in establishing biofilms.RNA was isolated using the SV RNA isolation kit (Promega) according to the manufacturer's instructions.23S and 16S rRNA was depleted using a MicrobExpress kit (Ambion) and genomic DNA digested using amplification-grade DNase I (Invitrogen).RNA was reverse-transcribed using random primers (Invitrogen) and Superscript III (Invitrogen) at 45 °C for 3 h and denatured at 70 °C for 15 min.Nextera XT libraries were constructed and sequenced with over 30 000 reads per sample (minimum estimated coverage above ×107).

Differential gene expression analysis
Illumina sequence reads containing expression data for Lm0132, and two independent RNA samples for each condition, were mapped to a reference transcriptome index created using Salmon (version 1.9.0).Gene expression data were mapped against a genome reference F2365 [69], to maximize the accurate characterization of differentially transcribed genes.Gene expression was calculated in units of transcripts per million (TPM), a relative abundance measure used for downstream analysis: DESeq2 in Bioconductor [70,71].Briefly, raw counts were normalized using size factors to account for differences in library depth.Counts were further log-transformed using the variance stabilizing transformation function.Gene-wise dispersion was calculated, and the estimates were reduced using empirical Bayesian shrinkage to avoid biological variance.Hypothesis testing was applied using Wald's test across the negative binomial distribution.Differences in fold-change values were calculated by determining the log 2 fold-change (LFC).A stringent cut-off of LFC<−2 and/or >2 was used to identify differentially expressed genes.An adjusted P-value was calculated for every gene with P<0.05 being considered significant.Reads from strain Lm0132 taken from the 24 h shaken media were compared with reads taken from 24 h stationary growth.

Genome-wide association studies
GWAS was performed to test for genetic variation associated with biofilm formation.Genetic variation within the L. monocytogenes pangenome was covered by extracting variable-length k-mers from the assemblies, known as unitigs.Pyseer (version 1.3.6)[72] was used to fit an elastic net model to test the association between unitigs and biofilm formation at both 30 and 37 °C.The elastic net uses regularized linear regression to simultaneously test the association of all the genetic variants with the phenotype and reports the variants predicted to be causal for the phenotype [73].The lasso default correlation filter of 25 % was used.Only unitigs with a minor-allele frequency (MAF) of >2 % were selected.In total, 872 250 unitig sequences were generated from the L. monocytogenes pangenome; 436 125 filtered variants were removed from the analysis; and 436 125 variants were tested for their association with the phenotype.Despite the elastic net's implicit control for the effects of population structure by distinguishing genetic variants that are causal for the phenotype from those inherited from the common ancestor (lineage effects), an explicit correction was also applied by using a genetic relatedness matrix calculated from the inferred population phylogeny to adjust the P-value of association variants.The phylogenetically corrected P-values were used by a suite of tools to interpret the contribution of genetic variation highlighted by the model.To infer gene function, unitigs were mapped to the L. monocytogenes F2365 complete reference genome (assembly accession: NC_002973.6)using BWA-MEM (version 0.7.17) [74].

Gene function determination
To determine gene function the KEGG database was used [75,76].Briefly, custom python scripts were employed to interface with the database by iteratively querying all L. monocytogenes schemes until a match was identified.KEGG information was downloaded via the KEGG database API and the resulting text file was parsed and gene functions were extracted.Where a gene had more than one function, all functions were extracted.

L. monocytogenes populations are highly structured
We investigated the epidemiology and whole-genome sequences of L. monocytogenes isolated from multiple sources in Japan.A total of 108 isolates were retrieved from diverse sources, including clinical, food production and agricultural sources.These isolates were tested for their biofilm production and underwent whole-genome sequencing (Table S2).In addition, 760 isolates from public databases were included in our analyses to provide context and an indication of the natural population structure of L. monocytogenes (Table S1).
Core-genome phylogenetic analysis revealed the population structure of L. monocytogenes (Fig. 1).The ML core-genome phylogeny was highly structured and included the major phylogenetic lineages I, II, III and IV.In total the dataset consisted of 277 unique sequence types (STs), of which 152 belong to 82 CCs and 125 of which did not belong to a previously defined complex.Despite our collection of samples all being sourced from Japan, the 108 isolates are representative of the diversity shown in the clinically relevant lineages I and II and were classified into 33 STs from 27 CCs.However, the addition of 760 publicly available samples revealed increased diversity, due the addition of 137 lineage III isolates, which are not observed in the Japanese isolate collection.Lineage III is the most genetically diverse lineage and included 124 STs, of which 21 belonged to 16 CCs, and 103 of which did not belong to a previously defined CC.Furthermore, the additional lineage I and II isolates also included STs not present in our Japanese collection, with 38 and 80 new STs being added for each, respectively.

Biofilm formation varies among L. monocytogenes isolates
To model the effect that changes in ambient temperature have on biofilm thickness, the extent of biofilm formation was measured at 30 and 37 °C from the absorbance value (OD 600 ) of adhered cells following fixation and staining with crystal violet (Fig. 2).Absorbance values recorded at 30 °C were significantly greater than at 37 °C with a mean reading for biofilm thickness of 0.489 (range: 0.242-0.742)and 0.413 (range: 0.208-0.671),respectively (P<0.0001).Of the 108 isolates studied, 78.7 % (n=85) produced more biofilm at 30 °C than at 37 °C.
Enhanced biofilm-forming ability was associated with multiple isolation sources.Isolates sampled from agricultural sources had significantly greater biofilm thickness than isolates sampled from clinical sources and meat products at both temperatures.Biofilm formation was also significantly greater in isolates sampled in Hokkaido, which consisted of samples from agricultural sources (n=28) and very few clinical samples (n=2) compared with those isolated in Tokyo, which mostly consisted of isolates sampled from the clinic (n=22) and meat production environments (n=35) and did not include any agricultural samples.Enhanced biofilm-forming ability was observed in isolates of lineage II compared with isolates of lineage I at 30 °C, but this difference in biofilm thickness was not observed between lineages at 37 °C.Further investigation of the biofilm-forming ability by CC revealed an unequal distribution; isolates belonging to CC7 and CC379 had significantly greater biofilm thickness than other complexes.The fact that CC7 belongs to lineage II, whilst CC379 belongs to lineage I might explain why the difference in biofilm-forming ability between the phylogenetic lineages was not consistent between temperatures.
Biofilm formation was also compared among isolates from the most prevalent serotypes (1/2a, 1/2b and 4b).Serotypes 1/2a and 1/2b showed significantly greater biofilm thickness compared with 4b isolates at both temperatures.However, it was less clear whether there was a difference in biofilm-forming ability between serotypes 1/2a and 1/2b, with 1/2a isolates showing greater biofilm thickness at 30 °C and 1/2b showing greater biofilm thickness at 37 °C.

L. monocytogenes biofilm formation is not explained by phylogenetic structure
An ML phylogeny was reconstructed from the core-genome alignment of 108 Japanese isolates and the biofilm-forming ability at 30 and 37 °C was mapped onto the tree (Fig. 3).The phylogeny revealed that biofilm-forming ability varied across the L. monocytogenes population, with high biofilm-forming strains occupying divergent phylogenetic positions.Except for CC7 and CC379, biofilm-forming ability was not strain-associated.Furthermore, there was a positive correlation between the biofilmforming ability of isolates at 30 and 37 °C, indicating that biofilm formation was independent of temperature across this range, i.e. individual isolates capable of forming thick biofilms at 30 °C also formed thick biofilm at 37 °C.This evidence implies that there are common genomic determinants underlying L. monocytogenes biofilm formation.

GWAS identifies genetic elements associated with biofilm formation
The genetic association of enhanced biofilm formation in L. monocytogenes was investigated at 30 and 37 °C in 108 Japanese isolates demonstrating a range of biofilm-forming phenotypes (Fig. 4).An ML phylogeny of the 108 isolates was inferred using a core-genome alignment and 436 125 variable-length unitig k-mers generated from the pangenome of these isolates were tested for their association with the biofilm formation phenotype.In total, 4079 and 4019 unitigs were identified by GWAS to be associated with biofilm formation at 30 and 37 °C, respectively.A Bonferroni-corrected significance threshold of −log 10 (P)=6.97 was calculated as the genome-wide significance level, and the suggestive significance threshold of −log 10 (P)=3.97 was used.To reveal genetic regions and genes associated with biofilm formation, k-mers were mapped to the reference L. monocytogenes F2365 genome (Tables S3 and S4).The KEGG database resource API [75,76] was used to derive gene functions from the L. monocytogenes database (Table S5).
At 30 °C there were five biofilm-associated k-mers exceeding the genome-wide significance threshold and 230 biofilm-associated k-mers exceeding the suggested significance threshold.These k-mers mapped to three and 114 genes, respectively (Table S3).At 37 °C there were 46 biofilm-associated k-mers exceeding the genome-wide significance threshold and 462 biofilm-associated k-mers exceeding the suggested significance threshold.These k-mers mapped to 18 and 181 genes, respectively (Table S4).In total, 676 unique k-mers were found to be significantly associated with biofilm formation; these mapped to 273 genes, including 22 genes shared by both 30 and 37°C conditions.They included genes that have previously been shown to have a functional association with biofilm formation, such as cell wall or cell membrane biogenesis [77], motility [49] and protein glycosylation [78].Additionally, genes with the potential to influence biofilm formation in L. monocytogenes included genes putatively involved in energy production and conversion, transport and metabolism of amino acids, carbohydrates, lipids and nucleotides [79][80][81][82].
Genes identified by GWAS have diverse roles.Multiple genes involved in the biosynthesis of amino acids were found to be associated with biofilm formation at both temperatures, suggesting that biofilms may increase the production of enzymes involved in producing scarce amino acids.These include aroC and aroE, which are involved in the synthesis of aromatic amino acids via the shikimate pathway.AroC has previously been demonstrated to affect biofilm formation in Escherichia coli, Streptococcus mutans and Edwardsiella tarda, and is essential for the production of intermediates, such as indole, that affect flagellar synthesis and biofilm formation [83][84][85].AroE has also previously been found to be associated with biofilm adaptation in L. monocytogenes [86].Another important mechanism by which genes involved in amino acid metabolism may be utilized in biofilm adaptation includes stress tolerance.The transcriptional repressor argR is involved in arginine biosynthesis and is induced by the central virulence regulator PrfA, known to have an important role in the capacity of L. monocytogenes to withstand low pH [87].Furthermore, argR previously been demonstrated to affect the biofilm forming ability of Streptococcus gordonii [88,89].Other genes involved in amino acid metabolism that were flagged include gcvT [90], hisJ [91] and thrB [92], which have all previously been demonstrated to affect biofilm formation in other bacterial species.
In L. monocytogenes, flagellum-mediated motility is critical for attachment to surfaces and subsequent biofilm formation [49]; for example, fliP shows increased expression in L. monocytogenes biofilm-forming isolates compared with planktonic isolates Our findings support this as variation genes involved in cell motility appeared to play an important role in biofilm formation.The major virulence determinant gene, actA, was found to be associated with biofilm formation at both temperatures.ActA has previously been shown to be critical in L. monocytogenes biofilm formation [94].In addition, the flagellar motor switch gene (fliM) was also identified [90].Furthermore, additional genes involved in cell motility were found to be associated with biofilm formation at 30 °C (fliF, fliR, flgE, flgK, motA and motB), further emphasizing the importance of motility in the ability of L. monocytogenes to form biofilms.
Genes involved in cell wall or cell membrane biogenesis are essential for biofilm formation.Indeed, multiple genes flagged by the GWAS are already known to be involved in biofilm formation.The dltB gene has been shown by mutagenesis studies of L. monocytogenes to have an important role in biofilm formation [95].d-Alanylation of extracellular lipoteichoic acids appears to be important in maintaining an appropriate surface charge for bacterial attachment to abiotic surfaces.Similarly, inlH is a stress-induced virulence gene which produces a surface protein that facilitates L. monocytogenes survival in tissues.Inactivation of inlH has been shown to enhance biofilm formation in L. monocytogenes [96].

Differential expression of biofilm-associated genes
The differential expression of genes involved in biofilm formation was investigated by comparing the transcriptome of L. monocytogenes in shaken culture versus static growth.Differential transcription during biofilm formation is the result of a combination of biofilm-related factors including stress tolerance, such as heat, oxygen and pH tolerance, resistance to antibiotics, motility, and growth-related factors.Our analysis found a total of 220 genes were differentially expressed between conditions with a log 10 -fold change >2 and a significance value of P<0.05.A reduction in gene expression was seen between the planktonic state and biofilm formation phase of bacterial growth in 160 genes, whilst 60 genes increased in expression (Table S6).The function was determined for all genes (Table S5).
Genes involved in cell wall or cell membrane biosynthesis (dltB, dltC and galE) had changes in gene expression associated with biofilm formation.The dlt genes were down-regulated in the biofilm state, whereas transcription of galE was up-regulated.Genes from the dltABCD operon have previously been implicated in biofilm formation in L. monocytogenes and other bacterial species [95].A galE mutant of Porphyromonas gingivalis was shown to have enhanced auto-aggregation and biofilm formation by increasing surface hydrophobicity, but this phenotype was only partially replicated in a galE mutant of Escherichia coli [97,98].Similarly, multiple genes involved in cell division showed changes in gene expression.The genes ftsX and mreB both showed a large increase in transcription in the biofilm phase.In L. monocytogenes the cell division gene ftsH shows increased expression in isolates forming a biofilm compared with isolates in a planktonic state [93].In addition, in Fusobacterium nucleatum, deletion of ftsX abolished biofilm formation [99].Mutation to the ftsEX operon in Bacillus veleznsis also led to defective biofilm formation [100].Furthermore, the ftsX gene is down-regulated in L. monocytogenes in response to high hydrostatic pressure, suggesting the cell-division protein has a role in adaptation to stressful conditions [101].The bacterial actin homologue mreB, which has putative function in cell organization and determining cell shape, was also up-in the biofilm phase [102,103].c-di-AMP is a secondary messenger that fulfils multiple important functions in bacteria [104].In L. monocytogenes disruption of c-di-AMP metabolism due to mutant strains lacking the phosphodiesterase PgpH affects biofilm formation and integrity of the cell envelope [105][106][107].
Genes involved in amino acid transport and metabolism were also differentially expressed.In addition to argH, which was flagged by GWAS, multiple genes involved in amino acid metabolism were identified, including aroH, asnB, dapF, hisC, purF and serC.Multiple genes involved in nucleotide metabolism or DNA replication and repair were found to be differentially expressed in L. monocytogenes.For example, both ruvA and ruvB displayed a reduction in expression in the biofilm state.Extracellular DNA is a ubiquitous structural component of biofilms that is stabilized by RuvA, and addition of the RuvABC complex results in significant biofilm disruption [108,109].The oligopeptidase gene, pepF, was also down-regulated in the biofilm state.PepF has previously been shown to influence biofilm formation in Aeromonas hydrophila [110].

Short-term adaptation and fixed genomic changes associated with biofilm formation involve distinct genes
We investigated the number of genes that were shared between the GWAS and differential gene expression analyses.Six genes exceeded the threshold for significance between the genome-wide association studies and differential expression analysis at 30 and 37 °C, respectively.No genes were identified as significantly associated with biofilm formation in all three experiments.Given the low numbers of shared genes observed between the experiments, we sought to determine quantitatively whether this lack of overlap was statistically significant using a chi-square test.A chi-square test showed there is a significant relationship between the genes identified by the differential expression analysis and by GWAS at 30 °C [χ 2 (2, N=2864)=5.85,P<0.025, and 37 °C, χ 2 (2, N=2864)=5.85,P<0.001].The number of genes shared by the two types of experiment is less than would be expected through random sampling.This suggests that short-term adaptation to biofilm formation, investigated by differential gene expression, and long-term adaptation to biofilm formation, investigated by GWAS and involving fixed genomic changes, involve distinct genes (Fig. 5).

DISCUSSION
Comparing L. monocytogenes isolates from Japan with a large and geographically diverse collection revealed a diverse and highly structured population representing the major L. monocytogenes lineages circulating globally.The comparison allowed identification of genetically related isolate clusters that are internationally distributed, suggesting recent, and possibly frequent, transmission between countries.Limited metadata associated with publicly available isolate genomes made it difficult to draw conclusions on epidemiological links between related international clusters.However, these observations highlight the importance of real-time monitoring and tracking of L. monocytogenes genotypes at the international level to identify common transmission sources.
The mixed ecological sources of samples from Japan allowed isolates to be classified according to their prevalence in different agricultural hosts, and persistence in the food production environment or clinic.Putative differences in isolate ecology can be inferred from the different CCs sourced from each site.For example, isolates sourced from clinical samples are more likely to belong to the same complexes as isolates sourced from meat products.Alternatively, complexes isolated from agricultural samples are rarely sampled from other settings, except in the case of CC1, CC6 and CC7, which include isolates sampled from all three settings.However, the evolutionary events responsible for L. monocytogenes transmission between these settings remain poorly understood.
In vitro investigation of short-term diversification and selection for mutations associated with biofilm formation has revealed convergent mechanisms of evolution, with strains of the same species often presenting identical nucleotide changes.Furthermore, several genes are known to be associated with biofilm formation in multiple species of bacteria; for example, luxS has an important role in quorum sensing and is involved in biofilm formation in multiple species including L. monocytogenes [111], Escherichia coli [112], Streptococcus mutans [113] and Helicobacter pylori [114].Importantly, in L. monocytogenes the ability to form a thick biofilm was distributed across the global phylogeny.This divergent origin of biofilm-associated genes is consistent with multiple independent emergences of this important phenotype, potentially associated with HGT as observed in lineage II isolates from a relatively broad range of environments [55].
Selection through food production environments might be assumed to favour isolates that form thick biofilms and can survive on contaminated food to infect people.However, in this study, the mean biofilm-forming ability of isolates was greatest amongst isolates sampled from agricultural sources compared with clinical sources or meat products.This observation probably reflects the heterogenous nature of biofilm formation in L. monocytogenes, limited sample numbers and the importance of other factor in human infection.
We found a positive correlation between biofilm-forming ability at the two temperatures investigated, with a significantly greater thickness observed at 30 °C than at 37 °C.These findings are consistent with previous research on the influence of temperature on biofilm formation by L. monocytogenes, which suggests that biofilm formation varies with temperature, with optimal biofilm formation being observed at 30 °C [115,116].Non-motile mutants of L. monocytogenes are not capable of forming biofilms, indicating the importance of flagellar-based motility in the formation of biofilms [49].fact, the reduction in biofilm formation 30 and 37 °C could be a result of the temperature dependence of these structures involved in cell motility [117].In this context, while our differential expression and GWAS analyses identify candidate biofilm genes, it will be important to confirm the link between sequence variation with function [118], particularly when temperature can influence biofilm formation [115].
Adaptation can occur over different timescales.Long-term adaptation is typically considered in the context of alterations to genes, resulting from mutations or recombination, that confer stable inheritable fitness advantages [119].Relatively rapid adaptation can also occur, potentially involving changes in regulation of gene expression rather that DNA sequence.This can include transcription factors [120], epigenetic modifications [121] and phase variation [122].These modifications provide reversible and dynamic responses to environmental change, which can be inherited over multiple generations [123].Biofilm adaptation has been linked to phase variation of the DNA methyltransferase (Mod) in Haemophilus influenzae and alters the expression of genes controlled by the regulon via site-specific methylation [124].Similarly, epigenetic regulatory modification can lead to distinct biofilm morphologies, despite isolates being genetically identical [125,126].
The transition between planktonic and biofilm states is a major phenotype change [127][128][129].Under the assumption that the most beneficial adaptations will increase most rapidly in the population, one might expect genes that exhibit changes in expression will also accumulate sequence variation linked to biofilm formation.However, in our study there was little evidence of overlap between biofilm genes linked to rapid and longer-term adaptation.This is consistent with a more complex adaptive process involving variation in both expression levels and DNA sequence.For example, observations of short-and long-term exposure to antibiotics have revealed initial adaptation through differential gene expression, including upregulation of efflux pumps [130].Subsequently, the SOS response promotes HGT and de novo mutation, resulting in resistance to higher antibiotic concentrations [130].
Understanding the contrasting processes linked to biofilm formation has particular significance for listeriosis.Broadly, biofilms confer physical, adaptive or virulence advantages to the pathogen.First, physical biofilm barriers promote survival in food production environments and offer protection against antibiotics prescribed to treat infections [131].Second, the accumulation of DNA -including mobile genetic elements -within biofilms facilitates HGT adaptation to changing environmental conditions.Finally, as in many pathogenic bacteria, biofilms are linked to virulence.There is clear utility in understanding contrasting genetics underpinning biofilm adaptation.Mature biofilms are linked to persistent infections and are more difficult to treat with antibiotics so early interventions are beneficial.Considering both gene expression and sequence polymorphism as important in biofilm formation may help in understanding important pathogenic lineages and guide interventions to reduce their emergence and spread.

Fig. 1 .
Fig. 1.L. monocytogenes clones isolated in Japan are globally distributed.(a) L. monocytogenes isolates (n=868) are shown on a core-genome phylogenetic tree reconstructed using an ML algorithm implemented in RAxML.The rings, from inner to outer, indicate the phylogenetic lineage of isolates (lineage I is green, lineage II is yellow, lineage III is purple, lineage IV is red); the diversity of CCs present within each lineage; the distribution of Japanese isolates across the phylogeny is indicated in black.Bar 0.1 substitutions per site.(b) The frequency of L. monocytogenes isolates by CC.Only complexes with four or more isolates are included.Bars are coloured according to isolation location (Japan is black, the rest of the world is white).

Fig. 3 .
Fig. 3. Biofilm formation is distributed throughout the clonal frame of L. monocytogenes.(a) L. monocytogenes isolates (n=108) from Japan are shown on a core-genome phylogenetic tree reconstructed using an ML algorithm implemented in RAxML.The coloured rings indicate the biofilm-forming ability of isolates at 30 °C (inner) and 37 °C (outer).Isolate Lm0132 has been highlighted using an asterisk.Bar, 0.02 substitutions per site.(b) Scatter plot of isolate biofilm formation at 30 and 37 °C.The blue dotted line indicates the line of best fit.

Fig. 4 .
Fig. 4. Genes with differential expression in transcription analysis are distinct from genes associated with biofilm formation in GWAS.(a) Differential gene expression between planktonic-and biofilm-forming states of Lm0132.The log-fold change in gene expression against its position in the F2365 genome is shown.Red lines denote a significance threshold of 100× change in gene expression.(b) and (c) Manhattan plots of unitigs associated with biofilm formation at 30 and 37 °C, respectively.The significance of each unitig k-mer's association with biofilm formation against its position in the L. monocytogenes F2365 genome is shown.The red line denotes a genome-wide significance threshold −log 10 (P)=6.97(α<0.05Bonferroni corrected with 436 125 unique variants), and the blue line denotes the suggested significance threshold −log 10 (P)=3.97.Bar charts represent the number of genes exceeding significance thresholds with a function described in the KEGG database [76].Dark grey bars represent gene functions (09142: Cell motility, 09145: Cellular community -prokaryotes, 09131: Membrane transport, 09132: Signal transduction, 09123: Folding, sorting and degradation, 99975: Protein processing, 09124: Replication and repair, 09121: Transcription, 09122: Translation, 09175: Drug resistance -antimicrobial, 09171: Infectious disease -bacterial, 09105: Amino acid metabolism, 09110: Biosynthesis of other secondary metabolites, 09101: Carbohydrate metabolism, 09102: Energy metabolism, 09107: Glycan biosynthesis and metabolism, 09103: Lipid metabolism, 09108: Metabolism of cofactors and vitamins, 09106: Metabolism of other amino acids, 09109: Metabolism of terpenoids and polyketides, 09104: Nucleotide metabolism), while light grey bars represent functional classes (09140: Cellular processes, 09130: Environmental information processing, 09120: Genetic information processing, 09160: Human diseases, 09100: Metabolism).

Fig. 5 .
Fig. 5. Biofilm adaptive changes in gene expression do not correlate with mutational changes identified by GWAS.Scatter plots of overlapping hits exceeding the suggestied significance threshold in GWAS and with a log(change) in gene expression >2.GWAS scores were determined as the proportion of the maximal −log(P-value) observed per experiment.RNA-seq scores were determined as the proportion of the maximal log(change) observed.Coloured dots indicate biofilm formation at 30 °C (blue) and 37 °C (red).The blue line indicates the line of best fit and dotted blue lines indicate 95 % confidence intervals.