Draft genome sequence data of methanotrophic Methylovulum psychrotolerans strain S1L and Methylomonas paludis strain S2AM isolated from hypoxic water column layers of boreal lakes

Methanotrophic bacteria inhabit a wide range of natural (e.g. wetlands, lakes and soils) and anthropogenic (e.g. wastewater treatment plants and landfills) environments. They play a crucial role in mitigating atmospheric emissions of the greenhouse gas methane. There is also a growing interest in applying methanotrophs in the bioconversion of biogas - and natural gas - methane into value-added products (e.g. chemicals and single-cell protein). Hence, isolation and genome sequencing of methanotrophic bacteria is needed to provide important data on their functional capabilities. Here, we describe the de novo assembled draft genome sequences of Methylovulum psychrotolerans strain S1L isolated from hypoxic water column layer of boreal Lake Lovojärvi (Southern Finland), comprising total of 5090628 bp in 11 contigs with G+C – content of 50.9% and containing 4554 coding sequences. The draft genome of strain S1L represents the first published genome of M. psychrotolerans strain isolated from lake ecosystems. In addition, we present the genome sequence of Methylomonas paludis strain S2AM, isolated from water column of boreal Lake Alinen Mustajärvi (Southern Finland), comprising 3673651 bp in 1 contig with G+C – content of 48.2% and 3294 coding sequences. The draft genome of strain S2AM represents the first published genome of M. paludis. The preliminary genome annotation analysis of both S1L and S2AM identified genes encoding oxidation of methane, methanol, formaldehyde and formate, assimilation of carbon, ammonium and nitrate, N2 fixation, as well as various enzymes enabling the survival in hypoxic conditions, i.e. high-affinity oxidase, hemerythrins, fermentation enzymes (for production of acetate, succinate and H2) and respiratory nitrite reductases. The draft genomes have been deposited at GenBank under the accession JAGVVN000000000 for S1L and CP073754 for S2AM.


a b s t r a c t
Methanotrophic bacteria inhabit a wide range of natural (e.g. wetlands, lakes and soils) and anthropogenic (e.g. wastewater treatment plants and landfills) environments. They play a crucial role in mitigating atmospheric emissions of the greenhouse gas methane. There is also a growing interest in applying methanotrophs in the bioconversion of biogas -and natural gas -methane into value-added products (e.g. chemicals and single-cell protein). Hence, isolation and genome sequencing of methanotrophic bacteria is needed to provide important data on their functional capabilities. Here, we describe the de novo assembled draft genome sequences of Methylovulum psychrotolerans strain S1L isolated from hypoxic water column layer of boreal Lake Lovojärvi (Southern Finland), comprising total of 5090628 bp in 11 contigs with G + C -content of 50.9% and containing 4554 coding sequences. The draft genome of strain S1L represents the first published genome of M. psychrotolerans strain isolated from lake ecosystems. In addition, we present the genome sequence of Methylomonas paludis strain S2AM, isolated from water column of boreal Lake Alinen Mustajärvi (Southern Finland), comprising 3673651 bp in 1 contig with G + C -content of 48.2% and 3294 coding sequences. The draft genome of strain S2AM represents the first published genome of M. paludis . The preliminary genome annotation analysis of both S1L and S2AM identified genes encoding oxidation of methane, methanol, formaldehyde and formate, assimilation of carbon, ammonium and nitrate, N 2 fixation, as well as various enzymes enabling the survival in hypoxic conditions, i.e. high-affinity oxidase, hemerythrins, fermentation enzymes (for production of acetate, succinate and H 2 ) and respiratory nitrite reductases. The draft genomes have been deposited at GenBank under the accession JAGVVN0 0 0 0 0 0 0 0 0 for S1L and CP073754 for S2AM. ©

Value of the Data
• Draft genome sequences of Methylovulum psychrotolerans strain S1L and Methylomonas paludis strain S2AM provide fundamental knowledge on the functional potential of methanotrophs mitigating methane emissions from natural freshwater ecosystems and give insights into their biotechnological applicability • This data is beneficial for researchers in biogeochemistry, environmental microbiology, biotechnology and circular economy • This data can be used in predicting the function of methanotrophs in natural ecosystems under variating physicochemical conditions as well as in developing methane-based bioproduct platforms

Data Description
Methanotrophic bacteria are widely distributed in natural (wetlands, lakes, oceans, soils) and anthropogenic (wastewater treatment plants, landfills) methane-producing ecosystems [1 , 2] . Two methanotrophic strains, S1L and S2AM, were isolated from hypoxic water column layer of boreal Lake Lovojärvi and Lake Alinen Mustajärvi, respectively. Based on the 16S rRNA gene sequencing and phylogenetic tree analysis, we classified S1L and S2AM as representatives of Methylovulum psychrotolerans and Methylomonas paludis , respectively (see 16S rRNA-gene based phylogenetic tree in Fig. 1 A). We chose strains S1L and S2AM for draft genome sequencing to identify their functional potential because 1) methanotrophs are important mitigators of atmospheric methane emissions, 2) methanotrophs provide platforms for bioconversion of biogas-and natural gas-methane to value-added bioproducts [1 , 3] , and 3) there are no previously published genomes of M. paludis , and for M. psychrotolerans no genomes of lake strains exist, as the previously published genomes are from strains Sph1 and HV10-M2 isolated from cold methane seep and soil ecosystems, respectively.
The full statistics of de novo assemblies and genome characteristics of strains S1L and S2AM are reported in Table 1 . The draft genome of S1L consisted of 11 contigs, with 5090628 bp in total length, G + C content of 50.9%, 4554 coding sequences, 9 rRNAs and 46 tRNA genes. In accordance with 16S rRNA gene -based analyses ( Fig. 1 A), phylogenomic tree analysis (see Fig. 1 B) as well as average nucleotide identities higher than the 95% -identity level used for species-level delineation, i.e. 97.4% to M. psychrotolerans Sph1 (type strain) and 97.3% to M. psychrotolerans HV10-M2, confirmed that strain S1L is a representative of M. psychrotolerans .
The draft genome of S2AM consisted of 1 contig, with 3673651 bp in length, G + C content of 48.2%, 3294 coding sequences, 9 rRNA and 44 tRNA genes ( Table 1 ). The deduced amino acid sequences of the database-deposited genes of the type strain M. paludis MG30(T) (DSM 24973) [4] , i.e. pmoA (CCH22593.1), mxaF (CCH22594.1) and nifH (CCH22595.1), were 100% identical Table 1 Statistics of de novo assemblies and genome characteristics of strains S1L and S2AM.  to the deduced amino acid sequences of the respective genes in the genome of strain S2AM, which confirms the results of 16S rRNA gene analyses on the species-level classification of strain S2AM ( Fig. 1 A). Average nucleotide identities of genome of S2AM to genomes of other strains of Methylomonas were not reported by FastANI program confirming that they were all < 80% [5] . The genome of S2AM also formed a separate species-level branch in the phylogenomic tree ( Fig. 1 B). Both S1L and S2AM encoded particulate methane monooxygenase operon (pmoCAB) for conversion of methane to methanol. In addition, both strains encoded the pxm operon (pxmABC), i.e. a copper membrane monooxygenase of unknown function [6] , while genes coding for the soluble methane monooxygenase (mmoXYZBCD) were not found in either of the strains. For the conversion of methanol to formaldehyde, the strains coded for both calcium-(mx-aFJGIACKLD) and lanthanide-dependent (xoxF) methanol dehydrogenases. Genes involved in tetrahydromethanopterin (H4MTP) -and tetrahydrofolate (H4-folate) -linked C1 transfer (i.e. formaldehyde oxidation), as well as in formate oxidation were also identified in both strains. Genes for a RuMP pathway for carbon (formaldehyde) assimilation and the oxidative TCA cycle were also present in both strains.
The genomes of both strains also included genes encoding nitrogen fixation (nitrogenase, nifDKH) and assimilation of ammonium (ammonium transporters, alanine dehydrogenase, glutamine synthetase/glutamate synthase) and nitrate (nitrate/nitrite transporters, nitrate reductase nasA and nitrite reductase nirBD). In accordance with their origin from hypoxic lake waters, genes encoding enzymes enabling for the survival in hypoxic conditions were found in both strains, i.e. high-affinity oxidase (cytochrome bd-I ubiquinol oxidase), oxygen-binding hemerythrins, nirS-type dissimilatory nitrite reductase as well as various fermentation enzymes, i.e. succinate dehydrogenase (sdhABCD) for conversion of fumarate to succinate, acetate kinase (ackA) for conversion of acetyl phosphate to acetate and hydrogen dehydrogenase (hoxFGHY) for conversion of H + to H 2 [7][8][9] .

Isolation of strains and DNA extraction
Strain S1L and S2AM were isolated from water samples, which were collected using a 2 dm 3 Limnos water sampler (length 30 cm) (Limnos Ltd., Turku, Finland) at 3.25 m depth of Lake Lovojärvi (61 °04 N, 25 °02 E; area = 0.054 km 2 , max. depth = 17.5 m) and at 4.5 m depth of Lake Alinen Mustajärvi (61 °12 N, 25 °06 E; area = 0.007 km 2 , max. depth = 6.5 m), located in Southern Finland, in 3rd of September 2019. The lakes were sampled at their deepest points, where the water was also vertically stratified with respect to O 2 and temperature (T) as determined using YSI ProODO (optical dissolved oxygen) field meter (Yellow Springs Instruments, Yellow Springs, OH, USA). At the time of sampling, T, O 2 and pH of the sampling layer were 12 °C, 12 μmol L −1 and 6.83, respectively in Lake Lovojärvi (strain S1L), and 6.3 °C, 10 μmol L −1 , 5.76, respectively, in Lake Alinen Mustajärvi (strain S2AM). Hence, the strains were isolated from hypoxic water layers.
The strain S1L was enriched and purified using nitrate mineral salts (NMS) medium (DSMZ 921 medium; initial pH ∼6.80) supplemented with 1 μM lanthanum (LaCl 3 ). The 1/10 dilution of NMS medium (dNMS) with 0.1 μM lanthanum was used for enrichment and purification of the strain S2AM. Initially, lake water samples were incubated in a serum bottle containing NMS or dNMS liquid media and 20% methane:80% air in headspace. After three or four subculturing in liquid media, the enriched lake cultures were streaked on NMS or dNMS agar media (1.5% noble agar) and incubated in an air-tight chamber with 20% methane:80% air in headspace for 14-30 days. Colonies were observed under stereo microscopy and picked and re-streaked until a single type of a colony was obtained. Culture purity was determined using a light microscope, and by confirming absence of growth in nutrient-rich medium and in NMS medium without CH 4 .
Genomic DNA was extracted using GeneJET Genomic DNA Purification Kit and quantified using a Qubit 2.0 Fluorometer and a dsDNA HS Assay Kit (Thermo Fisher Scientific, Waltham, MA, USA). Using the identification service offered by Macrogen (Amsterdam, Netherlands), the 16S rRNA genes of S1L and S2AM were amplified from the DNA using primers 27F (AGAGTTTGATCMTGGCTCAG) and 1492R (TACGGYTACCTTGTTACGACTT) and sequenced with primer pairs 785F (GGATTAGATACCCTGGTA) and 907R (CCGTCAATTCMTTTRAGTTT). 16S rRNA gene -based phylogenetic tree was done using the maximum likelihood algorithm (generalized time reversible model) with 100 bootstraps in Mega X [10] .

Genome sequencing and analysis
Library preparation and sequencing for long and short reads was done by Novogene Co. Ltd. (Beijing, China). For long reads, the 10kb SMRTbell library was prepared using SMRTbell Template Prep Kit 1.0-SPv3 (catalog number 10 0-991-90 0) (Pacific Biosciences, Menlo Park, CA, USA). In library preparation, the qualified high-molecular weight DNA were fragmented to approx. 10 kb, followed by damage repair, end repair and adapter ligation. Afterwards, size selection was performed by Size-Selection System. The SMRTbell-Polymerase Complex was prepared using Sequel Binding Kit 2.0 and sequenced on Sequel SMRT Cell 1M v2 (Pacific Biosciences). Short read libraries were prepared using NEBNext® Ultra TM DNA Library Prep Kit for Illumina® (catalog number E7370L) (insert size 350 bp) (New England Biolabs, Ipswich, MA, USA) and the sequencing was done on Illumina NovaSeq 60 0 0 platform (paired-end 150 bp) (Illumina, San Diego, CA, USA).