Bacterial distribution in Twilight zone of the Indian sector of Southern Ocean: V3- V4 rDNA hypervariable region data

Twilight zones in oceans represent the oceanic waters between 200 m to 1000 m in depth, wherein sunlight is diffused and intensity is <1% of surface value. The activities and diversity of marine micro-organisms in this unique zone are understudied, especially in the Indian Sector of the Southern Ocean. For a better understanding of the microbial environment and diversity in the twilight zone of the Indian sector of Southern Ocean, samples were collected from 200m depth in eddy-influenced waters of Subtropical Front (STF), Sub-Antarctic Front (SAF), Polar Front (PF), waters off Kerguelen (Kw), and Prydz Bay (Pb) waters. In this article, next-generation sequencing (NGS) based amplicon data of 16s rDNA bacterial samples are presented. Hypervariable V3-V4 regions were sequenced using Hiseq platform, and data was processed using Mothur v 1.48.0, and database Silva 138.1nr. Total of nine different phyla is reported from the Southern Ocean at 200m, whereas at order level Synechococcales was found in STF waters only and SAR 11_ Clades were present in all stations.


a b s t r a c t
Twilight zones in oceans represent the oceanic waters between 200 m to 1000 m in depth, wherein sunlight is diffused and intensity is < 1% of surface value. The activities and diversity of marine micro-organisms in this unique zone are understudied, especially in the Indian Sector of the Southern Ocean. For a better understanding of the microbial environment and diversity in the twilight zone of the Indian sector of Southern Ocean, samples were collected from 200m depth in eddy-influenced waters of Subtropical Front (STF), Sub-Antarctic Front (SAF), Polar Front (PF), waters off Kerguelen (Kw), and Prydz Bay (Pb) waters. In this article, nextgeneration sequencing (NGS) based amplicon data of 16s rDNA bacterial samples are presented. Hypervariable V3-V4 regions were sequenced using Hiseq platform, and data was processed using Mothur v 1.48.0, and database Silva 138.1nr. Total of nine different phyla is reported from the Southern Ocean at 200m, whereas at order level Synechococcales was found in STF waters only and SAR 11_ Clades were present in all stations.

Value of the Data
• First set of metagenomic data from the twilight waters of ISO. This set of data from an under-reported oceanic regime shall add to the global databases like BacDive, Planet Microbe, etc.. • Twilight zone is a hotspot for microbial activity and diversity; reported dataset can be used in linking microbial processes (enzymatic and biogeochemical processes) to diversity. • ISO serves as source and sink of atmospheric CO 2 and other greenhouse gases; this dataset may be used in combination with proteomics to evaluate the microbial processes in these waters. • This dataset can be a link to understand microbial community structure in other twilight zones of other oceans.

Objective
Southern Ocean is an environmentally isolated biome but has a large diversity and biomass of micro and macroplankton communities. The Southern Ocean is circumpolar, without any continental barriers and is linked with the Pacific, Atlantic and Indian Oceans. The strong meridional temperature and salinity gradients make it unique from the other oceans. Such stratified water masses serve as a hotspot for diverse bacterial communities including both phototrophic and heterotrophic microbes. High bacterial diversity was observed at 200 m depth in the Equatorial Indian Ocean [1] . The Southern Ocean plays a major role in flux of atmospheric carbon dioxide, organic matter, and sinking water masses [2][3] . A strong thermocline and resultant stratification at 200 m (twilight zone) retain particulate organic matter and plankton biomass in the Indian Sector of Southern Ocean [4] .
Here we report the metagenomic dataset from samples collected from 200 m depth in the Indian sector of Southern Ocean. Amplicon bacterial diversity was performed using HiSeq method followed by bioinformatics analysis using Mothur v 1.48.0 and Silva 138.1 nr database.

Data Description
The samples collected from seven different stations in Indian Sector of the Southern Ocean ( Fig. 1 ) were treated for DNA extraction and the hypervariable regions V3-V4 of 16s rDNA were  Table 1 ). sequenced using HiSeq sequencing technique. The sequences were then subjected to bioinformatics processing using Mothur v 1.48.0 [5] . Taxonomic classification was carried out using Silva 138.1 nr database. Further downstream processing of data was generated using MicrobiomeAnalyst [6] .

Experimental Design, Materials and Methods
Seven different sampling locations were selected based on hydrography and nutrients variability ( Table 1 ). Sampling was carried out onboard SA Agulhas in austral summer, during Indian Southern Ocean expedition 9 (ISOE-09; 2016-2017). Seawater samples were collected using a Niskin bottle (10 L sampler) (Seabird Inc., USA) attached to CTD rosette equipped with Sea bird CTD system (SBE911 plus, Sea-Bird Electronics, USA) [7]. Five litres (5 L) of water samples were filtered through 0.22-μm pore size, 47 mm diameter polycarbonate filters (Merck Millipore, USA) for bacterial diversity analysis. The filters were sealed in sterile tubes, stored at -80 °C and transported to the laboratory for DNA extraction [7] .
DNA Isolation and sequencing -Total DNA was extracted from 0.22 μm pore-size polycarbonate membrane filter using Power Water DNA kit (MoBio; USA). The hypervariable region (V3-V4) of bacterial 16S rDNA was amplified using primer 341 F: 5' CCTACGGGAGGCAGCAG 3' and 806 R: 5 GGACTACHVGGGTTCTAAT 3' [1] with Hiseq Rapid V2 Kit for 2 * 250 base pair (bp) sequence. DNA sequencing was outsourced to M/s Agrigenome Laboratory Ltd. (India). The raw data of the V3-V4 sequence were deposited NCBI database. Quality scores and CG base were checked and processed for down streaming bioinformatics analysis ( Table 1 ).
Bioinformatics and statistics analysis-DNA sequences were processed downstream using Mothur v-1.48.0 (Log file attached as supplementary 1). Contigs were prepared for total 2292979 reads and were trimmed, around 1739901 sequences were removed using screen.seqs command. Thereafter, 553078 sequences were selected and further 417049 sequences were selected as unique sequences. These selected unique sequences were further aligned using Silva.nr_v138.1 database. The sequential downstream processing was carried out and a total of 250842 sequences were obtained and subsampled with reference to the smallest group of 3309 sequences. Chloroplast, mitochondria, Eukaryota, Archaea and unknown samples were removed using remove.lineage command and further shared and taxonomy files were created with cutoff value of 0.03. Grapher 10, Microbiomeanalyst and R software was used for downstream processing and final data preparation.

Ethics Statements
This article does not contain any studies with human participation or animal performed by any of the authors.

Credit Author Statement
Alok K. Sinha -Sample Collection, Methodology, Data curation, Writing original draft preparation; Bhaskar V. Parli -Visualization, Supervision and Editing; N. Anilkumar -Supervision and Project Leader.

Funding Support
This work was carried out under SOCarP project funded by Ministry of Earth Sciences (MoES), Government of India (GoI).

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability
Southern Ocean Deep Watesr Sample (Original data) (NCBI)