Metagenomic insights into the abundance and composition of resistance genes in aquatic environments

A global survey was performed with 122 aquatic metagenomic DNA datasets (92 lake water and 30 seawater) obtained from the Sequence Read Archive (SRA). Antibiotic resistance genes (ARGs) and metal resistance genes (MRGs) were derived from the dataset sequences via bioinformatic analysis. The relative abundances of ARGs and MRGs in lake samples were in the ranges ND (not detected) – 1.34×10 0 and 1.22×10 − 3 – 1.98×10 − 1 copies per 16S rRNA, which were higher than those in seawater samples. Among ARGs, multidrug resistance genes and bacitracin resistance genes had high relative abundances in both lake and sea water samples. Multi-metal resistance genes, mercury resistance genes and copper resistance genes had the greatest relative abun- dance for MRGs. No signi ﬁ cant di ﬀ erence was found between epilimnion and hypolimnion in abundance or the Shannon diversity index for ARGs and MRGs. Principal coordinates analysis and permutational multivariate analysis of variance (PERMANOVA) test showed that strati ﬁ cation and geography had signi ﬁ cant in ﬂ uence on the composition of ARGs and MRGs in lakes (p < 0.05, PERMANOVA). Coastal seawater samples had sig- ni ﬁ cantly greater relative abundance and a higher Shannon index for both ARGs and MRGs than deep ocean and Antarctic seawater samples (p < 0.05, Kruskal-Wallis one-way ANOVA), suggesting that human activity may exert more selective pressure on ARGs and MRGs in coastal areas than those in deep ocean and Antarctic seawater.


Introduction
Antibiotic resistance genes (ARGs) and metal resistance genes have ancient origin (Jackson and Dugas, 2003;Wright and Poinar, 2012), but human activities accelerate the propagation and dissemination of ARGs and MRGs (Gillings, 2018;Li et al., 2017). Aquatic environments which were easily influenced by human activities and regarded as ideal mediums for the aggregation and dissemination of ARGs and MRGs (Marti et al., 2014;Pal et al., 2015). Lakes and oceans are two aquatic environments in which ARGs and MRGs ultimately collect. ARG pollution has been well documented in lakes (Bengtsson-Palme et al., 2014;Czekalski et al., 2014;Czekalski et al., 2015;Yang et al., 2017a;Yang et al., 2017b) and seas (Chen et al., 2013;Hao et al., 2018;Hatosy and Martiny, 2015;Zhang et al., 2018). MRGs have also been investigated in microbiota from lake (Edwardson and Hollibaugh, 2017) and seas (Chauhan et al., 2017;Yang et al., 2019a). ARGs and MRGs could be co-selected or cross-selected in the environment. Analysis of 5436 complete bacterial genomes showed that co-existent ARGs and MRGs occur widely in environmental and clinical bacteria (Luo et al., 2017). Plasmids containing both MRGs and ARGs were more likely to be conjugative and transferred between different bacterial taxa (Pal et al., 2015). mobile genetic elements (MGEs) is responsible for the co-selection of ARGs as a consequence of heavy metal pollution (Di Cesare et al., 2016;Imran et al., 2019). Concentrations of antibiotics in aquatic environments are typically below levels likely to impact microorganisms (Kolpin et al., 2002), while metal contamination is ubiquitous and is still rapidly increasing (Huang et al., 2019), so heavy metals could facilitates the proliferation of MRGs (Roosa et al., 2014) and accelerates the propagation of ARGs through co-selection of ARGs with MRGs (Baker-Austin et al., 2006;Di Cesare et al., 2016;Seiler and Berendonk, 2012).
Studies that have looked at ARGs and MRGs in natural lakes or seas have usually been performed at a local scale. There was no collation about ARGs and MRGs from metagenomic data of lakes and seas to obtain global diversity, biogeography and co-occurrence pattern of ARGs and MRGs in lakes and seas. In this study, we obtained metagenomic data from lakes and seas worldwide that was held in the Sequence Read Archive (SRA, https://www.ncbi.nlm.nih.gov/sra). The metagenomic data of lake water from the SRA could be classified by thermal stratification and geological sites. The metagenomic data for sea waters were classified into coastal, deep ocean, and Antarctic groups, to indicated the effect of human influence. Hence, the aims of this research were to: (1) identify the effect of thermal stratification and geography on the abundance and composition of ARGs and MRGs in lakes; (2) analyze the co-occurrence pattern of ARGs and MRGs in global lakes and seas; (3) investigate the distribution of ARGs in seawater of coastal, deep ocean, and Antarctic groups to speculate the effect of human influence on the abundance and diversity of ARGs and MRGs in seas.

Data collection and pre-processing
Metagenomic data sampled from lakes and seas worldwide were retrieved from the Sequence Read Archive (SRA) (https://www.ncbi. nlm.nih.gov/sra) database on the NCBI website using the search terms lake AND metagenome and sea AND metagenome on 2018-06-29. The data obtained were then refined using the criteria: (1) if a lake or sea site had metagenomic data for several years, data within the latest year were selected; (2) if there were datasets replicates, sequences having the greatest length were selected. These selections resulted in a complete dataset comprising 122 metagenomic sequence objects from lakes (n = 92) and seas (n = 30) (Supplementary materials, Table S1) that was used to investigate the diversity and distribution of ARGs and MRGs. Of the lake datasets, 37 items were obtained from epilimnion samples, 13 were from hypolimnion samples, and 8 were from metalimnion samples, with the remaining 34 datasets with no thermal stratification information. The lake data collected from 8 geographical regions, including Rwanda, Finland, USA, Canada, Brazil, Switzerland, Russia, Malawi and Antarctic (Fig. S1, Supplementary materials). The seawater datasets included 8 items from the deep ocean, 15 from coastal waters, and 7 from the seas around Antarctica. The raw data were downloaded and processed as follows : (1) PRINSEQ was used to check the data quality and filter the data (prinseq-lite.pl with parameter settings mean quality score ≥ 20 and number of ambiguous ≤ 1); (2) only sequences with length > 100 bp were included. The included sequences were trimmed to 100 bp to eliminate inconsistencies in sequences and reduce the bias caused by sequencing. In the final lake and ocean datasets used in the study, the number of sequences across all samples ranged from 211,168 to 403,273,514 sequences with an average value of 85,100,945 sequences.

Analysis of ARGs and MRGs
The metagenomic data was processed against the Structured ARG (SARG) database (containing 23 ARG classes and 1277 ARG subtypes) through the ARGs-OAP online pipeline (http://smile.hku.hk/SARGs) following the published procedure Yin et al., 2018) to identify ARGs. The metagenomic data were processed against the BacMet reference dataset (http://bacmet.biomedicine.gu.se/index. html, containing 23 MRG classes) to identify MRGs. The UBLAST algorithm, using Perl script supplied by the platform, was run to prescreen ARG-like, MRG-like and 16S rRNA gene sequences. The candidate ARG and MRG sequences were matched against the SARG and Bacmet databases using BLASTX. The sequences that were found which met the BLASTX criteria (alignment length 25 aa, similarity 80%, and evalue 1e −5 ) were classified according to the SARG and MRG hierarchy (Buchfink et al., 2014;Kristiansson et al., 2011). For example, 'tetracycline resistance genes' is one example of an 'ARG type', whereas 'tetA' is one of the tetracycline resistance subtypes. MRG hierarchy was also the same as ARG. ARG and MRG abundances (copies of ARGs or MRGs per copy of 16S rRNA) in the metagenomic data were calculated and recorded.

Statistical analysis
Kruskal-Wallis one-way analysis of variance on ranks was used to compare the differences in relative abundance and Shannon index of ARGs and MRGs in lakes and seas because the data failed the normality test. The R (3.4.1) vegan package was used to calculate the Shannon index and perform principal coordinates analysis (PCoA) based on the ARG and MRG subtypes (Oksanen et al., 2013). Permutational multivariate analysis of variance (PERMANOVA) was performed to analyze the effect of stratification and geography on the composition of ARGs and MRGs via adonis2 function in vegan package. The R VennDiagram package was used to analyze and graph ARGs and MRGs (Chen and Boutros, 2011). The R Hmisc package was used to perform Spearman analysis of ARGs and MRGs in lakes and seas (Harrell and Dupont, 2012), and Gephi (0.9.1; Gephi, WebAtlas, France) was used to create a network representation of the results (Bastian et al., 2009). In the network, the observed (O%) and random incidences (R%) of co-occurrence pattern of ARG and MRG were statistically analyzed and the ratio of O% to R% (O/R ratio) was used as a threshold to determine non-random co-occurrence patterns (Ju et al., 2016;Ju and Zhang, 2015). Pearson analysis of Gross domestic product (GDP) per million people (Table S2) and geometric mean of ARG abundances in lakes from each country were carried out.

ARG and MRG abundances in water from lakes and seas
The relative abundance of ARGs retrieved from 122 metagenomic data is shown in Fig. 1a and b. The relative abundance of ARGs in lakes was in the range ND (not detected)-1.34 copies per 16S rRNA, with an average value of 7.25 × 10 −2 copies per 16S rRNA. The relative abundance of ARGs in seas was in the range ND-1.40 × 10 −1 copies per 16S rRNA with an average value of 2.70 × 10 −2 copies per 16S rRNA. The median relative abundances were significantly higher in lakes than in seas (Kruskal-Wallis one-way ANOVA; p < 0.05). The average relative abundances for epilimnion, hypolimnion, and metalimnion were 2.70 × 10 −2 , 8.66 × 10 −2 , and 7.41 × 10 −2 copies per 16S rRNA. There was no significant difference in the median values of relative abundance of ARGs between epilimnion, hypolimnion, and metalimnion (Kruskal-Wallis one-way ANOVA; p > 0.05). Of the lake metagenomic datasets, 18 data were obtained from European lakes and 65 were obtained from North American lakes. There was no significant difference between European and North American lakes in relative abundance of ARGs (Fig. S2a, Kruskal-Wallis one-way ANOVA; p < 0.05). Median values of relative abundance of ARGs in coastal waters were significantly higher than those deep oceans (Kruskal-Wallis one-way ANOVA; p < 0.05). Twenty-two ARG types were identified in lakes (Fig. 2a), and 19 ARG types were found in seas (Fig. 2b). Bacitracin resistance genes and multidrug resistance genes were the two main ARG types with highest relative abundances in lakes. Multidrug resistance genes had the highest relative abundance in seas, followed by unclassified resistance genes and bacitracin resistance genes.
The relative abundance of MRGs identified from 122 metagenomic data are shown in Fig. 1c and d. The relative abundances of MRGs in lakes were in the range 1.22 × 10 −3 -1.98 × 10 −1 copies per 16S rRNA with an average value of 3.03 × 10 −2 copies per 16S rRNA. The relative abundances of ARGs in seas were in the range ND-1.78 × 10 −1 copies per 16S rRNA with an average value of 2.50 × 10 −2 copies per 16S rRNA. There was a statistically significant difference in the median values of the relative abundance of MRGs between lakes and seas (Kruskal-Wallis one-way ANOVA; p < 0.05). The median values of relative abundances in epilimnion and hypolimnion were significantly higher than those in metalimnion (Kruskal-Wallis one-way ANOVA; p < 0.05). There was no significant difference in relative abundance of ARGs between European and North American lakes (Fig. S2b, Kruskal-Wallis one-way ANOVA; p < 0.05). Fifteen MRG types were identified from lake metagenomic data, and 14 MRG types were identified from the sea data. Multi-metal resistance genes, copper resistance genes, mercury resistance genes, and arsenic resistance genes had high relative abundances in lakes (Fig. 2c). Multi-metal resistance genes, copper resistance genes and mercury resistance genes were the principal MRGs in seas having high relative abundances (Fig. 2d).

Diversity of ARGs and MRGs in lakes and seas
Four hundred and forty ARG subtypes were found in the 92 lake water metagenomic data, and 340 ARG subtypes were found in the 30 sea data. The Venn diagram shows that 250 ARG subtypes were found in both lakes and seas (Fig. S3a). Two hundred and seventy-one, 137 and 184 ARG subtypes were found in epilimnion, hypolimnion and metalimnion. Ninety-nine ARG subtypes, in 14 ARG types, were found in all lake waters ( Fig. 3a and Table S3). Three hundred and nine, 80 and 67 ARG subtypes were found in coastal, deep ocean and Antarctic sea waters. Nine ARG types, containing 38 subtypes, were found in all sea waters (Fig. 3b and Table S4).
Metagenomic data for lakes and seas contained 218 and 166 MRG subtypes. Venn diagram analysis shows that 159 MRG subtypes were found in both lakes and seas (Fig. S3b). There were 156, 108 and 113 MRG subtypes in epilimnion, hypolimnion and metalimnion (Fig. 3c). Of these, 12 MRG types, containing 84 MRG subtypes, were found in every lake stratum ( Fig. 3c and Table S5). There were 145, 65 and 93 MRG subtypes in data for coastal, deep ocean and Antarctic waters. Nine MRG types, containing 49 MRG subtypes, were found in all sea waters ( Fig. 3d and Table S6).
The Shannon indexes of ARGs in lakes and seas were in the ranges Y. Yang, et al. Environment International 127 (2019) 371-380 0-3.31 and 0-3.91 ( Fig. 4a and b), but there were no significant differences between them. No significant differences were found in the Shannon indexes of ARGs for epilimnion, hypolimnion and metalimnion. The Shannon index of ARGs in coastal sea waters was higher than for Antarctic seas and deep ocean waters. The median values of the Shannon indexes of MRGs in lakes were 3.57, which were higher than those in seas (3.25) (p < 0.05, Kruskal-Wallis one-way ANOVA; Fig. 4). No significant differences were found in the Shannon indexes of MRGs for epilimnion, hypolimnion and metalimnion (p > 0.05, Kruskal-Wallis one-way ANOVA; Fig. 4c). The Shannon index of MRGs in coastal sea waters was higher than those of Antarctic seas and deep oceans (p > 0.05, Kruskal-Wallis one-way ANOVA; Fig. 4d). PCoA of ARGs in lakes showed most water samples in Canada were placed in the right region of the ordination, while most water samples in USA were placed in the left region (Fig. 5a). Most epilimnion samples (red) were placed in the upper region of the ordination, while hypolimnion samples (green) were placed in the down region of the ordination (Fig. 5a). PERMANOVA analysis confirmed that the effect of stratification and geographical region on the composition were statistically significant (p < 0.05). This indicated that ARG composition in lakes exhibited characteristics of vertical and biogeographical distribution. Cluster analysis also showed the hypolimnion and epilimnion samples were grouped separately. Samples from the same geographical region were usually clustered together (Fig. S4). Fig. 5b shows that for the spatial distribution of MRGs in lakes, water samples from the different regions were clustered separately. Cluster analysis also confirmed this result (Fig. S5). Hypolimnion and epilimnion samples in Finland lakes were also grouped separately. PERMANOVA analysis indicated that the geographical region had significant difference on the composition of MRGs in lakes (p < 0.05). Fig. 6a describes the spatial pattern of ARGs in seawater, indicating that most water samples from deep oceans were clustered in the upper left corner and the coastal water samples were scattered in the ordination. The cluster analysis confirmed that most deep ocean water samples were clustered in Group III. Costal water samples were scattered in Group I, II and IV (Fig. S6). It can be seen that most deep ocean samples were grouped together in terms of MRGs. No clear pattern was found for coastal water samples or Antarctic seas (Fig. 6b). The cluster analysis confirmed that coastal and Antarctic water samples were scattered in all the classified groups, while most deep ocean samples were clustered in Group I (Fig. S7).

Discussion
Understanding the distribution and risk of ARGs should not be geographically restricted but rather take on a global perspective (Larsson et al., 2018), so we selected 92 (lakes) and 30 (oceans) metagenomic data from the SRA collection in order to determine the abundance, diversity and geographical distribution of ARGs and MRGs in the lakes and seas at a global level. Relative abundances of ARGs in lakes and seas were in the ranges ND-1.34 × 10 0 and ND-1.40 × 10 −1 copies per 16S rRNA. The abundances of resistance genes in lakes and seas almost falls into the range of ARG abundance previously reported in ten typical environments (sewage, swine wastewater samples, treated wastewater, river water, drinking water, soils, sediments, wastewater biofilm, sludge samples, and fecal samples), which was obtained by metagenomic analysis and is 3.2 × 10 −3 -3.1 × 10 0 copies per 16S rRNA gene (B. Li et al., 2015). The lakes are easily influenced by human activities, which may be one of the main drivers shaping the ARGs in lakes . Their surroundings could be human residential areas (e.g., Lake Mendota, an urban lake in Madison, Wisconsin, USA), or they may be in scenic areas (e.g., Lake Zug, a scenic lake in Switzerland). For example, 17,883 and 18,251 of cattle and pigs were around the Lake Zug, and 9.0% of the surround area was the urban area (Czekalski et al., 2015). The GDP per million people were significantly correlated with the geometric mean of ARGs and MRGs in lakes for each country via Pearson analysis at p < 0.05 level and p < 0.1 level, respectively (Fig. S8). GDP and population were also studied as human activities, which has also been found to outweigh the climatic contribution on the algal blooms in lake (Duan et al., 2009). These results indicated that human activities played an important role in shaping ARG distribution in lakes worldwide. Anthropogenic factors (including antibiotic concentration and socio-economic parameters) rather than environmental factors and bacterial community were the main driver on ARG structures in Chinese estuary (Zhu et al., 2017). Metagenomic analysis of fecal samples, human gut samples, pig gut samples, seawater samples and soil samples also confirmed that anthropogenic selection might be one of the main drivers shaping the ARG distribution (Zeng et al., 2019). Deep ocean sediment was considered as pristine sites, which had lower abundance and diversity of ARGs compared to estuary sediment (Chen et al., 2013). The coastal area has been experiencing the pressure of urbanization and threaten by the tourism and leisure activities. For the coastal data collected from Florida (USA) and Weihai (China), which are both affected by the urban area and tourism activities. The deep ocean and Antarctic waters were seldom Y. Yang, et al. Environment International 127 (2019) 371-380 affected by human activities. Antarctic is still regarded as one of the most pristine regions in earth, geographically isolated from other continents (Bhardwaj et al., 2018). ARG and MRG abundances in samples from coastal waters were greater than in deep ocean and Antarctic water samples. The Shannon indexes for ARGs and MRGs in the coastal water samples were also greater than in the deep ocean and Antarctic samples. These results suggest that human activities may exert more influence on ARGs and MRGs in coastal seas compared with deep ocean   and Antarctic water samples. There were high relative abundances of bacitracin resistance genes and multidrug resistance genes in lake water and seawater (Fig. 2). Similar results have been found using Roche 454 datasets, in that bacitracin resistance and multidrug resistance were the two main ARGs classes observed in human feces, soils and oceans (Nesme et al., 2014). Among bacitracin resistance genes, bacA was the main subtype detected in the lake water and seawater (Tables S3 and  S4) with very high relative abundance. bacA has been recently renamed as UppP, an abbreviation of undecaprenyl pyrophosphate phospha-tase21, which may be bacterial intrinsic because its homologues can be found in 153 genera (Antibiotic Resistance Genes Database) (Hu et al., 2013). Among the multidrug resistance genes, multidrug efflux pumps are ubiquitous in bacteria, which not only can effectively reduce the concentration of antibiotics, but also participate in other processes such as detoxification of metabolic intermediates, virulence and signal trafficking (Bao et al., 2016). Multidrug resistance genes and efflux pump mechanism were also observed as the most diverse and dominant of ARGs in a reservoir of China . Multi-metal resistance subtypes, copper resistance subtypes, and mercury resistance subtypes had high relative abundances in aquatic environments in our study. Multi-metal resistance subtypes, copper resistance subtypes, and mercury resistance subtypes were also the most abundant MRGs in both influent and effluent samples from wastewater treatment plants (Gupta et al., 2018). Besides, copper resistance genes were also one of the dominant MRG types in the plasmid metagenomes from wastewater treatment plants (A.D. Li et al., 2015). Mercury and copper have been widely used as antimicrobials in medicine and agriculture, which may be responsible for the wide occurrence and high abundance of mercury and copper resistance genes in the environment (Hobman and Crossman, 2015). Stratification and associated vertical oxygen concentrations have been found to be the main drivers of microbial community composition in lakes and reservoirs (Peura et al., 2012;Saarenheimo et al., 2016;H.H. Zhang et al., 2015). Bacterial composition is a principal driver of ARG composition in the environment (Yang et al., 2019b;Yang et al., 2018;Zheng et al., 2017;Zhou et al., 2017). We found that more ARG and MRG subtypes were observed the epilimnion than in the metalimnion or hypolimnion, although no difference was found in the Shannon index or abundance of ARGs and MRGs between epilimnion and metalimnion or hypolimnion (p > 0.05, Kruskal-Wallis one-way ANOVA). However, PCoA and cluster analysis both showed that most epilimnion and hypolimnion samples within the same country were usually clustered separately, based on the composition of ARGs and MRGs (Figs. S4 and S5), which were confirmed by the PERMANOVA analysis (p < 0.05). These results suggest that stratification and geographical region both influenced the composition of ARGs and MRGs. That is also to say, ARGs and MRGs in lakes both exhibited biogeographical patterns across countries, even on different continents (Figs. 5 and S4-S5). Geographical distributions of ARGs have also been investigated on larger scales, such as continental-wide studies of estuaries (Zhu et al., 2017) and 42 water bodies (natural lakes and reservoirs) across China . Antibiotic resistance varies geographically, which could be caused by different socioeconomic development, prescription behavior etc. (Qiao et al., 2018).
The co-occurrence of ARGs and MRGs is due to the co-selective pressure of metals, antibiotics and biocides in the environment (Luo et al., 2017;Pal et al., 2015;Pal et al., 2016). Non-random patterns of ARGs and MRGs were observed in both lakes and seas (Tables S7 and  S8). Metals are an important influence on the distribution of ARGs in lakes Yang et al., 2017b). Marine metal and petroleum hydrocarbon pollution were usually found to be at low levels (Peng, 2015;L. Zhang et al., 2015). An understanding of the long-term co-selective effect of low pollutant levels on ARGs and MRGs will help to understand the behavior of ARGs and MRGs in the marine environment.

Conclusions
Although more ARG subtypes were found in lakes than seas, no significant differences were found in the diversity Shannon index of ARGs between lakes and seas. The relative abundances of ARGs and MRGs were greater in lakes than in seas. Coastal sea waters had higher relative abundance and diversity of both ARGs and MRGs than the deep ocean and Antarctic seas. Stratification and geographical region had significant influence on the composition of ARGs in lakes.