Nested PCR Approach for petB Gene Metabarcoding of Marine Synechococcus Populations

ABSTRACT The molecular diversity of marine picocyanobacterial populations, an important component of phytoplankton communities, is better characterized using high-resolution marker genes than the 16S rRNA gene as they have greater sequence divergence to differentiate between closely related picocyanobacteria groups. Although specific ribosomal primers have been developed, another general disadvantage of bacterial ribosome-based diversity analyses is the variable number of rRNA gene copies. To overcome these issues, the single-copy petB gene, encoding the cytochrome b6 subunit of the cytochrome b6f complex, has been used as a high-resolution marker gene to characterize Synechococcus diversity. We have designed new primers targeting the petB gene and proposed a nested PCR method (termed Ong_2022) for metabarcoding of marine Synechococcus populations obtained by flow cytometry cell sorting. We evaluated the specificity and sensitivity of Ong_2022 against the standard amplification protocol (termed Mazard_2012) using filtered seawater samples. The Ong_2022 approach was also tested on flow cytometry-sorted Synechococcus populations. Samples (filtered and sorted) were obtained in the Southwest Pacific Ocean, from subtropical (ST) and subantarctic (SA) water masses. The two PCR approaches using filtered samples recovered the same dominant subclades, Ia, Ib, IVa, and IVb, with small differences in relative abundance across the distinct samples. For example, subclade IVa was dominant in ST samples with the Mazard_2012 approach, while the same samples processed with Ong_2022 showed similar contributions of subclades IVa and Ib to the total community. The Ong_2022 approach generally captured a higher genetic diversity of Synechococcus subcluster 5.1 than the Mazard_2012 approach while having a lower proportion of incorrectly assigned amplicon sequence variants (ASVs). All flow cytometry-sorted Synechococcus samples could be amplified only by our nested approach. The taxonomic diversity obtained with our primers on both sample types was in agreement with the clade distribution observed by previous studies that applied other marker genes or PCR-free metagenomic approaches under similar environmental conditions. IMPORTANCE The petB gene has been proposed as a high-resolution marker gene to access the diversity of marine Synechococcus populations. A systematic metabarcoding approach based on the petB gene would improve the characterization/assessment of the Synechococcus community structure in marine planktonic ecosystems. We have designed and tested specific primers to be applied in a nested PCR protocol (Ong_2022) for metabarcoding the petB gene. The Ong_2022 protocol can be applied to samples with low DNA content, such as those obtained by flow cytometry cell sorting, allowing the simultaneous assessment of the genetic diversity of Synechococcus populations and cellular properties and activities (e.g., nutrient cell ratios or carbon uptake rates). Our approach will allow future studies using flow cytometry to investigate the link between ecological traits and taxonomic diversity of marine Synechococcus.

One basis of this study is to present an alternative to 16S rRNA gene amplicon approaches by targeting genes that may provide better resolution between closely related Synechococcus spp. The authors outline the application of nested PCR that involves multiple sequential amplification steps with different primers to increase sensitivity to low concentrations of template (e.g., samples obtained from cell sorting).
One major point of concern is with respect to article type. It reads as a Methods comparison report and less so as a Research article. For example, the first section of Results is titled: Primer Design and PCR Amplification and the first three figures (out of four) are used to establish differences between protocols.
Along these same lines, I feel that this paper is lacking context on the ecological interpretation and comparisons of the subtropical and subantarctic sampling locations. In fact, there is sparse detail presented on these sampling locations in the main text and one must investigate the Supplementary material and Methods section to become familiarized. There is some discussion on the interpretation of how relative proportions of clades might represent different sampling locations, but no data to support these interpretations (e.g., L 371 and the discussion on clade II being indicative of warmer temperature, nutrient deplete environments).
My other major concern is that the authors justify the application of their method with respect to cell sorting because it allows for molecular diversity analyses to coincide with other measurements of cellular physiology such as "nutrient stoichiometric and isotopic composition and cellular activity such as glucose, phosphorus, nitrogen and CO2 uptake rates." However, none of these measurements were demonstrated within this study or related to the metabarcoding methodology being presented. This is a major overreach that extends beyond results supported by the data.
My primary recommendation is that the authors revise this paper to clearly communicate the goal of establishing a methodology or to include more detailed reporting on the diversity and ecological considerations being drawn between the various Synechococcus clades. If the authors wish to revise the paper to be more focused on methodology, then I think the readers would also benefit from more specific materials related to reproducible, step-by-step protocols.
Other major comments: Abstract: -first sentence "...cannot be accurately obtained using only 16S rRNA gene given its relatively low sequence divergence between closely related picocyanobacterial groups." -This is a strong statement for leading a paper off because it lacks any context of what the authors consider to be a benchmark of "molecular diversity" or what "accurately" obtaining this would be.
Paragraph from L 77: Nested PCR methods are an alternative to single cell genomic approaches. This paragraph could go further in explaining the comparative advantages and disadvantages of the two approaches rather than simply leaving it at cost savings.
Paragraph from L 102: This paragraph could be improved by outlining some of the risks and disadvantages of nested PCR (i.e., errors induced from double amplification that would manifest as overestimations of diversity at the ASV-level) in the context of how these risks are mitigated and why the pros outweigh the cons. Alternatively, this material could be more clearly articulated in the Discussion. L162: these "oceanographic patterns" should be cited to establish this line of reasoning L243-244: This conclusion is not justified because spurious amplification is not the only reason that an ASV will not match a given database. Please put this statement into the context and limitations of databases in general.
Minor comment: There are several mistakes in the language and grammar that can easily be fixed via revisions. Therefore, I will not go note every instance.
Staff Comments:

Preparing Revision Guidelines
To submit your modified manuscript, log onto the eJP submission site at https://spectrum.msubmit.net/cgi-bin/main.plex. Go to Author Tasks and click the appropriate manuscript title to begin the revision process. The information that you entered when you first submitted the paper will be displayed. Please update the information as necessary. Here are a few examples of required updates that authors must address: • Point-by-point responses to the issues raised by the reviewers in a file named "Response to Reviewers," NOT IN YOUR COVER LETTER. • Upload a compare copy of the manuscript (without figures) as a "Marked-Up Manuscript" file. • Each figure must be uploaded as a separate file, and any multipanel figures must be assembled into one file. For complete guidelines on revision requirements, please see the journal Submission and Review Process requirements at https://journals.asm.org/journal/Spectrum/submission-review-process. Submissions of a paper that does not conform to Microbiology Spectrum guidelines will delay acceptance of your manuscript. " Please return the manuscript within 60 days; if you cannot complete the modification within this time period, please contact me. If you do not wish to modify the manuscript and prefer to submit it to another journal, please notify me of your decision immediately so that the manuscript may be formally withdrawn from consideration by Microbiology Spectrum.
If your manuscript is accepted for publication, you will be contacted separately about payment when the proofs are issued; please follow the instructions in that e-mail. Arrangements for payment must be made before your article is published. For a complete list of Publication Fees, including supplemental material costs, please visit our website.
Corresponding authors may join or renew ASM membership to obtain discounts on publication fees. Need to upgrade your membership level? Please contact Customer Service at Service@asmusa.org.
Thank you for submitting your paper to Microbiology Spectrum.

Answers to referee comments
Dear editor and referee, We thank the editor and reviewer for their support of our work and their constructive remarks. Please find below our answers to your comments. Reviewer comments are in blue and our answers in black. We have also provided a file showing the differences between the two versions.

Editor Comments to Author
Please see the comments of reviewer below. As editor I would also add that more methodological detail should be added to a paper that describes a new method in order to foster further adoption.
We thank the editor for the suggestion. This manuscript details a nested PCR protocol of petB gene for illumina sequencing from Synechococcus populations. The protocol can be used with DNA templates generated by traditional filtration methods and flow cytometry sorted populations. To improve the methodology, we have created a step-by-step protocol explaining the DNA extraction protocol used and the PCR amplification from flow cytometry sorted cells which is available on DOI: 10.17504/protocols.io.5qpvorkj7v4o/v1 (Private link for reviewers: https://www.protocols.io/private/ 2134A97076D211EDB22C0A58A9FEAC02 to be removed before publication). The link can be found at the "Availability of data and materials" section (lines 536-537). A detailed pipeline and all source code for the processing raw Illumina sequences and data analysis are also available on GitHub (https: //github.com/deniseong/marine-Synechococcus-metaB) and provided in the section mentioned before.
A README.md file is also provided to help the reader to navigate on the GitHub repository. The manuscript therefore concentrates on the results obtained to validate our primers and protocols.

Reviewer Comments to Author:
Reviewer: 1 One basis of this study is to present an alternative to 16S rRNA gene amplicon approaches by targeting genes that may provide better resolution between closely related Synechococcus spp. The authors outline the application of nested PCR that involves multiple sequential amplification steps with different primers to increase sensitivity to low concentrations of template (e.g., samples obtained from cell sorting).
One major point of concern is with respect to article type. It reads as a Methods comparison report and less so as a Research article.
We agree with the reviewer, and thus we have submitted the manuscript as method and protocol type, and not research article. We are not sure why the reviewer missed this information in the system. Along these same lines, I feel that this paper is lacking context on the ecological interpretation and comparisons of the subtropical and subantarctic sampling locations. In fact, there is sparse detail presented on these sampling locations in the main text and one must investigate the Supplementary material and Methods section to become familiarized.
We have improved the description of the water masses sampled in the Materials and Methods to provide a better exploration of the environmental conditions in our results and the discussion.
Edited (lines 408-413): "The Chatham Rise is a dynamic region where northward-moving subtropical (ST) water masses mix with southward moving subantarctic (SA) water masses to form the Subtropical Convergence Zone (1, 2, 3, 4). ST waters are warm, saline and macronutrient-depleted while SA waters are cool, less saline and high-nitrate, low chlorophyll, low-silicate (HNLC-LSi) waters where low iron and silicate are the primary limiting factors for phytoplankton growth and productivity (5,6)." However, our objective is to provide the description of a PCR amplification protocol from flow cytometry sorted cells for the analysis of the genetic diversity of Synechococcus through high-throughput sequencing.
We then used a few samples, from contrasting oceanic conditions, to validate our method. Any further exploration or comparison of ecological significance will require a broad dataset than the one provided and a deep discussion into the oceanographic conditions in which the samples were obtained. We believe both points are not within the scope of this methodological manuscript.
There is some discussion on the interpretation of how relative proportions of clades might represent different sampling locations, but no data to support these interpretations (e.g., L 371 and the discussion on clade II being indicative of warmer temperature, nutrient deplete environments).
As mentioned before, for better contextualization of the environmental conditions in which our samples were obtained, we have improved the description of the water masses sampled.
My other major concern is that the authors justify the application of their method with respect to cell sorting because it allows for molecular diversity analyses to coincide with other measurements of cellular physiology such as "nutrient stoichiometric and isotopic composition and cellular activity such as glucose, phosphorus, nitrogen and CO2 uptake rates." However, none of these measurements were demonstrated within this study or related to the metabarcoding methodology being presented. This is a major overreach that extends beyond results supported by the data.
This manuscript details a nested PCR protocol of petB gene for metabarcoding of Synechococcus populations. The protocol can be used with DNA templates generated by traditional filtration methods and flow cytometry sorted populations. We have suggested that our protocol would be of interested to those applying flow cytometry for cell physiology measurements, such that these studies can also obtain the community composition simultaneously.

As pointed out in the introduction, the literature has several examples of how flow cytometric sorted
Synechococcus cells have provided important information about their metabolism (7,8,9,10,11,12,13,10,11,14,15,16). However, only few of these studies combined quantitative cell measurements with a fine taxonomic identification of the sorted populations by molecular methods. The current method for a fine taxonomic diversity of Synechococcus petB gene marker, is metagenome sequencing mining, which requires more DNA and is more expensive (17,18).
Our protocol provides an alternative to the scientific community. The scope of this methodological manuscript was to validate the nested PCR protocol and not to perform cell measurements.
My primary recommendation is that the authors revise this paper to clearly communicate the goal of establishing a methodology or to include more detailed reporting on the diversity and ecological considerations being drawn between the various Synechococcus clades. If the authors wish to revise the paper to be more focused on methodology, then I think the readers would also benefit from more specific materials related to reproducible, step-by-step protocols.
We thank the reviewer for this suggestion. We have created a step-by-step protocol for DNA extraction and PCR amplification, published on protocols.io at DOI: 10.17504/protocols.io.5qpvorkj7v4o/v1 (Private link for reviewers: https://www.protocols.io/private/2134A97076D211EDB22C0A58A9FEAC02 to be removed before publication) and added to the "Availability of data and materials" section (lines 536-537). A detailed pipeline and all source code for processing raw Illumina sequences and data analysis are available on GitHub (https://github.com/deniseong/marine-Synechococcus-metaB).

Other major comments:
Abstract: -first sentence "...cannot be accurately obtained using only 16S rRNA gene given its relatively low sequence divergence between closely related picocyanobacterial groups." -This is a strong statement for leading a paper off because it lacks any context of what the authors consider to be a benchmark of "molecular diversity" or what "accurately" obtaining this would be.
We have changed the statement to compare the use of high-resolution marker genes against 16S rRNA gene.
Original: "The molecular diversity of marine picocyanobacterial populations, an important component of phytoplankton communities, cannot be accurately obtained using only 16S rRNA gene given its relatively low sequence divergence between closely related picocyanobacteria groups." Edited (Lines 11-15): "The molecular diversity of marine picocyanobacterial populations, an important component of phytoplankton communities, is better characterised using high-resolution marker genes compared to the 16S rRNA gene as it has greater sequence divergence to differentiate between closely related picocyanobacteria groups." Paragraph from L 77: Nested PCR methods are an alternative to single cell genomic approaches. This paragraph could go further in explaining the comparative advantages and disadvantages of the two approaches rather than simply leaving it at cost savings.
We have expanded on the advantages and disadvantages of metabarcoding compared to whole genome sequencing in the same paragraph.
Original: "Metabarcoding is a suitable technique to assess the taxonomic diversity of flow cytometry sorted populations, including Synechococcus, because of its high sensitivity (e.g. trace concentrations of DNA can be PCR amplified and sequenced) and lower cost when compared with metagenome or whole genome sequencing." Edited (Lines 82-91): "Metabarcoding is a suitable technique to assess the taxonomic diversity of flow cytometry sorted populations with high sensitivity (e.g. trace concentrations of DNA can be PCR amplified and sequenced) and of many samples, including Synechococcus. Although alternative approaches of metagenome or whole genome sequencing could similarly obtain the taxonomic diversity without metabarcoding limitations of primer and amplification biases, these approaches require a much larger sequencing depth and sample processing time which would increase the cost and time needed (19). Whole genome sequencing would instead be more advantageous when applied to functional diversity and population genetics (20,21)." Paragraph from L 102: This paragraph could be improved by outlining some of the risks and disadvantages of nested PCR (i.e., errors induced from double amplification that would manifest as overestimations of diversity at the ASV-level) in the context of how these risks are mitigated and why the pros outweigh the cons. Alternatively, this material could be more clearly articulated in the Discussion.
We outlined the risks of nested PCR compared to standard PCR in the paragraph from line 321 in the discussion. Based on existing literature directly comparing nested PCR to standard PCR, there is no consensus on whether nested PCR would result in over or under-estimations of diversity, as previous two studies reported lower diversity in nested PCR approaches while this study obtained a higher diversity. To reduce the risks of misclassifying chimeras as ASVs and therefore over representing diversity, we compared the translated protein sequences of ASVs against the reference database. Based on this analysis, we concluded that our nested PCR improved the specificity compared to standard PCR instead, likely due to two separate sets of oligonucleotide primers to the template DNA.
L162: these "oceanographic patterns" should be cited to establish this line of reasoning The term "oceanographic patterns" in this sentence is used to highlight the observed pattern of community composition in this study across subtropical to subantarctic waters, and not the physical and chemical oceanographic patterns. We have changed the term to clarify this sentence.
Original: "Yet, oceanographic patterns such as the higher relative abundance of clade IVa in ST compared to SA waters was captured by both approaches." Edited (Lines 172-174): "Yet, patterns of community composition such as the higher relative abundance of clade IVa in ST compared to SA waters were captured by both approaches." L243-244: This conclusion is not justified because spurious amplification is not the only reason that an ASV will not match a given database. Please put this statement into the context and limitations of databases in general.
We agree with the reviewer and have corrected the sentence to include the limitations of databases.
Original: "More than half of these ASVs had no sequence match in the database, indicating that they were likely to have resulted from spurious amplification." Edited (Lines 254-257): "More than half of these ASVs had no sequence match in the database, indicating that they could have resulted from spurious amplification or the current database did not have a corresponding reference sequence."