MetaFunPrimer: an Environment-Specific, High-Throughput Primer Design Tool for Improved Quantification of Target Genes

ABSTRACT Genes belonging to the same functional group may include numerous and variable gene sequences, making characterizing and quantifying difficult. Therefore, high-throughput design tools are needed to simultaneously create primers for improved quantification of target genes. We developed MetaFunPrimer, a bioinformatic pipeline, to design primers for numerous genes of interest. This tool also enables gene target prioritization based on ranking the presence of genes in user-defined references, such as environment-specific metagenomes. Given inputs of protein and nucleotide sequences for gene targets of interest and an accompanying set of reference metagenomes or genomes, MetaFunPrimer generates primers for ranked genes of interest. To demonstrate the usage and benefits of MetaFunPrimer, a total of 78 primer pairs were designed to target observed ammonia monooxygenase subunit A (amoA) genes of ammonia-oxidizing bacteria (AOB) in 1,550 publicly available soil metagenomes. We demonstrate computationally that these amoA-AOB primers can cover 94% of the amoA-AOB genes observed in the 1,550 soil metagenomes compared with a 49% estimated coverage by previously published primers. Finally, we verified the utility of these primer sets in incubation experiments that used long-term nitrogen fertilized or unfertilized soils. High-throughput quantitative PCR (qPCR) results and statistical analyses showed significant differences in relative quantification patterns between the two soils, and subsequent absolute quantifications also confirmed that target genes enumerated by six selected primer pairs were significantly more abundant in the nitrogen-fertilized soils. This new tool gives microbial ecologists a new approach to assess functional gene abundance and related microbial community dynamics quickly and affordably. IMPORTANCE Amplification-based gene characterization allows for sensitive and specific quantification of functional genes. There is often a large diversity of genes represented for functional gene groups, and multiple primers may be necessary to target associated genes. Current primer design tools are limited to designing primers for only a few genes of interest. MetaFunPrimer allows for high-throughput primer design for various genes of interest and also allows for ranking gene targets by their presence and abundance in environmental data sets. Primers designed by this tool improve the characterization and quantification of functional genes in broad gene amplification platforms and can be powerful with high-throughput qPCR approaches.

Thank you for your recent submission to mSystems. Your manuscript has now been reviewed by two experts in the field and both recommend revisions to the text prior to reconsideration. Below you will find the comments of the reviewers, and I would particularly encourage you to attend to reviewer #1's comments on the focus of the manuscript and also the need to improve the supplemental material significantly. I also agree with reviewer #2 when they ask for additional application of this approach.
To submit your modified manuscript, log onto the eJP submission site at https://msystems.msubmit.net/cgi-bin/main.plex. If you cannot remember your password, click the "Can't remember your password?" link and follow the instructions on the screen. Go to Author Tasks and click the appropriate manuscript title to begin the resubmission process. The information that you entered when you first submitted the paper will be displayed. Please update the information as necessary. Provide (1) point-by-point responses to the issues raised by the reviewers as file type "Response to Reviewers," not in your cover letter, and (2) a PDF file that indicates the changes from the original submission (by highlighting or underlining the changes) as file type "Marked Up Manuscript -For Review Only." Due to the SARS-CoV-2 pandemic, our typical 60 day deadline for revisions will not be applied. I hope that you will be able to submit a revised manuscript soon, but want to reassure you that the journal will be flexible in terms of timing, particularly if experimental revisions are needed. When you are ready to resubmit, please know that our staff and Editors are working remotely and handling submissions without delay. If you do not wish to modify the manuscript and prefer to submit it to another journal, please notify me of your decision immediately so that the manuscript may be formally withdrawn from consideration by mSystems.
If your manuscript is accepted for publication, you will be contacted separately about payment when the proofs are issued; please follow the instructions in that e-mail. Arrangements for payment must be made before your article is published. For a complete list of Publicat ion Fees, including supplemental material costs, please visit our website.
1. There is nothing quantitative about the qPCR data presented yet the authors seem to sell their approach as a means to designing quantitative PCR primers. Admittedly one could use qPCR for each primer pair designed, however this is not what is presented in the example data. The authors used a Fluidigm qPCR system (one that I am familiar with) to test their 78 primer pairs against 30 soil samples from corn fields receiving different N-inputs. There are no standards run to calibrate the qPCR, although this would be difficult given 78 different primer sets. In addition some of the primers actually target the same related cluster of genes, so there would likely be cross amplification with some primers across target groups. Not to be entirely negative about the authors approach, I feel there is some benefit to doing what they did. It is just not appropriate to call it quantitative in the traditional way a reader would interpret this term. What the authors do is compare CT values across all DNA samples for all primer sets run. This creates a unique fingerprint for any sample that can be used to compare it to other samples. Also assuming that equivalent DNA masses are loaded into every sample slot, the relative CT values for primer pair qPCR reaction would give a relative difference in abundance between the samples for that particular target. This is semiquantitative at best.
2. I would not normally mention the quality of the Supplementary material as a major point of contention in a review, but I make an exception in this case. The supplemental tables, figures and methods are so poorly documented and organized that it almost made me not want to review this paper. If a reader wants to completely understand the example presented they have to access the Supplemental material. None of the material has a caption or title associated with it. Since it is supplemental there should be no reason that these Figures and Tables could have extra explanatory text affiliated with them. In addition the primer tables are confusing and could be better designed. For example the 78 primer pairs used should be listed with side by side F and R primers.
3. Although the authors do an excellent job demonstrating the successful design and application of primers targeting AOB in soils based on an assessment of 1550 metagenomes, I feel that things might get a little more complicated with other functional genes. It is clear from Figure S1 that the 60 gene clusters identified are for the most part not that distantly divergent and occur in mostly closely related taxa (i.e. narrow GC content range). This makes AOB amoA an ideal candidate for this approach because a relatively low number of primers could be designed (78) to cover >90% of the diversity in soils. Would the same be true if you looked at nirK, nirS, nrfA or nosZ? From my own experience with nrfA, which has 19 Clades delineated by 30% amino-acid sequence divergence, I would guess more than 78 primer pairs would be required to get >90% coverage. Even if we narrowed this down by ecosystem to seven or eight clades, I still think there would be more than 78 primer pairs. In addition the phylogenetic diversity of taxa containing nrfA genes is quite extraordinary, yielding GC contents ranging from <50% to 75%. This has an impact on PCR efficiency and potentially on how well the qPCR system will work. It would be nice if the authors would bring up some of these potential precautions or limits to how the method might be applied to other genes. Since we designed highly degenerate primers that theoretically have very good coverage for nrfA, it might be nice to convey why such degenerate primers might NOT be good for qPCR and that a multiplexed qPCR system approach is a better way to at least get some semiquantitative information from this approach.
4. I realize most of my comments are addressing the example provided and not the primary product of the program described. The reason I did this is because this example is critical to demonstrating how one might benefit from the use of this software. Therefore it is important to more completely document this example. I note that the authors state that they used the "Standard Methods" for qPCR defined by the Fluidigm system. I guarantee that Standard protocol for qPCR provided by Fluidigm did not have functional genes from soil in mind. For example, we did some work with the same platform and were required to preamplify the genes in the DNA samples before putting them on the microarray. Did you have to do this? These details must be included in this manuscript.
5. Ln 158-159: The authors state that there are 28 primer pairs generated in EcoFunPriimer and refer us to Table S3. Table S3 has more than 28 primers, so it is unclear what the authors are referring to here. 6. Ln 160: Seems to refer to the same 28 "degenerate" primer pairs apparently shown in Table S3, however there are no degenerate primers shown in this confusing table.
7.  Figure 4? More should be included in the text and in the Figure caption. Also typically ANOSIM has an R value associated with it. This should be included in Figure 4. 11. Ln 207-208: Not sure why "novel" is used here. These are simply new primers. Also I'm not sure how you are showing improved quantification. In case I missed something, please elaborate.

Reviewer #2 (Comments for the Author):
Lui et al., describes the tool for designing multiplex qPCR primer sets based on metagenomic data. MetaFunPrimer is basically based on collective sequences of the target gene from multiple sources: metagenome data and previously deposited reference sequences Database. It is suitable to design multiplex primers set. Overall pipeline is acceptable though some subjective parameters such as 96% aa similarity they used should be discussed thoroughly why the author chose these values. minor comments: 1. There is a certain portion of the false-positive signal in Ct values by qPCR. Validation of these primers set-amount of false-positive may be required. Amplicon sequencing by MiSeq with comparing sequences on some soil sample to validate ratio of false-positive in qPCR primets.
2. Even though authors extensively analyzed the amoA-AOB case, readers may still be perplexing which genes are applicablele to MetaFunPrimer pipeline. Please describe or explain criteria when they can use this pipeline. i.e.How many reference gene are needed? Or like microcystin-producing genes in hundreds of lake water samples you mentioned, describe more examples using MetaFunPrimer pipeline.
The line numbers in the reviewers' comments refer to the original manuscript, whereas those in our Response refer to the revised manuscript. Our Responses are in blue text.

Reviewer #1 (Comments for the Author):
This manuscript describes the development of a computational tool to design qPCR primers for functional gene analysis. The authors use AOB amoA as a representative functional gene to demonstrate their software. The overall functionality and usefulness of the MetaFunPrimer program is well presented. Although there is much value in having tools like this for microbial ecology studies, there are some issues with how the example data is presented. I will detail these below.

Response:
We really appreciate your thoughtful comments and suggestions. We admit that our presentation of example data was not very efficient. We hope that our efforts to improve the manuscript have clarified your concerns.
1. There is nothing quantitative about the qPCR data presented yet the authors seem to sell their approach as a means to designing quantitative PCR primers. Admittedly one could use qPCR for each primer pair designed, however this is not what is presented in the example data. The authors used a Fluidigm qPCR system (one that I am familiar with) to test their 78 primer pairs against 30 soil samples from corn fields receiving different N-inputs. There are no standards run to calibrate the qPCR, although this would be difficult given 78 different primer sets. In addition some of the primers actually target the same related cluster of genes, so there would likely be cross amplification with some primers across target groups. Not to be entirely negative about the authors approach, I feel there is some benefit to doing what they did. It is just not appropriate to call it quantitative in the traditional way a reader would interpret this term. What the authors do is compare CT values across all DNA samples for all primer sets run. This creates a unique fingerprint for any sample that can be used to compare it to other samples. Also assuming that equivalent DNA masses are loaded into every sample slot, the relative CT values for primer pair qPCR reaction would give a relative difference in abundance between the samples for that particular target. This is semiquantitative at best.

Response:
We do understand why our previous manuscript may be considered a 'semiquantitative' approach. We have added text to clarify the quantitative power of our approach. Additionally, we have used standards to provide absolute quantification in our example with a new set of experiments (Line 209-225, Line 329-387, Fig. 5, Fig. S2, Table S6). We have clarification on a few points detailed below. 1) We feel that we must first acknowledge that the present paper is mainly about computational tool, "MetaFunPrimer", which can produce the least number of primer pairs to cover given sequences. A natural and important complement to any computational tool is experimental validation, though this evidence does not always accompany computational tool publications. Initially, in the previous version of the manuscript, we were hesitant to showcase this tool with too specific a platform (e.g., high throughput qPCR) because we feel like it may diminish its perception of its broad applicability (e.g., for qPCR which has more democratized access). However, we considered the reviewer's suggestion and felt like it does strengthen this manuscript to include experimental validation of our primers with HT-qPCR. We have also added text to note that the usage of MetaFunPrimer is not limited to HT-qPCR (Line 276-280). 2) To demonstrate the quantitative nature that can be achieved with MetaFunPrimer and HT-qPCR, we chose 6 primer pairs and performed absolute quantification using standard DNA samples. The selection of these primers was based on first a screen of 78 primers, as users may also implement for selecting primers for absolute quantification. We have also added text to outline strategies for absolute and relative quantification within this example (Line 209-225, Line 329-387, Fig. 5, Fig. S2, Table S6). 3) As the reviewer mentioned, there generally can exist cross-amplification between primer pairs. Our bioinformatic pipeline minimizes cross-amplification. EcoFunPrimer, which is embedded in MetaFunPrimer, outputs multiple primer pairs for the same gene target in some cases. Next, MetaFunPrimer has a function to optimize and select the minimal set of primer pairs that can exclusively target the maximal diversity of functional genes of interest . Experimentally, it is possible that unintentional cross amplification may occur during the PCR process. The likelihood of this occurring is also related to the experimental conditions chosen (e.g., HT-qCPR vs standard qPCR, the number of probes, etc.). However, since this is part of the experimental process (e.g., qPCR optimization), we decided that this was outside the scope of this manuscript. Our addition of the experimental validation of the primers designed helps to demonstrate the value of the software, but we fully acknowledge that experimental optimization may be needed (Line 260-262, Line 265-266, Line 273-274, Line 300-301).
2. I would not normally mention the quality of the Supplementary material as a major point of contention in a review, but I make an exception in this case. The supplemental tables, figures and methods are so poorly documented and organized that it almost made me not want to review this paper. If a reader wants to completely understand the example presented they have to access the Supplemental material. None of the material has a caption or title associated with it. Since it is supplemental there should be no reason that these Figures and Tables could have extra explanatory text affiliated with them. In addition the primer tables are confusing and could be better designed. For example the 78 primer pairs used should be listed with side by side F and R primers.
Response: Inexcusably, our supplementary tables and figures were not well-organized in the previous manuscript. We apologize and have learned our lesson. We thank the reviewers for their patience and mentorship. We have revised all the tables and figures. The table containing primer information has been modified as suggested by the reviewer (Table S4).
3. Although the authors do an excellent job demonstrating the successful design and application of primers targeting AOB in soils based on an assessment of 1550 metagenomes, I feel that things might get a little more complicated with other functional genes. It is clear from Figure S1 that the 60 gene clusters identified are for the most part not that distantly divergent and occur in mostly closely related taxa (i.e. narrow GC content range). This makes AOB amoA an ideal candidate for this approach because a relatively low number of primers could be designed (78) to cover >90% of the diversity in soils. Would the same be true if you looked at nirK, nirS, nrfA or nosZ? From my own experience with nrfA, which has 19 Clades delineated by 30% amino-acid sequence divergence, I would guess more than 78 primer pairs would be required to get >90% coverage. Even if we narrowed this down by ecosystem to seven or eight clades, I still think there would be more than 78 primer pairs. In addition the phylogenetic diversity of taxa containing nrfA genes is quite extraordinary, yielding GC contents ranging from <50% to 75%. This has an impact on PCR efficiency and potentially on how well the qPCR system will work. It would be nice if the authors would bring up some of these potential precautions or limits to how the method might be applied to other genes. Since we designed highly degenerate primers that theoretically have very good coverage for nrfA, it might be nice to convey why such degenerate primers might NOT be good for qPCR and that a multiplexed qPCR system approach is a better way to at least get some semiquantitative information from this approach.
Response: Thank you for your comments. We acknowledge that depending on the functional gene targets of interest, challenges for primer design will vary. This study was based on amoA-AOB because of its impacts associated with nitrogen turnover in managed agroecosystems, for which we developed this tool. The reviewer is correct, in our experience, if primer pairs are designed for other nitrogen cycle genes, some of the genes (e.g. narG, napA, etc.) are too diverse to design a reasonable number of primer pairs to cover even >50% of reference sequences. We mentioned it in Line 262-265. Following the reviewer's suggestion, we have added to text to discuss applications of these tools and user choices to accommodate varying needs (Line 163-165, Line 260-262, Line 265-266). We have also mentioned that universal primers or degenerate primers can be limited due to low resolution in some cases and may lead to the loss of the ability to identify specific bacterial species or strains (Line 90-96). Additionally, we have highlighted that MetaFunPrimer can also generate degenerate primers (Line 184-186, Line 318-320). An option is available for users to choose how much degeneracy can be allowed in a primer. In the text, we focused mainly on the features of MetaFunPrimer, with some comments on potential application to available platforms. Thus, we wrote mainly about the capabilities and advantages/disadvantages of degenerate primers but specifically did not comment on specific platforms.
4. I realize most of my comments are addressing the example provided and not the primary product of the program described. The reason I did this is because this example is critical to demonstrating how one might benefit from the use of this software. Therefore it is important to more completely document this example. I note that the authors state that they used the "Standard Methods" for qPCR defined by the Fluidigm system. I guarantee that Standard protocol for qPCR provided by Fluidigm did not have functional genes from soil in mind. For example, we did some work with the same platform and were required to preamplify the genes in the DNA samples before putting them on the microarray. Did you have to do this? These details must be included in this manuscript.

Response:
We hope that we have struck a balance between describing MetaFunPrimer and including an experimental validation of its application. We agree that the methods we used need to be more thoroughly documented. Preamplification can be a reasonable way to deal with environmental samples, but we tested our primers and soil samples in various ways not to include the preamplification step. Our concern was that this step could also be another alteration to the samples. We found that a three-step PCR process for 10-fold diluted soil DNA samples had good amplifications. Please see the results in Fig. 4, Fig. 5, Fig. S2, and Table S7. The details of these methods have also been added to the manuscript (Line 329-387, Material S1).
5. Ln 158-159: The authors state that there are 28 primer pairs generated in EcoFunPrimer and refer us to Table S3. Table S3 has more than 28 primers, so it is unclear what the authors are referring to here. Your manuscript has been accepted, and I am forwarding it to the ASM Journals Department for publication. For your reference, ASM Journals' address is given below. Before it can be scheduled for publication, your manuscript will be checked by the mSystems senior production editor, Ellie Ghatineh, to make sure that all elements meet the technical requirements for publication. She will contact you if anything needs to be revised before copyediting and production can begin.

Response:
Otherwise, you will be notified when your proofs are ready to be viewed.
As an open-access publication, mSystems receives no financial support from paid subscriptions and depends on authors' prompt payment of publication fees as soon as their articles are accepted. =

Publicat ion Fees:
You will be contacted separately about payment when the proofs are issued; please follow the instructions in that e-mail. Arrangements for payment must be made before your article is published. For a complete list of Publicat ion Fees, including supplemental material costs, please visit our website.
Corresponding authors may join or renew ASM membership to obtain discounts on publication fees. Need to upgrade your membership level? Please contact Customer Service at Service@asmusa.org.
For mSyst ems research art icles, you are welcome to submit a short author video for your recently accepted paper. Videos are normally 1 minute long and are a great opportunity for junior authors to get greater exposure. Importantly, this video will not hold up the publication of your paper, and you can submit it at any time.
Details of the video are: · Minimum resolution of 1280 x 720 · .mov or .mp4. video format · Provide video in the highest quality possible, but do not exceed 1080p · Provide a still/profile picture that is 640 (w) x 720 (h) max · Provide the script that was used We recognize that the video files can become quite large, and so to avoid quality loss ASM suggests sending the video file via https://www.wetransfer.com/. When you have a final version of the video and the still ready to share, please send it to Ellie Ghatineh at eghatineh@asmusa.org.