Host-associated microbe PCR (hamPCR) enables convenient measurement of both microbial load and community composition

The ratio of microbial population size relative to the amount of host tissue, or ‘microbial load’, is a fundamental metric of colonization and infection, but it cannot be directly deduced from microbial amplicon data such as 16S rRNA gene counts. Because existing methods to determine load, such as serial dilution plating, quantitative PCR, and whole metagenome sequencing add substantial cost and/or experimental burden, they are only rarely paired with amplicon sequencing. We introduce host-associated microbe PCR (hamPCR), a robust strategy to both quantify microbial load and describe interkingdom microbial community composition in a single amplicon library. We demonstrate its accuracy across multiple study systems, including nematodes and major crops, and further present a cost-saving technique to reduce host overrepresentation in the library prior to sequencing. Because hamPCR provides an accessible experimental solution to the well-known limitations and statistical challenges of compositional data, it has far-reaching potential in culture-independent microbiology.


Introduction
Knowing the relative abundance of individual taxa reveals important information about any ecological community, including microbial communities. An expedient means of learning their composition in a sample is to sequence and count a defined number of 16S or 18S rRNA genes (hereafter rDNA), the internal transcribed spacer (ITS) of rRNA arrays, or other amplicons that distinguish microbial species in a sample. However, these common amplicon counting-by-sequencing methods do not provide information on the density or load of the microbes. Critically, such microbial sequence counts lack a denominator accounting for the amount of the habitat sampled, and thus, sparsely-colonized and densely-colonized samples become indistinguishable, despite most study systems being open systems in which the total number of microbial cells can vary over many orders of magnitude. Another limitation of such compositional data is that because the sum of all microbes is constrained, an increase in the abundance of one microbe reduces the relative abundance of all other microbes, creating misleading interpretations in the absence of appropriate statistical methods (Barlow et al., 2020;Gloor et al., 2017;Morton et al., 2019;Tsilimigras and Fodor, 2016). Experimental determination of the microbial load, for example by relating microbial abundance to sample volume, mass, or surface area, has led to important insights in microbiome research that otherwise would have been missed with relative abundance data (Humphrey and Whiteman, 2020;Niu et al., 2017;Props et al., 2017;Regalado et al., 2020;Smets et al., 2016;Stämmler et al., 2016;Tkacz et al., 2018;Vandeputte et al., 2017).
For many host-associated microbiome samples, in particular those from plants , nematodes (Ogier et al., 2020), insects (Ellegaard et al., 2020;Gendrin et al., 2015;Parker et al., 2020), and other organisms in which it is difficult or impossible to physically separate microbes from host tissues, a thorough DNA extraction yields both host and microbial DNA. For such samples, the amount of DNA from host and microbe is directly proportional to the number of cells sampled (Davies, 1977;Massonnet et al., 2011), and therefore the ratio of microbial DNA to host DNA is an intrinsic measure of the microbial load of the sample (Humphrey and Whiteman, 2020;Karasov et al., 2019;Karasov et al., 2018;Lebeis et al., 2015;Regalado et al., 2020).
Researchers have attempted to exploit this property and use the host rDNA amplified as a byproduct of microbial rDNA to calculate microbial load (Humphrey and Whiteman, 2020;Lebeis et al., 2015), but because host nuclear ribosomal arrays may have hundreds or thousands of copies (Rabanal et al., 2017), and organellar DNA is also overabundant, these methods are inefficient and require noisy interventions to increase the microbial signal. Sufficiently deep whole metagenome sequencing (WMS) also can in principle describe the microbial community composition and measure the microbial load, but is rarely practical because of a similar overrepresentation of host DNA (Karasov et al., 2019;Regalado et al., 2020). For example, WMS of a leaf extract from wild Arabidopsis thaliana typically yields >95% plant DNA and <5% microbial DNA. Furthermore, many WMS reads remain unclassifiable and thus unquantifiable in complex samples (Karasov et al., 2019;Regalado et al., 2020).
Most commonly, researchers combine amplicon sequencing with an additional orthogonal method. These include supplementary shallow WMS , quantitative PCR (qPCR) or digital PCR of host and/or microbial genes (Anderson and McDowell, 2015;Barlow et al., 2020;Ellegaard et al., 2020;Guo et al., 2020;Jian et al., 2020;Karasov et al., 2019;Nadkarni et al., 2002), adding sequenceable 'spike-ins' calibrated based on sample volume (Lin et al., 2019), mass (Stämmler et al., 2016), or qPCR-determined host DNA content (Guo et al., 2020), counting colony-forming units (CFU) (Chen et al., 2020;Niu et al., 2017), and flow cytometry (Props et al., 2017;Vandeputte et al., 2017). The multitude of methods and publications hints at the enduring nature of this problem. While combining amplicon sequencing with any of these other approaches improves data, it requires more work, consumes more sample material, and introduces technical caveats, such as a reliance on accurately pipetting small quantities.
Here, we introduce host-associated microbe PCR or 'hamPCR', a robust and accurate single-reaction method to co-amplify a low-copy host gene and one or more microbial regions, such as 16S rDNA. We accomplish this with a two-step PCR protocol (Carlson et al., 2013;Gohl et al., 2016;Lundberg et al., 2013;Symeonidi et al., 2020;Wen and Zhang, 2012). In hamPCR, gene-specific primer pairs bind to the 'raw' templates in a first short step, which is run for only two cycles to limit propagating amplification biases related to primer annealing and primer availability. In the second exponential step, a single set of primers with complementarity to the universal overhangs add barcodes and sequencing adapters. Such co-amplification of diverse fragments is used in many RNA-seq and WMS protocols (Kukurba and Montgomery, 2015;Quince et al., 2017). Notably, Carlson et al., 2013 similarly used a two-step PCR including a multiplexed first step of five to seven cycles to sequence and quantify both variable and joining segments at human T and B cell receptor loci, providing strong proof-of-concept for our method applied to the microbiome.
We designed our host and microbe amplicons to have slightly different lengths, such that they can be resolved by electrophoresis for quality control. We further show that after pooling finished sequencing libraries, the amplicons can be separately purified and re-mixed at any favorable ratio prior for sequencing (for example, with host DNA representing an affordable 5-10%), and sequence counts can be accurately scaled back to original levels in-silico. Thus, in stark contrast to WMS, samples with initially unfavorable host-to-microbe ratios can be easily adjusted prior to sequencing without loss of information. Because of the practical simplicity and flexibility of hamPCR, it has the potential to supplant traditional microbial amplicon sequencing in host-associated microbiomes.

Results hamPCR generates quantitative sequencing-based microbial load
The first two-cycle host-and-microbe template tagging step ('HM-tagging') of hamPCR multiplexes two or more primer pairs in the same reaction, at least one of which targets a single-or low-copy host gene (Appendix 1-figure 1). The HM-tagging primers are then cleaned with Solid Phase Reversible Immobilization (SPRI) magnetic beads (Rohland and Reich, 2012; Appendix 1-figure 2). Next, an exponential PCR of 20-30 cycles is performed using universal barcoded primers ( Figure 1a, Appendix 1-figure 1). As a host amplicon in A. thaliana samples, we targeted a   thaliana phyllospheres was used as a template for both V4 16S rDNA PCR (left, 515F and 799R) and hamPCR (right, V4 16S rDNA and GI 502 bp primers). Four replicates were produced with two cycles of the HM-tagging reaction and 30 cycles of PCR, and additional replicates with 3 to 10 HM-tagging cycles paired with 29 to 22 PCR cycles (for a constant total of 32 cycles). The stacked columns show the relative abundances of major ASVs. Boxed upper case letters demarcate groups of samples compared below. (b) Correlation of fourth-root transformed ASV abundances for the 16S rDNA samples above panel (a) box [A] to the 16S rDNA samples above box [B]. Only ASVs with a minimum relative abundance of 0.05% were compared. R 2 , coefficient of determination. fragment of the GIGANTEA (GI) gene, which is well conserved and present as a single copy in A. thaliana and many other plant species (Duarte et al., 2010). As microbial amplicons, we initially targeted widely used regions of 16S rDNA. Further considerations for primer design are discussed in Appendix 1.
To assess the technical reproducibility of the protocol, we made a titration panel of artificial samples combining varying amounts of pure A. thaliana plant DNA with pure bacterial DNA that reflects a simple synthetic community (Materials and methods). These represented a realistic range of bacterial concentrations as previously observed from WMS of wild leaves, ranging from about 0.25% to 24% bacterial DNA . All DNA preps employed heavy bead beating to ensure thorough lysis of both host and microbes, as an incomplete DNA extraction can lead to underrepresentation of hard-to-lyse cells (Albertsen et al., 2015;Yuan et al., 2012). We applied hamPCR to the panel, pairing each of three commonly-used 16S rDNA amplicons for the V4, V3V4, and V5V6V7 variable regions with either a 502 bp or 466 bp GI amplicon (Materials and methods, Supplementary file 1), such that the host and microbial amplicons differed by approximately 80 bp in length and were resolvable by gel electrophoresis. In all pairings, the GI band intensity increased as the 16S rDNA band intensity decreased (Figure 1b, Figure 1-figure supplement 1).
Focusing on the V4 16S rDNA primer set, 515F -799R, paired with the 502 bp GI amplicon, we amplified the entire titration panel in four independently-mixed technical replicates. Although plant organelle sequences can be removed bioinformatically, we attempted to block their amplification as much as possible. In addition to use of the chloroplast-avoiding 799R primer (Chelius and Triplett, 2001), plant organelle-blocking peptide nucleic acids (PNAs) (Lundberg et al., 2013) further prevented unwanted 16S rDNA signal from organelles in the pure plant sample (Figure 1b). We note that although these PNAs are widely used and extremely effective, they do not work for all plant hosts (Fitzpatrick et al., 2018) and they can interfere with analysis of certain bacteria present in some environments (Jackrel et al., 2017). We pooled the replicates and sequenced them as part of a paired-end HiSeq 3000 lane. Because the 150 bp forward and reverse reads were not long enough to assemble into full amplicons, we analyzed only the forward reads (Materias and methods), processing the sequences into Amplicon Sequence Variants (ASVs) and making a count table of individual ASVs using Usearch (Edgar, 2010).
After identifying the ASVs corresponding to host GI and the bacteria in the synthetic community, we plotted the relative abundance of A. thaliana GI, the three Sphingomonas ASVs, and the single E. coli ASV across the samples of the titration panel ( Figure 1c). There was high consistency between the four replicates, more than what was visually apparent in the gel (Figure 1c, Figure 1-figure supplement 1). We next divided ASV abundances in each sample by the abundance of the host ASV in that sample to give the quantity of microbes per unit of host, a measure of the microbial load. Plotting the data with a fourth-root transformed Y axis for better visualization of low bacterial loads, we observed consistent and accurate quantification of absolute microbial abundance from 0 up to about 16% total bacterial DNA ( Figure 1d). Through this range, the actual sequence counts for total bacteria matched theoretical expectations based on the volumes pipetted to make the titration (solid black line, Figure 1d). At 16% bacterial DNA, bacteria contributed more than 96% of sequences, and the microbe-to-host template ratio was near 25. At higher microbial loads the trend was still apparent, and the decrease in precision was likely exacerbated by the effects of small numbers; when the host ASV abundance is used as a denominator and the abundance approaches 0, load approaches infinity and sampling error has a greater and greater influence on the quotient. Eventually, this creates unacceptable uncertainty. We defined a 'noise factor' N as the full range in microbial load quotients that would result from adding a single host count and subtracting a microbial count from a sample ([microbe counts + 1] / [host counts -1]) or vice versa ([microbe counts -1] / [host counts + 1]). N increases as microbial load increases, but this is overcome with increased total sequencing depth. We determined conservatively that samples for which N > 0.22 should only be classified as 'highly colonized', and should not be used for quantitative measurements (Appendix 1- figure 3). In our case, only a minority of highly infected hosts reached bacterial abundances above the highly quantitative range.  Figure 3 continued on next page hamPCR does not distort the detected composition of the microbial community We amplified products from a wild A. thaliana phyllosphere template DNA preparation , either with four technical replicates using V4 16S rDNA primers alone, or alternatively with four technical replicates using hamPCR. After sequencing and deriving ASVs, we first compared ASV abundances within identically-prepared replicates of the pure 16S rDNA protocol to demonstrate best-case technical reproducibility of this established technique. As expected, this resulted in a nearly perfect correlation, with a coefficient of determination R 2 of 0.99 and abundance distributions that were indistinguishable by a Kolmogorov-Smirnov test ( Figure 2b). Next, we removed the ASV corresponding to A. thaliana GI from the hamPCR data and rescaled the remaining microbial ASVs to 100% to give relative abundance data. We then compared microbial ASVs from the four pure 16S rDNA replicates to those from the four rescaled hamPCR replicates. In this comparison as well, R 2 was 0.99 and the distributions were essentially identical ( Figure 2c). Thus, the inclusion of a host amplicon in the reaction did not introduce taxonomic biases.

Sensitivity to number of HM-tagging cycles and template concentration
Two HM-tagging cycles minimize amplification biases that might otherwise have compounding effects due to differential primer efficiencies for the host and microbial templates. However, for templates at borderline low concentrations, inefficiencies due to SPRI cleanup could represent a bottleneck in amplification. Additionally, some techniques that prevent off-target organelle amplification (Agler et al., 2016b;Song and Xie, 2020) may benefit from additional HM-tagging cycles. To investigate the sensitivity of the results to additional HM-tagging cycles, we applied hamPCR for 2 through 10 HM-tagging cycles, both on the wild A. thaliana phyllosphere DNA described above and on a synthetic plasmid-borne template that contains bacterial rDNA and a partial A. thaliana GI gene template in cis in a 1:1 ratio (Appendix 1 -Discussion 2). Surprisingly, for the primers used here, there was no apparent influence of additional HM-tagging cycles, as 7-10 HM-tagging cycles yielded the same distribution of host and 16S rDNA ASV abundances as two cycles (Kolmogorov-Smirnov test, p > 0.47). This was true for hamPCR and for 16S rDNA primers alone (Figure 2d and e). This ideal result may not be achievable for all primer pairs and should be tested experimentally, but it is consistent with data that either 5 or 7 HM-tagging cycles gave comparable results for quantifying the human immune receptor repertoire (Carlson et al., 2013), and with the fact that properly designed multiplex reactions can be used in qPCR that is carried out with many cycles (Vet et al., 1999). We noticed that application of hamPCR to the 1:1 synthetic template yielded an average of 56.5% host GI and 43.5% bacteria, invariant with HM-tagging cycle number ( As a further exploration of the robustness of the protocol, we applied hamPCR to a range of total A. thaliana leaf template concentrations of between 5 and 500 ng total DNA per reaction, covering a typical template range of 5-100 ng. Through the typical range, there was no difference in microbe or host ASV abundances. At 200 ng or above, the host amplicon seemed to be slightly favored, possibly because the 16S rDNA primers started to become limiting at these concentrations (Appendix 1- figure 5).
Pre-sequencing adjustment of host-to-microbe ratio Some host DNA must be present so that microbial load can be calculated, but sequencing too much host DNA would add unnecessary expense. We realized that the size difference between host and microbe bands in hamPCR affords not only independent visualization of both amplicons on a single  gel, but also allows convenient and easy adjustment of the host and microbial signals in the pooled library prior to sequencing, in order to improve cost effectiveness. We developed a strategy by which the final hamPCR amplicons are pooled and one aliquot of the pool is rebarcoded to form a reference sample that preserves the original host-to-microbe ratio. The remainder of the pool is run on a gel and the host and microbial bands are separately purified, quantified, and remixed (e.g. to reduce host and gain more microbial resolution). The rebarcoded reference sample, which was not remixed and thereby preserves the original ratio, can be sequenced separately or spiked into the remixed library prior to sequencing. Following sequencing, the reference sample provides the key to the correct host and microbe proportions, allowing simple scaling of the entire library back to original levels ( Figure 3a To demonstrate this concept, we prepared four replicate reference samples for our HiSeq 3000 run, which included much of the data from Figure 1 and Figure 2, and then separately purified the host and microbial fractions of the library (Figure 3-figure supplements 1 and 2). Based on estimated amplicon molarities of the host and microbial fractions, we remixed them, aiming for a sufficient but cost-efficient amount of 5% host DNA, added the reference samples, and sequenced the final mix as part of a new HiSeq 3000 lane. A stacked-column plot of relative abundances for all samples on the original run clearly showed the host A. thaliana GI ASV highly abundant in some samples, on average responsible for about 22% of total sequences in the run (Figure 3b and e). The remixed reduced host library had nearly 10-fold less total host GI ASV, 2.6%, slightly lower than our target of 5% (Figure 3c and e). The reference samples averaged 19.2% of host GI ASV, very close to the 22% host fraction in the original library. After using the reference samples to reconstruct the original host abundance in the remixed dataset, we recreated the shape of the stacked column plot from the original library (compare Figure 3b-d). When the fourth-root abundances for ASVs above a 0.05% threshold were compared between the original and reconstructed libraries, the R 2 coefficient of determination was 0.99, with no significant difference between the distributions (Kolmogorov-Smirnov test, p > 0.86). Thus, if hamPCR libraries have already been prepared and the host amplicon is overabundant, the host representation in the pooled libraries can be easily and accurately reduced using basic agarose gel technologies to enable more efficient sequencing.
hamPCR with different 16S rDNA regions compared to whole metagenome sequencing We next applied hamPCR to leaf DNA from eight wild A. thaliana plants that we had previously analyzed by WMS, and from which we therefore had an accurate estimate of the microbial load as the number of microbial reads divided by the number of plant chromosomal reads . We applied hamPCR with primer combinations targeting the host GI gene and either the V3V4, V4, or V5V6V7 variable regions of the 16S rDNA. We produced three independent replicates for each primer set, which we averaged for final analysis ( Figure 4, Figure 4-figure supplement 1). Across WMS and the three hamPCR amplicon combinations, the relative abundance of bacterial families was consistent (Figure 4a-d: i), with slight deviations likely due to the different taxonomic classification pipeline used for the metagenome reads , as well as known biases resulting from amplification or classification of different 16S rDNA variable regions (Graspeuntner et al., 2018;Thijs et al., 2017). After converting both WMS and hamPCR bacterial reads to load by dividing by the plant read count in each sample, we recovered a similar pattern despite the quantification method, with decreasingly lower total loads progressing from plant S1 to plant S8, and individual bacterial family loads showing similar patterns (Figure 4a-d: ii, iii). Relative differences in load estimates when comparing the different hamPCR amplicons are likely in part due to different affinities of the 16S rDNA primer pairs for their targets in different bacterial species, and rDNA copy number variation among the microbial families (Kembel et al., 2012). To quantify the consistency of hamPCR load estimates with WGS load estimates, we plotted the loads against each other and found strong positive correlations, with hamPCR using V4 rDNA having the highest correlation to WGS (Figure 4e-g).
It is important to note that while relative load ratios between samples were consistent across hamPCR primer sets, the total microbe-to-host ratio varied substantially, with the maximum V5V6V7 16S rDNA total load at less than three times host, and the maximum V4 16S rDNA total load near 16  times host. This is likely due to variation in GI and 16S rDNA primer efficiencies. To make a statement about the ratio of plant cells to bacterial cells using hamPCR, it would be important to include standard samples with known bacterial load ratios, and to normalize each bacterial taxon by its average rDNA copy number.

Three-amplicon hamPCR for simultaneous determination of oomycete and bacterial load
One disadvantage of 16S rDNA primers is that they readily amplify sequences from bacterial species, but their targets are absent from other microbes such as fungi, oomycetes, and archaea; other primer sets must therefore be used to detect these other groups. We attempted to overcome this limitation by setting up hamPCR not only with 16S rDNA and host GI sequences as described above, but also adding a third primer pair targeting oomycetes, which include important pathogens of A. thaliana. We first tested universal ITSo primers broadly targeting oomycete internal transcribed spacer rDNA (Materials and methods) in combination with A. thaliana GI primers and 16S rDNA primers targeting the bacterial V4, V3V4, or V5V6V7 regions, using as template our synthetic plasmid that includes templates for the three primer sets in equal proportion (Materials and methods, Appendix 1 -Discussion 2). A combination of all three amplicons seemed to work efficiently for the V4 region ( Figure 5-figure supplement 1), and with this encouraging result, we set up a simple infection experiment. As pathogens, we prepared local strain 466-1 of the obligate biotrophic oomycete Hyaloperonospora arabidopsidis (Hpa) (Coates and Beynon, 2010) and the welldescribed bacterial pathogen Pseudomonas syringae pv. tomato (Pst) DC3000 (Xin and He, 2013). We used two A. thaliana genotypes: the reference accession Col-0, which is resistant to Hpa 466-1 but susceptible to Pst DC3000, and an enhanced disease susceptibility 1 (eds1-1) mutant, which has a well-studied defect in a lipase-like protein necessary for many disease resistance responses and which is susceptible to both pathogens (Bhandari et al., 2019).
We infected seedlings with either Hpa 466-1 alone, a mix of Hpa 466-1 and Pst DC3000, or a buffer control, and maintained them for 7 days under cool, humid conditions ideal for Hpa growth (Materials and methods). The eds1-1 plants inoculated with Hpa 466-1 became heavily infected and sporangiophores were too numerous to count. No visible bacterial disease symptoms were present on any of the plants, likely because the cool temperature decelerated bacterial growth and symptom appearance (Huot et al., 2017). We ground pools of four to five seedlings in a buffer and used a small aliquot to count Pst DC3000 CFUs, and the remainder of the lysate for DNA isolation and hamPCR. Despite the lack of bacterial symptoms, we recovered Pst DC3000 CFUs from the inoculated plants.
We applied hamPCR to these samples using the ITSo/16S/GI primer set, but due to excessive ITSo product, we repeated library construction replacing the ITSo primers with primers for a single copy Hpa actin gene (Figure 5-figure supplement 1; Anderson and McDowell, 2015). Intensity of the actin product correlated with visual Hpa symptoms (Figure 5-figure supplement 1). Sequencing the libraries confirmed Hpa and Pst ASVs in the inoculated samples, as expected. A standard bacterial relative abundance plot, as would be obtained from pure 16S rDNA data, confirmed the presence of Pst DC3000 in the bacteria-infected samples, and in addition revealed that Hpa-infected samples had a different bacterial community than uninfected samples ( Figure 5a). Importantly, it failed to detect obvious differences between microbial communities on Col-0 and eds1-1 plants. However, after including the actin ASV from Hpa and converting all abundances to microbial load, a striking difference became apparent between Col-0 and eds1-1, with eds1-1 supporting higher bacterial and Hpa abundances ( Figure 5b). This is expected from existing knowledge (Bhandari et al., 2019), and supported by Pst DC3000 CFU counts from the same plants ( Figure 5c). The microbial load plot also revealed that Hpa-challenged plants supported more bacteria than buffer-treated plants, indicating either that successful bacterial colonizers were unintentionally co-inoculated with Hpa, or that Hpa caused changes in the native flora ( Figure 5b). We noted one outlier sample (red arrows, Figure 5b and c) with especially high microbial load. This sample was also an outlier by CFU counting, and because hamPCR fell within the quantitative range defined by host abundance and sequencing depth (Appendix 1-figure 3), we conclude that this outlier was likely not due to limitations of hamPCR.
To confirm that the abundances for all three amplicons accurately reflected the concentration of their original templates, we prepared a stepwise titration panel with real samples, mixing increasing amounts of DNA from an uninfected eds1-1 plant (low load) into decreasing amounts of DNA from an Hpa-infected eds1-1 plant (high load). Sequencing triplicate hamPCR libraries revealed a stepwise increase in ASV levels for all amplicons, consistent with the expectation based on pipetting ( Figure 5d). These data, combined with the infection experiment, show that hamPCR is quantitative for at least two independent microbial amplicons in real-world samples.

Utility in diverse hosts and crops with large genomes
To demonstrate the utility of hamPCR outside of plants, we prepared samples from the nematode worm Pristionchus pacificus. Small hosts like nematodes where the entire individual is typically lysed during DNA preparation are ideal for hamPCR; the choice of this nematode was due to the convenience of a lab specializing in studies of this species in our institute (Sommer, 2006). P. pacificus was fed on a diet of either pure E. coli OP50, or alternatively a mix of E. coli OP50 with Pst DC3000 and Lysinibacillus xylanilyticus. The worms were washed extensively with PBS buffer to remove epidermally-attached bacteria, enriching the worms for gut-associated bacteria, and we prepared DNA from each sample. In the same manner as described in the previous section, we titrated the two DNA samples into each other to create a panel of samples representing a continuous range of colonization at biologically relevant levels. Over three replicates, hamPCR accurately captured the changing bacterial loads of the gut microbes ( We similarly validated the technique for fungal and bacterial microbes of Triticum aestivum (bread wheat), because it is the most widely grown crop in the world yet one of the most difficult to study due to being hexaploid and having a 16 Gb haploid genome (Appels et al., 2018). To simulate different levels of infection, we titrated DNA from axenically-grown wheat leaves into DNA from wheat roots that had been cultivated in non-sterile soil and applied hamPCR, using as a host gene RNA polymerase A1 (PolA1), which is present as a single copy in each of the A, B, and D subgenomes (Rai et al., 2012). We recovered expected abundance patterns in the panel both for ITS1 rDNA primers ( . We noticed the original ratio of ITS1 to PolA1 sequences recovered was low; because ITS primers produce amplicons that are highly variable in length, some of which may co-migrate with the host amplicon on a gel, the cut-and-mix approach described in Figure 3 could not be used to improve ITS1 representation. However, increasing the ratio of the ITS1:PolA1 HM-tagging primers from 1:1 to 2:1 (Materials and methods) successfully enriched the ITS1 amplicon without sacrificing relative load determination between samples ( Figure 5-figure supplements 3  and 4).
To go beyond model organisms and controlled titrations and to demonstrate the ability of hamPCR to yield biological insights into complex study systems, we conducted two additional experiments with crop plants. First, we set up a growth curve in bell pepper (Capsicum annuum), which has a 3.5 Gb genome (Kim et al., 2014), approximately 25Â larger than A. thaliana, and the pepper pathogenic bacterium Xanthomonas euvesicatoria (Xe) strain 85-10 (Thieme et al., 2005). As proof-of-concept preparation for the growth curve, to confirm that hamPCR could accurately capture absolute changes in pathogen abundance in pepper leaves, we constructed an infiltration panel in which Xe 85-10 was diluted to final concentrations of 10 4 , 10 5 , 10 6 , 10 7 , and 10 8 CFU / mL and infiltrated into four replicate leaves per concentration. Immediately afterwards, without further bacterial growth, we harvested leaf discs within inoculated areas using a cork borer. We ground the discs and used some of the lysate for Xe 85-10 CFU counting, and the remainder for DNA extraction and hamPCR targeting the V4 16S rDNA and the pepper GI gene (CaGI), and qPCR, targeting the xopQ gene for a Xe type III effector (Doddaraju et al., 2019), and the C. annuum UBI-3 gene for a ubiquitin-conjugating protein (Wan et al., 2011).
Sequencing the hamPCR libraries revealed that as the Xe 85-10 infiltration concentration increased, so did the resulting load of the ASV corresponding to Xe 85-10 ( When we scaled hamPCR and qPCR values to fit the range of CFU counts recovered from the same lysates (Materials and methods), the hamPCR and qPCR Xe 85-10 ASV loads showed nearly the same exponential differences between samples, although at lower infiltration concentrations, qPCR and hamPCR gave a slightly higher estimate than CFU counts ( Figure 6-figure supplement 2c). The presence of a low level of native, antibiotic-sensitive Xe on the leaves could potentially explain this discrepancy, because this could be detected by DNA-based methods but not culturing.
For the pepper growth-curve, we infiltrated six C. annuum leaves of six different plants with Xe 85-10 at a concentration of 10 4 CFU / mL, and took samples from each plant at 0, 2, 4, 7, 9, and 11 dpi for CFU counting, qPCR, and hamPCR. We observed a rapid increase in Xe 85-10 ASV abundance as a result of rapid bacterial growth, leveling off at 7 dpi ( Figure 6a). By 7 dpi, bacterial growth had reduced the host GI amplicon abundances for most samples to levels at which load calculations become less reliable at their sequencing depth (as defined in Appendix 1-figure 3d); this was also the case at 9 and 11 dpi (gray box, Figure 6a). Scaled Xe 85-10 ASV loads compared very closely to CFU counts and to scaled qPCR abundances up to 7 dpi ( Figure 6c). Notably, the other major bacterial classes, Actinobacteria and the Alpha-, Beta-, and Gammaproteobacteria, also increased in microbial load through time, a trend significant even comparing 2 dpi to 0 dpi (Figure 6b, Mann-Whitney U-test, p < 0.001). This increase in load for the other classes was not a PCR artifact due to high Xe 85-10 titers, because in the infiltration panel, measurements for these classes had not changed even at higher pathogen concentrations ( Figure 6-figure supplement 2b). This subtle but biologically significant effect of infection on growth of commensal bacteria would be completely invisible in a pure 16S rDNA amplicon analysis, which would only show Xe 85-10 overtaking the community.
Finally, we applied hamPCR to DNA from 201 leaf samples from mature, isogenic maize (B73) growing in a field site in Tü bingen, Germany. We used the V4 region of 16S rDNA for bacteria and the single copy LUMINIDEPENDENS (LD) gene as a host marker, and plotted both the relative abundance of bacterial genera ( Figure 6d) and the bacterial load of these genera ( Figure 6e). In some samples, the genus Sphingomonas exceeded 80% of the bacterial community, creating especially strong compositionality effects; other abundant genera Perlucidibaca, Limnohabitans, and Acidovorax visibly increased in relative abundance as Sphingomonas became less abundant (Figure 6d). In contrast, the bacteria load of these same genera appeared mostly unaffected by Sphingomonas bacterial load ( Figure 6e). As expected, a Pearson correlation network made with relative abundance data revealed that Sphingomonas was negatively correlated with many genera, a well-known and problematic artifact of compositionality (Friedman and Alm, 2012;Figure 6f). A Pearson correlation network made with microbial load data was remarkable in that Sphingomonas, despite having the highest median abundance of any genus, is among the genera least correlated with others ( Figure 6f). We also calculated a correlation network using SparCC (Friedman and Alm, 2012), which estimates Pearson correlations on log-transformed components to avoid compositionality artifacts. This network did indeed avoid the spurious negative correlations with Sphingomonas, although it still implicated the genus more strongly than the correlation network built with hamPCR data. Each network has a very different biological interpretation, with negative Sphingomonas correlations implying that the genus as a whole can greatly influence the colonization of other microbes. The weak connectedness of Sphingomonas in the microbial load data does not necessarily imply that Sphingomonas strains do not influence colonization of other microbes, but rather that such effects, if they exist, can not be inferred from the abundance of the genus as a whole. Future study will be necessary to resolve these issues. Overrepresentation of a few abundant organisms is a feature shared by most ecological communities (McGill et al., 2007), including microbial communities (Zhou et al., 2013); overcoming this compositionality problem is broadly relevant to studies of hostassociated microbiomes.

Discussion
We developed hamPCR, a simple and robust method to quantitatively co-amplify one or more microbial marker genes along with an unrelated host gene, allowing accurate determination of microbial load and microbial community composition from a single sequencing library (Figure 1, Figure 2). Furthermore, we developed a method to predictably optimize the amount of sequencing effort devoted to microbe vs. host, without losing information about the original microbe to host ratio ( Figure 3). This is an important advance in our approach that greatly increases cost-efficiency.
The principle behind hamPCR stands on a body of literature describing related, firmly established techniques, which bodes well for wide-spread adoption of our approach. Using two steps in a PCR protocol is common in amplicon sequencing, including of microbial marker genes (de Muinck et al., 2017;Gohl et al., 2016;Holm et al., 2019;Lundberg et al., 2013;Symeonidi et al., 2020). Twostep PCR protocols provide the major advantage that only a one-time investment is needed in a set of universal barcoding primers for a flexible step two. These can be easily adapted to any amplicon (s) by simply swapping in different template-specific primers for step one. For labs already equipped for two-step PCR, implementing hamPCR involves only slight adjustments to cycling conditions and template-specific HM-tagging primers (Appendix 1 -Discussion 1). Whiskers extend from the box up to 1.5 times the interquartile range. (a) Xe 85-10 was inoculated into C. annuum leaves at 10 4 CFU/mL. Leaf samples were taken at 0, 2, 4, 7, 9, and 11 days post inoculation (dpi), and hamPCR performed. The corrected load is shown for the particular ASV corresponding to Xe 85-10, as well as for the major bacterial classes. (b) The total load for all bacterial classes shown in a at 0, 2, and 4 dpi (***p<0.001, Mann-Whitney U-test). (c) Actual CFU counts for Xe 85-10 in the growth curve experiment juxtaposed with scaled qPCR and hamPCR loads. (d-e) Fieldgrown Z. mays collection. (d) Relative abundance (RA) of bacterial genera found in Z. mays leaf hole punches, ordered by Sphingomonas relative abundance. The genera corresponding to the first 15 colors from bottom to top are shown in reverse order in the legend. The relative abundance of four isolated genera is highlighted (colored boxes in legend). (e) Same as (d) but showing microbial load rather than relative abundance and ordered by  Quantitative co-amplification using multiple primer pairs also has proven reliable (Carlson et al., 2013;Weller et al., 2000;Wen and Zhang, 2012), and PCR biases affecting co-amplification of diverse fragments are manageable and well-understood from popular RNA-seq and WMS protocols (Bowers et al., 2015;Rinke et al., 2016). This rich literature should increase confidence when implementing hamPCR in microbiome research, and it also provides resources for optimization and further development. For example, the use of fewer cycles in exponential PCR could reduce noise and bias, hamPCR HM-tagging primers could be fitted with unique molecular identifiers (UMIs) for higher precision, and the protocol could be adapted for sequencing platforms with longer read lengths.
We have demonstrated that microbial load measurement is sensitive to the relative concentrations between the host and microbe primers in the HM-tagging step (Appendix 1-figure 4), consistent with the effects of primer concentration on amplification efficiency in qPCR (Bustin and Huggett, 2017;Pierce et al., 2005;Sanchez et al., 2004). This property makes it possible to finetune the primer ratios, either to yield the expected ratio of products (Carlson et al., 2013), or to intentionally increase the representation of a microbial amplicon for more efficient sequencing (Figure 5-figure supplement 4a-d). The effect of primer concentration has important implications for how a large project should be prepared. We recommend that the HM-tagging primers be carefully pipetted into a multiplexed primer master mix sufficiently large to be used for the entire project, or alternatively the same control samples should be sequenced across sample batches to allow correction of slight batch differences.
Because hamPCR can only quantify the DNA available in the template, choice of sample and appropriate DNA extraction methods are very important. In particular, the sample must in the first place include a meaningful quantity of host DNA. For example, although there is some host DNA in mammalian fecal samples or in plant rhizosphere soil samples, this host DNA does not accurately represent the sample volume, and therefore relating microbial abundance to host abundance probably has less value in these cases. Further, the DNA extraction method chosen must lyse both the host and microbial cell types. An enzymatic lysis suitable for DNA extraction from pure cultures of E. coli may not lyse host cells or even other microbes. Appropriate DNA preparation methods for metagenomics have been thoroughly evaluated elsewhere (Albertsen et al., 2015;Yuan et al., 2012), and a common point of agreement is that strong bead-beating increases the yield and completeness of the DNA extraction, but comes at the cost of some DNA fragmentation. Especially for short reads, as we have used here, this fragmentation is not a problem, and we recommend to err on the side of a harsher lysis, using strong bead beating potentially preceded by grinding steps using a mortar and pestle as necessary for tougher tissue.
A limitation of hamPCR is reduced accuracy at the highest microbial loads (Appendix 1- figure  3). Only a minority of our samples reached a level of infection that interfered with accurate quantification, and we expect that this will be the case for most colonized hosts. If not, there are three straightforward adjustments that can increase host signal to acceptable levels. First, altering the host and microbe amplicon ratio in the pooled library prior to sequencing, as demonstrated in Figure 3, could be used to increase the overall host representation. Second, a host gene with a higher copy number could be chosen for HM-tagging throughout the entire project, which would increase host representation by a factor of that copy number (Kembel et al., 2012). Finally, adjusting the concentration of the host primers in the HM-tagging reaction could also increase the representation of host (Appendix 1-figure 4; Carlson et al., 2013). hamPCR is best suited to comparing microbial loads within an experimental system using consistent host and microbe primers, because different microbe primer pairs can differ in their amplification efficiency and therefore yield different load measurements on the same template DNA (Figure 4). Additionally, researchers may decide to adjust primer ratios for more efficient sequencing, which can also change the calculated load. However, looking toward future best practices, this lack of cross-comparability can be overcome by including a simple standard curve. This can be prepared similarly to our synthetic titration panel in Figure 1 which used pure plant DNA mixed with bacterial DNA. Alternatively, a pure host amplicon can be mixed with a pure microbial amplicon at different known ratios. By using hamPCR to sequence standards along with experimental samples, the measured microbial loads can be correlated with known 'standardized' load ratios, which in turn can be compared across primer sets and labs.
In summary, we have demonstrated that hamPCR is agnostic to the taxonomic identities of the organisms studied on both the host and microbe side, their genome sizes, or the functions of the regions amplified. We have also shown that hamPCR can monitor three amplicons at the same time for interkingdom microbial quantification, and in principle can multiplex more with careful design. Our focus here has been on tracking hosts and their closely-associated microbes, but the protocol could also be adapted to quantitatively relate different amplicons targeting archaea, bacteria, and fungi in diverse 'host-free' environments like soil. Besides whole organisms, hamPCR also enables quantitative monitoring of bacterial populations and sub-genomic elements, such as plasmids or pathogenicity islands that might not be shared by all strains in a population. An exciting application of hamPCR is the study of endophytic microbial colonization and infection in crop plants, many of which have very large genomes that preclude the analysis of any sizable number of samples by WMS. In a previous study, we sequenced leaf metagenomes from over 200 A. thaliana plants, at not insignificant costs . In wheat, assuming comparable microbial loads, the same investment in sequencing would barely be sufficient for two samples due to the size of the wheat genome of over 16 Gb. WMS of the >200 samples we processed of field-grown maize likewise would be prohibitively expensive, and supplementing these data with an orthogonal method on this scale requires at least double the number of samples and double the experimental time.
Other exciting applications are the recognition of cryptic infections (Stergiopoulos and Gordon, 2014), tracking of mixed infections, and measurement of pathogen abundances on hosts showing quantitative disease resistance -this could even be accomplished by spiking hamPCR amplicons into the same sequencing run used to genotype the hosts (St Clair, 2010). In sparsely colonized samples, hamPCR will help prevent inflating the abundance of ultra-low abundance microbes, such as reagent contaminants. Finally, for projects with many samples, the fact that hamPCR derives microbial composition and load from the same library not only saves costs and uses less of the sample, but also simplifies analysis and project organization. hamPCR protocol hamPCR requires two steps: a short 'HM-tagging' reaction of two cycles, and a longer 'exponential' reaction. We used 30 cycles throughout this work, although fewer can and should be used if the signal is clear for better quantitative results. The primers employed in the HM-tagging reaction were used at ⅛ the concentration of the exponential primers, as this still represents an excess in a reaction run for only two cycles, prevents waste, and reduces dimer formation. See Appendix 1 and Supplementary file 1 for detailed information about the primers. In particular, Appendix 1 -Discussion 1 provides guidance on adapting HM-tagging primers to other two-step PCR protocols, and Appendix 1 -Discussion 3 discusses the design of new HM-tagging primers.

HM-tagging reaction
We used Taq DNA Polymerase (NEB, Ipswich, MA, USA) for the first HM-tagging step, and set up 25 mL reactions as follows. Each well received 20 mL of master mix and 5 mL of DNA (around 50 ng). Completed reactions were thoroughly mixed on a plate vortex and placed into a preheated thermocycler. We used the following standard cycling conditions: 1. 94˚C for 2 min. Denature 2. 78˚C for 10 s. PNA annealing 3. 58˚C for 15 s. Primer annealing 4. 55˚C for 15 s. Primer annealing 5. 72˚C for 1 min. Extension 6. GO TO STEP 1 for 1 additional cycle 7. 16˚C forever Hold The HM-tagging reaction was cleaned with Solid Phase Reversible Immobilization (SPRI) beads (Rohland and Reich, 2012). All ITS amplicons were cleaned with a 1.1:1 ratio of SPRI beads to DNA, or 27.5 mL beads mixed in 25 mL of tagged template. After securing beads and DNA to a magnet and removing the supernatant containing primers and small fragments, beads were washed twice with 80% ethanol, air dried briefly, and eluted in 17 mL of water. For primer sequences, see Appendix 1 and Supplementary file 1.

Exponential reaction
Of the tagged DNA from step one, 15 mL was used as template for the exponential reaction. To reduce errors during the exponential phase, we used the proof-reading enzyme Q5 from NEB, with its included buffer. We prepared reactions in 25 mL for technical tests with replicated samples. For samples prepared without sequenced replicates, we prepared most in triplicate reactions in which a 40 mL mix was split into three parallel reactions of~13 mL prior to PCR to reduce bias, although this is likely unnecessary .

Master Mix
For 25  We first distributed 8.75 mL (or 23 mL for 40 mL mixes) of master mix to each well. We then added 15 mL of the DNA from the HM-tagging reaction and 1.25 mL (or 2 mL for 40 mL mixes) of 5 mM barcoded reverse primer. For the 40 mL mixes, 13 mL was pipetted into two new PCR wells. The PCR reactions were placed into a hot thermocycler and cycled with the following standard conditions: 1. 94˚C for 2 min. Denature 2. 94˚C for 20 s. Denature 3. 78˚C for 5 s. PNA annealing 4. 60˚C for 30 s. Primer annealing 5. 72˚C for 45 s. Extension 6. GO TO STEP 2 for 29 additional cycles 7. 16˚C forever Hold Following PCR, sets of three 13 mL reactions were recombined to 40 mL. For primer sequences, see Appendix 1 and Supplementary file 1.

Library quality control and pooling
For visualization, 5 mL of PCR product was mixed with 3 mL of 6x loading dye and all 8 mL loaded on a 2% agarose gel and stained with ethidium bromide. The remaining PCR products were cleaned with a SPRI-to-DNA ratio of 1.1:1.0 (v/v). The DNA concentrations in the cleaned products were measured with PicoGreen (Invitrogen, Carlsbad, CA, USA) and samples were pooled at equimolar total DNA ratios. We note that because host and microbial fractions are independently visible on the gel, it would also be possible to measure the quantity of microbial products with image analysis software such as ImageJ (Rueden et al., 2017) and pool at equimolar microbial ratios.
The pooled library was diluted to~1 ng/mL and run on a Bioanalyzer High Sensitivity DNA chip (Agilent, Santa Clara, CA, USA) to check library purity and to estimate the expected ratio of host to microbial amplicons in the sample.

Pre-sequencing adjustment of host: microbe ratio
To adjust the host-to-microbe ratio in the 'synthetic template panel' and 'cycle number test' prior to sequencing on a HiSeq3000 instrument (Illumina, San Diego, CA, USA), four reference samples were first made by rebarcoding the original pooled library (Figure 3-figure supplements 1 and 2). To accomplish this,~5 ng of of the pooled library was used in a 30 mL PCR reaction as follows:

Master Mix
For 30  After distributing 23.5 mL of master mix to each well, 5 mL of the diluted original library was added to each well (5 ng total), along with 1.5 mL of 5 mM barcoded reverse primer. Just prior to placing the reactions in the thermocycler, a 5 mL pre-PCR aliquot was removed from each one and kept on ice to preserve the pre-PCR concentrations. The remaining 25 mL reaction was placed into a preheated thermocycler and run for eight cycles, using the following cycling conditions: 1. 94˚C for 2 min. Denature 2. 94˚C for 30 s. Denature 3. 78˚C for 5 s. PNA annealing 4. 60˚C for 1 min. Primer annealing 5. 72˚C for 1 min. Extension 6. GO TO STEP 2 for seven additional cycles 7. 16˚C forever Hold Following PCR, the pre-PCR aliquots were run alongside 5 mL of post-PCR product on a 2% gel to confirm successful amplification of the reference libraries (Figure 3-figure supplement 2). The remaining 20 mL of PCR reactions were then cleaned with SPRI beads (1.5: 1.0 [v/v]) and set aside. A large aliquot of the original library (approximately 50 ng) was also run on a 2% gel to separate the host and microbe bands for individual purification. The bands were cut out of the gel and each band was put into a separate Econospin spin column (Epoch Life Sciences, Missouri City, TX, USA) without any other liquids or binding buffer. The gel slices were centrifuged at maximum speed to force the liquid containing the DNA into the bottom chamber, leaving the dried gel on top. The eluted DNA was cleaned with SPRI beads at 1.5: 1.0 (v/v) and eluted in EB.
The purified pooled host library fraction, pooled microbe library fraction, and each of the four reference libraries were quantified with Picogreen and the molarity of each was estimated. The pools were then mixed together, targeting host molarity at 5% of the total and each reference library at 1% of the total.

Illumina sequencing
Pooled and quality-checked sequencing libraries were cleaned of all remaining dimers and off-target fragments using a BluePippin (Sage Science, Beverly, MA, USA) set to a broad range of 280 to 720 bp. The libraries were then diluted for Illumina sequencing following manufacturers' protocols. Libraries were first diluted to 2.5-2.8 nM in elution buffer (EB, 10 mM Tris pH 8.0) and spiked into a compatible lane of the HiSeq3000 instrument (2 x 150 bp paired end reads) to occupy 2-3% of the lane. Samples were sequenced across four total lanes (Supplementary file 1).

Sequence processing
The sequences were demultiplexed first by the 9 bp barcode on the PCR primers (Supplementary file 1), of which there are 96, not allowing for any mismatches. In some cases in which two samples differed in both their host and microbe primer sets, we amplified both samples with the same 9 bp barcode to increase multiplexing; such samples were further demultiplexed using regular expressions for the forward primer and reverse primer sequences. Following demultiplexing, all samples were filtered to remove sequences with any mismatches to the expected primers. With HiSeq3000 150 bp read lengths, overlap of read 1 and read 2 was not possible for our amplicons, and therefore only read 1 was processed further.
All primer sequences were removed. Additional quality filtering, removal of chimeric sequences, preparation and Amplicon Sequence Variant (ASV) tables, and taxonomic assignment were done with a combination of VSEARCH (Rognes et al., 2016) and USEARCH10 (Edgar, 2010). ASVs were prepared as 'zero-radius OTUs' (zOTUs) (Edgar, 2010). The 16S rDNA taxonomy was classified based on the RDP training set v16 (13 k seqs.) (Cole et al., 2014), and ITS1 taxonomy of the top 10 most abundant fungal ASVs was classified manually using the UNITE database (Nilsson et al., 2019) (https://unite.ut.ee/). To reduce memory usage, data from the five lanes was processed into four independent ASV tables (Supplementary Data), as described in the sample metadata (Supplementary file 1).

Synthetic titration panel
Seeds from the Arabidopsis thaliana accession Col-0 were surface sterilized by immersion for 1 min in 70% ethanol with 0.1% Triton X-100, soaking in 10% household bleach for 12 min, and washing three times with sterile water. Seeds were germinated axenically on ½ strength MS media with MES, and about 2 g of seedlings were harvested after 10 days. DNA was extracted in the sterile hood as in Regalado et al., 2020 and diluted to 10 ng/mL in elution buffer (10 mM Tris pH 8.0, hereafter EB). Pure E. coli and Sphingomonas sp. cultures were likewise grown with LB liquid and solid media respectively, and DNA was extracted using a bead beating protocol . E. coli DNA was used separately, or alternatively pooled with the mixed Sphingomonas DNA, and diluted to 10 ng/mL. The plant DNA and microbial DNA were then combined according to the following

Synthetic equimolar plasmid template
The ITS1 region from Agaricus bisporus, a fragment of the GI gene from A. thaliana Col-0 accession, the 16S rRNA gene from Pst DC3000, and the ITS1 region from H. arabidopsidis were PCR amplified individually, combined into one fragment via overlap extension PCR, and cloned into pGEM-T Easy (Promega, Madison, WI, USA). The sequences of these templates can be found in Appendix 1 -Discussion 2.
Wild A. thaliana samples DNA from chosen samples previously analyzed by conventional 16S rDNA-only sequencing of the V4 region and WMS  was reused, chosen to capture a wide range of realistic bacterial loads. The samples were individually assayed with hamPCR using 5 mL DNA template (approximately 50 ng). V4 HM-tagging was performed with 515_F1_G-46603 and 806_R1_G-46631 (V4 16S rDNA) and At.GI_F1_G-46602 and At.GI_R502bp_G-46614 (502 bp A. thaliana GI). These were the only V4 samples tagged with the 806R primer instead of the nearby 799R primer, and it was used to enable direct comparison to the dataset in Regalado et al., 2020. V3V4 HM-tagging was performed with 341_F1_G-46605 and 799_R1_G-46601 (V3V4 16S rDNA) and At.GI_F1_G-46602 and At.GI_R502bp_G-46614 (502 bp A. thaliana GI). V5V6V7 HM-tagging was performed with 799_F1_G-46628 and 1192_R1_G-46629 (V5V6V7 16S rDNA) and At.GI_F1_G-46602 and At. . Each exponential PCR reaction was completed in a single reaction of 25 mL; each sample was replicated three times.
Wild A. thaliana mixed sample DNA from samples previously analyzed by conventional 16S rDNA-only sequencing of the V4 region and WMS  were pooled to prepare a single abundant mixed sample to be used repeatedly for technical tests.
The infected plants were grown at 16˚C for 8 days (10 hr light, 14 hr dark) and harvested by pooling all seedlings in each pot into a sterile pre-weighed tube, which was again weighed to find the mass of the seedlings. Three 5 mm glass balls and 300 mL 10 mM MgCl 2 were added to each tube and the plant cells were lysed at a speed of 4.0 m/s for 20 s in a FastPrep-24 Instrument (MP Biomedicals, Illkirch-Graffenstaden, France) to release the live bacteria from the leaves. From the pure lysate, 20 mL was used for a serial log dilution series, and 5 mL of each dilution was plated on LB agar supplemented with 100 mg/mL rifampicin. Colony-forming units (CFUs) were counted after 2 days of incubation at 28˚C. The remaining 280 mL of lysate were combined with 520 mL DNA lysis buffer, 0.5 mL of 1 mm garnet sharp particles (BioSpec, Bartlesville, OK, USA). Of 20% SDS, 60 mL was added to make a final SDS concentration of 1.5%, and DNA was extracted using a bead beating protocol . The number of Hpa sporangiophores was too high to be accurately quantified by visual counting.

Titration with plant DNA infected with H. arabidopsidis
A titration panel was made combining different amounts of DNA from uninfected plants (eds1-1 treated only with 10 mM MgCl 2 ) and DNA from Hpa-infected plants (eds1-1 infected with Hpa as described above). Infected and uninfected pools were each diluted to 6 ng/mL, and combined in 0:7, 1:6, 2:5, 3:4, 4:3, 5:2, 6:1, and 7:0 ratios. These were tagged using the same three primer sets described above for Hpa actin, A. thaliana GI, and V4 16S rDNA above. Each exponential PCR reaction was completed in a single reaction of 25 mL; hamPCR was replicated on the titration three times.

Capsicum annuum infections with Xanthomonas Leaf infiltration log series
Using pressure infiltration with a blunt-end syringe, C. annuum cultivar Early Calwonder (ECW) leaves were inoculated with Xanthomonas euvesicatoria (Xe). Xe strain 85-10 (Thieme et al., 2005) was resuspended in 10 mM MgCl 2 to final concentration of 10 8 CFU / mL (OD 600 =0.4) and further diluted to 10 7 , 10 6 , 10 5 , and 10 4 CFU / mL. Upon infiltration, five leaf discs (7 mm diameter) were punched from each leaf per sample and placed in a 2 mL round-bottom tube with two SiLibeads (type ZY-S 2.7-3.3 mm, Sigmund Lindner GmbH, Warmensteinach, Germany) and 300 mL 10 mM MgCl 2 . The samples were ground by bead beating for 25 s at 25 Hz using a Tissue Lyser II machine (Qiagen, Hilden, Germany). For CFU-based bacterial enumeration, 30 mL of the lysate or 30 mL of serial dilutions were plated on NYG medium (0.5% peptone, 0.3% yeast extract, 0.2% glycerol and 1.5% agar [Daniels et al., 1984] containing rifampicin 100 mg/ml). Xe bacteria were counted 3 days post incubation at 28˚C. The remaining 250 mL of lysate was combined with 600 mL of DNA lysis buffer containing 2.1% SDS (for a 1.5% final SDS concentration) and transferred to screw cap tubes filled with 1 mm garnet sharp particles, for a bead-beating DNA prep as previously described .
Growth curve: Xe strain 85-10, resuspended in 10 mM MgCl 2 to a final concentration of 10 4 CFU / mL, was infiltrated via a blunt end syringe into 6 C. annuum (ECW) leaves of six different plants. Upon 0,2,4,7,9,and 11 dpi (days post inoculation) four leaf discs (7 mm diameter) from each inoculated leaf were harvested and bacterial numbers were determined as described above. Of leaf lysates, 250 mL were used for a bead-beating DNA prep as described for all other samples above.

Pristionchus pacificus titration panel
Pristionchus pacificus strain PS312 (Dieterich et al., 2008) was grown on nematode growth media (NGM) plates supporting a bacterial lawn of either pure E. coli OP50 or alternatively a mix of E. coli OP50, Pst DC3000, and Lysinibacillus xylanilyticus (a strain isolated from wild P. pacificus). The worms were washed extensively with PBS buffer to remove epidermally-attached bacteria, and DNA was prepared from whole worms using the same bead beating protocol as described for A. thaliana . Worm DNA from the pure culture and the mixed culture were each diluted to 6 ng/mL, and combined in 0:7, 1:6, 2:5, 3:4, 4:3, 5:2, 6:1, and 7:0 ratios to create a titration panel. Each hamPCR template (5 mL template or 30 ng total) was used to perform the HM-tagging reaction, using primers 799F1_G-46628 and 1192R1_G-46629 (V5V6V7 16S rDNA), and Pp_csq-1_F1_G-46691 and Pp_csq-1_R1_G-46692 (P. pacificus csq-1). Each exponential PCR reaction was completed in a single reaction of 25 mL; the titration was replicated three times.

Triticum aestivum titration panel
Triticum aestivum (wheat) seeds (Rapunzel Naturkost, Legau, Germany) were surface-sterilized by immersion in 70% ethanol and 0.1% Triton X-100 for 1 min, soaking for 15 min in 10% household bleach, and finally washing three times in sterile autoclaved water. Axenic plants were grown on 1% agar supplemented with 1/2 strength MS medium buffered with MES. About 1 g of sterile leaf tissue was harvested after 10 days, and DNA was extracted in the sterile hood as described in ref. . Roots that had been spontaneously colonized by microbes were obtained by growing by transplanting germinated seeds outdoors into potting soil. Roots were harvested from approximately 4-week-old plants and surface-sterilized by immersion in 10% household bleach with 0.1% Triton X-100 for 5 min, followed by three washes with sterile water. Axenic leaf DNA and spontaneously colonized root DNA were each diluted to 60 ng/mL and combined in 0:7, 1:6, 2:5, 3:4, 4:3, 5:2, 6:1, and 7:0 ratios to create a titration panel of eight samples. Each hamPCR HM-tagging reaction used 3 mL (~180 ng) template; fungal ITS1 HM-tagging was performed with primers ITS1_F1_G-46622 and ITS2_R1_G-46623 (ITS1 rDNA), and PolA1_F1_G-46750 and PolA1_R1_G-46751 (T. aestivum RNA polymerase one gene, PolA1). Bacterial 16S rDNA HM-tagging was performed with the same PolA1 primers and with 515_F1_G-46603 and 799_R1_G-46601 (V4 16S rDNA). To make an additional ITS1 library enriched for ITS1 amplicons, the ITS1 primer pair concentration was increased by a factor of 1.33 and the PolA1 primer pair concentration was decreased by a factor of 0.66, giving a 2:1 ratio instead of the standard 1:1 ratio, and the tagged products were amplified with 7 HM-tagging and 25 PCR cycles instead of the standard 2 HM-tagging and 30 PCR cycles.

Zea mays field samples
Samples of leaves from mature Zea mays (maize) genotype B73 were harvested by standard hole punch from a field side in Tü bingen. Permission to punch the leaves was graciously provided by Dr. Marja Timmermans (University of Tü bingen). Each sample comprised five leaf discs, which were immediately shaken in 1 mL of sterile water in a screw cap tube to remove dust from the field. The water was removed by pipetting and the leaf discs were snap frozen in liquid nitrogen and taken back to the lab for processing. DNA was extracted with the bead beating protocol described above, with the difference that prior to addition of lysis buffer and garnet rocks, the deep frozen leaf discs were pre-ground with three metal ball bearings at a speed of 5.0 m/s in a FastPrep-24 instrument. We found this pre-grind was helpful to break down the fibrous maize leaf tissue. Prior to adding garnet rocks and lysis buffer, the metal balls were removed by magnet, as metal balls can crack the tubes at the speed of 6.0 m/s used for the primary DNA extraction. Each hamPCR HM-tagging reaction used 10 mL (~120 ng) template; Bacterial 16S rDNA was tagged with one of the forward primers 515F_bcGA_G-47188, 515F_bcTC_G-47189, 515F_bcAG_G-47190, or 515_F3_G-46694 paired with the reverse primer 799_R1_G-46601 (V4 16S rDNA). Maize LUMINIDEPENDENS (LD) was tagged with one of the forward LDP1 primers Zm_LD_bcGA_G-47184, Zm_LD_bcTC_G-47185, Zm_LD_bcAG_G-47186, or Zm_LD_bcCT_G-47187 paired with the LD reverse primer Zm_LD_R_G-47158. Two HM-tagging cycles were paired with 30 exponential cycles. To reduce host representation in the final library from the original~75% to approximately 40%, we used the gel remixing technique described in Figure 3.

Test of HM-tagging step cycle numbers
As templates, we used a pool of mixed wild A. thaliana leaf DNA (~50 ng / reaction) and the 'synthetic equimolar plasmid template' (~0.05 ng / reaction, Appendix 1 -Discussion 2). For the wild A. thaliana leaf DNA, we tested V4 16S rDNA primers alone in the HM-tagging step vs. hamPCR with V4 16S rDNA primers plus primers for the host GI gene. For the 'synthetic equimolar plasmid template', we used only hamPCR. Specifically, we used 515_F1_G-46603 and 799_R1_G-46601 (V4 16S rDNA) and At.GI_F1_G-46602 and At.GI_R502bp_G-46614 (502 bp GI gene).

Tests of template concentrations
A panel of eight concentrations of wild A. thaliana leaf DNA was prepared, ranging from 5 to 500 ng per reaction. Primers for the wild A. thaliana leaf DNA were 515_F1_G-46603 and 799_R1_G-46601 (V4 16S rDNA) and At.GI_F1_G-46602 and At.GI_R502bp_G-46614 (502 bp GIGANTEA gene), with both primer pairs in equal ratio.
Quantitative real-time PCR on C. annuum samples A primer set targeting the gene for the type III effector XopQ of pathogenic Xanthomonas was used to measure abundance of Xe 85-10 (Doddaraju et al., 2019). For C. annuum, primers targeting the UBI-3 gene encoding a ubiquitin-conjugating protein were used (Wan et al., 2011). Two reagent mastermixes were prepared, one for each primer set, to help improve primer dose consistency. Each sample was amplified using three 10 mL technical replicates per primer set that were averaged for analysis. Each 10 mL reaction included 2.5 mL of DNA, to which was added, as a mastermix, 5 mL SYBR Green PCR Master Mix (Life Technologies, Carlsbad, California), 1.5 mL water, 0.5 mL of 5 mM forward primer, and 0.5 mL of 5 mM reverse primer. qPCR was performed on a BioRad CFX384 Real- The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication. . Transparent reporting form Data availability All data in this manuscript have been deposited in the European Nucleotide Archive (ENA) under the project number PRJEB38287.

Author contributions
The following dataset was generated:

Primers: Microbe tagging
Standard desalting is sufficient purification for the HM-tagging primers.
V4 region (16S rDNA), 515F -799 R / 515F -806R,~414 bp final amplicon Underlined letters represent the universal overhang. 799F is commonly used in combination with 1192R because it avoids plant plastids by mismatch. Shown here is the reverse complement of that primer. Used in combination with PNAs (Lundberg et al., 2013), it avoids nearly all plastids.
Bracketed letters represent 'linkers' -sequences designed to anneal to as few 16S rDNA molecules as possible, and ensure that only the gene-specific region binds to the template.
Lowercase letters are template-specific. Template-specific primer sequences for 515F and 806R are the same as used in the Earth Microbiome Project Thompson et al., 2017 andLundberg et al., 2013. Linker sequences for 515F and 806R are as in Lundberg et al., 2013. Template-specific primer sequence for 799F as in Agler et al., 2016b. 'F1', 'F3', 'bcGA', 'bcTC', and 'bcAG' forward primers perform identically but each introduces different sequenced bases which serve as an additional barcode. See Lundberg et al., 2013. FORWARD: 799F is commonly used in combination with 1192R because it avoids plant plastids by mismatch. Shown here is the reverse complement of that primer. Used in combination with PNAs (Lundberg et al., 2013), it avoids nearly all plastids.
Bracketed letters represent 'linkers' -sequences designed to anneal to as few 16S rDNA molecules as possible, and ensure that only the gene-specific region binds to the template.
Lowercase letters are template-specific. Template-specific primer sequence for 341F is as in Agler et al., 2016b. FORWARD:
Bracketed letters represent 'linkers' -sequences designed to anneal to as few 16S rDNA molecules as possible, and ensure that only the gene-specific region binds to the template.
Lowercase letters are template-specific.
Bracketed letters represent 'linkers' -sequences designed to anneal to as few ITS molecules as possible, and ensure that only the gene-specific region binds to the template.
Lowercase letters are template-specific. Template-specific primer sequences and linker sequences are as used in the Earth Microbiome Project (Thompson et al., 2017). Lowercase letters are template-specific. The template-specific primer sequences were chosen for this study; the choice of the Actin gene was based on (Anderson and McDowell, 2015). Lowercase letters are template-specific. Here, the template-specific portion extends into the universal overhang for the At.GI_F1_G-46602 primer.
'F1' and 'F3' forward primers perform identically but F3 introduces a 'CT' causing a frameshift that can also serve as an additional barcode. See (Lundberg et al., 2013). Here, the additional 'CT' extends into the universal overhang for the At.GI_F3_G-46640 primer.

Discussion 1: Note on Adapting HM-tagging Sets for Other Protocols
Simply changing the universal overhangs and adjusting annealing temperatures accordingly may be sufficient to use hamPCR with other two-step PCR protocols. The following are starting suggestions only, and have not been tested experimentally.
To adapt a forward HM-tagging primer to that in Gohl et al., 2016, replace the upper-case, unbracketed letters in the forward primer for that template (shown in orange in the example below) with 'TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG'.
For example, for 'At.GI_F1_G-46602' for A. thaliana GI, 'TCCCTACACGACGCTCTTCCGATct gtaaagataaatgggtcatctaa' would become: 'TCGTCGGCAGCG TCAGATGTGTATAAGAGACAG ctgtaaagataaatgggtcatctaa' To adapt a reverse HM-tagging primer to that in Gohl et al., 2016, replace the upper-case, unbracketed letters (shown in orange in the example below) in the reverse primer for that template with: 'GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG'.
For example, for 'At.GI_R502bp_G-46614' for A. thaliana GI, 'GGAGTTCAGACGTGTGCTCTTCCGATCT tccttctgaaccggtgtattc' (this work) would become: 'GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG tccttctgaaccggtgtattc'. Rather than adapt the forward HM-tagging primers to work with the forward PCR primer in Lundberg et al., 2013, we recommend using the forward primer here. The reverse HM-tagging primers can be used directly with the reverse PCR primers in Lundberg et al., 2013. Primers: Exponential PCR For the full PCR primer set, please see Supplementary file 1. PCR primers should be HPLC-purified or purified with an alternative method to reduce truncated primers.

REVERSE:
Example PCR_R_indexed primers. Same set as that published in Lundberg et al., 2013.  Appendix 1-figure 2. SPRI ratios. SPRI beads in a polyethylene glycol (PEG) solution, such as AMPure XP, preferentially bind longer DNA fragments, making them useful for removing free primers and primer dimers from reactions. As the PEG concentration decreases, a wider range of short fragments can no longer bind to the beads, and primer dimers are more completely eliminated. However, if the PEG concentration is too low, the range of fragment sizes eliminated could include DNA of interest. For hamPCR, it is important that the SPRI cleanup does not affect the ratio of the host and microbial amplicons, which could lead to systematic bias and noise. To determine an acceptable PEG concentration, we tested cleanups with SPRI (AMPure XP) beads at different SPRI: DNA ratios (resulting in a range of PEG concentrations) on a standard DNA size ladder (GeneRuler DNA Ladder Mix, Thermo Scientific, Waltham, MA, USA), and quantified abundance of the purified fragments with a Bioanalyzer (Agilent, Santa Clara, CA, USA). Using the pure, uncleaned ladder, we calculated the ratio of each peak's abundance to the adjacent larger peak (200: 300 bp, 300: 400 bp, etc.), and used this set of abundance ratios as a baseline (no SPRI, far left). With each successive decrease in SPRI: DNA ratio (from left to right), we looked for a decrease in the abundance ratio between adjacent bands, which would indicate elimination of the smaller fragment. Of highest interest is the 300: 400 bp ratio (black), because the smallest tagged templates in hamPCR are around 300 bp. We determined SPRI: DNA ratios less than 1.0 endangered the 300: 400 ratio, and thus decided to conservatively use a SPRI: DNA solution of 1.1: 1 or higher for the cleanup of tagged products. . The x-axis shows the percentage of the sample occupied by the host amplicon, decreasing from left to right. The lines show a simple linear, mutually-exclusive relationship between host and microbe. (b) The microbial abundances from (a) were converted to microbial load by dividing by the host abundances. Note that as the host amplicon abundance approaches 0, the microbial load climbs towards infinity. The red point at 0% host amplicon abundance represents infinite load. The vertical red dotted line indicates 3% host amplicon abundance; below 3% host Appendix 1-figure 4. Effect of primer pair concentration on product. The same wild A. thaliana DNA pool was amplified with hamPCR using eight different ratios of the 16S rDNA primer pair to the GI primer pair, ranging from 1/10 to 10. (a) 2% agarose gel of the prepared libraries (b) Relative abundance (RA) of bacteria and host amplicons in the sequence data. (c) Scatterplot of microbe-tohost product ratios plotted against the microbe-to-host primer ratios used to produce them. A tenfold difference in primer ratios resulted in an over 200-fold difference in product ratios, underscoring the importance of steady primer ratios (through the use of a mastermix of primers) across an experiment. thaliana GI ASV appears to increase at the highest template concentrations, but remains of constant abundance through the standard template range of 5-100 ng. (c) The A. thaliana GI ASV has been removed and the bacterial ASVs have been rescaled to give bacterial relative abundance.

TTTGGCGGCGGCTGCTGGCTTAATTGTTGGCGGCTGCTGCTGAGTGAGCCCTATCAAAAAAAAGGCGAACG TTTGGGCTTCGGCCTGATTTAGTAGTCTTTTTTTCTTTTAAACCCCTTCCTTAATACTGAATATACTG TGGGGACGAAAGTCTCTGCTTTTAACTAGATAGCAACTTTcagcagtggatgtctaggct
The four amplicons were gel purified and combined by overlap extension PCR. First, the ITS amplicon was joined with the 16S rDNA amplicon, and the GIGANTEA amplicon was joined with the ITSo amplicon. These two fragments were gel purified and combined by overlap extension PCR to make the final fragment below, which was cloned into pGEM-T Easy (Promega, Madison, WI, USA). taaAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTATTGAATTATGTTTTCTAGA  TGGGTTGTAGCTGGCTCTTCGGAGTATGTGCACGCCTGTCTGGACTTCATTTTCATCCACCTGTGCACC  TTTTGTAGTCTTTTTCAGGTATTGGAGGAAGTGGTCAGCCTATCAGCTCTTTGCTGGATGTAAGGAC  TTGCAGTGTGAAAACAGTGCTGTCCTTTACCTTGGCCATGGAATCTTTTTCCTGTTAGAGTCTATGTTA  TTCATTATACTCTTAGAATGTCATTGAATGTCTTTACATGGGCTATGCCTATGAAAATTATTATACAAC  TTTCAGCAACGGATCTCTTGGCTCTCgcatcgatgaagaacgcagctattgaggcgaatcaacctgtATC  TAACAATCAAACTGCTAACCGTAAAAGTAGGAATGTCAAGGGACAGGGACCTGTGGCAGCATTTGATTCA  TACGTTCTTGCTGCTGTTTGTGCTCTTGCCTGTGAGGTTCAGCTGTATCCTATGATCTCTGG  TCGGGGGAACTTTTCCAATTCTGCCGTGGCTGGAACTATTACAAAGCCTGTAAAGATAAATGGGTCATC  TAAAGAGTATGGAGCTGGGATTGACTCGGCAATTAGTCATACGCGCCGAATTTTGGCAATCCTAGAGGCAC  TCTTTTCATTAAAACCATCTTCTGTGGGGACTCCATGGAGTTACAGTTCTAGTGAGATAGTTGC  TGCGGCCATGGTTGCAGCTCATATTTCCGAACTGTTCAGACGTTCAAAGGCCTTGACGCATGCATTGTC  TGGGTTGATGAGATGTAAGTGGGATAAGGAAATTCATAAAAGAGCATCATCATTATATAACCTCATAGATG  TTCACAGCAAAGTTGTTGCCTCCATTGTTGACAAAGCTGAACCCTTGGAAGCCTACCTTAAGAA  TACACCGGTTCAGAAGGATTCTGTGACCTGTTTAAACTGGAAACAAGAGAACACATGTGCAAGCACCACA  TGCTTTGATACAGCGGTGACATCCGCCTCAAGGACTGAAATGAATCCAAGAGGAAACCATAAGTATGCTAG  acattcagatgaaggctcaggagagtttgatcctggctcagATTGAACGCTGGCGGCAGGCCTAACACA  TGCAAGTCGAGCGGCAGCACGGGTACTTGTACCTGGTGGCGAGCGGCGGACGGGTGAGTAATGCCTAGGAA  TCTGCCTGGTAGTGGGGGATAACGCTCGGAAACGGACGCTAATACCGCATACGTCCTACGGGAGAAAG-CAGGGGACCTTCGGGCCTTGCGCTATCAGATGAGCCTAGGTCGGATTAGCTAGTTGGTGAGGTAATGGC  TCACCAAGGCGACGATCCGTAACTGGTCTGAGAGGATGATCAGTCACACTGGAACTGAGACACGGTCCA-GACTCCTACGGGAGGCAGCAGTGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCGTG  TGTGAAGAAGGTCTTCGGATTGTAAAGCACTTTAAGTTGGGAGGAAGGGCAGTTACCTAATACGTATCTG  TTTTGACGTTACCGACAGAATAAGCACCGGCTAACTCTGTGCCAGCAGCCGCGGTAATACAGAGGG  TGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGTGGTTTGTTAAGTTGAATGTGAAA  TCCCCGGGCTCAACCTGGGAACTGCATCCAAAACTGGCAAGCTAGAGTATGGTAGAGGGTGGTGGAA  TTTCCTGTGTAGCGGTGAAATGCGTAGATATAGGAAGGAACACCAGTGGCGAAGGCGACCACCTGGACTGA  TACTGACACTGAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCG  TAAACGATGTCAACTAGCCGTTGGGAGCCTTGAGCTCTTAGTGGCGCAGCTAACGCATTAAGTTGACCGCC  TGGGGAGTACGGCCGCAAGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGG  TTTAATTCGAAGCAACGCGAAGAACCTTACCAGGCCTTGACATCCAATGAATCCTTTAGAGATAGAGGAG  TGCCTTCGGGAGCATTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAG  TCCCGTAACGAGCGCAACCCTTGTCCTTAGTTACCAGCACGTTAAGGTGGGCACTCTAAGGAGACTGCCGG  TGACAAACCGGAGGAAGGTGGGGATGACGTCAAGTCATCATGGCCCTTACGGCCTGGGCTACACACGTGC  TACAATGGTCGGTACAGAGGGTTGCCAAGCCGCGAGGTGGAGCTAATCTCACAAAACCGATCGTAG  TCCGGATCGCAGTCTGCAACTCGACTGCGTGAAGTCGGAATCGCTAGTAATCGCGAATCAGAATGTCGCGG  TGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTGGGTTGCACCAGAAGTAGCTAG  TCTAACCTTCGGGGGGACGGTTACCACGGTGTGATTCATGACTGGGGTGaagtcgtaacaaggtacg-gaaggatcattaccacACCTAAAAAACTTTCCACGTGAACCGTTTCAACCCAATAGTTGGGGGTCTTA  TTTGGCGGCGGCTGCTGGCTTAATTGTTGGCGGCTGCTGCTGAGTGAGCCCTATCAAAAAAAAGGCGAACG  TTTGGGCTTCGGCCTGATTTAGTAGTCTTTTTTTCTTTTAAACCCCTTCCTTAATACTGAATATACTG  TGGGGACGAAAGTCTCTGCTTTTAACTAGATAGCAAC  TTTcagcagtggatgtctaggcttcgagcaccaccaccaccac Discussion 3: Designing hamPCR HM-tagging primers Investigators wishing to use hamPCR for a study system not demonstrated in this manuscript will need to design suitable HM-tagging primers specific to that system. Before embarking on new primer design, please first check our public resource of functional primers, which includes new primer sequences and primer combinations contributed by the community: https://docs.google. com/spreadsheets/d/190VcSCMXmKEAIawp3GE-ZHXf1v1G3V7uuMTo4anYsJw/edit?

>Synthetic_equimolar_templatesXagaaggagatataccatggcttggtcatttagaggaag-
Here, we provide some helpful tips for new primer design. Researchers should have already decided on a two-step PCR system (see Appendix 1 -Discussion 1), which defines the universal overhangs that will be added to the HM-tagging primers.

Choose the microbial amplicon
For example, choose the V4 region of 16S rDNA. Calculate its approximate length (~285 bp) and GC content (~56%), and find the length and annealing/melting temperatures of the primers needed to amplify it (~73˚C for 515F and~52˚C for 799R or 806R). Numerous melting temperature calculators are freely accessible online. At the time of writing, Thermo Fisher's 'Multiple Primer Analyzer' was our favorite.

Find candidate host genes
Research single or low-copy host genes in the host to be targeted. Such genes are often of interest for making phylogenies (Li et al., 2017) and therefore for many organisms have already been identified and written about. If sequences for candidate genes are publicly available, download candidate gene sequences into a text editor. If the host organism has no or few sequences available, look for candidate genes in two or more related organisms for which sequences are available. By aligning the genes in these related organisms, conserved regions in those genes can be identified that have a high chance of also being conserved in the host of interest. Those conserved areas may be used to design primers. Be wary of using the cDNA sequence of such genes, as the introns present in the genome are not part of the cDNA. Please note that this concern also applies to genes from eukaryotic microbes.
3 Find primer binding sites in the host candidate genes Search the candidate genes for a region of similar length to the microbial amplicon that does not have long stretches of the same repeated base. Genes that appear by eye to have a random distribution of bases are excellent. If the candidate genes differ in GC content, start with the gene for which the GC content is most similar to the microbial amplicon. Locate a short (18-25 bp) region suitable for the forward primer that has a similar annealing temperature to that used for the microbial amplicon, erring on the side of a higher annealing temperature. Follow normal primer design guidelines for selecting a region (excellent advice and software are freely offered by many companies that sell oligos). When a candidate forward primer has been identified, look downstream in the host sequence for a similarly suitable reverse primer sequence. Ideally, the distance between the primers will differ from the length of the microbial amplicon by about 80-120 bp, allowing visualization of both amplicons on an agarose gel, and ensuring both amplicons will be sequenced with similar efficiency. This means for each forward primer candidate, there is about 40 bp of sequence in which to locate an ideal reverse primer candidate. A difference of fewer than 80 bp is also acceptable, especially if the host amplicon is not of overwhelming abundance and gel separation of the bands (as we show in Figure 3) is not necessary. Length and GC biases can be mitigated to some extent by choice of polymerase and adjusting cycling conditions (Aird et al., 2011;Dabney and Meyer, 2012).

Assemble the full HM-tagging primer sequences
Add the universal overhangs to each forward and reverse primer that will bind to the index primers. Appendix 1-figure 6 below helps to visualize how the universal overhangs on the HM-tagging primers interact with the indexing PCR primers. If there is sequence variation in the amplicon being targeted (which can be the case in particular for microbial amplicons such as ITS and 16S rDNA), a short linker sequence may be used, positioned just 5' of the template-binding part of the primer. The purpose of the linker sequence is to not bind to as many template sequences as possible. Thus, it helps the 5' end of the primer avoid contact with template molecules, which could bias amplification. Primers targeting host genes (for which sequence variation is not expected) will likely not benefit from such linker sequences. Keep in mind throughout that shorter HM-tagging primers tend to work better, so do not add unnecessary bases.
After deciding on candidate full primer sequences, align the full primer sequences to the host template again, and check if any of the 3' bases of the universal overhang also by chance match the host gene (this would effectively extend the template-specific portion of the primer). If any bases of the universal overhang can be considered as part of the template-specific portion of the primer, the 3' end of the template-specific portion of the primer may be shortened. For example, in the following primer that binds the wheat PolA1 gene, the final 'ct' of the underlined universal overhang (highlighted purple) also is complementary to the PolA1 template, and should be part of the length and annealing temperature calculation for the template-specific portion of the primer.
>Ta.PolA1_F1_G-46750 TCCCTACACGACGCTCTTCCGATct gatgttgtggaaggaattgaa 5 Check full primer sequences for predicted incompatibility in the same reaction Some primers may have strong and unexpected complementarity to other primers or to themselves, leading them to form dimers rather than bind to the intended template. Dimerization can ruin the reaction. We used Thermo Fisher's 'Multiple Primer Analyzer', freely available online, to look for primer interaction. Many well-functioning combos will show some weak interactions -although it is important to minimize dimers, especially those with binding near the 3' end of the primer which polymerase could potentially extend into a product, it is not necessary to eliminate all predicted interactions. For example, primers for V4 16S rDNA and the A. thaliana GI gene (515_F1_G-46603, 799_R1_G-46601, At.GI_F1_G-46602, and At.GI_R502bp_G-46614) are very robust together, and they show potential weak dimers as follows: Self-Dimers: 2XdimersXfor:XAt.GI_F1_G-46602 5-tccctacacgacgctcttccgatctgtaaagataaatgggtcatctaa-> XXXXXXXXXXX||X|XX|XXXXX||||X||X||||XXXXX|XX|X|| XXXXXX<-aatctactgggtaaatagaaatgtctagccttctcgcagcacatccct-5 XXXX5-tccctacacgacgctcttccgatctgtaaagataaatgggtcatctaa-> XXXXXXXXXXXX||XXXXX||X|X|X||||X|X|X||XXXXX|| <-aatctactgggtaaatagaaatgtctagccttctcgcagcacatccct-5 CrossXPrimerXDimers: 515_F1_G-46603XwithXAt.GI_F1_G-46602 515_F1_G-46603 XX5-tccctacacgacgctcttccgatctgtaaagataaatgggtcatctaa-> XXXXX||XXXXXX|XX|XXXX||X||||X|X|X||XXXXX|| <-aatggcgccgmcgacygtgagtctagccttctcgcagcacatccct-5 6 Experimentally test primer candidates Primers sometimes defy predictions, working better or worse than expected. For example, if primers for which dimerization is not predicted nonetheless have some off-target complementarity to microbial or host genomes, they may not provide clear products. We recommend ordering at least three host primer pairs at once, and testing all of them. First, test each primer pair alone in a normal (not hamPCR) reaction of 35 cycles using a sample containing host DNA as template, and ensure that a product of the correct size is formed. Primers that work well alone can then be tested in combination with microbial primers in hamPCR. For initial testing, it is helpful to a DNA template that has an equal amount of host and microbial templates, so that both bands are expected. For example, host and microbial templates can be mixed in an equimolar ratio (which takes into account any differences in genome size between host and the microbial target). Alternatively, template amplicons can be cloned to make a single chimeric DNA fragment as we described in Appendix 1 -Discussion 2, forming a template with a known 1:1 host-to-microbe ratio.