Microbiota DNA isolation, 16S rRNA amplicon sequencing, and bioinformatic analysis for bacterial microbiome profiling of rodent fecal samples

Summary Fecal samples are frequently used to characterize bacterial populations of the gastrointestinal tract. A protocol is provided to profile gut bacterial populations using rodent fecal samples. We describe the optimal procedures for collecting rodent fecal samples, isolating genomic DNA, 16S rRNA gene V4 region sequencing, and bioinformatic analyses. This protocol includes detailed instructions and example outputs to ensure accurate, reproducible results and data visualization. Comprehensive troubleshooting and limitation sections address technical and statistical issues that may arise when profiling microbiota. For complete details on the use and execution of this protocol, please refer to Gubert et al. (2022).


STEP-BY-STEP METHOD DETAILS Fecal collection
Timing: 5 min per mouse Mouse fecal collection is to be completed in a sterile environment and as quickly as possible, to avoid potential microbial contamination. Ensure appropriate PPE is worn and all items used in the collection process are sterilized to minimize sample contamination.
1. Assemble cage base with lid and label each cage with appropriate mouse number prior to beginning fecal collection. 2. Spray each cage with 80% ethanol and wipe with paper towel. Wait until the cage is completely dry as an additional sterilization measure. 3. Place each mouse into the appropriately labeled clean cage for up to 5 min or until the mouse has excreted 4-8 fresh fecal pellets or a minimum of 35 mg of feces. 4. Return the mouse to the home cage. 5. Using a fresh needle for each collection, use the needle tip to carefully collect the pellets and place into an appropriately labeled 5 mL Eppendorf tube.
Note: Do not collect fecal pellets that have been in contact with urine, see troubleshooting 1.
6. Place the Eppendorf tube immediately into a cooler box with dry ice, before storing at À80 C until further processing.
Note: All cages are sterilized using a commercial cage and rack washer and commercially available chemicals. The wash cycle runs for 240 s at 55 C using TP Alka detergent followed by a dripping cycle for 30 s and neutralization for 4 s using TP Acid. Neutralization is followed by a rinse cycle for 20 s at 82 C, steam sanitization and an exhaust cycle for 60 s at 45 C.
Note: Immediately freezing and storing fecal samples at À80 C is considered best practice for preserving microbial composition (Fouhy et al., 2015;Wu et al., 2010). Bacterial species remain viable for up to 10 years when stored at À80 C (Simione et al., 1991).

DNA extraction from fecal pellets
Timing: 5-7 h for 94 samples and 2 controls This step aims to extract genomic DNA from the previously collected fecal samples. Ensure appropriate PPE is worn during the extraction process. Pause point: The PCR cycle will take approximately 2 h 45 min and the PCR reactions can remain on the thermocycler until the next step.

Pooling and normalizing amplicon libraries:
a. Pool the triplicate PCR reactions for each sample into a single volume of 75 mL. Run the amplicon libraries on an agarose gel to verify the presence of PCR product with the expected size of $390 bp. b. Quantify the amplicons using Qubit HS assay kit, Quant-iT PicoGreen dsDNA Assay kit or equivalent. The expected ranges for positive and negative control are 10-30 ng/mL and 0-2 ng/mL respectively. Fecal samples are expected to generate an amplicon concentration of 10-30 ng/mL. Troubleshooting 4. c. Combine 240 ng of each sample into a single tube. Pool 2 mL of the negative control and 10 mL of the positive control in the final tube. 13. Clean-up the amplicon library pool using Ampure XP beads (Beckman Coulter).
a. Allow the beads to come to 23 C and vortex to resuspend before usage. b. In a clean Eppendorf tube, add 200 mL of the resuspended Ampure XP beads to 400 mL of amplicon pool and repeat until all the pool has been transferred. c. Mix by inverting the tube 10 times. d. Incubate at 5 min at 23 C. e. Pellet the beads on a magnetic rack for approximately 2 min or until the solution turns clear and a visible bead is formed close to the magnet. f. Prepare 80% v/v ethanol in nuclease-free water. g. Wash the beads: i. Pipette off the supernatant and discard.
ii. Wash the beads with 500 mL of freshly prepared 80% v/v ethanol without disturbing the pellet. iii. Pipette off the 80% v/v ethanol and discard. h. Repeat the wash one more time. i. Pulse centrifuge the tube and replace the tubes on the magnetic rack. j. Pipette off residual ethanol and allow the pellet to dry for $30 s. Do not overdry the pellet to the point of cracking, a good rule of thumb is that the pellet should be wet but not shiny. k. Remove the tube from the magnetic rack and resuspend pellet in 200 mL nuclease-free water.
Incubate at room temperature for 2 min. l. Return the tubes to the magnetic rack until the solution is clear. Remove and retain 200 mL of eluate from all tubes into a single tube. i. Quantify the pool using Qubit HS assay kit, Quant-iT PicoGreen dsDNA Assay kit or equivalent. The pool should be between 0.5-5 ng/mL. Given that the expected amplicon size is 390 bp, and dsDNA is 660 g/mol/bp, determine concentration of the pool in molarity as below: 15. Prepare the 16S Read 1, Read 2 and Index sequencing primers: a. Resuspend the lyophilized primer in nuclease-free water to 100 mM. b. Incubate at room temperature for 2 min. c. Vortex for 10 s and then centrifuge the primers at a speed of 15,000 g for 30 s. 16. Thaw the MiSeq v2 reagent cartridge in a water bath at room temperature for at least 1 h before usage. 17. Denature and dilute the libraries to 6.5 pM: a. Add 10 mL of freshly diluted 0.2 N NaOH to 10 mL of 2 nM library in a clean microfuge tube. b. Mix by pipetting gently 10 times. c. Incubate for 5 min at room temperature. d. Quench the reaction with 980 mL of pre-chilled HT1 provided in the kit, vortex and put on ice. e. Further dilute the sample to 6.5 pM by adding 325 mL of the reaction from the previous step to 675 mL of HT1 in a new microfuge tube. f. Mix and chill until use. 18. Denature and dilute PhiX Control v3 (Illumina) to 6.5 pM: a. Add 2 mL of 10 nM PhiX and 8 mL nuclease-free water to a new microfuge tube. b. Add 10 mL 0.2 N NaOH to the tube. c. Mix by pipetting gently 10 times. d. Incubate for 5 min at room temperature. e. Quench with 980 mL of pre-chilled HT1 provided in the kit, vortex and put on ice. f. Further dilute the PhiX by adding 81.2 mL of the reaction from the previous step to 168.8 mL HT1 in a new microfuge tube. g. Mix and chill until use. 19. Mix the reagents in the MiSeq cartridge by inverting 5-10 times and to ensure all reagents have defrosted. Gently tap down the cartridge to settle the reagents to the bottom of the well. 20. Load the sequencing primers and library onto the MiSeq cartridge. Pierce wells with a clean 1 mL pipette tip and add: a. To well 12: 4 mL of Read 1 primer (100 mM). b. To well 13: 4 mL of Index primer (100 mM). c. To well 14: 4 mL of Read 2 primer (100 mM). d. To well 17: Prepare the final loading sample with 15% PhiX by adding 850 mL of 6.5 pM sample and 150 mL of 6.5 pM PhiX Control in a clean microfuge and load 600 mL into the well. 21. Run the sequencing on the MiSeq instrument using either Base Space or sample sheet. Troubleshooting 5.
Note: To ensure that there are at least 100,000 sequencing reads per sample, which is adequate for microbiome profiling, no more than 192 samples should be multiplexed per sequencing run given the MiSeq output of 20 million reads.
Note: 10%-15% of the high diversity PhiX spike-in is commonly used for best results when sequencing 16S libraries given its low diversity.

OPEN ACCESS
Bioinformatic analysis & data pre-processing

Timing: 3-5 h including data visualization
This step aims to generate amplicon sequence variants (ASVs), taxonomic classification of ASVs, calculation of alpha and beta diversity and multivariate statistical analysis. R programming was used. Specific to this study we analyzed results based on sex and housing groups. Computational resources used included a 3.1 GHz Dual-Core Intel Core i5 processor and 16 GB 2,133 MHz LPDDR3 memory.
As in Gubert et al. (2022), we present the specific steps to investigate sex and housing groups using R and graphical outputs. The R code below can be modified appropriately for similar statistical analyses, and for further improved analyses.
22. Process Illumina MiSeq sequence raw FASTQ data using Qiita for quality control, demultiplexing sequences and trimming to generate ASVs.
Note: Qiita has built in quality control. Default options can be used to run the pre-processing pipeline. Please refer to the Qiita Processing Data page for relevant quality scores.
23. Load all required libraries into R.
24. Read and pre-process the data. a. Read the ASV count table saved as an '.biom' file in the data folder using 'biomformat' R package and filter out the brain samples that were included in the biom file.
Example of an expected output of 'physeq_count_with_mit' R object: e. Remov non-bacterial sequences (i.e., mitochondria) from the phyloseq object. Example of an expected output of 'physeq_count' R object: 25. Calculate beta diversity measures (i.e., Bray Curtis distance and Unweighted Unifrac distance) and relative abundances for each sample by normalizing counts to 1.
26. Inspect the library sizes (i.e., the number of reads) in each sample to identify any heterogeneous library sizes using 'ggplot2' and 'gghighlight' R packages.
Note: Based on the results, 'Sample 65' is identified with a low library size. However, this sample is not removed from further analysis as there is no compelling reason to do so.
Example of an expected output: 27. Calculate alpha diversity using 'phyloseq' R package. a. Following Kong et al., 2018 reads are rarefied to 15,000 using 'phyloseq' R package. Since the lowest library size is reported to be 17871, all the samples are rarefied to 15,000.

OPEN ACCESS
Note: The rationale behind rarefaction is to adjust the differences in library sizes across samples to aid comparisons of alpha diversity. However, alpha diversity may not be accurate on rarefied data. Users interested in alpha diversity can normalize samples to a median sequencing depth, which is preferred over rarefaction (McMurdie and Holmes, 2014).
ii. Shannon diversity index, which considers richness and relative abundance or evenness of ASVs. iii. Inverse Simpson diversity index which also considers richness and relative abundance or evenness of ASVs but is less sensitive to rare species compared to the Shannon index.
28. Visualize alpha diversity using boxplots with the use of 'ggplot2' and 'ggpubr' R packages.  Example of an expected output: b. Visulaize PCoA for Unweighted Unifrac distance.

OPEN ACCESS
Example of an expected output: 35. Identify ASVs which segregate and contribute to stratification of samples according to housing conditions (for a given sex and genotype) using Sparse Partial Least Squares regression-Discriminant Analysis (sPLS-DA) (Lê Cao et al., 2011) using 'mixOmics' and 'dplyr' R packages. Example of an expected output for the female-HD data.
a. Choose the optimal parameters for sPLS-DA (number of components and number of variables to select on each component).
b. Run sPLS-DA with the optimal parameters, output as 'ncomp' and 'select.keep' from above.
c. Visualize sample plots with 0.95 confidence ellipse plots showing discrimination between housing conditions. ll OPEN ACCESS tify which fixed effects and their interactions may explain ASV abundance using 'nlme' R package. Example for the female data.
a. To account for compositional data Gloor et al. (2017), first transform raw counts using centred log-ratio transformation.
Example of an expected output:

EXPECTED OUTCOMES
From the DNA extraction, the positive control is expected to yield a DNA concentration of more than 2 ng/mL. The water blank is expected to yield a DNA concentration of close to 0 ng/mL. Fecal samples are expected to yield DNA concentrations of 2-100 ng/mL. Antibiotic treated mice may yield a DNA concentration less than 1 ng/uL. Adjust the elution volume as necessary. The acceptable value of A260/230 and A260/280 ratio should be between 1.8-2.1. Abnormal values for these two ratios may indicate contamination in the extracted sample which may hinder downstream assays. Low concentrations of nucleic acids may result in skewed 260/230 ratio.
When sequencing the 16S amplicon library, loading of 6.5 pM DNA library is expected to obtain the cluster density of 1,000-1,200 K/mm 2 using MiSeq v2 reagents, although this is highly variable depending on pipetting accuracy when quantitating DNA library concentration or liquid handling. Cluster density of around 800-1,000 K/mm 2 is also acceptable, lower data output is expected in this case. Both the Q30 and clusters PF should be more than 80%. The generated fastq.gz files are expected to be around 4-5 Gb in total, containing around 24-30 million paired end reads. For 96 samples, the sequencing will generate 125,000-150,000 reads per sample, with the Zymo positive control expected to have around or more than 50,000 reads, while the water negative control is expected to have 0-1,000 reads. website. Since microbiome data analysis is a fast-moving research area, several methods have been recently proposed, which will improve future research.

LIMITATIONS
The 16S rRNA gene sequences which are used in this protocol are widely used in prokaryotic strains. However, the primers used in the sequencing may be biased toward specific taxa (Martí et al., 2021). Thus, deep amplicon sequencing that includes 18S rRNA gene, internal transcribed spacer for eukaryotes; 16S rRNA and chaperonin-60 genes for bacteria; and Gene 23 and RNA-dependent RNA polymerase for certain viruses, would be more appropriate for a more holistic understanding of the microbiome (Uyaguari-Diaz et al., 2016).
In this protocol, we considered the reference database ''silva_nr99_v138.1_wSpecies_train_set. fa.gz'' for taxonomic assignment. In addition to SILVA 16S database (Pruesse et al., 2007) there are other databases that can be explored in the future for taxonomy classifications such as Ribosomal Database Project (Cole et al., 2009), Greengenes (DeSantis et al., 2006 and EzTaxon (Chun et al., 2007). However, these reference databases are currently incomplete, resulting in many unclassified sequences at the species level. In addition, there are numerous classification conflicts among these databases (DeSantis et al., 2006).
In the statistical analysis, we compared alpha-diversity measures across samples using rarefied data. However, if the rarefaction curves of the samples do not plateau with respect to library size, it indicates that the sequencing depth and coverage may not be sufficient to cover the total diversity in the gut. Thus, in those scenarios, other normalization methods such as median sequencing depth should be considered (McMurdie and Holmes, 2014).
Since microbiome data are compositional, several authors including Gloor et al. (2017) consider beta-diversity measures, Bray-Curtis and unifrac distances, inappropriate. Thus, recently proposed compositional beta diversity such as Information UniFrac, Ratio UniFrac (Wong et al., 2016) and robust Aitchison PCA (Martino et al., 2019) should be considered instead in future analyses. Using these distances will change the outputs presented in steps 33-35. We also note that PERMANOVA was used to test for microbial divergence among populations. However, other methods such as analysis of group similarities (ANOSIM), multi-response permutation procedures (MRPP), and Mantel's test (MANTEL) can also be used to test group differences in microbiome data (Xia and Sun, 2017).
Due to the cage effects in the data, we only use LMMs to identify which effects may explain ASV abundance. As discussed in Nearing et al. (2022), one should use multiple methods to ensure robust biological inferences when the sample size is large enough. For instance, if the analytical objective is to test for differential abundance then methods such as edgeR (Robinson et al., 2010), DESeq2 (Love et al., 2014), ANCOM (Mandal et al., 2015) or ZIGDM can be applied (Tang and Chen, 2019). Additionally, if the objective is to infer correlated taxa, then methods such as SparCC (Friedman and Alm, 2012), CCLasso (Fang et al., 2015), REBACCA (Ban et al., 2015), SpiecEasi (Kurtz et al., 2015), HARMONIES (Jiang et al., 2020) and SPRING (Yoon et al., 2019) can be used. There are also several statistical methods developed for longitudinal microbiome data .

TROUBLESHOOTING Problem 1
Urine in fecal samples.
Mice may urinate within the sterilized cage during fecal collection, which could potentially contaminate fecal pellets with the urobiome.

Potential solution
Ensure that no urine has contaminated the fecal pellets by only collecting feces that has not come in to contact with urine, via visual inspection of the area around each fecal pellet.

OPEN ACCESS
Problem 2 Low DNA yield from extraction.
Low DNA concentration during the extraction process may be observed in samples from mice that underwent antibiotic treatment, or when low amounts of raw fecal material was used for this process.

Potential solution
Use more raw fecal material during the extraction process or reduce the elution volume.

Problem 3
Potential PCR Contamination.
Genomic DNA contamination from other sources during PCR amplification can cause inaccurate or unreliable results.

Potential solution
Physically separate pre and post PCR areas within the lab, ensuring separate sinks, water purification systems, protocol supplies, equipment and storage spaces. Also make sure all PCR areas are cleaned daily using 80% ethanol to reduce risk of contamination. Ensure that a water blank is processed along with the samples for quality control. Taxa identified in the samples which were also found in water control post-sequencing and denoising can be removed from the analysis.
Problem 4 DNA does not amplify during PCR.
Samples may have low library yield post-PCR which may stem from lows DNA input or contamination in the extracted DNA which can inhibit PCR amplification. Excess amounts of DNA may also inhibit PCR reactions.

Potential solution
Make sure to check the DNA yield using fluorometric methods such as Qubit or Quant-iT assay. The purity of the extracted DNA should be checked using the spectrophotometer. If the A260/230 or A260/280 ratio is abnormal, DNA clean-up using SPRI beads can be performed. If DNA concentration is too high, the extracted DNA can be diluted with the elution buffer.
Problem 5 Low number of sequencing reads.
Samples may have low read counts post sequencing. This may stem from inaccurate DNA quantification of each library for normalization or when quantifying the library pool for loading onto the sequencer resulting in low sequencing output.

Potential solution
Ensure that fluorometric assays such as Qubit or Quant-iT kit are used for DNA quantification. Accurate pipetting techniques should be employed here as well. Samples may also be quantitated in replicates to improve accuracy.

Problem 6
Data interpretation in high-dimensional data.
16S sequencing produces high-dimensional data that are harder to interpret. Potential solution Data visualizations in projection-based methods (i.e., PCA, sPLS-DA) used in this protocol support data interpretation. For example, in step 36, the PCA plot visualizes the samples colored according to cages to detect potential cage effects in the data. Methods such as LMM produce p-values which can be used to infer significant genotype and housing effects on ASV abundance among the two sexes separately.

RESOURCE AVAILABILITY
Lead contact Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Anthony J. Hannan (anthony.hannan@florey.edu.au).

Materials availability
This study did not generate new unique reagents.

Data and code availability
The datasets and metadata related to this study have been deposited in the NCBI Sequence Read Archive. The accession number for the raw sequence reads reported in this paper is BioProject number PRJNA770470. The reproducible R code and report for the statistical analysis has been uploaded to a GitHub repository -https://github.com/SarithaKodikara/Gene_environment_gut_ interactions_in_Huntington-s_disease.