Species-level evaluation of the human respiratory microbiome

Abstract Background Changes to human respiratory tract microbiome may contribute significantly to the progression of respiratory diseases. However, there are few studies examining the relative abundance of microbial communities at the species level along the human respiratory tract. Findings Bronchoalveolar lavage, throat swab, mouth rinse, and nasal swab samples were collected from 5 participants. Bacterial ribosomal operons were sequenced using the Oxford Nanopore MinION to determine the relative abundance of bacterial species in 4 compartments along the respiratory tract. More than 1.8 million raw operon reads were obtained from the participants with ∼600,000 rRNA reads passing quality assurance/quality control (70–95% identify; >1,200 bp alignment) by Discontiguous MegaBLAST against the EZ BioCloud 16S rRNA gene database. Nearly 3,600 bacterial species were detected overall (>750 bacterial species within the 5 dominant phyla: Firmicutes, Proteobacteria, Actinobacteria, Bacteroidetes, and Fusobacteria. The relative abundance of bacterial species along the respiratory tract indicated that most microbes (95%) were being passively transported from outside into the lung. However, a small percentage (<5%) of bacterial species were at higher abundance within the lavage samples. The most abundant lung-enriched bacterial species were Veillonella dispar and Veillonella atypica while the most abundant mouth-associated bacterial species were Streptococcus infantis and Streptococcus mitis. Conclusions Most bacteria detected in lower respiratory samples do not seem to colonize the lung. However, >100 bacterial species were found to be enriched in bronchoalveolar lavage samples (compared to mouth/nose) and may play a substantial role in lung health.

In this study we utilized the Oxford Nanopore MinION to sequence nearly 78 complete bacterial ribosomal operons, resulting in longer sequencing reads [35,36] with species-level detection [37][38][39] in respiratory tract samples from 5 subjects. We also chose to use the MinION rRNA operon profiling because it has been shown to 81 quantitatively reflect relative changes in target gene abundance for the top 10% of the microbial community [39]. Our hypothesis was that microbial populations living within the lung will display a relative abundance gradient along the respiratory tract. Therefore, 84 samples were collected by bronchoalveolar lavage (BAL; indicated as "lung" in the figures), throat swab, mouth wash, and nasal swab for rRNA operon profiling ( Fig 1A).
Our hope was to distinguish those bacteria which displayed an outside-in pattern 87 (highest relative abundance in mouth/nose) from those bacteria with an inside-out distribution (highest relative abundance in the lung compared to the mouth/nose) ( Fig   1B). The critical sample to assess this pattern is the throat swab, representing an

Data description
Raw MinION sequence reads were collected as fast5 files with MinKnow (Oxford Nanopore Technologies), basecalled, separated by barcode, and converted to fastq 99 files using Albacore (v 2.2.7). Reads between 3700-5700 bp in length from each sample were imported into Geneious (v 11) and screened against the EZ BioCloud 16S rRNA gene database [40] by Discontinuous MegaBLAST to determine operational taxonomic 102 units (OTU) [39]. The top hit data were exported as a .csv file and analyzed using pivot tables in Excel. Fastq data is available at NCBI SRA (Bioproject # PRJNA564314).

Study Approval
This study was approved by the Institutional Review Board of Rutgers, The State University of New Brunswick (protocol #20140000953). All study subjects provided 111 signed written informed consent prior to any study interactions.
Human Subjects for the study 114 Six adult volunteers were recruited from patients who presented at Robert Wood Johnson Hospital for a scheduled diagnostic lavage primarily due to a suspicious shadow on a lung x-ray. They were asked by the admitting clinician (SH) if they were 117 interested in participating in a research study in which excess lavage sample will be analyzed for bacteria in their lung and provide a series of non-invasive samples (e.g. throat and nose swab, oral cavity rinse). They were assured that the answer as to 120 whether they choice to participate would not affect their medical care. The follow-up diagnosis was not obtained for these subjects.

Bacterial DNA Extractions and Purification
Bronchial lavage (BAL), throat swabs, mouth wash, and nasal swabs collection was done or overseen by the attending physician (SH Houston, TX, USA) as previously described [39].

Library Preparation and Sequencing by MinION
MinION library construction employed the 1D sequencing kit (SQK-LSK108-Oxford Nanopore; Oxford England). Two 12 barcoded amplicons (1800 ng total in each) 8 were combined, end-repaired, dA-tailed as per ONT instructions using NEB kits (New England Biolabs, Ipswich, MA, USA) and the modified Ampure bead purification described above. Ligation of the ONT adaptor employed the Blunt/TA ligase master mix 156 (NEB) with an addition of 1 µL of freshly-prepared ATP solution (~4 mg/mL) to facilitate ligation. All libraries were analyzed on R9 flow cells.
Availability of data and materials-All raw sequence data is currently being made 159 available at NCBI SRA (Bioproject # PRJNA564314).

Quality control
162 BAL, throat swab, mouth wash, and nasal swab samples were collected from 6 subjects, DNA was extracted, and rRNA operons were amplified (with universal rRNA operon primers and barcode primers). Unfortunately, 1 lavage sample from Subject 1 165 failed to properly amplify (Suppl. Fig 1) and the remaining respiratory samples from this subject were included in overall community analysis but the samples from this particular subject were not characterized for lung enrichment by relative abundance. A total of 168 ~1.8 x10 6 raw reads were obtained, of which ~1.2 x 10 6 reads passed Albacore basecalling and were separated by barcode. After size selection (3.7-5.7 kb), a total of 623,271 barcoded sequences were screened against the EZ Biocloud database by 171 Discontinuous MegaBlast (Suppl. Fig 2). Of these BLASTED reads, a total of 599,053 sequences passed an additional QA/QC step, having an identity between 70-95% and an alignment with >1200 bp of the 16S rRNA genes in the database (Suppl. Fig. 3). 174 Data validation 9 The BLAST screening indicated the respiratory tract was dominated by 5 phyla: Firmucutes, Proteobacteria, Actinobacteria, Bacteroidetes, and Fusobacteria 177 (representing over 98% of the QA/QC reads) ( Fig. 2A). The number of different species within the top 5 genera of these abundant phyla are presented in Fig 2B  To identify lung-enriched bacteria genera and species, we subtracted the read 207 counts of mouth and nose from bronchial lavage counts after normalization for each subject. Over 1300 lung-enriched bacterial species were discerned across all samples.
However, most of these differences in read counts were <50 which may represent 210 methodological variation in raw read results from MinION sequencing. Our prior work has shown that replicate read numbers of >100 have a coefficient of variation of ~12% or less [39]. Therefore, a conservative threshold of 150 read differences was used to 213 define those bacteria enriched in the lower respiratory tract. This yielded 114 bacterial species from all subjects with a stronger rRNA operon signal in bronchial lavage compared with the higher respiratory tract samples (Suppl. Table 1). To determine 216 whether comparable lung-enrichment was observed for the subjects for particular OTUs, a heat map was generated using the lung-mouth and lung-nose read differences in relative abundances which were >150 reads (Fig. 4). Overall, those lung-enriched 219 OTUs were nearly equally in the bronchial lavage samples for subject 6, 12, and 15.
The predominant lung-enriched bacterial genera for this group were Veillonella spp., micronuciformes. In contrast, subject 7 and 8 were largely missing these particular However, because of the low biomass within the lung, end-member analysis to determine the microbial differences along the respiratory tract is difficult to verify. 255 Furthermore, studies resolving only from the bacterial genus to phylum levels will not detect differences within bacterial species or strain levels from the lung. In this study, Re-use potential 14 Our study found over 100 different bacterial species which are capable of colonizing the human lung and followed an inside-out distribution with respect to upper respiratory 291 samples. Understanding which specific bacteria can colonize the lower respiratory tract will help in discerning which microbiota constitute a "healthy lung microbiota" and provide a diagnostic tool for studying the role of the microbome in the development of 294 lung-related diseases.           Click here to download Figure Figure 4.pdf Histogram of normalized reads for Veillonella spp Figure 5 Click here to download Figure Figure 5.pdf Fig. 6 Histogram of normalized reads for Streptococcus spp Figure 6