Hidden in the blow - a matrix to characterise cetaceans’ respiratory microbiome: short-finned pilot whale as case study

Cetaceans are key sentinel species of marine ecosystems and ocean health, being a strategic taxonomic group that evaluates the well-being of aquatic habitats and detects harmful environmental trends. Respiratory diseases are amongst the main causes of death in these animals, so identifying the microbiome community in their exhaled breath conden-sates (EBC), i.e. blow, has been proposed as a key biomarker for assessing respiratory health. Yet, to characterise microbiomes related to these animals’ respiratory tract and use them as a proxy for health status, it is necessary to develop baseline data on the microorganisms associated with cetaceans. Here, the short-finned pilot whale (SFPW, Glo-bicephala macrorhynchus ) was used as a case study to validate the most suitable primer set to explore the prokaryotic diversity of the cetaceans’ respiratory tract. DNA extracted from blow samples (n = 12) of animals off Madeira Island was sequenced to amplify both V3-V4 and V4-V5 hypervariable regions of the 16S rRNA gene, using the same sequencing platform (Illumina MiSeq). Independently of the primer set used, all blows shared Actino-bacteria, Bacteroidetes, Firmicutes and Proteobacteria phyla in their composition. V3-V4 resulted in a higher diversity of taxa with relative abundance above 1%, whereas the V4-V5 primers captured a higher number of microbial Amplicon Sequence Variants, detecting the rare microbial biosphere with pathogen potential. Additionally, it captured the core microbiome more efficiently. Thus, this study provides a detailed characterisation of SFPW respi-ratory-associated microbial communities, strengthening the idea of sociality influencing microbiome composition in the respiratory tract. Moreover, it supports the use of blow as a relevant biomarker for the physiological state of the airways in free-ranging cetaceans.


Introduction
As keystone species, cetaceans play vital ecological functions, considering their role as nutrient vectors, position in the food chains and use as bioindicators of environmental health (Ballance 2018).Currently, multiple global threats pose a serious concern to the conservation of these aquatic mammals (Evans 2018), affecting individual health and ultimately compromising population viability (Nicol et al. 2020).Addressing their impacts on cetacean populations is, therefore, crucial yet challenging, as many species are elusive and inhabit remote habitats with wide distributions (Parsons et al. 2015).
Respiratory tract infections make a substantial contribution amongst the primary causes of death observed in these marine mammals (Díaz-Delgado et al. 2018;Cuvertoret-Sanz et al. 2020).Therefore, the identification of the microbiome community existent in their exhaled breath condensates (EBC), i.e. blow, has been proposed as a key biomarker for assessing respiratory health and to provide information for conservation management (Acevedo-Whitehouse et al. 2010).Cetacean blow is a relatively unexplored subject (Mello and Oliveira 2016), though more recent studies study it as a biological matrix (Atkinson et al. 2021;O'Mahony et al. 2024).It is mostly composed of the host's DNA, hormones, respiratory microbes, a range of metabolites and compounds related to inflammation and the immune system (Hunt et al. 2013).Several studies have reported important differences in the microbial communities detected in the blow of cetaceans when compared to those occurring in the external environment, thus being considered specific to cetaceans' respiratory tract (Pirotta et al. 2017;Geoghegan et al. 2018;Vendl et al. 2021).Within this material, potential specific microbial pathogens have been identified (Acevedo-Whitehouse et al. 2010), highlighting the value of using the blow collection and characterisation as a method for monitoring respiratory microbial communities in cetaceans (Lima et al. 2012).
Previous research on the airway microbiota of cetaceans resorted to the analysis of blow samples, using different sampling methodologies, DNA extraction and amplification and targeting different hypervariable gene regions, some of these using a metabarcoding approach.Nevertheless, the majority of data accessible on cetacean-associated microbiome, namely pathogens, diseases and parasites, come from captive, stranded, sick or injured individuals, which cannot be considered representative of the free-ranging populations (Johnson et al. 2009;Acevedo-Whitehouse et al. 2010;Lima et al. 2012).Therefore, although some studies have provided insights from free-ranging baleen whales (Apprill et al. 2017), knowledge of the respiratory microbiome of free-ranging cetaceans is rather limited.Here, we focus on short-finned pilot whales (Globicephala macrorhynchus; SFPW), a thoroughly studied species due to its global distribution, abundance and propensity to mass strandings (Betty et al. 2023).Living in social groups with long-lasting relationships and exhibiting distinct levels of site fidelity to specific areas (Boran and Heimlich 2019; Aguilar de Soto and Alves 2023), SFPW represents an interesting case-study of how social complexity might shape microbiome composition.
To the best of our knowledge, there are no published studies on the SFPW respiratory microbiome.In addition, no previous studies on other cetacean species have compared the amplification targeting different hypervariable gene regions using the same methodology on the same samples.In light of this, the present study addresses the respiratory microbiome of free-ranging cetaceans, using SFPW as a model species.The main goal is to provide a network of consistent microbiome core taxa of the SFPW blow, by comparing V3-V4/V4-V5 hypervariable regions of 16S rRNA gene.The V4-V5 regions are currently recommended to target marine microbes (including both bacteria and archaea) by the Earth Microbiome Project (Walters et al. 2015).On the other end, the primers 341F and 806R, used in this study to amplify the V3-V4 regions of the 16S rRNA bacterial and archaeal genes, were already applied in a previous blow study (Centelleghe et al. 2020).Other prior studies analysing the blow microbiota have amplified the regions V3-V4, but used different primers in relation to the present study (Bik et al. 2016;Robles-Malagamba et al. 2020).The expected outputs will serve as a baseline to set a working and optimised methodology for SFPW health monitoring, from sample collection to laboratory analysis.

Blow Sampling
Blow sampling was conducted in September and October 2018, during at-sea research monitoring campaigns in the southern waters of Madeira Island, Portugal, targeting SFPW (Fig. 1).In addition to blow collection, the natural behaviour (travelling, resting, feeding or socialising), age class (following Betty et al. (2023); Aguilar de Soto and Alves ( 2023)) and the number of individuals sampled in the group (Suppl.material 1: table S1) were recorded.
Sample collection was carried out using a PERFORMAgene TM PG-100 swab collection kit (DNA Genotek®).This kit was attached to an extendable 5-metre aluminium pole and used as a sampling device to collect the blow (Suppl.material 1: fig.S1).As described by the company (www.dnagenotek.com),the "PERFORMAgene is a simple all-in-one non-invasive swab kit for the collection, stabilisation and transportation of animal DNA samples", being very practical to handle.Additionally, with this kit, the samples are stable for a year under room temperature conditions, which allows for some safe flexibility in terms of temperature fluctuation during fieldwork and transportation processes.An 8-metre rigid inflatable boat with an outboard 150 horsepower engine was used to slowly approach the animals (under permits 508/2018 and 10661/2018 from the Instituto de Florestas e Conservação da Natureza IP-RAM) and when they surfaced to exhale, the blow collector device was positioned about 40 to 50 cm above the blowhole into the exhaled plume to collect the droplets in the blow.Simultaneously, the sampled animals were photographed to distinguish between sampled individuals and to compare with the (OOM/MARE-ARDITI) Madeira's photographic-identification (ID) catalogue of the species following Alves et al. (2019).The latter allowed confirming if they belong to the island-associated population of short-finned pilot whales in Madeira (i.e.regularly captured in different seasons throughout the years; following Alves et al. (2013Alves et al. ( , 2020))) or if they were transient (i.e.captured only once), with the categories detailed in the Suppl.material 1: table S1.Nevertheless, the residency pattern was attributed as a reference and no statistical comparison was made due to the low number of individuals/samples in each category.

DNA extraction
DNA extraction from blow samples was performed using the QIAmp® DNA Mini Kit (QIAGEN), following the manufacturer's instructions.An additional sample concentration step in an Eppendorf Concentrator Plus™ was added to increase the concentration of the extracted DNA from blow samples.A Qubit™ 3 Fluorometer with a Qubit™ dsDNA High Sensitivity (HS) assay kit (Invitrogen™) was used for DNA quantification, after running negative (0 ng/µl dsDNA) and positive (10 ng/µl dsDNA) controls.
Amplification and sequencing of both regions were performed through a dual-step PCR protocol followed by high-throughput sequencing.The PCR reactions included 2.5 μl of template DNA in a total volume of 25 μl.PCR conditions involved a 3 min denaturation step at 95 °C, followed by 35 cycles of 98 °C for 20 s, 60 °C for 30 s and 72 °C for 30 s and, finally, an extension stage at 72 °C for 5 min.A second PCR reaction was performed to add indexes and sequencing adapters to the target region, according to manufacturer's recommendations (Illumina 2013).Negative controls without templates were included in all PCR reactions.A detailed description of the protocol is reported in Ribeiro et al. (2018).Briefly, the KAPA HiFi HotStart PCR Kit was used in the first PCR reaction following manufacturer's suggestions.Indexes and sequencing adapters were added to the target amplicon in the second PCR reaction.The amplified products obtained were purified and normalised with a SequalPrep 96-well plate kit (ThermoFisher Scientific, USA).Pair-end sequencing was carried out in the Illumina MiSeq® sequencer with the V3 chemistry (Illumina, Inc., San Diego, CA, USA) at Genoinseq laboratories (Cantanhede, Portugal).

Upstream analysis
The "DADA2" software package (v.1.20)(Callahan et al. 2016) on R studio (v.4.1.1)was used to process the FASTQ files obtained after Illumina MiSeq sequencing.Initially, the raw sequences were quality-filtered.Raw reads were truncated at 275 bp and 215 bp for forward and reverse sequences, respectively -the position where the final sequence numbers were higher.Afterwards, the trimmed sequences were dereplicated, denoised and merged.Sequences were then processed to obtain the Amplicon Sequence Variants (ASVs) table for taxonomic resolution.In the ASVs, sequences differ to the level of a single-nucleotide (Callahan et al. 2017).Additionally, chimeric sequences were identified and excluded.To determine the taxonomic classification of each ASV, the Naïve Bayes classifier was used against the SILVA database v.132 (Quast et al. 2013).SILVA is a curated taxonomic database that provides comprehensive, quality-checked and regularly updated datasets of aligned 16S ribosomal RNA sequences and is predominantly used for microbial molecular identification.

Downstream analysis
The ASV counts and taxonomy tables from the upstream analysis, together with the metadata table containing the sample information (Sample ID, sampling day and geographical coordinates and residency pattern), were used as input for the "phyloseq" R package (v.1.36)for downstream analysis (McMurdie and Holmes 2013).In summary, "phyloseq" analyses and graphically displays complex phylogenetic sequencing data already clustered into ASVs.The ASVs that were taxonomically unclassified at phylum rank or were not assigned to bacterial or archaeal lineages and the undesirable lineages, such as "Chloroplast", "Eukaryota" and "Mitochondria", were excluded from further analysis.
The distribution and diversity of the prokaryotic community across the different used primer sets were investigated.Alpha diversity was analysed for different subsets of samples by calculating four different indexes, also using "phyloseq" package: Observed and Chao 1 for the estimation of unique ASVs abundance and Shannon and Inverse Simpson as species diversity measures.Differences in alpha diversity were tested using the Wilcoxon signed rank test, considering a level of significance of 0.05.Moreover, the beta diversity was analysed using the "vegan" package and a Non-metric MultiDimensional Scaling (NMDS) plot, based on the Bray-Curtis dissimilarity ("phyloseq" package).This measure is a statistical index used to quantify the compositional dissimilarity between two different sites, depending on the two communities' counts of shared and non-shared taxa (Bray and Curtis 1957).A PERMANOVA statistical test was implemented to test if the set of primers used had a significant effect on the prokaryotic communities in the different blow samples.Therefore, two hypotheses were generated: H 0 : The use of different primer sets does not influence the distribution of prokaryotic communities; H 1 : The use of different primer sets influences the distribution of prokaryotic communities.The level of significance was set to 0.05.

Core microbiome
To identify the common set of microbial taxa that originated for each of the datasets, the core microbiome was calculated using the "phyloseq" package "microbiome" (Lahti and Shetty 2017).The core microbiome was determined using the "core_members" function, considering a detection threshold of 0.001 and a prevalence of 50% in all blow samples.The comparative analysis of the primers used to identify this core microbiota was then carried out using a NMDS plot, based on Bray-Curtis dissimilarity.

Blow sampling characterisation
A total of 12 blow samples were collected from six sampling events (Fig. 1), of which nine were from individual animals and three from a pool of animals (Table 1, Suppl.material 1: table S2).The photographic-ID comparison showed that the nine individual samples corresponded to seven individuals, i.e. two samples were obtained from the same animals in two separate sampling events (Table 1, Suppl.material 1: table S2).

Sequencing output
After the sequencing process, a total of 685,692 reads (with a mean of 57,141 ± 7,920.8 reads per sample) was obtained for the V3-V4 dataset; and 862,763 for the V4-V5 (with a mean of 71,896.9± 12,990.5 reads per sample).After the quality filter steps, the V3-V4 dataset showed a lower decrease in the number of sequences throughout the workflow compared to V4-V5, with 60.4 ± 5.7% and 48.5 ± 7.7% of sequences retained per sample, respectively.Therefore, the non-target sequences were eliminated.Consequently, the final output was 415,340 sequences in the V3-V4 dataset that were assigned to 2,764 ASVs (with a mean of 34,611.7 ± 6,228.9 reads per sample) and 417,733 sequences in the V4-V5 dataset that were assigned to 3,665 ASVs (with a mean of 34,811.1 ± 7,554.7 reads per sample).

Alpha and beta diversity
Alpha diversity metrics varied with the set of primers used (Fig. 2).The Observed ASVs and Chao1 had lower minimum/maximum and mean values in the V3-V4 dataset relative to values of the V4-V5 dataset.On the other hand, alpha indices (Shannon and Inverted Simpson) values were higher for the V3-V4, although with an evident high standard deviation.For the V4-V5 dataset, despite lower values for these aforementioned diversity indices, there was less variation within samples (lower standard deviations).In the V4-V5 dataset, sample Gma_03 was an outlier, presenting the highest values in Observed and Chao1 indexes.The Shannon and Inverse Simpson indexes demonstrated more ASVs diversity in the V3-V4 region.The Wilcoxon Signed-Rank Test -used to compare the blow community richness for the Observed, Chao1 and Inverse Simpson measures -was significantly different between the two primers sets (see Suppl.material 1: table S3), with ca.32% more bacterial ASVs in the V4-V5.Statistically significant differences were not found for the Shannon index.
The NMDS plot for beta diversity of microbial communities depicted a separation of the samples, based on the used primer set, with four distinct main clusters, two for each primer set (Fig. 3).One of the clusters of the V3-V4 dataset was composed of samples Gma_01a, 02, 03, 04 and 06 and the other cluster with samples Gma_01b, 07a, 07b, P1, P2 and P3.Gma_05 was an outlier for this dataset in this analysis.In the V4-V5 dataset, one of the clusters was   composed of Gma_01a, 01b, 02, 03, 04 and 05 samples and the other one of Gma_06, 07a, 07b, P1, P2 and P3.Blow samples from individuals travelling together clustered in the same group for the V4-V5 dataset, which was not the case for the V3-V4 dataset.The most similar values were obtained for the samples of the individuals Gma_04 and 05 (sampled on the same day and in the same group of animals), within the V4-V5 dataset.
Regarding the PERMANOVA test results, the primer set influenced the prokaryotic communities' distribution (p = 0.006).Specifically, approximately 25% (R 2 = 0.249) of the prokaryotic community distribution variation was explained by using different primers (Suppl.material 1: table S4).

Blow taxonomic characterisation
The composition of the blow prokaryotic communities was investigated on the two datasets (V3-V4 and V4-V5) to analyse the differences and commonalities between using the two primer sets.Considering the richness, 1,890 bacterial and 38 archaeal taxa were recorded in the V3-V4 dataset, whereas V4-V5 recorded 2,327 bacterial and 30 archaeal taxa.Taxonomic analysis at the phylum level (Fig. 4A) revealed a core of four most abundant phyla identified by the two datasets: Actinobacteria, Bacteroidetes, Firmicutes and Proteobacteria.In the V3-V4 dataset, the prokaryotic community was distributed across 15 different phyla, seven of them had more than 1% of the total proportion of all the ASVs sequences of this dataset.This dataset was dominated by sequences of the phylum Proteobacteria (47%).On the other hand, V4-V5 was distributed within 21 phyla, with only five phyla with more than 1% of the total proportion of the V4-V5 dataset.This dataset was dominated by Proteobacteria (37%) and Actinobacteria (34%).Cyanobacteria was present in more samples of the V4-V5 dataset.
Regarding the family level (Suppl.material 1: fig.S2), the prokaryotic community was distributed amongst 108 families and 24 families comprised more than 1% of the total proportion.This dataset was dominated by sequences of the family Pseudomonadaceae (18%), whereas V4-V5 was distributed by 153 families and only 17 families with more than 1% of the total proportion.Although the V4-V5 dataset presented more families, most had a relative abundance of < 1%.Propionibacteriaceae (25%) was the predominant family of this dataset.A comparison between the different datasets regarding the sequence proportion of the major taxonomic families identified can be found in the supplementary material (Suppl.material 1: fig.S3).
Concerning the genus level (Fig. 4B), the V3-V4 dataset detected 24 genera (with more than 1% of the total proportion of the sequences) associated with 128 different ASVs.This dataset was dominated by sequences of the genus Pseudomonas (18%), with only three ASVs associated with this genus.On the other hand, the V4-V5 dataset had only 14 genera (with more than 1% of the total proportion) associated with 79 different ASVs and dominated by Cutibacterium (25%).The top 10 genera included some common taxa to both datasets: Cutibacterium, Dyella, Flavobacterium, Prochlorococcus_MIT9313, Pseudomonas, Rhodococcus and Sphingomonas.In the V3-V4 dataset, the ASVs were merged into 104 different genera and 53 lineages affiliated with higher taxonomic ranks.In the V4-V5 dataset, the ASVs were merged into 184 different genera and 68 lineages affiliated with higher taxonomic ranks.Overall, at this level, 110 (36.8% of the total) lineages were observed in both datasets.Considering all the taxa, in the V3-V4 dataset, 47 (15.7% of the total) lineages were absent from the V4-V5 dataset.Moreover, in the V4-V5 dataset, 142 lineages (47.5% of the total) were absent from the V3-V4 dataset.

Core microbiome
The NMDS for the core microbiome analysis demonstrated an obvious separation of the samples based on the primer set used, making it possible to differentiate three distinct main clusters: two originated from the V3-V4 dataset and a unique cluster for the V4-V5 dataset (Fig. 5).

Sampling methodology
The main goal of non-invasive sampling techniques for cetaceans is to avoid disturbing, injuring or negatively influencing the sampled individual during sample collection (Robinson and Nuuttila 2020).Apparently, none of the SFPW sampled here was negatively impacted by the sampling procedure, although, for future work, we suggest designing methodologies parallel to sampling to monitor the animals' behavioural responses, as tested by O'Mahony et al. (2024).This is based on no evident changes in the animals' swimming patterns or anomalous behaviour during sampling, suggesting that the blow-sampling method causes minimal distress to the individuals.
The sampling kit used (PERFORMAgene) has been employed in previous studies and applied to different animals, such as livestock (Neary et al. 2014) and pets (Sacco et al. 2017;Colpitts et al. 2022).This kit is suitable for genotyping, sequencing, parent-child appraisal and biobanking (Foley et al. 2011).Until now, no studies inliterature have reported using the PERFORMAgene kit to sample the blow of free-ranging cetaceans.In the present study, this kit was tested for the first time to collect blow samples from SFPW.DNA was extracted from the swab of the PERFORMAgene kit and, although low concentrations were obtained (Suppl.material 1: table S2), the method allowed for the employment of metabarcoding analysis.

Comparison of primer sets
In this study, the prokaryotic community harboured in the respiratory tract of SFPW was described for the first time and a comprehensive comparison of the performance of two different 16S primer sets' (V3-V4 and V4-V5) was conducted.Both hypervariable 16S regions are recommended in literature for assessing marine microbial diversity (Klindworth et al. 2013;Walters et al. 2015).
Using different hypervariable regions, as well as different types of samples, storage, methods of DNA extraction and 16S databases, can influence the obtained data and the interpretation of the results.In McNichol et al. (2021), that compared commonly-used primers, with > 300 million rRNA gene sequences retrieved from marine metagenomes around the world, the best-performing primers, when comparing predicted median coverage to Bacteria and Archaea of 16S rRNA, were 515Y/926R (regions V4-V5) and 515Y/806RB (region V4).The V4-V5 regions are currently recommended to target marine microbes (including bacteria and archaea) (Walters et al. 2015).On the other hand, prior studies analysing the blow microbiota of cetaceans have amplified the regions V3-V4 (Bik et al. 2016;Robles-Malagamba et al. 2020).Specifically, Centelleghe et al. (2020) used the primers 341F and 806R (as in this study) to amplify the V3-V4 regions of the 16S rRNA bacterial and archaeal genes.One of the aims of the present research was to assess the advantages and limitations in the amplification of both V3-V4 and V4-V5 regions regarding prokaryotic diversity and richness without the influence of technical biases (same experimental and bioinformatic processes on the same samples).Moreover, with this work, we also intended to select an optimal PCR primer set (or different applications of each primer set), that can be applied to the study of the microbiome of cetacean blow samples.The results denote that the different hypervariable regions tested provide different degrees of resolution in taxonomic identification, resulting in different estimates of microbial community composition.
Alpha diversity measures, captured by each primer set, differed significantly.Our results showed that the V4-V5 dataset captured more abundance in unique ASVs (higher values in the Observed ASVs and Chao1 measures) and, subsequently, more identified taxa.On the other hand, V3-V4 resulted in higher values for the applied alpha diversity indexes (higher values in Shannon and Inverse Simpson measures).This is probably explained by the fact that these used indexes only consider unique ASVs in relative abundance above 1%.Beta diversity revealed that samples were grouped according to the primer sets used.Despite separating the samples in different clusters in relation to the primer set used, the V4-V5 dataset appeared to better represent the distribution of prokaryotic communities.In this dataset, individuals that were travelling together when they were sampled (namely Gma_2/Gma_3, Gma_4/Gma_5 and Gma_7/Gma_P1/Gma_P2) appeared within the same cluster and had more approximate values in the NMDS when compared to the V3-V4 dataset.There is evidence that sociality affects microbes in the respiratory tract (Vendl et al. 2020), which seems to be the case for the SFPW sampled in the present work.Some types of behaviour, such as surfacing and breathing near each other or feeding cooperatively, could facilitate the transfer of microbes and the spread of pathogens between individuals (Bogomolni et al. 2008;Apprill et al. 2017).Such types of behaviour are common in highly social and matrilineal species, such as the SFPW (Olson 2009;Boran and Heimlich 2019), as demonstrated in the target population (Alves et al. 2013;Esteban et al. 2022) to which the sampled animals belong.These processes have been recognised as exclusive and important aspects of social living, providing animal health benefits and acting as a driving force in the evolution of social behaviour (Lombardo 2008).Nevertheless, this hypothesis needs further evidence, since the V3-V4 dataset does not corroborate these results, possibly not reflecting the true blow prokaryotic community composition.Vendl et al. (2020) targeted solely the V4 region to study the microbiome in the blow of different whale species and observed a species-specific clustering in the microbiome beta-diversity, detecting a positive correlation between sociality and microbial diversity.Analysing the microbial composition in the breath of a representative number of individuals with different levels of site fidelity to the study area -which could not be statistically assessed in the present study due to a low number of individuals from each category -would be an interesting future line of enquiry, particularly as this would further shed light on the effect of sociality and habitat use on breath microbiota.Additionally, this type of comparative study could be paired with other predictors of health, such as photogrammetry-sourced measures of body condition.
The analysis of the blow core microbiome could provide useful features for the health monitoring of cetaceans worldwide.All samples from both datasets shared a main core microbiota in their blow, composed of Actinobacteria, Bacteroidetes, Firmicutes and Proteobacteria phyla.Nevertheless, the dominant ASVs were not the same between the results obtained from the amplification of V3-V4 and V4-V5 regions.Lima et al. ( 2012) described a temporal analysis of the blow in captive dolphins, suggesting that microbial community composition in healthy animals is quite stable and that individual dolphins harbour consistently unique microbial communities.Several studies provided preliminary evidence that cetaceans host a core group of bacteria associated with the respiratory system (Johnson et al. 2009;Lima et al. 2012;Bik et al. 2016;Apprill et al. 2017).Apprill et al. (2017) detected, on average, 25 sequence clusters in 100% of humpback whales, accounting for 36% of the microbiota, being one of the most extensive core microbiotas found in any mammal.This large core microbiome was shared across individual whales from populations separated geographically into two ocean basins.Vendl et al. (2019) found low microbiota richness and a small core microbiome in humpback whales off the coast of Australia.Those sampled animals had been migrating for four months, which is the fasting period; whereas the animals sampled in the study by Apprill et al. (2017) were at their foraging sites and in their early stages of migration.Vendl et al. (2019) hypothesised that the lack of a core microbiome might be related to the animals' physiological state at the sampling time.
In this study, at a phylum level, the most dominant taxa recovered from both datasets were Proteobacteria.This aligns with previous studies of other species of cetaceans, either of baleen whales (Apprill et al. 2017;Pirotta et al. 2017;Vendl et al. 2019;Atkinson et al. 2021) and of dolphins w (Nelson et al. 2019;Centelleghe et al. 2020;Atkinson 2021) where this taxon is highly abundant.Lima et al. (2012) showed that the aforementioned common phyla, here present in both datasets (Proteobacteria, Actinobacteria, Bacteroidetes and Firmicutes), plus the Fusobacteria and unclassified bacteria, represented 98% of the total community composition in bottlenose dolphins.In the present work, Fusobacteria was also detected, but in lower abundance (< 1% of the total sequence reads) in both datasets, contrasting with the results from Nelson et al. (2019) where this taxon accounted for 7,2% of the total community.The presence of Cyanobacteria, a seawater phylum, in blow samples, is not surprising, having been detected in previous studies of the cetacean blow microbiota (Pirotta et al. 2017).Control water samples were not taken in this study, but we consider this to be extremely important for future work, in order to analyse the extent to which the detected microbial composition of the blow samples is influenced by seawater at the time of sampling.Geoghegan et al. (2018) and Centelleghe et al. (2020) have previously reported the presence of prokaryotes in blow samples that are also common to the microbiome found in surface layers of seawater.In both studies, the authors demonstrate a lower abundance of common prokaryotic taxa between the microbial composition of the blow and seawater, also showing a clear distinction between them.Epsilonbacteraeota, a phylum with a marked presence in the V3-V4 dataset, has already been identified as part of the core microbiota of bottlenose dolphins' respiratory system (Lima et al. 2012).In the current study, we found the phyla Euryarchaeota, Dadabacteria, Kiritimatiellaeota, Acidobacteria and Deinococcus-Thermus exclusively in the V4-V5 region, which were not previously reported in studies investigating the blow microbiota of various cetacean species.Similarly, Dependentiae in the V3-V4 dataset has not been documented in prior research.Contrarily, Planctomycetes, Chloroflexi (Bik et al. 2016) andSpirochaetes (Lima et al. 2012;Bik et al. 2016;Vendl et al. 2021), here only found in the V4-V5 dataset, were described in other studies concerning blow microbiota.
Regarding the genera level, Cutibacterium (dominant in the V4-V5 dataset) is a typical dominant microbial community in the nasal microbiota (Kumpitsch et al. 2019).The genus Prochlorococcus, detected in both datasets, is one of the most abundant bacteria in the ocean (Zehr et al. 2017) and it is probably a seawater-associated bacterium within the SFPW blow.Four of the most abundant genera detected in both datasets (Pseudomonas, Flavobacterium, Rhodococcus and Sphingomonas) are potential pathogenic genera within the blow microbiome, commonly related to different diseases, such as various pulmonary infections (Higgins 2000;Venn-Watson et al. 2008;Venn-Watson et al. 2012;Apprill et al. 2017).This is not necessarily indicative of disease since not all organisms belonging to these taxa are pathogenic.The V4-V5 dataset detected more diversity of the less abundant taxa in the microbiome composition of the blows, therefore highlighting the value of this primer set to detect the "rare biosphere" in the blow.This can be especially relevant, considering the low-represented taxa that can potentially have a pathogenic role in causing respiratory diseases.Different potentially pathogenic genera were less abundant in the blow samples analysed in Nelson et al. (2019); Staphylococcus with 0.01% relative abundance of identified genera in blow samples and Streptococcus with 0.09%.Identification of taxa with pathogenic potential may, therefore, be of special relevance in assessing the health status of cetacean populations, such as the SFPW targeted in this case study.Here, the identification of various pathogenic-potential genera across all samples may suggest that this resident population of SFPW might be vulnerable to respiratory disease.Nevertheless, the low number of samples and low definition in the microbiome results (not being able to identify the pathogenic species as previously discussed) do not allow us to properly infer the population vulnerability.We would also require additional data from transient individuals found within the study area and/or additional samples from the same species, but in other areas to eliminate the hypothesis that this microbiome might be a baseline within healthy individuals of this species.Other health assessment methodologies, such as nucleic acid-derived indices, have also recently been tested to study the ecophysiological traits of these animals that commonly occur in the surroundings of the Island of Madeira (Alves et al. 2020).In this cited study, the authors concluded that this SFPW population showed good ecophysiological conditions, although significantly lower when compared to other species from the same study area, which could be due to interspecific variations and not environmental conditions.Thus, the present blow microbiome work then provides complementary information to evaluate the health status of this marine mammal population.
Regarding the comparison of the core microbiome between the different datasets used, our results show that V4-V5 provides less variation in the data obtained from all the samples, with all the blow microbiomes showing similarity between them.Furthermore, it is clear that, within the same differentiated cluster for this dataset, the NMDS values for the sampled individuals which travelled together (Gma_2/Gma_3, Gma_4/Gma_5 and Gma_7/Gma_P1/ Gma_P2) are tendentiously close.This reinforces the influence of sociality in the microbiome composition, similar to what was inferred in the beta-diversity analysis.Therefore, our results suggest that the V4-V5 dataset could be more consistent in determining the core microbiome present in the respiratory tract of free-ranging SFPW, showing less variation in this parameter of tested samples.Nevertheless, the higher variation documented by V3-V4 may also be relevant since it provides a broader level of representation of the detected taxa within the blow.In this regard, a combination between the two primer sets could be complementary and represent a more robust way of characterising the microbiome.This is a widely used approach for metabarcoding studies as the combination of different primers have shown to improve taxa coverage, while also helping to reduce diversity bias (Lee et al. 2023).

Conclusions
The comparison between the primer sets showed that all samples from both datasets (V3-V4 and V4-V5) shared the main taxa composed of Actinobacteria, Bacteroidetes, Firmicutes and Proteobacteria phyla.This study provides a detailed characterisation of the microbial richness present in the blow of SFPW across multiple taxonomic levels.Following this work's results, it is concluded that the primer set selection for the microbiome assessment in cetacean blow samples should depend mainly on the goal of the analyses.If the main goal is to capture more diversity present in higher relative abundance, the V3-V4 primer set is demonstrated to have a better performance; whereas, if the purpose is to gather more information in the form of unique ASVs and to identify the microbial rare biosphere, we propose the use of the primer set targeting the hypervariable regions V4-V5.Despite the V4-V5 dataset detecting a higher number of unique ASVs and taxa, most had a relative abundance of < 1%.The V4-V5 dataset showed more consistent results in determining the core microbiome in the blow samples, while V3-V4 had higher variation.In this regard, a combination of primers may prove to be the more robust way to characterise the blow microbiome.This study also offers evidence that social behaviour influences the microbiome composition of the respiratory tract in cetacean species.
Nevertheless, several other aspects require consideration and future development to advance the blow microbiome as a health monitoring tool for cetaceans.Besides optimisation of the sampling and processing protocols, it is also relevant to test different methodologies to enhance sequencing efficiency and downstream procedures.Moreover, crossing this type of data with photogrammetry datasets to assess body condition is relevant to properly infer the pathogenic potential of these microbial communities in cetacean species.
In conclusion, this study represents an important contribution towards understanding the microbiome present in the respiratory tract of free-ranging cetaceans and marks the initial step in characterising the blow microbiome of SPFW.Finally, this study further underscores the potential of the blow microbiome as a future biomarker for assessing the health status and physiological state of the airways in free-ranging cetaceans.

Additional information
The samples were kept refrigerated after collection, during transportation and until storage in the laboratory.Under lab conditions, samples were kept frozen until extraction.The maximum time between collection and extraction was approximately one year.

Figure 1 .
Figure 1.Blow sampling locations and dates in the Madeira Archipelago, with an illustration of the short-finned pilot whale (Globicephala macrorhynchus; © E. Berninsone / ARDITI).

Figure 2 .
Figure 2. Box plot diagrams representing the median, first quartile and third quartile of observed richness, Chao1, Shannon and Inverse Simpson alpha diversity indices of the blow microbiome.Different primer sets were represented by different colours (blue for V3-V4 and orange for V4-V5).

Figure 3 .
Figure 3. Non-metric multidimensional scaling (NMDS) of the microbiota found in short-finned pilot whale blow samples, in merged V3-V4 and V4-V5 datasets, based on Bray-Curtis dissimilarity.The primer sets used are represented by different shapes (circles for V3-V4 and triangles for V4-V5) and the sampling days by different colours.

Figure 4 .
Figure 4. Relative abundance of prokaryotic phyla (A) and top 10 genera (B) identified across the SFPW blow samples, by V3-V4 (on the top) and V4-V5 (on the bottom) datasets.

Figure 5 .
Figure 5. Non-metric multidimensional scaling (NMDS) of the core microbiota found in short-finned pilot whale blow samples (Genus Level), in merged V3-V4 and V4-V5 datasets, based on Bray-Curtis dissimilarity.The primer sets used are represented by different shapes (circles for V3-V4 and triangles for V4-V5) and the sampling days by different colours.

Table 1 .
Summarised metadata of blow samples collected and analysed.* Samples gma_01a and gma_01b correspond to the same individual, as samples gma_07a and gma_07b; ** Samples gma_P1, gma_P2, gma_P3 were collected from a pool of individuals within the same group.