Introduction

Viruses cause a wide range of human diseases ranging from acute self-resolving conditions to chronic fatal diseases [1]. Viral disease outbreaks, including the current COVID-19 pandemic as well as the severe acute respiratory syndrome (SARS) epidemic in 2003, the H1N1 influenza pandemic in 2009, and the Middle East respiratory syndrome epidemic in 2014, have documented the airborne spread of infectious diseases from expiratory emissions, especially in the indoor environment [2, 3]. As of March 2022, COVID-19 has resulted in more than 6.1 million deaths and 478 million infections [4]. Epidemics and pandemics have threatened not only the health of the public but also become one of the biggest threats to the world’s global economy and financial markets [5].

Airborne transmission occurs when respiratory particles, including aerosols (<100 μm) and droplets (>100 μm), are emitted into the air by an infected individual and are breathed in by a susceptible individual, infecting the individual that inhales them [6, 7]. Both large and small particles may spread infectious diseases. The risk of infection from large particles is primarily within the near field plume, while smaller particles are primarily responsible for exposure in the far field [6, 8]. Once expelled, respiratory particles rapidly evaporate under standard indoor relative humidity conditions (30–65% RH), reducing in size to slightly less than half of their original diameter [7], and thereby increasing the number of particles that are suspended as aerosols. The average residence time in the air of respiratory particles (estimated from Stokes law for spherical particles) increases by four orders of magnitudes for a one order of magnitude decrease in particle size from 10 to 1 μm [6]. Respiratory activities, which include coughing, sneezing, talking, singing, and breathing, can generate thousands of particles per liter of exhaled air [9,10,11]. Respiratory emissions produce particles that vary in diameter between 0.1 and 200 μm, with a multimodal size distribution where the majority of the particles are <5 μm [12]. Consequently, most of the emitted particles are of sufficiently small diameter that they can stay suspended in the air as an aerosol for extended lengths of time (minutes to hours) and can travel significant distances [13,14,15].

It is now widely recognized that aerosols are the primary vector for SARS-CoV-2 transmission [8, 16, 17]. Due to the higher incidence of speaking versus coughing, particularly for asymptomatic individuals, vocalizations are likely a significant transmission source and lead to increased transmission for individuals in close contact or in indoor environments [3, 11, 18]. The quantity and size distribution of respiratory aerosols during vocalization has been investigated for sustained phonation [11, 12], speech [12, 19], and singing [10, 19].

Although the mechanics of aerosol production during speech are poorly understood, it has been proposed that respiratory aerosols arise due to bronchial fluid film bursting within the pulmonary tract, vibration of the vocal folds during phonation, and articulation of the tongue/lips/teeth [11, 20, 21]. Consistent with this theory, studies have shown that increased vocal intensity levels (perceived as loudness) during phonation leads to much higher particle production rates. Initial investigations revealed that increasing speech intensity from quiet, to medium, and loud, increased particle production rates by up to an order of magnitude [11]. Subsequent studies have found similar results [12, 22, 23], including identifying a positive correlation (R2 = 0.824) between sound pressure level (SPL) such that an increase of 10 dB of SPL resulted in an order of magnitude increase in particles [24].

Despite these insights, questions regarding the etiology of aerosol production during speech remain. This is evidenced by the identification of “superemitters” that, even when accounting for changes in vocal intensity, produce respiratory particles at a rate that is up to an order of magnitude higher than the mean [11, 25, 26]. Superemitters have been suspected of contributing to significant community outbreaks that could be linked to a single event [3, 18].

It has been proposed [11, 23] that varying respiration mechanics may lead to increased aerosol production within the bronchioles via increased activation of fluid film bursting. Laryngeal biomechanics may also play a role, as phonation arises due to collision of the mucosal surfaces of the vocal folds. This likely occurs due to changing rheological properties of the mucosal covering, which is known to be influenced by hydration [27], as well as differences in the morphometry of the larynx. This can lead to altered flow patterns and vocal fold oscillation dynamics [28,29,30]. The frequency of speech (i.e., pitch) also increases the frequency with which the mucosalivary fluid covering the vocal folds come into contact. This has the potential to increase particle production through the repeated formation, and breakup, of capillary bridges that span between the opening and closing vocal folds [31] similar to the mechanics of liquid fluid film bursting within the bronchioles [32]. Interestingly, the influence of frequency of phonation on respiratory aerosol production remains unexplored. This is particularly relevant as phonation frequency during speech is known to vary with age and sex [33], as well as emotional state [34] and dialect [35].

In response, the objective of this study is to statistically quantify the dependency of respiratory particle production rate on frequency during sustained phonation and compare it with the influence of vocal intensity of phonation. The ability of these two variables to explain the broad spread in respiratory particle production rates within the general population is also explored.

Methodology

Experimental method

Laboratory measurements were performed with human subjects to determine how emission rates and size distributions of respiratory aerosols varied with frequency and vocal intensity during sustained vowel intonations. The study was comprised of individuals recruited from the Clarkson University student, staff and faculty body. The study was approved by the Clarkson University Institutional Review Board (#20-56), and written informed consent was obtained from each participant. A total of 40 participants were recruited, including 16 females (F1–F16) and 24 males (M1–M24), with an age range of 20–60 years. Professional singers are known to generate increased respiratory aerosol emissions, although the reason remains unknown [24]. To avoid potential confoundment of the results, formally educated or trained singers were excluded from the study.

Each participant followed the protocol outlined below. A sustained vowel /a/ was produced at a specific frequency and held for 5 s, followed by 15 s of rest (comfortable breathing), repeated six times, sequentially. Each sequence of data collection for a particular frequency, therefore, lasted 120 s. The user was prompted when to begin phonating, when to stop, and when to rest via a timed recording that was played through one headphone placed over one ear. This recorded audio file was synchronized with the respiratory aerosol measurement acquisition. The frequency of phonation was controlled by asking the participant to match their pitch to a particular note, played on an online piano keyboard. The participant was allowed to practice matching the note prior to the collection of data. During data collection, the note was played on the keyboard in the background during the rest periods to make pitch-matching by the participant easier. This protocol was repeated seven times for notes (frequencies): C3 (131 Hz), E3 (165 Hz), G#3 (208 Hz), C4 (262 Hz), E4 (330 Hz), G#4 (392 Hz), and C5 (523 Hz). The order of the notes was randomized for each participant. If large deviations (±20% of the desired frequency) were observed in the audio signal of the participant (discussed below), the trial was not reported and was repeated. Participants were instructed to phonate at a comfortable vocal intensity for each frequency. If the monitored SPL exceeded ~95 dB (discussed below), the sample was repeated because SPL values greater than 95 dB resulted in clipping in the recorded audio file. Due to the inherent tendency to increase the intensity with increasing frequency [36] and the natural variation between participants, a broad range in vocal intensity was produced; the normalized vocal intensity values recorded by the microphone varied between 0.02–0.5. To preclude any effects that may arise due to varying hydration, participants were advised to drink at least 0.5 l of water in the hour preceding the experiment. During the experiment, they were not allowed to drink any fluids.

A schematic of the facility used for measuring the respiratory aerosol emissions is shown in Fig. SF1. Respiratory particle emission rates and size distributions were acquired using a TSI (Shoreline, MN, USA) Model 3321 Aerodynamic Particle Sizer (APS). The APS was placed inside a (LABCONCO, Purifier Clean Bench) laminar flow hood equipped with a HEPA filter. It was oriented so that it produced a uniform velocity flowing toward the participant at 0.6 m/s, as measured with an anemometer. The participant phonated into a 10.0 cm diameter funnel at the opening that tapered down to 1.9 cm at the exit, over a length of 8.0 cm. A piece of thread was strung across the opening of the funnel and used as a reference against which the participant placed the underside of their nose. This ensured that the position of the head of each participant, relative to the funnel, was consistent across all measurements. This resulted in the mouth being positioned 15 cm from the exit of the funnel. A 25.0 cm long piece of conductive tubing connected the funnel exit to the inlet of the APS. The tubing had an inner diameter of 1.9 cm and surface resistivity 105 Ohm/m2. Total particle losses (sampling and transportation) through the inlet system were estimated to be ~0–5% for particle diameters ranging from 0.5–10 μm [37].

A REED Instruments (Wilmington, NC) Model R8050 SPL meter with a sampling frequency of up to 8 kHz was positioned within the laminar flow hood so that it was 12.7 cm from the exit of the participant’s mouth and enabled visual observation of the radiated SPL during intonations. In addition, a Logitech (Lausanne, Switzerland) Blue SNOWBALL iCE microphone was positioned within the laminar flow hood at a distance of 22.9 cm from the participant’s mouth and acquired the acoustic signal. The microphone was connected to a personal computer to record the audio file. The vocal intensity was estimated as the root-mean-square of the amplitude of the digital audio signal produced during phonation, denoted as ARMS. This measure varied from 0.02 to 0.5 across all participants and trials. Because the audio was not recorded using a calibrated microphone from which an accurate measure of SPL could be determined, ARMS provides a relative measure of vocal intensity. This is sufficient for correlating particle production rates with vocal intensity, and is the same approach that has been implemented in prior studies [11]. The sampling frequency of the recorded audio was 44.1 kHz. Similarly, the fundamental frequency (fo) of each sample was determined by performing a fast Fourier transform of the audio sample. The fundamental frequency was calculated across all samples and for all participants. The background SPL was found to be ~69–71 dB. This relatively high value was due to the noise produced by the fan of the laminar flow hood. Temperature and relative humidity in the laminar flow hood and laboratory were monitored with Onset (Bourne, MA USA) HOBO UX100-003 data loggers.

Figure 1 presents data acquired from one subject and is representative of the measurements that were obtained. Figure 1A shows the acoustic waveform measured by the microphone, indicating the six repeated instances of 5 s of phonation, followed by 15 s of rest. Note that the amplitude of the signal is sufficiently high to differentiate it from the background noise, which is evident during the period of rest (no phonation). The particle emission was quantified using a 5 s sample rate from the APS. Figure 1B shows how the particle emission rate varies during measurement from one participant (F1), and at one note (C4). The figure shows that when not phonating, the emission rate drops to approximately zero, indicating that the influence of ambient/background particles is negligible. There is, however, some carryover evidenced by particles produced during one sample window being detected in the next sample window due to the finite transit time of the particle passing through the tubing and into the APS.

Fig. 1: Examples of audio and aerosol recordings.
figure 1

(A) sample audio file of one note (C4) from one subject (F1), B aerosol production acquired during recording shown in A.

Statistical analyses

Vocal frequency, vocal intensity (i.e., ARMS), and particle emission rates were collected through the experiments (discussed in “Experimental method” section). Data pre-processing was carried out using Mahalanobis distance, an effective metric for analyzing multivariate data, to detect outliers in the measures of frequency and vocal intensity [38].

Correlations between variables were examined by calculating Pearson’s correlation coefficients (r) between the predictors (frequency, vocal intensity, and individual subject) and response variables (particle emission rates). A multiple linear regression was implemented to evaluate the contribution of predictor variables in particle emission rates. The residuals of the regression models were checked for model adequacy. Lilliefors normality test to check the normality of the residuals after constructing the linear models was applied [39]. The goodness of the model was interpreted based on the adjusted coefficient of determination (R2) value. The contributions of each predictor variable, frequency, vocal intensity, and individuals were checked. The p values of the regression coefficients of the predictor variables were used to assess if their incorporation made a meaningful addition to the model. All analyses used two-sided statistical tests, and p values <0.05 were considered significant.

Violin plots were created as visualization tools for frequency and vocal intensity for each of the seven notes, C3, E3, G#3, C4, E4, G#4, and C5, to explore the distribution of numeric data for each note. Next, the frequency and vocal intensity data were divided into five subcategories based on 20th, 40th, 60th, and 80th percentile cut-offs. Visualizations of the dependent variable (particles/s) were created by generating density plots utilizing different categories of frequency and vocal intensity data. Violin plots, which are hybrids of boxplots and kernel density plots, were employed to show the summary statistics as well as to depict the full distribution of the data.

Results

Figure 2A presents violin plots of the variation in the vocal intensity (i.e., ARMS) for each note. Because participants were instructed to phonate at a comfortable level, the distribution is quite broad across all of the notes, although it generally increases with frequency, which is a known phenomenon [11, 36].

Fig. 2: Variation in vocal intensity and frequency for all of the subjects.
figure 2

Violin plots show the variation of (A) vocal intensity (ARMS) and (B) frequency as a function of target note (frequency).

Although participants were instructed to match a specific note/frequency during the trials, there was some variation in the frequency that was ultimately produced. This is presented in Fig. 2B, where violin plots show how the frequency varied as a function of the targeted note (frequency) across all of the trials. The width of the plots denotes the density of the data, with the upper and lower limits indicating the maximum and minimum values, respectively.

The frequency distributions increased with higher notes, although the distinction from note-to-note was still quite clear.

The respiratory particle emission rate, defined as the total number of respiratory particles expelled from the mouth during a single vocalization divided by the time of vocalization (in seconds), is an important variable to predict the risk of airborne virus transmission. Figure SF2 presents violin plots of the respiratory particle emission rate as a function of gender (female, and male). Note that each 5 s sample acquisition period is recorded as a separate data point. The median particle emission is slightly higher for males (7.0 particles/s) than for females (4.6 particles/s). However, it can be observed that the bulk of the data falls within a similar range of 0–30 particles/s. The standard deviations for emissions from males and females are 10.5 particles/s and 12.0 particles/s, respectively. Differences between male and female emission rates as a function of note (frequency) were also investigated but are not shown here for brevity. While statistical significance (p < 0.05) in the particle production rate between male and female participants was found for the two lowest notes (C3 and E3), there was no difference for the higher notes. In addition, the trends were reversed for the two notes, with females emitting particles at a greater rate for C3, while males emitted particles at a greater rate for E3. The combined mean emission rate for males and females is 9.5 particles/s, with a standard deviation of 11.1 particles/s.

Figure 3 presents a boxplot of the mean particle production for all participants. The particle production rate for each participant is averaged across all of the sampling periods (six periods of 5 s each) and across all frequencies (and hence, vocal intensity levels). For the 40 participants, the median and inner quartiles are specified as a red line and blue box, respectively. While previous work [11] has defined superemitters as those that emit at a rate greater than one standard deviation above the mean, we propose a more statistically robust definition, defining superemitters as statistical outliers. For univariate data, this can be defined as those that fall outside the range of the third quartile (Q3) plus 1.5 times the inner quartile, where the inner quartile is the difference between the third and first quartiles (Q3–Q1). Based on this definition, one outlier (i.e., superemitter) is identified out of 40 participants. This is represented in the boxplot shown in Fig. 3, where the median is 9.1 particles/s, the first and third quartiles are 4.1 and 13.0 particles/s, respectively, the lower and upper fences are 2.1 and 22.0 particles/s, respectively, and the outlier produces 31.2 particles/s.

Fig. 3
figure 3

Boxplot representation of the mean number of particles per second produced by all participants.

Prior research efforts have explored expiratory particle emissions as a function of vocal intensity [10, 11, 23], with only one study explicitly exploring the effect of frequency of phonation (i.e., pitch) on particle production [11]. In that work, the authors considered a relatively confined range of frequency variations (216–264 Hz), which is much smaller than the range of human frequency during vocalization (~80–220 Hz), and is at the upper limit of the normal, or “chest”, register of phonation; particularly for males [40]. Figure 4A presents the mean particle number distribution for all of the subjects, with separate symbols/lines that are color coded by the target phonation frequency. The mean of the raw data from the APS measurements is denoted by symbols, while a fit to each distribution is shown as solid lines, which were computed by a high-order polynomial fit (order of 20). A clear trend is visible, indicating that expiratory particle emission increases with increasing frequency. Comparing the concentration rate at the lowest frequency of C3 (123 Hz) with the highest frequency of C5 (523 Hz) reveals that the particle number concentration increases more than three-fold. The geometric mean particle diameter was consistently ~0.6 μm, as computed from the fitted curves, regardless of phonation frequency. Approximately 98.3% of all emitted particles had diameters between 0.53–3 μm.

Fig. 4: Aerosol emissions as a function of frequency.
figure 4

A Particle emissions for all seven notes as a function of particle size distribution, B box plots of the particle size as a function of note (frequency), and C particle emission variation for one note (C4).

The box plots of Fig. 4B quantify how the median of the particle distribution varies as a function of note (frequency). It should be clarified that for each note, the minimum mode size is 0.53 μm, because that is the smallest particle diameter that can be resolved by the APS. As indicated by the plots, the median particle size across all of the notes is ~0.6 μm, with very similar distributions. For the note G#3 and C4, the median values are equal to the first quartile values.

Figure 4C provides a statistical view of the distribution of the particle number concentration across all subjects for the specific note C4 (262 Hz). The red line indicates the median particle concentration as a function of particle size, with the shaded region bounded by the blue lines indicating the first and third inter quartiles, and the black lines showing the maximum and minimum particle number concentrations. Note that comparing the median line of Fig. 4C with the mean value for the same note (C4) in Fig. 4A reveals that the mean concentration is higher than the median, indicative of a skewed distribution, as was similarly observed in the violin plots of Fig. 2. For the note that is shown (C4) the normalized particle concentration varies from a minimum of 0 particles/cm3 to a peak of 0.5 particles/cm3. The inset plot in the top right corner of Fig. 4C presents a boxplot of the corresponding particle emission rate across all subjects for the same note C4 (262 Hz). Similar to the broad distribution in average emission rates presented in Fig. 3, for the specific note of C4 (262 Hz), the median value of the total emission rate was 6.4 particles/s, with first and third quartiles of 3.8 particles/s and 15.2 particles/s, respectively. The maximum and minimum emission rates were 34.2 and 0.4 particles/s, respectively. While data are only presented for one specific note (frequency), the other notes showed similar trends.

These results demonstrate that respiratory particle production increases with both vocal intensity and phonation frequency. To quantify the relative importance of both of these variables, Pearson correlation coefficients (r) were computed. Figure 5 presents correlation plots of particle production versus frequency (Fig. 5A) and vocal intensity (Fig. 5B), with the associated correlation values shown. There is a positive correlation for both variables, with a value of 0.37 for vocal intensity, and 0.28 for frequency. Both correlations are statistically significant (p < 0.05).

Fig. 5: Correlation plots with Pearson correlation value (r) and p value for particle production.
figure 5

Data are presented as a function of (A) frequency, and (B) vocal intensity (ARMS).

Similar to prior results [10, 11, 19], a significant spread in the particle production rate as a function of individuals was reported herein. To further explore this relationship, and the ability of both frequency and vocal intensity of phonation to act as predictive variables, an additive regression model was applied to the data where particle emission rate was the response variable and frequency, vocal intensity, and individual subject were the predictor variables. This approach enables the determination of whether frequency and intensity of phonation are the primary variables responsible for the variation in particle production rate, or if there are additional variables that have not been considered. Table 1 presents the statistical information of the additive regression model, and Figure SF3 presents the residual plots for the model, which exhibit a normal distribution. The model has an accuracy of 40% with a confidence interval of 95%, with all variables significant with p value <0.001 and sufficiently large t-test statistic values.

Table 1 Additive linear regression models with coefficients for all of the variables, and the model summary.

To demonstrate the distribution of particle production according to binned levels of frequency and vocal intensity, diagonal cell data of the dependent measurements, particle/s, are plotted in Fig. 6. The groups G1 to G5 correspond to data along the diagonals of the subcategory space created by subdividing both the intensity and frequency levels into categories distributed by 20th percentile increments. G1 corresponds to the lowest frequency and intensity percentile level, with G5 corresponding to the highest percentile of both. Density plots of the groups were constructed to show the distribution of the segregated particle production rates. As can be seen, all kernel density plots are skewed to the right, with the first three groups G1, G2, and G3, having less dispersion compared to the last two groups G4, and G5, respectively. Moreover, G3, and G5 both have clear bimodal distributions for particle production when compared with the others. The increased dispersion in particle production with increasing frequency and vocal intensity is profound, with the standard deviation increasing from 4.1 particle/s for G1 to 21.2 and 17.1 particle/s for G4 and G5, respectively.

Fig. 6: Density plots of particle production rates for categories of frequency and vocal intensity (ARMS) subdivided by 20th percentiles.
figure 6

The subplots span the diagonal of increasing frequency and intensity, indicated by increasing values from G1 (lowest subcategory of frequency and intensity) to G5 (highest subcategory of frequency and intensity).

Discussion

For the 5 s acquisition periods over which measurements were recorded, production rates were found to vary over a wide range between 0.2 and 142 particles/s. These values are similar to those reported in some prior works [11, 12], where particle production rates during phonation were reported to be in the range of O(1–100) particles/s. In contrast, other studies of respiratory emissions during phonation reported production rates that are approximately one order of magnitude higher O(100–1000) particles/s [10, 19]. Interestingly, the studies reporting lower values, including the work contained herein, performed the measurements inside a laminar flow hood, which aims to reduce sample contamination from ambient/background particles. Prior studies reporting higher values did not utilize a laminar flow hood. The specific reason for these large discrepancies, however, remains unresolved, and warrants further investigation.

The wide range of observed particle production rates lends credence to the notion of “superemitters”, who produce particles at a far greater rate than the normal population. Although, it appears from the current work that the prevalence of superemitters is relatively low. The newly proposed statistical definition resulted in the identification of one of 40 participants that was classified as a superemitter. Care should be taken, however, in the usage of this term as it does not provide insight into the infectivity of emissions, which is an important consideration in airborne transport of infectious diseases, and which can cause confusion when used interchangeably with the term “superspreader”. The definition of superemitter that is proposed herein refers only to the quantity of respiratory particle emissions and does not consider infectivity.

Both vocal intensity and frequency were found to positively correlate with particle production during phonation. The higher correlation value for vocal intensity indicates its influence is greater than that of phonation frequency. The observation that particle production increases with increasing phonation frequency lends credence to the hypothesis that liquid capillary bridges form and rupture during voiced speech [41]. Higher phonation frequency also increases the surface velocity of the vocal folds, which may lead to increased instability and subsequent breakup of mucosalivary bridges [41, 42], thereby further increasing the rate of droplet production. Additional mechanisms of particle production during vocal fold oscillation may also exist, such as spume droplet formation [43], which arises due to shear forces acting on a fluid layer. This occurrence is most probable at the onset of vocal fold opening when the airflow velocity passing over the mucosalivary coating of the vocal folds is very high (O(30 m/s)). Increased vocal intensity, which is characterized by increased airflow velocity through the vocal folds, would similarly drive increased spume droplet formation. While these theories support the finding that increased frequency and vocal intensity positively correlate with droplet production, it is emphasized that the additive regression model, which considered frequency, vocal intensity, and individual subject, resulted in a relatively low goodness of fit with adjusted R2 = 0.40. This suggests that factors other than phonation frequency and vocal intensity are responsible for the wide range of respiratory particle production rates observed in the general population. This is an important finding as it emphasizes the need for continued research efforts to elucidate these additional etiologies. Possible explanations could include hydration, body-mass index, laryngeal biomechanics, and morphometry, etc.

To this end, density plots of the particle production rate also revealed that simultaneously increasing frequency and vocal intensity not only increased mean particle production rates, but also the standard deviation. The data also transitioned to a bimodal distribution. No clear pattern with statistical significance was observed when these behaviors were differentiated by sex, as might be reasonably expected because females can more easily reach high notes versus males who may struggle to do so and subsequently transition to a falsetto register. Nevertheless, the emergence of a bimodal distribution seems to indicate that changes in laryngeal biomechanics that occur when increasing vocal frequency and vocal intensity in tandem may also influence particle production rates. Clearly, further investigations are warranted.

Finally, it should be stressed that the aim of this study was limited to sustained phonation of a single vowel /a/, which is a very simplified representation of the complex range of sounds produced during speech. While prior studies show similar particle size distributions between sustained vowel production and running speech [10, 11, 19], care should be taken in extrapolating the findings herein beyond the stated scope.

Conclusions

In this experimental study the effect of vocal intensity and frequency of phonation on respiratory particle emissions during sustained phonation of the vowel /a/ was investigated in 40 healthy individuals ranging in age from 20–60 years. A broad range of particle production rates was observed, and a statistical measure for defining superemitters was introduced as being greater than the third quartile (Q3) plus 1.5 times the inner quartile (Q3–Q1). This criterion identified the presence of one superemitter within the sample size of 40 individuals.

The experimental data confirmed prior findings that particle production rates increase with vocal intensity while finding that it also increases as a function of phonation frequency (i.e., pitch). An additive regression model applied to the expansive data set revealed that relatively poor goodness of fit was found when accounting for these two variables. Subsequent analysis of the distribution of particle production rates as a function of increasing frequency and vocal intensity similarly indicated that there are likely other unexplored variables that are responsible for the wide range of particle production rates that are observed in respiratory emissions during vocalization.