How much COVID-19 face protections influence speech intelligibility in classrooms?

The ongoing pandemic caused by the COVID-19 virus is challenging many aspects of daily life. Several personal protective devices have become essential in our lives. Face protections are mostly used in order to stop the air aerosol coming out of our mouths. Nevertheless, this fact may also have a negative effect on speech transmission both in outdoor and indoor spaces. After a severe lockdown, classes have now started again. The adoption of face protection by teachers is either recommended or mandatory even though this is affecting speech intelligibility and thus students’ comprehension. This study aims to understand how protections may affect the speech transmission in classrooms and how this could be influenced by the several typologies of face protections. An experimental campaign was conducted in a classroom in two different reverberant conditions, measuring and comparing the variation in speech transmission and sound pressure level at different receiver positions. Furthermore, a microphone array was used to investigate the distribution of the indoor sound field, depending on the sound source. Results clearly show how different types of personal protection equipment do affect speech transmission and sound pressure level especially at mid-high frequency and that the source emission lobes vary when wearing certain types of personal devices.


Introduction
At present, the world is dealing with the SARS-CoV-2 or COVID-19 pandemic. Several precautions were taken to limit the spread of the virus. Social distances and lockdown were imposed in all countries, leading to a decrease in contagious behaviour and situations. When these actions were mitigated, governments made individuals use protection at least in indoor environments. Thus, face masks, shields and respirators were adopted everywhere, becoming a necessary element of everybody life. However, personal facial protections are already widely used in medical care and in hospitals to prevent healthcare workers from virus infections by aerosol particles [1]. They have also demonstrated to be efficient personal devices that slow down the spread of the Corona virus [2]. However, there are several different types of protective equipment available on the market: (i) surgical masks, (ii) cloth or community masks [3] or respirators (also known as Filtering Face Piece) and (iii) face shields [4].This variety of protections not only has an impact on the filtering of aerosol particles [5], but also on communication capabilities. Moreover, face masks are a visible obstacle in body language, impacting: (i) verbal communication [6], (ii) people's emotional expressivity, the signs of understanding [7], (iii) lip reading [8], which is a very useful aid in ordinary communication among people and extremely useful for people with hearing loss and finally (iv) voice [9].
For the above-mentioned reasons, it is therefore essential to study how face protections may influence noise source in the built learning environment. The adoption of these devices is certainly needed for the pandemic [10], but they could negatively influence communication. Indeed, several studies have already demonstrated the impact of these devices on speech. Aranaz Andrés et al. [11], in their study on the secondary effects of face masks, highlighted the difficulties related to speech understanding, when speakers are wearing face protection. Hulzen and Fabry [8] investigated their effects on COVID-19 infection protection and the discomfort caused to individuals with hearing loss due to the use of face masks. Accordingly, face protections pose two problems for these people: they cannot receive any clues from lip reading [12] and furthermore, other people's voices may be attenuated or in some way distorted.
Many available studies focus on communication difficulties in healthcare environments. Wittum et al. [13] investigated the side-effects of wearing surgical masks in hospitals. Goldin et al. [14]  applied to a white noise simulator through an artificial mouth. They compared the difference in sound pressure levels between all the masks offered by their healthcare providers, highlighting that each type of device essentially reduced the speech transmission in this environment. Other studies have demonstrated the impact of face masks on communication and on speech. Corey et al. [15] measured the sound reduction caused by several masks, focusing on sound pressure levels in a laboratory with soundabsorbing walls. Conversely, Palmiero et al. [16] studied the Speech Transmission Index of three masks to understand how these may compromise sound transmission.
During the pandemic, in several countries the use of masks inside educational environments has either become mandatory for teachers and/or for students [17,18], or recommended [19], in order to protect teachers, students and their families [20].
From this perspective, the investigation of how these protections may influence learning and communication performance assumes a key role. There are various types of educational environments. In some countries there are standards for acoustic conditions in the classes that have thresholds for reverberation times, noise levels and sound pressure levels at receivers' positions [21]. However, generally, the classroom can have a high reverberation time (reverberant conditions) [22] or may feature sound absorbing material to reduce the reverberation [23,24].
Moreover, the educational environment is strongly influenced by the indoor acoustic conditions [25]. Indeed, in classrooms more than one factor is generally taken into account: (i) background noise [26], (ii) acoustic field [27], (iii) effects of noise on the speaker/teacher [28] and (iv) effects on the listener/students [29]. Skarlatos and Manatakis [30] found that reverberation time is one of the main causes of high noise levels in classrooms. As also mentioned by Hodgson et al. [31], this parameter is of paramount importance and it should be taken into account when characterizing indoor spaces. It refers to the reflection of sound waves [32] and it has been shown to have a strong impact on classroom soundscape [33] and on students' achievements [34]. Indeed, control of the indoor sound field improves pupils' speech comprehension [35] and it has been shown that a good acoustic environment helps to increase voice and words understanding [36]. Generally, in order to assess the acoustic quality of a room, specific objective parameters are used, coupled with reverberation time. When focusing on rooms dedicated to speech, clarity is one of the most utilized parameters [37]. Clarity considers reverberation time and acoustic strength as a single index and thus it is very useful in determining how the indoor sound field varies when moving from one student's desk to another [38]. However, these parameters are not affected by possible source signal amplitude variations. Thus, they can effectivly assess indoor sound field, but they cannot be used to investigate source-receiver sound wave transmission.
Accordingly, Zannin and Zwirtis [39] measured the sound pressure levels in three Brazilian classes and found that this parameter is not only related to the quality of verbal communication, but also to the listening effort, which can impact students' activities. Astolfi et al. [40] in their study highlighted how (i) sound pressure levels measured inside the classroom, (i) reverberation time and (iii) numerical models should be integrated to have a systematic procedure to approach this complex indoor environment. Within the learning environment, Prodi et al. [41] found that the Speech Transmission Index (STI) is also a relevant parameter strongly related to Clarity and the measured sound pressure levels at the receivers' positions. Seetha et al. [42] reported in their research that the STI measurement coupled with the analysis of the sound pressure levels is enough to have a complete overview of the sound field in a classroom. Peng et al. studied STI in 28 Chinese schools, finding that the speech intelligibility score is a very important parameter for all ages. Radosz [43] illustrated a method to assess the acoustic quality of classrooms including all the factors mentioned above with a unique global index.
The importance of students'/listeners' distance from the source/ teacher has also been assessed. Bilzi et al. [44] highlighted how in classrooms STI could significantly vary from one position to another, while Stewart and Cabrera [45] found that the directivity of the source strongly influences the STI results in noise-free conditions. Thus, reverberation time clarity and STI are the parameters most often used to characterize indoor sound field, while STI and sound pressure levels at receivers' positions are needed to investigate the quality of the transmitted speech.
Among all these medical and acoustic studies, there is a lack of focus on the impact of the face protections within educational environments. Students in classrooms are careful listeners [46,47]. Wearing masks may impact the communication between teachers and students, affecting linguistic and non-verbal information [48]. Transparent masks or shields may represent one solution as they may help by making lip reading possible and helping to overcome speech degradation [49]. However, to the authors' knowledge, no studies have been conducted on these issues.
For these reasons, the aim of this research is to understand to what extent face protections influence (i) speech intelligibility, (ii) indoor sound pressure level distribution and (iii) source directivity within a classroom. Another aim of this research is to understand if the impact of these devices varies, according to diverse indoor sound field conditions.

Materials and methods
This study was conducted in the Free University of Bozen living lab, which provides reconfigurable educational environments that are used for everyday activities. The facility includes a classroom, wherein it is possible to change indoor sound absorption and thus to study diverse acoustic conditions. Two different scenarios were built: Scenario A (Higher reverberant room), where a reverberant classroom was recreated, emulating a traditional school environment; Scenario B (Lower reverberant room), where 16-square-meter sound absorbing panels were installed on the room ceiling. Their sound absorption coefficient is reported in Fig. 1. Panels were composed of 5 cm polyester fibre panels with a porosity of 0.99. The air gap between them and the room ceiling is 10 cm.
Indoor sound field acoustic characterizations were performed in accordance with the ISO 3382 standard [50][51][52] using an omnidirectional (dodecahedral) sound source and a logarithmic sine Removable panels sound absorpƟon coefficient sweep as a signal, as depicted in Fig. 2. One excitation per one receiver position was played, recorded and then post produced by means of a convolution, in order to obtain the impulse response. The source position was chosen very close to the teacher's desk. The positions of the receivers were uniformly distributed along the rows of desks of the existing students. Reverberation time T 30 and clarity C 50 were derived from measurements and used to characterize different scenarios. In order to understand how face protections may influence sound source (teachers), speech intelligibility tests were carried out using a repeatable and robust procedure so as to ensure reliable results. In particular, a constant and directive sound source is needed to ensure comparable values. This procedure determines the Speech Transmission Index (STI). It assesses the quality of the received speech in different student positions and it is normally used to determine whether a classroom is suitable for teaching and learning or not. STI evaluation is based on IEC 60268-16 [53] which provides a gender specific octave band weighting and redundancy factors. Gender-related factors are expressed differently due to signal spectra and different weighting factors.
The STI measurement is strongly influenced by several factors related to the acoustic conditions of the classroom [54]: the reverberation time, background noise, clarity and sound pressure levels. Evaluations connected to speech were assessed using speech intelligibility tests [19], developed using a directive MLSequipped noise source, positioned on the teacher's desk and receivers located at the students' positions [55]. Unlike clarity and reverberation time, which depend exclusively on (i) the classroom volume, (ii) the internal reverberation time and (iii) the sourcereceiver position, the STI also relies on the sound source frequency level. If this is altered, the final result varies.
In terms of background noise, it was regularly checked so as not to disturb or influence the acoustic tests. In this way, especially for STI measurements, it was possible to consider the uncertainty to be lower than a JND [56] Since the face protections are not normally attached to real mouths, special accessories were plugged on a directive speaker surface in order to maintain the masks' membrane at a distance of about 4 mm to 7 mm from the directive speaker in order to simulate a mouth-nose appearance and profile (Fig. 3).
Protections were then mounted on the directive speaker, using this configuration. In order to analyze as many options as possible within the two described scenarios A and B, ten kinds of individual face protections were tested. They were mounted on the directive sound source, covering the noise emission so as to simulate real mouth and nose overlapping. The devices used are listed in Table 1 and examples of applications are reported in Fig. 4. Complete identification of the face masks under consideration and transparent shields are reported in the Appendix tables (see Fig. 5).
Tests were performed using a B&K 4720 directive noise source and a B&K 2270 sound level meter, connected to a sound card controlled by B&K Dirac 6.0 software. The receiving points were placed at 1.4 m from the ground in six representative positions, according to the class layout (Fig. 6). The room is box-shaped, 7.3 m Â 7.6 m Â 3.6 m, for a volume of 196 m 3 . The source was placed at 1.5 m from the ground and near the teacher's desk. As consistent with COVID guidelines, the source (teacher) and the positions of the receivers (students) are not expected to change [58]. For every scenario, firstly a traditional STI measurement was performed, i.e. without any face protection (case 0), as a starting point for the comparison. For the sake of simplicity, only frequency average results will be presented. These are calculated by taking into consideration the octave band measured valued of 500 Hz, 1000 Hz and 2000 Hz. An arithmetic average is then performed, in order to obtain an average STI result.
Reverberation time, clarity and speech transmission index variations are evaluated using the Just Noticeable Difference (JND) approach [50]. Seraphim [59] stated that a reverberation time vari-     ation is perceived when its variation is at about 5%. Karjalaine and Jarvelainen [60] reported that a 10% variation is more accurate than 5%. For this reason, in this paper a variation of 10% is used to consider a reverberation time variation to be significant. For clarity and STI, a variation of 1 dB for clarity and 0.03 for STI are to be classified as minimum JNDs, according to Bradley et al. [61]. In order to better understand indoor sound field variations, synchronous sound pressure level measurements were developed by means of an omnidirectional microphones model ECM ½" 999 hung on the roof and controlled by a Zoom F8 sound card. The positions are depicted in Fig. 6. They were located at a height of 2 m (Fig. 2).
Furthermore, since source directivity may be affected by mouth obstruction, acoustic camera high frequency analyses were carried out using a planar 40 cm Â 40 cm microphone array, equipped with 40 MEMS noise sensors coupled with an A/D converter. Every device features a linear frequency response from 60 Hz to 15000 Hz. Software analysis was performed using a Robust Asymptotic Functional Beamforming algorithm, providing high quality 5 megapixels real time images and videos. The acoustic camera was situated in receiver 4 position (Fig. 6), in order to better monitor the phenomenon.

Results and discussion
In terms of the acoustic characterization of the room scenarios, Tables 2 and 3 report reverberation time and clarity for mid-high frequencies (500 Hz -2000 Hz) and STI results for case 0 (no device).
It can be seen that in Scenario A (higher reverberant room), most of the parameters present values which are usually conducive to a good indoor learning environment [62] at most considered frequencies. Accordingly, reverberation time is over 1 s in all frequency bands in all positions, while clarity results are close to zero or below in almost all cases. As for the STI results, it is clear that Scenario A provides poor speech intelligibility [63] in all positions, regardless of the teacher gender, except in position 1 (very close to the source) and in position 3 (in front of the source).
In terms of Scenario B (lower reverberant room), it can be seen that indoor sound field quality greatly improved. Reverberation time is always lower than 0.75 s, highlighting a variation of up to 4 JNDs. Clarity is always higher than 2 dB, except in position 2 and 4 at 500 Hz, where, however, the values are always higher than zero. This shows a positive variation up to 4 JNDs. The value of the STI also improved, gaining up to 3 JND per position per device type.
As an overall result, it is possible to state that the indoor sound field significantly varied its conditions and thus STI measurements using face masks in scenario A and scenario B start from clearly different indoor conditions.

STI results
The results of STI measurements performed using individual protection were grouped based on the receiver position and catalogued based on the wearable protective equipment. For the sake of brevity, all tables are reported in appendix A. As a reference, the case without any face protections (case 0 -no device) was used, which as expected presents the best STI values. As an overall result (Table 4), it can be highlighted that many individual protections highly limit sound wave propagation. Considering the maximum JND results, up to 2 JNDs were found in all positions. Maximum values show how STI male values are 2 to 3 JNDs for scenario A and 2 to 5 JNDs for scenario B. For STI female values are lower: 1 to 2 JNDs, for the first case, while 1 to 3 for the second case.
In Fig. 7, a representative background noise level is reported. This was measured during the test. As it can be seen, levels are very low in every frequency. Since the noise source level has to be 60 dB (A) at 1 m from the noise source, it can be concluded that background noise cannot influence the final results.
Another general outcome is that differences are higher in scenario B (lower reverberant room), rather than scenario A (higher reverberant room). It is also evident how the STI male in both cases suffers more compared to the female STI. Positions 3 and 6 are where maximum variation are verified for both scenarios, followed by Position 2 and Position 1, which have a similar variation of STI in both scenarios. In addition, as expected, the face shield coupled with the community mask featuring a paper sheet is the worst combination, followed by the community mask featuring a paper filter (ID 4b). The maximum JNDs assessed are mostly referable to the face shield coupled with the community mask (ID 10), followed by the community mask with a paper filter (ID 4b), NCN MR2 mask (ID 7), the face shield (ID 8) and the FFP2 safety mask (ID 6). This is attributable to the fact that they are all medical or high efficiency protection devices.
Referring to minimum differences in JNDs, it can be seen that surgical masks (ID 1) are the best masks, followed by kids' masks (ID 5). Values show that very few JNDs are caused by surgical masks in most receivers' positions in both scenarios and for both male and female sources.

Sound pressure level results
In order to understand how the indoor acoustic field is affected by source protection, sound pressure level measurements were taken. For the sake of brevity, results for the receivers 2, 3 and 6 are shown in Figs. 8 and 9, reporting on the x-axis the difference between the reference case (ID 0 -no face protection) and the Fig. 8. SPL relative differences frequency trends for face protections which least affect indoor sound field. Positions 2, 3 and 6.  1  3  ID 10  1  ID 10  4  ID 10  3  ID 10  2  3  ID 6  2  ID 6  4  ID 10  2  ID 10  3  3  ID10  2  ID 17  5  ID 10  3  ID 10  4  2  ID 10  1  ID 10  4  ID 8  2  ID 8  5  2  ID 4b  1  ID 4b  2  ID 4b  1  ID 4b  6  3  ID 7  2  ID 7  4  ID 4b  3  ID 4b STI male min JNDs STI female min JNDs STI male min JNDs STI Female min JNDs  investigated device (e.g. ID 0 minus ID 1). In order to present an overview of both the least and the most impacting face protection, the results are listed in Table 5. In this table, the sum of the sound pressure levels at voice frequency (630-5000 Hz) L p and the difference between the reference case (ID 0 -no face protection) and each device D are reported. It is evident that surgical masks, even those featuring a transparent window, do not significantly modify sound pressure levels at the receivers. Other types of masks provide significant deviations of up to 2.7 dB. Face shields clearly alter the sound wave transmission. Negative differences mean that visors reflect a significant part of the sound waves to the upper part of the room, thus considerably affecting the source-receiver propagation path.
On the frequency domain, some results offer interesting insights. In Figs. 8 and 9, the results of respectively the least and the most influencing devices are reported. Accordingly, it can be seen that all individual protections provide differences at midhigh frequencies. Accordingly, volcano-shaped trends at 2000 Hz can easily be identified in both scenarios. In the cases reported in Fig. 8, levels can vary from 5 to 10 dB in both scenarios, while the ones included in Fig. 9 fluctuate from 8 dB to 11 dB in scenario A and from 9 dB to 14 dB in scenario B. Thus, these protections may modify the original emitted signal in a range where human voices are very active. A reduction of up to 10 dB in both reverberant and absorbing rooms was found, thus implying that sound pressure reductions do not depend on indoor sound field conditions.
On higher frequencies, some differences are still present. They linearly decrease to 4000 Hz, where they stop and in some cases start to slightly increase again. Accordingly, both scenarios present the same overall trends. The face shields (ID 8 and ID 10) offer a positive effect at low and high frequency, but when it is coupled with a mask, they negatively influence soundwave propagation only at high frequency.
In this light, it is interesting to present tridimensional relative difference trends of sound pressure levels related to the single most influencing face protection in order to understand if there are some significant differences among the diverse devices. Fig. 10 shows scenario A, while Fig. 11 shows scenario B. For the Fig. 9. SPL relative differences frequency trends for face protections which most affect indoor sound field. Positions 2, 3 and 6. Table 5 Sound pressure level of voice frequency (630-5000 Hz) L p and relative case difference D. Grey: minimum differences; plain: medium differences; bold maximum differences.
Position n.2 Position n.3 Position n.6 first scenario, it can be deduced that all face protections provide similar 3D distribution. Differences are present in all receiver positions and always show significant values, starting from the closest to the most distant positions. Specifically, it can be highlighted that position 6, which is also the furthest from the noise source, always shows the highest value. Conversely, locations 1, 2 and 3 very often provide the same results. As an overall result, we can deduce that the majority of the face protections, which clearly influence sound propagation in a reverberant classroom, will provide almost the same sound field variation.  In the case of a sound absorbing environment, differences are always distributed using the same pattern. The worst location (except mask ID 10) is position 3, where the highest difference can be found. In the first row of desks, values are often the lowest ones and in position 3 results are the lowest for mask ID 4b and 6. Furthermore, in scenario B, we can conclude that all face protections, which regularly affect noise propagation, present common sound field differences.
In terms of sound pressure level, as a global result it can be determined that differences are significantly high in both scenarios and they provide the same difference ranges. The asymmetry could be caused by the sound absorbing panels distribution on the ceiling. This implies that there is no significant difference provided by the indoor environment conditions, but the sound limitations are intrinsically related to the face mask. Another paramount result is that face protections mostly act in a difference range of 8-14 dB at 2000 Hz. This means that 8 dB of source reduction is granted even if a receiver (student) stands close to the sound source (teacher), regardless of the position and of the indoor acoustic conditions. Furthermore, the last receivers' row, which is the furthest away from the source (teacher), present the worst difference values, up to 14 dB. This means that wearing one of these face protections may reduce the sound wave emitted by the speaker by up to 14 dB.

Acoustic camera results
Acoustic camera results are reported in Fig. 12 for scenario A and in Fig. 13 for scenario B, both for 2000 Hz, for the sake of brevity. When comparing the two different indoor acoustic fields, as an overall result it is evident that in a more reverberant room the source emission presents a wider emission lobe. The application of face masks modifies the received sound pressure levels but does not significantly variate the lobes' dimensions. On the other hand, face shields do change emission patterns, splitting them into two distinct parts. This effect is caused by the sudden reflection of the direct sound wave against the polymeric shield. However, these frequencies are related to the human voice. If teachers wore a face shield, the direct sound would represent a minor contribution, compared to the reflected sound. This may not be perceived as a comfortable situation by the receivers, since they would be able to see the teacher in a definite point in the indoor space but hear his/her voice coming from other directions.

Overall considerations
Masks negatively affect voice propagation in classrooms, regardless of the indoor acoustic conditions, but with differences linked to characteristic male and female emissions. Furthermore, the results clearly identify how face shields split the emitted noise into two different directions, thus compromising the direct propagation.
The results above (Table 4) demonstrate that wearing masks influences the male voice more than the female voice. Accordingly, in the first case (male voice), more JNDs are found to be present rather than in the second case (female voice) in both scenarios. This depends on how the face masks behave when a sound wave propagates through them. A membrane-like model most appropriately explains the above results, with the mask stretched from nose to mouth. In this case, a sound absorbing effect can be predicted when the resonance frequency of the system is verified, in accordance with Eq. (1):  mask, the solid is the mouth and the typical weight range is about 2-8 g. Masks are worn differently and this causes differences in the membrane free area, however balanced by mass variations. In this calculation, it is also difficult to define a perfect distance from the mouth as well as the free vibrating areas related to single masks due to the morphological diversity in human faces. However, ranges may be determined based on real mask weights and average facial differences, retrieved from the literature [64][65][66]. Thus, a range starting from about 1600 Hz to about 2200 Hz can be assessed. This is clearly consistent with what is shown in Fig. 9, where the major difference peak starts to rise at 1600 Hz, presents its maximum at 2000 Hz and then decreases. The noise limitation effect observed at higher frequency ranges could be caused by sound absorption of the textiles (masks) or transmission loss produced by transparent polymer (face shields). In this light, the morphology, shape and components of the face protections act together to limit the sound wave generated by the source. When analyzing the male-female voice difference, it may be helpful to consider the A-weighted reference speech levels used to calculate STI male and STI female. In Fig. 14, the STI male and female -weighting trends (expressed in percentage of reduction) are reported and combined with the range where the face protections mostly act.
It is evident how face protections act in a range where the female voice presents a higher percentage reduction than the male voice, with a difference of around 5% (3 dB in the standard). Thus, the STI female is influenced less by wearing face protections than the male STI, because the weighting penalizes the female voices more, where the masks provide the most efficient sound reduction.

Conclusions
In this study, the influence of face protections on sound propagation in classrooms was experimentally investigated. Two different scenarios were considered, featuring very different indoor sound fields (more reverberant, less reverberant). In both scenarios, 10 different masks and face shield were tested. Speech intelligibility index, sound pressure levels and emission lobes were assessed.
The main findings are summarised as follows:

4) Speech Transmission
Index for male and female voices present different reduction patterns. In particular, female speech is less influenced by face protection compared to male speech. This is due to the membrane resonance natural frequency and textile sound absorption. 5) The most affected frequencies lay in a range, which starts form 1600 Hz and ends at 6300 Hz. The most affected octave band is 2000 Hz for all face protections. In this frequency, a minimum of 8 dB and a maximum of 14 dB reduction was recorded, depending on the type of device worn and chosen position in the classroom. 6) All of the most influencing face protections present similar 3D reduction patterns along the indoor acoustic field in both scenarios, but with different values. It thus can be concluded that the indoor sound field will be affected in the same way in most of the cases (different face protection), but with some differences in noise reduction. 7) Source emission and directivity is affected by face protection, especially by face shields. In this case, mostly at high frequencies, the emission lobe is always split into two parts in both scenarios. This negatively affects sound perceptions, as the source is not clearly identified in the space.
Finally, it can be concluded that in classrooms surgical masks should be adopted as they protect from mouth spray and do not significantly reduce speech intelligibility and indoor sound field, both without and with a transparent window for lip reading. Other types of tested facial masks significantly influence speech transmissions and thus teacher-student communication in learning environments.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.