One year of surgical mask testing at the University of Bologna labs: Lessons learned from data analysis

The outbreak of SARS-CoV-2 pandemic highlighted the worldwide lack of surgical masks and personal protective equipment, which represent the main defense available against respiratory diseases as COVID-19. At the time, masks shortage was dramatic in Italy, the first European country seriously hit by the pandemic: aiming to address the emergency and to support the Italian industrial reconversion to the production of surgical masks, a multidisciplinary team of the University of Bologna organized a laboratory to test surgical masks according to European regulations. The group, driven by the expertise of chemical engineers, microbiologists, and occupational physicians, set-up the test lines to perform all the functional tests required. The laboratory started its activity on late March 2020, and as of the end of December of the same year 435 surgical mask prototypes were tested, with only 42 masks compliant to the European standard. From the analysis of the materials used, as well as of the production methods, it was found that a compliant surgical mask is most likely composed of three layers, a central meltblown filtration layer and two external spunbond comfort layers. An increase in the material thickness (grammage), or in the number of layers, does not improve the filtration efficiency, but leads to poor breathability, indicating that filtration depends not only on pure size exclusion, but other mechanisms are taking place (driven by electrostatic charge). The study critically reviewed the European standard procedures, identifying the weak aspects; among the others, the control of aerosol droplet size during the bacterial filtration test results to be crucial, since it can change the classification of a mask when its performance lies near to the limiting values of 95 or 98%.


Introduction
SARS-CoV-2 appeared in Italy in early February 2020, and soon the outbreak spread suddenly, although at different times and at different rates for each Italian region. Italian government ruled measures to contain the COVID-19 epidemic in March 2020 [1]. First, the National Health system reached its limits and intensive care units (ICU) were soon collapsing. In that scenario, the lack of personal protective equipment (PPE) and the shortage of facial masks for healthcare workers and for the general population was dramatic.
A medical face mask is a device covering the mouth, nose and chin ensuring a barrier that limits the transition of an infective agent between the hospital staff and the patient [2]. They are used by healthcare workers to prevent respiratory droplets and splashes from reaching the mouth and the nose of the wearer, and to contribute to reduce and/or control at the source the spread of large respiratory droplets from the person wearing the mask. To a certain extent, they also protect the wearer from external infectious contaminants [3]. The protection offered is also limited by the loose fit between the mask edge and the wearer's face, which leads to leakages in the range 12-25% [4]. Such protection is not a temporary problem strictly correlated to the current sanitary emergency, but it will probably last for a very long time as a new awareness of work and lifestyle to prevent other health disasters like the COVID-19 pandemic rises. In addition to adequate spacing, hygiene rules and behavior, the use of PPE and medical devices as facial masks represents the main defense available [5]. It is estimated that there is a monthly need in Italy of about 90 million facial masks, with a daily consumption for each hospital of about 10 thousand masks per day. Therefore, we witnessed a reconversion of some industries to tackle the severe shortage of protective masks, in short time to face the outbreak at the initial stage.
A medical face mask needs to comply to specific standards (i.e. EN 14683:2019 [6] in Europe or ASTM F2100-19 [7] in the United States, or analogous standard), but during the COVID-19 outbreak, the Italian National Institute of Health (ISS) allowed, limited to the emergency period, the use of surgical masks without the CE mark after evaluation by the ISS. The procedure requires the execution of the different tests indicated by International Standard procedures, and in particular by the EN 14683:2019 European standard. Masks defined under such standard are classified into two types, according to their ability to capture contaminated breath, indicated as bacterial filtration efficiency (BFE), and their comfort in breathability. Specific limits are thus provided for both properties and indicated in the EN standard, which also identifies two different classes of surgical masks, namely class I with at least 95% of BFE and class II with better filtration performances (BFE ≥ 98%). A third class of medical masks, IIR, is also considered to resist to possible blood jets, which may be impacting the masks during a surgical operation. For such reason, the latter ones are also required to pass a splash test, related to the resistance to the penetration of a splash of synthetic blood at a certain velocity. Masks that meet the level 1 and level 2 requirements of the ASTM F2100-19 standard are very similar to type I and type II masks, of the EN 14683:2019 standard, respectively, although some differences can be observed. In particular, the thresholds for differential pressure are slightly different (less conservative in the ASTM norm), but more relevantly a further test is required by the American standard, namely the submicron particulate filtration, which is not considered in the European norm.
Additionally, the requirements of medical face masks include the microbial cleanliness (i.e., the bioburden of the mask) with a maximum limit of colony forming units that has to be respected [8,9]. Medical face masks should be also tested for their biocompatibility according to ISO 10993-5:2009 and 10993-10:2009 [10,11], which specifies cytotoxicity and skin sensitivity test methods to ensure the materials are not harmful to the wearer [12,13].
Nonetheless, the rapid diffusion of SARS-CoV-2 together with the overgrowing supply of protection devices prompted the laboratories to evaluate even the performance of homemade masks [14,15] while companies started to reconvert their production to increase the mask availability, together with other products essential to fight the COVID-19 pandemic [16].
Research laboratories worldwide started working on mask testing, for the evaluation of their performances, setting up apparatuses and test lines able to inspect the ability of the masks to comply to the standard requirements. Relevantly, no certified laboratories accredited to carry out such standard tests were present in the Italian territory in early 2020. Therefore, the government decided to allow universities and research institutes to set-up testing rigs aiming to support the industry reconversion to the production of new surgical face mask, and to provide a simple and straightforward procedure for the certification of such devices [17,18].
In this scenario, in Italy and in other countries, not only the setup requested by the EN standard were developed, but also simpler test rigs were proposed in order to have a faster screening method to evaluate mask effectiveness. Interestingly, Amendola et al. developed a system made of two interconnected chambers divided by the tested mask specimen. In the first chamber, the generated aqueous aerosol was loaded and the initial content of liquid particles is detected, while in the second one the aerosol was forced to pass by a vacuum pump and the droplets were counted after the simulated respiratory action by an optical particle counter [19]. That study measured the filtration efficiency as a function of particle dimensions of prototypes produced by reconverted industries, and the obtained data were compared with the results of certified medical face masks. The adequate filtration performance of medical masks tested was identified and measured equal to 97.3% (for aerosol particle sizes of d > 0.28 µm), significantly larger than the 84.4% of the non-medical prototypes produced with cotton and comparable with the 96.7% value obtained with woven-nonwoven fabrics [19]. It is noteworthy that in most of the breathing, coughing or speaking human operations, the size of the liquid droplets emitted from either the nose or the mouth is in the range 0.6-10 μm [20], potentially carrying the viral load [21]. Such size may vary in air shortly after the emission due to water evaporation, or droplet fragmentation or coalescence [22].
Whiley et al. proposed a different approach, replacing the Staphylococcus aureus strain indicated in the ASTM F2101-14 Standard Test Method with the bacteriophage MS2 due to its smaller dimensions (27 nm diameter) for the evaluation of viral filtration efficiency of fabric masks [23]. Various non-medical masks commercially available made of fabrics, essentially cotton, were tested and the performance obtained were compared with those guaranteed by surgical and N95 masks. Even in this case, the surgical masks resulted as the most efficient mask type for bacterial droplet filtration, while most of the fabric masks reached values of about 50% of viral filtration efficiency. Such value was found to increase when a vacuum cleaner bag/wipe baby is added as an alternative to a disposable pocket filter, even up to 98.8% [23]. Unfortunately, no data were collected in terms of air permeability, therefore good breathability through the masks cannot be ensured.
In this context, several Italian universities and research centers created a joint laboratory with the capability for testing of 120 masks over 3 months in terms of both bacterial filtration efficiency and breathability [24]. The analysis revealed that on the basis of 120 prototypes 54 (45%) satisfied the standards for Type I in terms of performance requirements for BFE and differential pressure, while 34 (28.3%) are compliant to Type II surgical masks (more stringent criteria in terms of BFE). The data collected showed a correlation between the material, the number of layers, and the surgical mask performances. Particularly, the masks made of nonwoven polypropylene with at least three layers (spunbond-meltblown-spunbond) showed the best results, while masks made of woven/knitted materials, including pure cotton and cotton/ artificial fibers, showed poor performances in terms of both BFE and breathability, or at least in one of them [24]. Interestingly, the same study reported that the consistency of the various test sites of the joint laboratory and of the procedures adopted was mainly based on two parameters, considered the key numbers for the prototype characterization: the Mean Particle Size (MPS) of the aerosol and the Colony Forming Units (CFUs) of the positive controls used for BFE analysis. The EN 14683 standard, indeed, requires MPS to be in the range 2.7-3.3 µm, while CFUs positive in the range from 1700− 3000 to ensure a reliable evaluation of the mask efficiency at bacterial filtration.
Various reports and experimental data on Type I and Type II masks are available in the scientific and technical literature, while the third category, Type IIR, whose main use is for surgical operations with possible blood splashes, is only partially analyzed. Douglas et al. tested the so-called Fluid Resistant Surgical Masks (type IIR candidates) to inspect their ability to block smoke particles (0.1 μm), used to simulate SARS-CoV-2 particles (0.12 µm) [25]. Five minutes of intense smoke exposure was estimated to be the equivalent of an 8-h working shift. Such masks revealed no protection to inhaled smoke particles. Modifications with tape and three mask layers proved to increase the filtering ability of the masks, but they were not considered fully suitable for use. However, it should be stressed that medical masks have been developed to block contaminants exhaled from the wearer, and not to ensure the cleanliness of the inhaled air, and thus they cannot be considered as equipment for complete respiratory protection; in this light, the very thin particles dispersion of the smoke used as a probe can hardly resemble the typical droplets size distribution generated by human breath, where viruses and bacteria are encapsulated in liquid particles with a dimension significantly greater than 0.1 μm [26].
The impermeable outer layer of Type IIR masks was investigated in detail by Melayil et al., who inspected the surface wettability after the application of a superhydrophobic coating. That was found to be very critical for face masks, as it splits the aqueous particles released by breathing, thus giving rise to a number of small droplets that can linger in air for longer times, eventually contributing to the transmission of potential viral loads [27].
The mass usage of surgical masks by the whole population raised some concerns related to long time use of such devices, including the possibility that an excessive CO 2 content may remain in the breathing zone of the mask over time. For this reason, various studies were carried out on medical masks, cloth masks and KN95, aiming to evaluate the CO 2 level in the breathing area [28], and some physiological parameters of the wearer [29,30]. Interestingly, no appreciable differences were observed between the three types of face masks tested in terms of CO 2 content within the mask. The surgical mask was tested under different conditions: at work in an office setting, during slow walking and fast walking. The concentrations of carbon dioxide measured ranged between 2100 and 2900 ppm, which are values quite high if compared with the ones corresponding to normal carbon dioxide concentrations in indoor environments, equal to 500-900 ppm, but far below the threshold values typically identified in the literature as hazardous for the CO 2 toxicological effect when inhaled [28].
To address the emergency and to support the Italian industrial reconversion to produce surgical masks, a multidisciplinary group of the University of Bologna created a laboratory to test surgical masks according to the European regulations and complying to the EN standard. The group, driven by the expertise of chemical engineers, microbiologists, and occupational medicine doctors, set-up from scratch the four different test lines to perform all the functional tests required by EN 14683:2019. To our knowledge, the laboratory was the first Italian laboratory able to completely test surgical masks according to the European standard.
This work reports the effort dedicated to the creation of the interdisciplinary laboratory, and the different preliminary characterization suitable to verify the accuracy and the appropriateness of the test line and procedures. The results obtained from the mask tests are also reviewed, with the aim to illustrate the effectiveness of the industrial reconversion to such productions and to inspect general correlations of the performances with materials and structure. Last, some considerations to the test standard and procedures are provided, based on the experience gained in this emergency time and the different perspectives of variegated expertise of the multidisciplinary team.

Materials and methods
The ideation, design and construction of the test rigs for the four different experiments was carried out according to the requirements and test methods for medical face masks indicated in the European standard EN ISO 14683:2019 [6] and referenced ones: • EN ISO 10993-1, Biological evaluation of medical devices -Part 1: Evaluation and testing within a risk management process [31]; • EN ISO 11737, Sterilization of health care products -Microbiological methods -Part 1: Determination of a population of microorganisms on products [32]; • ISO 22609, Clothing for protection against infectious agents -Medical face masks -Test method for resistance against penetration by synthetic blood [33].
In particular, the actual regulation requires to analyze the mask prototype by means of the following tests: • Differential pressure (also called "breathability"): It is the direct measure of the respiratory resistance provided by the mask, evaluated as pressure drop at fixed flow, and it is thus correlated to the effort required by the wearer in order to breathe with conventional inhalation/exhalation rates. Taking advantage of the uniformity of the surgical mask, the tests are executed on a representative section, a circular sample of 25 mm of diameter (i.e., 4.91 cm 2 of area), in which the pressure drop need to be measured when impacted by 8 L/min air flow.
Such test is of paramount importance not only for the respiratory effort for the wearer, but also because, unlike respirators (personal protective equipment, PPE), surgical masks do not contain specific elements that provide a good adhesion with the face, and thus the sealing of the equipment is poor. Hence, if the resistance to the airflow caused by the masks is too high, a fraction of the air passes through the boundaries instead that through the mask, reducing the protection offered by the device.
• Bacterial filtration efficiency (BFE): The test represents the direct evaluation of the effectiveness of the filtering device, as it measures the number of liquid droplets containing bacteria that permeated through a circular sample (80 mm diameter) fed by two-phase mixture (gas + liquid droplets containing bacteria). The efficiency is calculated from the ratio between the number of droplets (i. e., bacteria) permeated vs. the ones fed to the sample mask, being the latter directly measured from a "blank" experiment (i.e., with no filtering mask).
• Resistance against penetration by synthetic blood (also called "splash test"): The test aims to verify the protection offered to the operator wearing a surgical mask with respect to a blood squirt, which may occur during surgery or emergency operations. The test is required for Type IIR masks only. The penetration of a certain fixed volume of blood simulant a frontal impact with the sample at known velocity is evaluated by a simple visual analysis. That is required to ensure that the blood does not reach the internal layer of the masks, and coming thus into contact with the operator's lips and nose.
• Sterilization of health care products ("Bioburden"): The test aims to quantify the number of microorganisms present on the mask before wearing it for the first time. Although it is crucial for the final evaluation of a surgical mask to be made available in the market and used by the population, it may be considered of a minor importance with respect to the previous tests when a mask prototype is first evaluated to determine its performances. To some extent, the bioburden evaluates the cleanliness of the production line and the effectiveness of cleansing and packaging lines after production.
• Biological evaluation of medical devices: The test inspects the biocompatibility of the masks towards human face and body in general. Basically, it requires the assessment of the possible biological effects caused by the mask on the wearer's skin upon exposure, including for instance skin irritation, cytotoxicity, sensitization, intracutaneous reactivity, etc. Such tests were not carried out in our laboratory and will not be described further.
The first three tests evaluate the performance of surgical masks, and the classification in the three categories is carried out according to the threshold values reported in Table 1.
It is noteworthy that the EN standard uses quite unusual units for the differential pressure, pressure per unit area, whereas pressure drop should be irrespective of specimen area, and if the latter is impacted by an air flow with the same velocity, which is 0.272 m/s in the case indicated in the standard. However, no ambiguity in the requirement is observed as the specimen size (and thus active area) is clearly indicated, and the differential pressure needs to be lower than 196 and 294 Pa for Type I/II and Type IIR, respectively. Fig. 1 illustrates the layout of the test rig, which follows the guidelines indicated in the EN standard (Fig. 1a), and a picture of the apparatus set-up in our laboratory (Fig. 1b); it contains the sample holder, a U-tube manometer (maximum reading 2 kPa, accuracy 1 Pa), a flow meter (range 0-500 L/h), and a regulation valve.

Differential pressure
In the apparatus set-up during the first Covid-19 outbreak in Italy, two different configurations were adopted, one in close agreement with the EN standard, in which the air flow (and thus the pressure difference) was generated by a vacuum pump that pulls air from the section downstream to the mask sample, with upstream section connected to the atmosphere. A second configuration considered the upstream compartment in slight overpressure with compressed air and with the downstream sections at atmospheric pressure. The results obtained using the two configurations were compared after testing different samples, and an average difference of (1.6 ± 0.8)% was observed in the differential pressure measured at the inspected flow rates (see the results reported in the Supporting Information for the differential pressure tests performed on three samples, Table S1). Relevantly, such difference is significantly lower than the average relative standard deviation resulting from the analysis of all the tests performed (equal to 6.9%), which is mainly attributed to the unevenness of the five samples analyzed in each test. Thus, the latter configuration was implemented in the light of a slightly easier set-up and the absence of regions in the test rig with pressure under vacuum, thus ruling out any possible infiltration of external air.
In brief, compressed air was fed to the apparatus and its flow rate controlled by a dedicated valve and by means of an analogic flowmeter. The sample holder was composed by two stainless steel T-pipes, having an internal diameter of 25 mm and tri-clamp connections at all the extremities, used as reported in Fig. 1b for the connections with the differential manometer.
The samples were prepared by punching with a hollow cutter (25 mm diameter) the prototype mask in different positions. Then, after appropriate conditioning for at least 4 h at 85% R.H. and room temperature (obtained by potassium chloride KCl supersaturated solutions [34,35] in a closed box), the samples were clamped between the connections of the two tubes and sealed with flat, rubber ring gaskets (internal diameter of 25 mm) placed above and below the specimen to ensure tightness and the correct sample cross-sectional area.
The experimental procedure was developed in order to minimize the experimental error and to ensure the repeatability of the tests, considering the measure of the pressure drop at multiple flow rates, namely 100, 200, 300, 400, 450, and 500 L/h, at least twice per each specimen. A linear correlation was detected in all cases, and the slope allowed the determination of the pressure difference at 8 L/min (480 L/h). The final value reported for each specimen was the arithmetic mean of the different measurements (at least two), after its division by sample area, as indicated in the EN standard. An example of the linear correlations resulting from the measurement for a typical surgical mask sample is reported in Fig. S1.

Bacterial filtration
According to the EN standard indications, the design of the BFE apparatus required several specific instruments, as indicated in the layout of the system in Fig. 2a. Due to the full lockdown during the pandemic outbreak in spring 2020 and the extreme urgency in the emergency period, not all such pieces of equipment were readily available on the market. Therefore, we were forced to adapt components already present in our laboratories, spare parts, and even pieces disassembled from other apparatuses. Some components were generously donated by companies, citizens, or other departments of the University of Bologna. Over the subsequent months and for the whole 2020, the initial version of the apparatus was then improved.
The first set-up of the BFE apparatus, illustrated in Fig. 2b, was installed in a disused surgery room at the University hospital, Policlinico Sant'Orsola, which ensured the sterile conditions required and security for the workers. The air, coming from the sterile room, was fed in a 1.5 m, transparent, polymethyl methacrylate (PMMA) tube, with an internal diameter of 80 mm and flanged at both ends. At half of the tube height (700 mm), a pressure driven nebulizer (Collison single-jet nebulizer by CHT technologies) created a Staphylococcus aureus aerosolized suspension. The EN standard requires a glass tube with an internal diameter of 80 mm and a height of 600 mm, from the top of which the aerosol enters. The excess height of the tube (not involved in the aerosol flux) and material of the tube used were considered not to be influential. Regarding the pipe material, no interferences with the aerosol bacteria were observed, as shown by all the negative control runs executed during the entire period of investigation that demonstrated the absence of tube contamination. In all such tests, not here reported, the air flow with bacteria-free droplets was collected in the impactor (as it will be later described) and no bacteria colonies were detected. About the tube height, the aim was to provide a chamber long enough to ensure the uniform mixing between the air and the bacteria aerosol. Since the aerosol enters the tube at 70 cm from the bottom, where the mask is placed, the part of the tube above is not influent from a fluid dynamic point of view, and the 10 cm of additional pathway is not expected to reduce the mixing phenomena or the pressure to any significant extent. However, after the emergency and by the end of the lockdown period, the PMMA tall tube was replaced by a standard glass tube, prepared strictly complying to the EN standard. Since no differences were observed in the results, it will be simply indicated as BFE tube in the following.
The two-phase mixture produced in the cylinder reached the bottom, where the mask sample was placed (between two flanges) after being cut to the size requested by the EN standard. The bottom flange was then fillet to the underneath impactor by a connection specifically designed by the software Autodesk Inventor 2019 and produced by a 3D printer (Prusa i3 MK3S+, Prusa Research a.s., Prague, Czech Republic). The bacteria containing droplets permeating through the mask sample were collected in an Andersen 6-stages impactor, generously donated by Cavazza Anna sas (Bologna, Italy). The impactor was able to collect droplets of different sizes, with diameter down to 0.65 µm.
To avoid any possible diffusion of bacteria in the environment, smaller contaminated droplets were collected in a vacuum trap, preceded by a glass, water cooled condenser. To this aim, two different configurations were considered, i.e., the initial one used the tap water that was available in the operating theatre, the later one exploited a mini-chiller that was implemented in the system to reduce the water consumption and to ensure proper cooling also during the warmest periods.
The droplet-free air finally reached a flow indicator (Bronkhorst elflow, range 0-100 L/min) generously donated by IMA SpA (Ozzano Emilia (Bologna), Italy), a valve for flow regulation and a vacuum pump that provided the required driving force for the air flow.
The aerosol used during the tests consisted of a bacterial suspension of Staphylococcus aureus ATCC6538, which was prepared at an initial concentration of 5 × 10 5 CFU colony forming units (CFU) /mL by diluting 7000 times a bacterial culture with 1.8 McFarland turbidity in 50 mL of modified peptone water (peptone 5 g/L, NaCl 5 g/L). The inoculum preparation allowed to obtain approximately 1.7-3 × 10 3 CFU in the control Petri dishes (as suggested by the EN standard).
The bacterial aerosol was transported inside the tube with a 28.3 L/ min flow rate, by means of the vacuum pump located at the end of the test line. That allowed the aerosol to pass through the mask, and the bacteria that were not retained by the mask were collected in the stages of the impactor, where 6 Horse Blood Agar Petri dishes were placed, one plate within each impactor stage. The bacterial colonies (CFU) grown on these plates were counted after 24-48 h incubation at 37 • C and then corrected using the positive hole correction table (Fig. S2 reports the relationship between the number of counted CFU and the corrected value). To test a prototype, 8 runs are required: -2 positive control runs: performed in absence of the mask sample, to evaluate the number of CFU delivered to the system; -5 mask runs: one for each mask sample; -1 negative control run: only air flows through the system in the absence of both the mask sample and the nebulized bacterial aerosol, to assess the absence of contamination inside the system.
The BFE of each mask "j" was calculated using Eq. (1): where CFU PC is the average of all the CFU collected by the impactor during the two positive control runs and CFU j is the total number of CFU collected while testing the sample "j". The BFE of the prototype was finally calculated as the average of the BFE of the five samples tested. Interestingly, the EN standard contains specific indications about the conditioning protocol for the specimens (room T at 85% R.H. for at least 4 h), while there is no mention about the humidity of the feed air used for the test, although it is very relevant in the case of multiphase transport of air containing water droplets. The atmosphere in the operating theatre was maintained in rather dry conditions, approximately 30% R.H., thus requiring a humidification of the feed air entering the top of the cylinder, in order to simulate human breath and ensure the required reliability of the results. The air sent to the top of the cylinder was conditioned by bubbling in demineralized water, contained in a poly vinyl chloride (PVC) tube. The bubbler was connected by a Tjunction to the top of the BFE tube and to the atmosphere and the conditioned air was sent in excess to the system. This procedure ensured that all the air that was entering the apparatus (28.3 L/min) was conditioned, while the portion exceeding was wasted in the atmosphere. The relative humidity was steadily measured by a digital thermohygrometer (XS UR 200, XS Instruments, Modena, Italy) inside the BFE chamber and the desired value of 85% was achieved and kept constant by regulating the height of the water column. Small variations, in the order of ± 1.5 % R.H., were usually observed and were ascribed mainly to the uncertainty of the thermo-hygrometer. The water contained in the bubbler was changed every day, to reduce the possibility of contamination. The evaporation rate was not sufficient to cause an appreciable variation of the height of the water column and, indeed, of the R.H. of the air sent to the system, which was observed to be constant during the entire working day.

Droplet size and air humidification
The aerosol generation was obtained by means of a standard nebulizer. In order to ensure that the normative requirements are met, in terms of mass of liquid nebulized and droplets dimension, the following procedure was adopted to find the correct operative process.
2.2.1.1. Measurement of the amount of generated aerosol. The nebulizer filled with the operative fluid was turned on, kept at a fixed pressure for 30 min, and the amount of aerosol generated (liquid fraction) was weighted on an analytical balance every 10 min; the volumetric flowrate of aerosol produced in each interval "j", Q ap,j , was calculated by dividing the difference between the starting, m i , and the final weight, m f , for the density of the water (considered equal to that of the operative fluid) and for interval time: The amount of liquid nebulized was the arithmetic mean of the value of the three intervals.

Measurement of the droplet size distribution (DSD).
At the outlet section of the nebulizer, a laser diffraction system Spraytec (Malvern Panalytical, UK) was used to inspect the size of the generated liquid droplets generated by the nebulizer. Measurements were performed 20 mm downstream the outlet pipe of the nebulizer to minimize the effects of the evaporation rate. This choice was made to characterize the nebulizer before installing it in the experimental loop, the check on the droplet size distribution was repeated each week to control that the performances of the nebulizer was maintained over time. More in detail, the DSD analysis can be carried out and reported in different ways, and the numerical frequency curve of the different particle sizes may be reported on a numerical or volume basis. The relative numerical frequency, f i,n , in each size class was evaluated as: where n i is the number of the droplets in the i-class and n tot is the total number of the droplets. However, in order to account for the larger volume and weight of larger droplets with respect to the smaller ones, and consequently a larger number of bacteria contained, the volumetric frequency, f i,v , was also evaluated as: where j is the number of analyzed classes and d i is the mean diameter of the droplets in the i-class. The obtained DSD and the effective diameter associated to the generated aerosol can be characterized by lumped parameters, and it is thus useful to define, in an objective way, the mean diameter as: According to the values of i and m, it is possible to compute the numerical diameter (d 10 ), the surface-volume mean diameter (d 32 ), usually called Sauter diameter, and the volume-weighted mean diameter (d 43 ), usually called De Brouckere diameter. It should be also important to define a lumped parameter giving a quantitative information on the uniformity of the DSD to avoid that different laboratories working with same mean diameter could obtain different results because of the different DSD.
A parameter that could be simply evaluated from the experimental data of DSD is the Span factor defined as: where d 0.9 is the diameter below which 90% of the total volume of the droplet distribution is contained; in the same way d 0.1 and d 0.5 are defined, which refer to 10% and 50% of the total volume, respectively. Fig. 3 shows an example of the droplet size distribution that could be used for the mask test. In this case, the droplet size distribution is characterized by d 0.9 = 12.4 μm, d 0.1 = 1.76 μm, d 0.5 = 4.28 μm and S = 2.48.

Calculation of the mean particle size (MPS) produced.
The MPS of the aerosol generated was verified by placing the atomizer immediately above the impactor during two consecutive positive control runs (indeed, in absence of the mask sample) and by the analysis of the granulometric distribution of the droplets collected. The MPS was calculated using Eq. (7).
where P i and C i are the size and the corrected number of the viable particles collected in each stage "i", respectively.
2.2.1.4. Calculation of the mean particle size (MPS) at the mask sample. The analysis of the granulometric distribution of the droplets collected by the impactor during the positive control runs of a conventional BFE test (with the atomizer placed 70 cm above the sample) was used to estimate the MPS of the aerosol right below the sample holder zone (about 10 cm). Again, the MPS was calculated using Eq. (7).
To evaluate the effects of the relative humidity of the air on the evaporation and coalescence of the droplets while flowing through the BFE tube and, indeed, on the measured BFE of a surgical mask, two prototypes were tested using dry (30% R.H.) and humid (85% R.H.) air.

Splash apparatus
The blood resistance test rig was assembled as indicated by the EN standard. The whole apparatus is shown in Fig. 4. In brief, the apparatus was equipped by a specimen-holding frame, able to accommodate the whole surgical mask prototype, and of a fluid dosing system in which a pneumatic controlled valve dispensed a specified volume of blood in a jet stream (Fisnar JB1113N, NJ, United States). The dosing system made use of compressed air to activate the syringe, equipped by a cylindrical needle 12.7 mm long and with a 0.83 mm diameter, as requested by the EN standard. Compressed air was used by the instrument to regulate the pressure (double checked by an additional external manometer, Druck PTX-1400, UK), while the valve opening time, controlled by the dosing system, allowed to control the volume of synthetic blood directed towards the mask surfaces. More in detail, the whole frame was fabricated according to the specifications and the quotes indicated in the EN standard, while the mask-holder was produced by 3D printing.
The synthetic blood was prepared according to the following method. 5% w/v of poly(ethylene glycol)-b-poly(propylene glycol)-bpoly(ethylene glycol) (Pluronic® F-108, Merck, Italy) was added to 1 L of distilled water previously boiled for at least 5 min. The solution was kept under agitation for 1 h and subsequently sonicated for 15 min. 30 g of Rhodamine (purity > 95%, Millipore-Sigma USA) was added to the solution with a further agitation for 40-60 min, using an orbital shaker.
The key characteristics of the synthetic blood that govern the splash tests were the fluid surface tension and density. In particular, the surface tension of the synthetic blood was measured in triplicate by the pendant drop method using a Theta Lite tensiometer (Biolin Scientific, Sweden). Density was measured in triplicate using a 1.0 mL Hamilton syringe and an AX224 Sartorius balance (Lab Instruments GmbH & Co. KG Goettingen Germany) with 0.0001 g precision. The solution obtained was characterized by a surface tension equal to 41.45 mN/m and a density of 1015 kg/m 3 .
Before each test, the sample was preconditioned in a chamber with a relative humidity of 85 (±5) % at room temperature 21 (±5) • C and the pressure of the air and the volume of the blood sprayed were checked.
After conditioning, the sample was mounted on the mask-holder and a synthetic blood jet of 2 mL was sprayed at a pressure of 16.0 kPa, considering a distance from the needle to the mask of 300 (±10) mm and the center of the specimen as the target area. After testing, surgical face mask pass/fail evaluation was based on simple visual detection of synthetic blood. However, mask samples with particular color, written or draw required the use of talcum powder to rule out the possibility of blood trail, as shown in Fig. S3.
The EN standard makes use of the criteria of acceptance quality limit (AQL) as to understand whether to accept or retain the mask prototypes, in general the AQL represents the worst tolerable quality level as defined by the standard ISO2589-1 [36]. In particular, EN 14683:2019 requires an acceptance quality limit (AQL) of 4%, which is defined on a number of samples equal to 32. In the first stage of the pandemic outbreak, however, the number of mask prototypes produced by the reconverted industrial productions was typically much smaller, and therefore not enough specimens were available for such analysis. Hence, aiming to rescale the 4% AQL onto a smaller number of samples, a set of 5 different surgical masks of the same prototype was classified eligible as IIR mask if all 5 tests were successfully passed.

Bioburden
Microbial cleanliness test (Bioburden) was carried out under aseptic conditions by separately treating 5 masks (randomly chosen within the mask batch) in sterile bottles containing 300 mL of extraction liquid (composed of 5 g/L NaCl, 1 g/L peptone and 2 g/L Tween 20). After 5 min of vigorous shaking (200-250 rpm), 100 mL of the extraction solution was filtered through a 0.45 µm filter placed in a 1225 Sampling Manifold (Millipore-Merck, Darmstadt, Germany) that allowed the simultaneous vacuum filtration of 12 samples. The filter was then placed upon Tryptic Soy Agar (TSA) plates for the enumeration of bacterial CFU after 3 days of incubation at 30 • C. Additional 100 mL of the extraction solution was filtered in the same way and the filters were placed on Sabouraud Dextrose Agar (SDA) plates with chloramphenicol for the enumeration of fungal colonies after 7 days of incubation at 22 • C. The total count of microbial CFU was therefore obtained by summing up the number of bacterial and fungal colonies obtained on the filters. The final "microbial cleanliness" parameter (or Bioburden) was calculated by dividing this number by the weight (in grams) of the mask. Bioburden procedure was done for each of the 5 masks (of the same type) under analysis to calculate the mean value and the standard deviation. A microbial cleanliness value <30 CFU/g of mask is required for Type I, II and III R masks. For each tested mask batch, to validate the sterility of the system, a control was carried out by filtering 100 mL of sterile extraction liquid in parallel to the extraction liquids derived from the 5 treated masks. Two control filters were placed on each of the two agar plates used for the mask analysis (TSA and SDA).
In a second phase, the bioburden system with the structure indicated in the ISO standard was acquired from SpeedFlow Manifold (Crami Group srl, Milano, IT) and used to test some mask batches and further validate the results obtained with the 1225 Sampling Manifold.

Results and discussion
To complete the analysis of the characteristics and performances of the mask prototype, and thus the investigation of its compliance to the EN 14683:2019 standard, a specific test order was followed. In particular, the differential pressure test was considered the first step on the road to certification. Such measurement, indeed, was quite simple and fast to perform, so it was used as first screening of mask prototypes, as it allowed to rapidly exclude non-certifiable masks for poor breathability reasons. Considering the first 10 months of operations of the laboratory, the success rate for differential pressure was about 64%, even though about 1 mask of 2 was found not complying with the breathability requirements during the first period of emergency. That is not surprising as, in the very first period, a very broad spectrum of prototypes was proposed for certification, including materials and configurations far from those of conventional surgical masks.
Concerning the masks that successfully passed the differential pressure test, about 60% of them presented a value lower than 40 Pa/cm 2 and were thus subjected to the second test on the road to certification, namely the bacterial filtration efficiency, BFE. The remaining 40% of masks with differential pressure values between 40 and 60 Pa/cm 2 were tested for the BFE, as they could be potentially classified as Type IIR masks (required filtration efficiency of at least 98%) upon verification of the splash resistance to synthetic blood.
Only the mask prototypes that passed the differential pressure test were thus tested for the BFE, and only 1 over 5 of the Type I and Type II candidates (i.e., differential pressure lower than 40 Pa/cm 2 ) showed a BFE value higher than 95% (15% of them above 98% of bacterial filtration, and 5% in the range 95-98%). In the case of masks with differential pressure between 40 and 60 Pa/cm 2 , only 27% of them could be potentially classified as Type IIR mask, showing a BFE value higher or equal to 98%. In order to be classified as Type IIR, after showing satisfactory results of both breathability and BFE tests, the prototype needed to comply to the requirements for the blood resistance (for blood jets produced at 16 kPa of pressure, corresponding to a velocity of 550 cm/ s), which had to be inspected by the splash test.
The final test to assess the EN standard requirements verified the microbial cleanliness of the masks, since it evaluates the bioburden of the entire mask that has to be lower than 30 CFU/g of mask. Bioburden is not a real mask performance, but rather it evaluates the cleanliness of the production and packaging process. Hence, a negative result should not be correlated with the mask performances, and it does not potentially affect the possibility of its certification, once a cleaner process is implemented for its production or a sterilization step (e.g., by UV-C irradiation [37]) is included in the production lines.
Finally, considering a total of 435 masks for which all tests were performed, only 42 of them (corresponding to about 10%) reached the finish line on the road to certification, while the remaining 393 masks (corresponding to about 90%) failed. Interestingly, looking at the 42 compliant prototypes, only 8 were classified as surgical masks of Type I (19% of the successful masks), 20 as Type II (48%) and 14 as Type IIR (33%) after the splash test analysis. Fig. 5 summarizes the overall test results with the indication of the reason for failing of the different masks tested.

Laboratory workload
The laboratory started the activity on 24th March 2020, only a few days after the publication of the ministerial decree (DPCM) on 17th March 2020 [38], and until 31st December 2020, more than 400 mask prototypes were tested, with nearly 1000 tests performed in total. The highest workload was equal to nearly 39 mask types per week during the period of March-April 2020. However, such large number of tests, unfortunately, did not correspond to a large number of masks suitable for certification. In particular, during March and April 2020 only 6 masks out of the 193 prototypes tested in our laboratory (3.1%) satisfied the requirements of the EN standard (Fig. 6). That was a direct consequence of the lack of fundamental and technical knowledge in the fabrication of medical masks, and thus the samples produced by reconverting any type of production lines, mainly for clothes, various types of fabric, but also sanitary pads, swim costumes or available nonwovens, were inadequate either due to their poor filtration ability or to a limited breathability (i.e., too large pressure drop). Over time, thanks to an increasing knowledge of main requirements for such medical devices, accompanied by the increased availability of technical filter materials, specifically designed for similar applications, such as meltblown or spunbond/meltblown/ spunbond (SMS) filter materials [39], the percentage of prototypes successfully passing all the tests increased to 30.8 %, which were potentially ready to enter the market during November and December 2020. Interestingly, the number of tests carried out dramatically decreased over time, as the number of new prototypes developed by the various industries decreased. Fewer new systems were proposed, but with a more solid knowledge on how to make a working surgical mask, to the point that the success rated increased significantly (see Fig. 6).
Interestingly, the large number of prototypes tested allowed a qualitative analysis on filter material and morphology, and its production method. Woven, nonwoven, and knitted materials are the three main categories that can be identified according to the material production method. Fig. 7 reports scanning electron microscopy (SEM) analysis, illustrating the most frequent morphologies of the main types of masks tested.
In general, a cloth mask is composed of one or two layers of fabric, typically cotton (Fig. 7a-d). Conversely, most surgical face masks consist of three layers. In particular, the external and the internal layer of the mask are made of polypropylene nonwoven (PP), commonly named spunbond (Fig. 7 e, h, g, j). The middle layer is the actual filtering layer that is usually made of polypropylene as well. The nonwoven meltblown in the middle layer is composed of randomly oriented fibers with a very low diameter (Fig. 7 f and i). The filtration efficiency of meltblown nonwovens is mostly related to the low diameters of the fibers and to the electrostatic charge acquired by the fibers during the meltblown production process.
The raw material used, the filter morphology, and processing techniques are crucial for the achievement of the required performances and the optimal balance between differential pressure and filtration efficiency [40]. In filtration science a trade-off may be defined for these two quantities, as an increase in filtration efficiency often leads to poorer gas transport through the filter, and thus to an increased differential pressure (poor breathability) in this case [41,42].
In general, the capture of contaminated aerosol, i.e., the filtration process, occurs via different mechanisms, spanning from gravity sedimentation to inertial impaction, interception and diffusion, but also thanks to electrostatic attraction [43].
The woven and knitted mask prototypes tested in this work were frequently made of cotton, but in some cases also by polymeric fiber materials, such as polypropylene (PP) or polyethylene (PE). Nonwoven masks were typically produced by meltblowing or spunbonding techniques or by their combination. PP was the material prevalently used for producing meltblown filters, while SMS, a three layers bonded material, was mainly composed by two polyethylene terephthalate (PET) spunbond external layers and a middle polybutylene terephthalate (PBT) meltblown layer.
The weight per surface areagrammagemay be considered indicative of the thickness of the mask layers or of the density of the fibers. The grammage is measured weighting the mass of each sample and dividing it by the area. Interestingly, there is a certain variability in the resulting weight, to be attributed to certain inhomogeneity in the mask samples; the calculated error is about 10% of the average value. Such extensive work evidences data reproducibility and the consistency of the analytical method for a large number of samples made of the same material. Considering only the 42 masks that satisfied the standard and were certified, 5 (12%) were made with an SMS filter, 4 of which had a grammage of 70 g/m 2 and 1 of 93 g/m 2 . Of these masks, one was classified as Type I, one of Type II and the remaining as Type IIR. The other 37 masks (88%) were made using a meltblown filter, with an average grammage of 26.2 ± 2.5 g/m 2 .
The main difference between the SMS material that was tested and meltblown filters consisted in the different manufacturing techniques. A single PP meltblown layer presented a random fiber network with narrow fiber diameter distribution, and thus an electrostatic charge was added by corona poling method in order to increase its filtration performance. Conversely, an SMS is produced using an internal meltblown layer that showed wider fiber distribution including fibers of submicrometric diameters that enhanced its filtration performance, with no need of any additional electrostatic charge mechanisms to improve the filtration properties. Indeed, the increased mechanical filtration efficiency was obtained at the expense of the breathability, which resulted often lower in masks produced using SMS filters, with respect to all the other mask types tested. The nature and the grammage of the filter are not a guarantee for the production of a compliant face mask, as shown in Fig. 8, where one can notice that a significant number of prototypes made with a meltblown or a SMS filter failed the BFE. These results indicate the importance of the production process on the filter performance. However, the use of meltblown filters typically ensures a filtration efficiency significantly higher to spunbonds, whose measured BFE values were always below 80%, even at very high grammage.
To better understand the difference between a high-performing surgical mask made with a meltblown and a non-compliant mask made only of spunbond layers, two masks were tested: -Mask MB was a three layers surgical mask, constituted by two external spunbond layers and an internal meltblown filter, and showed an average BFE of 98.5 ± 0.6%; -Mask SB was a three layers surgical mask, constituted by three spunbond layers and showed an average BFE of 86.6 ± 3.5%.
For both the masks, the penetration as a function of the particle size was calculated as the average number of bacteria collected in each stage of the impactor, for the 5 samples tested, divided by the average number of bacteria collected in the same stage during the positive control runs. The results, reported in Fig. 9, show that Mask SB, despite the relatively high filtration efficiency, is able to completely block only very large particles, while it is much less effective against small aerosols compared to Mask MB.On the other hand, the spunbond masks offer a much lower resistance to the airflow compared to meltblowns and SMSs, as shown in Fig. 8a. For this reason, spunbond layers are typically used as external layers: the internal fabric is hydrophilic and absorbs the larger respiratory droplets emitted by the wearer, shielding the internal filter and reducing the formation of condensate; -the external fabric is hydrophobic and protects the filter from external contaminants; -both the fabric layers provide mechanical resistance to the mask.   7. SEM analysis of a two layers cloth cotton mask and of a three layers surgical mask at different magnifications. External layer of a cloth mask at 400× (a) and at 3000× (c); internal layer of a cloth mask at 400× (b) and at 3000× (d); external layer of a surgical mask at 400× (e) and at 3000× (h); middle layer of a surgical mask at 400× (f) and at 3000× (i); internal layer of a surgical mask at 400× (g) and at 3000× (j). This statement is confirmed by Fig. 10, which shows that the highest certification likelihood (90.5%) were represented by masks with three layers of filtering media.

Review of the experimental protocol and best practices
The development of the testing lab by the interdisciplinary team of researchers, with quite broad and diverse expertise, and the experience gained after testing a large number of prototypes in such short time during the pandemic outbreak, led to identify possible critical points of the test procedures and to draw some considerations on the experimental protocols.

Differential pressure
The values indicated in the EN standard as limiting values for classification, thus 40 and 60 Pa/cm 2 , are referred to a specific volumetric flow rate in the specific sample area, equal to 8 L/min, that corresponds to the estimated flow rate during breathing. The measurement on a single flow rate of the pressure drop may be prone to a larger experimental error, if compared to scanning a broader range of flow rates, including the one indicated in the EN standard. Such approach tends also to exclude possible errors due to instrument accuracy, or in the execution of the test, considering that in all cases a linear relationship between Q and ΔP was observed. In particular, the procedure developed that considers five different flow rates for each test was found Fig. 8. Performance of the 42 compliant surgical masks and of other prototypes that passed the differential pressure test and for which the composition was known: a) Differential pressure variation with grammage, b) BFE variation with grammage. Lines are guide to the eyes.

Effect of mask shape and wearability.
The standard for surgical masks (EN 14683:2019) requires only the evaluation of the pressure loss in a portion of the sample, relying on a flat and uniform configuration of the mask. However, during the practical use of the mask, the geometrical configuration is very different from the testing one. In particular, exhalation losses through lateral surface of the mask could significantly reduce the performances of the device, thus potentially increasing the infection risk [44]. The evaluation of these losses is requested by the EN standard for semi-facial masks like PPE respirators and the procedure requires an analogous set-up to the standard differential pressure tests described in this work, with the exception on the use of a dummy head to simulate the respiratory act and the whole mask area for testing. That evaluation should be addressed being relevant not only for the determination of the real breathability of the mask but also for other tests needed for the certification, such as the bacterial filtration efficiency, BFE.

Bacterial Filtration Efficiency
Since the capture of the droplets and the bacteria contained depends significantly on the size distribution of the liquid particles that impact the mask, the characteristics of the nebulizer and its performances, as well as the humidity in the column, may significantly affect the efficiency measured. Hence, it would be recommended to identify the cutoff for the separation, i.e., the size of the droplet that shows a 100% BFE, often named as dp100. Such data may be conveniently accompanied by the BFE with the effectiveness of bacterial capture for smaller droplets. Furthermore, the exhalation losses are surely very significant in the determination of the real rate of capture of bacteria (and viruses).
Correlated to the droplet size, the humidity in the column may have significant effects on BFE performances. Table 2 shows the effects of the air humidity on the MPS of the droplets that reach the impactor and, in turn, on the BFE of a compliant and a non-compliant prototype. Additional information about the variation of the performance of these two masks with the MPS are reported in Fig. S4.
An excessive dry air fed to the column eventually results in the evaporation of water from the droplets and ultimately leading to appreciably smaller liquid particles. The effects of a lower MPS on the BFE resulted to be negligible for the high-performing compliant mask, while a significant decrease was observed for the non-compliant mask. Such difference could be crucial in the evaluation of those prototypes whose BFE value is close to 95% or 98%, where a slight variation of the MPS could bring to an error in the mask classification. It is noteworthy that the EN standard requires a droplet size in the generated aerosol of about 3 μm, to be fed to the mask specimen, but no clear prescription is provided at the section downstream, i.e. at the mask sample height, although a simple method for particle size determination is provided, but mainly aimed to control the reproducibility of the tests.
Furthermore, the EN standard prescribes the use of modified peptone water as the base for bacterial suspension. Still, even if the range of inoculum concentration is prescribed, it becomes essential to verify that, during the positive control runs (before and after the mask is tested), the number of forming colonies remains approximately the same to rule out any possible variation of the colonies during the test. The composition of peptone water as indicated by the EN standard requires 5 g/L of peptone and 5 g/L of NaCl. In order to verify the effect of the peptone supply in the bacterial suspension, BFE control tests (without any mask in the system) were carried out using saline solution in place of the peptone water as bacterial suspension solution. As a result, the number of bacterial colonies grown on the recovery plates using the saline solution was significantly lower (50-80% lower) than the number of colonies detected using the peptone water. That result suggests the presence of peptone in the bacterial suspension is required to maintain bacterial vitality. Indeed, the production of bacteria-containing aerosol might represent a stressful condition that could be overcome by bacteria only in the presence of some nutrient (e.g. peptone) in the suspension medium. Therefore, the natural conclusion is that peptone is needed to preserve the biomass concentration over time.
Another consideration about the EN standard requirements concerns the bacterial culture used for testing. While EN 14683:2019 indicates the use of a Staphylococcus aureus strain, it may be recommended the use of an Escherichia coli laboratory strain (e.g., DH5α, JM109), as an alternative. Indeed, Staphylococcus aureus can form "grape-like" clusters, including more than one cell that might influence the BFE calculation based on the droplet size. Escherichia coli, on the contrary, has a rodshaped size single cell morphology in suspension, potentially limiting counting errors. Another option would imply the use of an Escherichia coli laboratory strain carrying an antibiotic resistance cassette (e.g., neo gene for kanamycin resistance, bla gene per ampicillin resistance) that  would allow to work in a non-sterile environment that typically inhabits various environmental microbial strains. In this context, the presence of an antibiotic resistance gene in the bacterial strain used in BFE would allow the selection of the only test strain by using recovery plates in the impactor that are supplied with the corresponding antibiotic. That would avoid the growth of possible other environmental strains that would affect the colonies enumeration and therefore BFE final calculation.
In addition, the EN standard requires an average number of CFU in the positive control runs (after the application of the positive hole conversion) in the range 1700-3000 CFU. To achieve such value, a very high number of CFU should be counted in each plate, especially that of stage no. 4 for which a typical number is in the range 375-395 CFU (an example of typical positive control runs is reported in Table S2). Since this number is close to the limit (400 CFU), the difference between the number of CFU calculated after the correction by the positive hole conversion table and the real value could be substantially different. Furthermore, such a high number of colonies tend to merge while growing and become difficult to be identified well before the limit of 52 h of incubation indicated by the EN standard. Based on experimental data, a number of CFU in the range 800-1200 provides similar results in terms of measured bacterial filtration efficiency and standard deviation among the 5 samples tested and make the CFU counting simpler and more precise. More information about the experimental results are provided in Table S3.
Last, the count of bacterial colonies (CFU) may become challenging if done by a visual inspection. For this reason, an apparatus consisting of a camera (GoPro HERO4, San Mateo, USA) and a free, open-source software for digital image analysis (OpenCFU, Q. Geissmann) was used in the present work. Such instrument allowed a higher efficiency in the test performance in a reduced amount of time, eventually increasing the accuracy of the results. Thus, it is highly advised an update of the EN standard toward an automatic CFU count using digital imagine analysis.

Splash test
The description of the test method, and of the criteria indicated in ISO 22609:2004, are rather qualitative in nature. That may potentially undermine the reproducibility of splash test between different laboratories. In particular, more stringent standards for the synthetic blood formulation should be provided in order to achieve similar fluid properties used in the test. The recipe used in this experimental campaign is carefully described in the Materials and Methods section.
The pass/fail procedure is also intrinsically qualitative. The mask is considered adequate as Type IIR if, after the blood spurt on the test area, no footprint of fluid permeation may be detected in the internal side of the mask. In several cases, we observed that a very little amount of the blood actually reached the internal side, hard to be detected on a naked eye, and it was necessary to use a powder, such as talcum, to confirm this trespassing (Fig. S3). Such observation leads to the conclusion that the result evaluation is highly dependent on the operator carefulness and thus it is highly subjected to a human error. Therefore, an evaluation through a digital image analysis should be encouraged.

Comparison with other works
The production of homemade masks and the use of cloth masks that was useful when approved masks were not available to the population is investigated by several authors [13,19,35,37]. While a selection of the best materials could be a useful indication [13] in spring 2020, the use of homemade masks must be discouraged. The study of Whiley et al. supports the use of fabric masks for community protection, still the levels of viral filtration efficiency (VFE) hardly reach those of surgical masks, so some kind of mitigation could be achieved but these devices should not be used as personal protection devices [19]. Now that either good disposable surgical masks or PPE, such as N95 or FFP2, are broadly available, the use of fabric masks should be avoided, as confirmed by the results obtained in this study, at least in case of good availability of certified masks. As a matter of fact, a non-adequate bacterial filtration even if not dramatically lower than the thresholds indicated by the standard norm, would imply a significantly larger spread of bacteria and viruses (e.g. a BFE of about 90% means a permeability of the pathogens at least doubled with respect to a Type I certified mask and a 5-fold larger than in the case of a Type II mask). Some fabric masks were reported to be effective in blocking large droplets, while they are all not capable to capture smaller particles, in which large number of pathogens are often present. Such feature is also supported by the tests above reported in Table 2, which shows how the BFE values of a non-compliant fabric mask are strongly dependent on the droplets size. It is worthy to point out, however, that the efficiency of surgical masks as a protection against COVID-19 may be limited by the lateral losses due to the mask geometry that could compromise the mask efficiency [45].
The use of an automatic methods for CFU count was also implemented by Pourchez et al. [9] confirming the need of a more reliable method for counting the bacterial colonies in the Petri dishes that is independent on the operator. We agree with their suggestion that a revision of the visual counting procedure reported in the standard EN 14683:2019 is needed. In the same work, Pourchez et al. compared different ways of measuring the droplet size, however these methods are all indirect techniques that rely on the use of the six stage Andersen impactor. In our opinion, the aerosol droplet separation efficiency should be experimentally evaluated measuring, using a laser instrument, the droplet size distribution and the droplet concentration both upstream and downstream the tested medical face mask, avoiding the use of the six stage Anderson impactor.
A study collecting the results of tests performed by several Italian laboratories confirmed the diversity of the performance on the masks produced (or imported) in Italy in the first pandemic wave. That reflects not only a masks material problem, but also the different equipment, setup and experience of the different laboratories that tested a limited number of mask specimens [20]. The results plotted as a function of the number of layers of the mask is of great importance towards the production of suitable masks, but the result dispersion is too large and it is not possible to obtain a clear correlation. In our case, the large number of tested masks allowed to identify that masks produced with three layers had the highest probability to comply with the EN 14683:2019 requirements. In general, the results obtained by the Italian large research group are consistent with those reported in this paper. However, a higher number of tested specimens allows to obtain a more general view of a relationship between the results. That can be observed by the data reported in Fig. 11, where the whole results obtained on testing the surgical mask prototypes are plotted in the same chart, with the standard deviation reported for each test performed. The values of such deviations have to be ascribed to the large difference between the prototypes, in terms of both BFE and differential pressure, and not to the error of the measurements. The data included are related to those specimens for which we carried out both the two main tests included in the EN standard. Since a general trade-off between differential pressure and filtration efficiency has to be expected, a general trend was inspected. As it can be seen in the plot, large breathability (i.e. very small pressure drop across the filter) is typically associated to poor BFE performances, whereas large BFE is obtained after a dramatic loss in breathability. However, due to the very broad spectrum of prototypes tested, in terms of materials, morphologies and filtration mechanism, a clear correlation cannot be detected.

Conclusions
The work reported the effort of a multidisciplinary team of the University of Bologna that led to the creation of a dedicated laboratory to test surgical masks according to European regulations during the first COVID-19 outbreak in early spring 2020, at the peak of the sanitary emergency. The four different test lines were set up very rapidly according to the EN 14683:2019 standard, breathability, bacterial filtration efficiency (BFE), bioburden, and splash test (for IIR type masks), aiming to support the reconversion of many companies to the production of such protective devices, and to help contributing to face the rising pandemic. In the first nine months of activity, 435 surgical mask prototypes and several materials for mask manufacturing were tested, with nearly 1200 tests performed in total. The test procedures and methodologies adopted helped to identify the questionable aspects of surgical mask characterization and possible ambiguities indicated in the standard.
The results obtained on such large number of mask prototypes with a wide variability of materials and morphologies allowed to identify general trends and correlations with the resulting performances, thus providing useful indications for the fabrication of an effective surgical mask. In particular, three layers systems resulted to be the most suitable solution for this application, being the ones with the highest success rate. Three layers face masks are composed of an intermediate filtration layer and two external layers. As for the filtration layer, meltblown and SMS fabrics are the only commercially available materials for the production of compliant surgical masks, in terms of bacterial filtration efficiency and breathability. Other materials (like spunbond or cotton) resulted to be not appropriate, especially in terms of filtration efficiency of very small respiratory droplets and aerosols. However, not all meltblown and SMS fabrics are equally efficient, especially the former in which the filtration efficiency depends also on the electrostatic charge possessed by the fibers. About the external layers, several materials can be used. However, the most common are polypropylene spunbond nonwoven fabrics: the external is typically hydrophobic while the internal can be coupled to other materials, such as cotton, to absorb large respiratory droplets and to increase the comfort for the wearer.
Furthermore, the critical aspects of the standard tests prescribed by the EN norm were identified, pointing out, in particular, that the control of aerosol droplet size during the bacterial filtration test is crucial in the evaluation of the filtration efficacy BFE, and it can affect dramatically the classification of a mask when its performance lies near to the limiting values of 95 or 98%.