Design of a Low-Cost Conﬁgurable Acoustic Sensor for the Rapid Development of Sound Recognition Applications

: Concerned about the noise pollution in urban environments, the European Commission (EC) has created an Environmental Noise Directive 2002/49/EC (END) requiring Member states to publish noise maps and noise management plans every ﬁve years for cities with a high density of inhabitants, major roads, railways and airports. The END also requires the noise pressure levels for these sources to be presented independently. Currently, data measurements and the representations of the noise pressure levels in such maps are performed semi-manually by experts. This process is time and cost consuming, as well as limited to presenting only a static picture of the noise levels. To overcome these issues, we propose the deployment of Wireless Acoustic Sensor Networks with several nodes in urban environments that can enable the generation of real-time noise level maps, as well as detect the source of the sound thanks to machine learning algorithms. In this paper, we brieﬂy review the state of the art of the hardware used in wireless acoustic applications and propose a low-cost sensor based on an ARM cortex-A microprocessor. This node is able to process machine learning algorithms for sound source detection in-situ, allowing the deployment of highly scalable sound identiﬁcation systems.


Introduction
The number of people living in urban areas has been greater than in rural areas since 2010; with around 50.5% of the world's population residing in towns or cities [1]. Moreover, according to the United Nations (UN), this tendency is expected to increase in the next four decades [1,2]. In order to cope with this growth, the authorities of major cities envision an evolution towards smart cities or smart regions [3,4], with a focus on improving the quality of life of urban inhabitants. Doing so requires significant changes in governance, decision-making and the development of action plans [5].
The smart city concept encompass several quality of life and health indicators; among them, the availability of digital services and the acoustic pollution [6]. The later is justified by large-scale studies in Europe that revealed severe adverse effects on the health and the life expectancy of the inhabitants of acoustically polluted environments [7,8]. In order to address these issues, the European Commission have created the Environmental Noise Directive 2002/49/EC (END) [9], in force since December 2018, and the Common Noise Assessment Methods in Europe (CNOSSOS-EU) [10]. They define, respectively, the obligations of Member States regarding the management of urban noise and common methods that they are expected to follow. In particular, the END requires Member sensor's sampling frequency and data precision, of processing algorithms, and the choice to whether to receive samples of audio files with variable length. This paper is structured as follows: Section 2 briefly describes the context of this development and the state of the art of acoustic urban sensing projects. Section 3 explains the proposed architecture for a Smart WASN and lists the system requirements. Section 4 describes the elements that make up the sensor. Finally, Section 5 discusses the implications of our design and concludes the paper.

Background and Related Work
In this section, we describe representative approaches that automatically measure the noise levels of cities in order to create noise maps [28], reviewing also their platform design and their hardware. In the first section, we review noise monitoring projects that do not take into account the identification of the sound source. These projects are relevant because most of them use low-cost sensor networks, close to the work we present in this paper. The second subsection is focused on the sensor networks that do identify the source of sound, but in most cases require a complex hardware in the sensor nodes, or use the clouds or dedicated nodes for running the signal processing and machine learning algorithms. Our goal, in this work, is to develop a sensor which is, at the same time, (i) low cost, (ii) capable of running machine learning algorithms in-situ for real-time operations within each sensor, and (iii) configurable to allow the rapid-development and evolution of sound monitoring solutions.

Urban Noise Monitoring
Several noise monitoring projects have been developed in Europe since the publication of the END [29]. Their goal is usually to automatically monitor the city noise to generate noise maps that were originally done with the support of technicians, hence lowering their costs. Some aim to collect this information to improve urban planning, while others are concerned with noise monitoring and its effects on people, but nearly none of them focus on identifying the source of sound. Instead, they measure the equivalent level L Aeq collected in each location at a certain time period. In addition, most solutions require a rapid adaptation to the place where the nodes will be deployed.
Here we describe the most representatives approaches found in the literature.

Urban Noise Monitoring with a Noise Meters
These projects are based on short term measurement periods using expensive noise meters, normally operated by experienced and costly personnel. These devices sample audio with high accuracy, but the price is prohibitive to deploy a sensor network with several nodes.
Traffic noise simulations of 100 roads have been performed in Xiamen City (China) using a neural network [30]. This noise monitoring system is divided into data collection, transmission, and processing. Each noise meter collects the data, Zigbee and General Packet Radio Service (GPRS) are used as communication technologies, and finally, the intensive computation algorithm for data processing takes place in a remote data center, where the neural network is trained, tested and used. In this project, the measurements were carried out just in a few selected number of places and, thanks to the neural network, those results have been extended throughout simulations for 100 roads. Our proposal, on the other hand, aims to enable low-cost massive WASNs, where the processing algorithm is performed in the nodes.
Additionally, low density roads have been studied by Dekonick [31] using mobile and fixed sensing platforms that obtained noise data, among others. In this solution, the mobile measurements are carried out by sensors installed in bicycles [32]. Mobile acoustic sensors also have been used for soundscape identification [33], where geo-information is derived from audio data to determine the position of a mobile device in motion. Those projects take into account both the L Aeq values and the location of the measurements. The hardware used in Dekoninck et al. [31] is based on the Svantek 959 noise meter. In Fišer et al. [33], a lower cost custom hardware is used. However, both solutions are more expensive than our proposal, making them prohibitively to deploy across networks with a massive number of nodes. Moreover, the mobile systems are also limited by the battery life since the measurement nodes are mobile.

WASNs for Urban Noise Monitoring
These projects use a low-capacity single board with a low-cost sound card and microphones. These solutions have the lower cost amongst the other noise measurement projects. However, they are limited to data acquisition, simple equivalent level calculations at a certain time span (such as the computation of the L Aeq ), and/or transmission of the results to a remote server due to its low processing capabilities.
Filipponi et al. [34] evaluated what is the most suitable wireless protocol to communicate the nodes in noise monitoring applications in terms of power consumption. In this work, the power consumption are measured in the wireless devices, which are the Tmote Sky [35] (a prototyping platform with a cost of 77e) and the Tmote Invent (a platform from Moteiv Corp). Tmote Sky and Tmote Invent are ultra low power wireless modules for use in sensor networks, monitoring applications, and rapid application prototyping. They are based on a low power 8 MHz Texas Instruments' MSP430 microcontroller, provided with several peripherals such as ADCs, DACs and sensors. These modules are suitable to deploy a WASN to sense and transmit information, but their processing capacity is low; hence they cannot process a sound recognition algorithm in situ nor transmit the RAW audio data of a WASN with a high number of sensors.
The IDEA (Intelligent Distributed Environmental Assessment) project presents a WASN based on cheap consumer electronic nodes and a high performance server. More concretely, Botteldooren et al. [36] depicts a new wireless infrastructure for sound recognition networks in urban applications. Within the same project context, Domínguez et al. [37] presents an active self-testing system to validate the microphones of the nodes, which have a cost of 50e. Finally, Bell and Galatioto [38] describes a WASN of 50 nodes, where the L Aeq is obtained in the node and is transmitted to the remote server to be further processed. The hardware used is a custom mote, which is a low-cost sensor based on a simple microcontroller for low capacity processing.

WASNs with Sound Source Detection
This section summarizes the main hardware platforms in the scientific literature that includes sound source detection. They aim to evaluate the equivalent level L eq , with some sensors also incorporating machine learning algorithms for filtering the results (while others run such algorithms in the cloud). Some of these projects also require a high-cost sensor design to compute the required algorithms. Our proposal focuses not only in the real-time acoustic features evaluation, but also in the sound source identification locally within a low-cost flexible sensor.
Paulo et al. [39] built a low-cost monitoring system to analyze the urban sound environment. This system identifies and detects events, as well as the direction of the source of these sounds. The system is based on the FIWARE platform, which is an open initiative for creating an environment for the wave of digitization brought by recent internet technologies. The main processing unit used in this work is the Intel Edison embedded system [40], which costs around 90e, and requires a four-microphone structure. The FI-Sonic project [41] has processed and analyzed audio signals also using a WASN based on the FIWARE platform. This project aims to deploy low-cost technologies to capture sound with Class I microphones and the direction of the source with accelerometers, interconnecting the WASN with other stakeholders in the Smart City. This system seeks to process the audio in-situ and transmit the results remotely to the cloud, but the cost of the Edison platform is much higher than our proposal, as it will be explained in Section 4.6, where only the microprocessor platform costs about 90e and other expensive components should be added, such as microphone, 3G module, power supply unit, etc.
The SONYC (Sounds Of New York City) [42] project monitors and describes the urban acoustic environment of New York City. It implements a smart, low-cost, static acoustic sensing network based on several consumer hardware (e.g., mini-PC devices and Microelectromechanical systems (MEMS) microphones), working at a sampling frequency of 44.1 kHz using 16 bit audio data. More concretely, the core of its platform [43] is the single board Tronsmart MK908ii mini PC, which was priced at 50 USD on August 2015. With a 1.6 GHz quad core processor, 2 GB of RAM, 8 GB flash storage, USB I/O, and Wi-Fi connectivity, this platform could allow complex processing of digital signal on the devices, alleviating the need to transmit large amounts of audio data for centralized processing. These acoustic sensor nodes can be deployed in varied urban locations for a long period of time in order to collect longitudinal urban acoustic data for changing policies and developing action plans; also in the same project, Bello et al. [23] developed a representative dataset with data gathered from the 56 sensors deployed in different neighbourhoods of New York, identifying the ten different common urban sound sources that were highly frequent in areas with urban noise complaints. Nevertheless, in the SONYC project, sensors do not identify the source of the sounds in real-time and in-situ; instead this work is conducted off-line in the central servers of the project. The algorithms are trained using the Urbansound dataset, which was created by artificially mixing the events coming from Freesound with the background noise collected in the project [44]. The use of consumer platforms reduces the time and cost of development, but constraints the connectivity only to USB subsystems such as sound acquisition (microphones) or GPRS and WIFI communications, which increases the total cost of the platform, possibly making it prohibitive when the number of sensors in the network is very high.
Achieving a good trade-off between cost and accuracy is also the core idea of the WASN designed in the DYNAMAP project [21,45], which identifies the sound sources in the sensor by means of the ANED algorithm [26]. However, it only distinguishes between road traffic noise and any other Anomalous Noise Events (ANEs). This project deployed pilot WASNs in the Italian cities of Rome [25] and Milan [24] to evaluate the noise impact of road infrastructures in suburban and urban areas, respectively. The WASNs deployed in DYNAMAP are hybrid, combining two types of sensors [45]: (i) High-capacity ARM-based sensors that use signal processing techniques to analyze and process the acoustic signals within the nodes, and (ii) low-capacity µController (µC)-based sensors, which despite less computational capabilities maximize the coverage of the network, as their solar panels allow for greater flexibility in terms of sensor positioning. Our approach differs as it proposes low-cost sensor nodes with enough computational capacity to run improved machine learning algorithms for a more accurate detection of the sound source in each situation under test. While the low-capacity µC is the lower cost solution, it is not able to process the sound recognition algorithms used in our solution. On the other hand, while the high capacity sensors in DYNAMAP are able to process the sound recognition algorithm with a similar cost, the time to market and the cost of the custom platform development is much higher than in our proposal, which takes advantage of custom subsystems to build the sensor, which in terms of hardware, simply connect a Raspberry Pi with peripherals. The development of a custom hardware requires a huge effort in terms of human resources and time to design the schematics, the printed circuit boards (PCB), testing and validating the system, which increases the time to market and the cost.

System Description
In this section, we define the architecture for a smart WASN; that is, a WASN capable of identifying the source of noise events and, consequently, providing an more accurate measurement of the different noise types. In order to make the WASN scalable, we designed it as a distributed intelligent system in which smart acoustic sensors (i) capture the audio, (ii) process the audio frames to obtain the label of the noise event using a machine learning algorithms, and (iii) send this information to a remote server. The transmitted data are the acoustic pressure level (L eq ), the label of the acoustic event, and its timestamp. Finally, the output of this processing is plotted with Sentilo, an IoT platform that we use for representing the audio information on the different locations.

Description of the WASN Architecture
The aforementioned system is made up of two elements: (i) The wireless acoustic sensors and (ii) the remote server, shown in Figure 1. The sensors run the SmartSound technology, which is a set of data intensive algorithms for processing audio frames and for classifying noise events (see Section 3.2); while the remote server stores the L eq and output of these processing algorithms, as well as allows the remote configuration of the sensors through the SmartSense platform, which is a research tool for the rapid development of sensor-based solutions (see Section 3.3).
Distributing the intelligence for noise source identification among sensors, allows us to deploy a WASN of any number of nodes. Moreover, it lowers the requirements on the wireless network and enhances privacy because this system only sends the label and the L eq of the recorded audio frame every second, instead of sending the acoustic RAW data at 48 ksps. This architecture, therefore, requires the smart acoustic sensors to have sufficient computing capability to process all this information. In order to evaluate the trade-off between performance and cost to design such a sensor, we carried out a comparison of the computing platforms in embedded systems. This comparison is detailed in Section 4.  Figure 1. Elements of the Wireless Acoustic Sensor Network (WASN) architecture to capture the audio, process it and send it to a remote server for monitoring the soundscape of an urban area.

The SmartSound Technology
The SmartSound technology is a sound recognition system that listens to an audio stream and uses signal processing and machine learning algorithms to identify, for each acoustic event, its type and absolute measurements. Figure 2 represents how SmartSound can be embedded in sensors at strategic points within the cities and send information to servers for various purposes.
The technology is based on supervised learning, which consists of two main processes, as shown in Figure 3: (i) Signal feature extraction, and (ii) noise event identification. The signal feature extraction process obtains a features set representing the acoustic characteristics of the noise signal. Subsequently, a feature vector is computed upon each frame, thus obtaining a compact representation of the signal. Then a supervised learning system is trained with multiple samples of noise events recorded in the real environment. As a result of this training process, the system is capable of distinguishing between different types of Anonymous Noise Events (ANE), thus being able to label new incoming noise samples as belonging to different noise sources. Examples of ANEs are the sound of people, car, motorcycle, bell, etc.

The SmartSense Platform
The SmartSense platform is a research and development tool used by the Research Group on Media Technology (GTM) of La Salle, University Ramon Llull. The platform allows the rapid development of proofs-of-concept of the SmartSound technology. It enables the GTM researchers to develop the core of distinct applications by combining pre-programmed signal processing and machine learning algorithms and to deploy them in remote sensors. While these applications are intended to carry out the processing entirely within the sensor (for privacy and bandwidth reasons), during their development, researchers may choose to receive samples of audio data of varied sizes when certain ANEs are detected to double check the classification of the noise events. Finally, the researchers can use the platform to remotely configure several options on the sensor, as explained in Section 3.4.

Sensor Requirements
To handle all features described above, the smart acoustic sensor must fulfill the following requirements: • Price: For some time, accurate acoustic sensors based on noise meter commonly found in WASNs cost thousands of euros, see class/type 1 and 2 microphones on Section 4.2. Such high costs limit the number of nodes and, consequently, the coverage of of the network that city administrators can deploy. We envision smart WASNs deployed permanently across the entire city, as well as temporarily in points of interest by both by the public administration and private businesses. Therefore, low-cost is an essential requirement of our sensor.

•
Processing capacity: The sensor should be able to run the SmartSound technology to identify the source of noise events by processing an audio stream in real-time.

•
Storage capacity: The storage requirement is negligible due to the fact that, for privacy, the audio is deleted as soon as it is processed and the label is obtained. This requires very little storage.

•
Microphone: The sensor should be able to analyse raw audio signal and identify the sources of noise that are audible to human ear. Therefore, it should ideally support an operating frequency in the range of 20 Hz to 20 kHz.

•
Power Supply: We assume that the services to be developed by our research group based on WASNs are not critical and that sensors will be located in strategic places with Alternate Current (AC) power supply from the city, such as light posts and buildings facades. Therefore, for this first version of the sensor does not require battery for a power backup system. • Wireless communications: The sensor should be able to send the results of the processing to a server and to an IoT platform for visualization; as well as allow on-the-fly configuration of parameters and the replacement of processing algorithms. For such, both WIFI and 3G should be supported.

•
Outdoors exposure: The device has to operate outdoors for long periods of time during which it will be exposed to winds and rains. Therefore, it requires effective protection against adverse atmospheric conditions. • Re-configurable: The researchers can configure the sampling frequency (up to 48kHz) and the data precision of the sensor (16 or 32 bits per sample), replace processing algorithms on-the-fly and request audio samples of varied sizes when ANEs of interest are detected. These configurations will be adjusted to develop and test the proof-of-concept applications.

Smart Acoustic Sensor Design
This section describes the elements that make up the smart acoustic sensor. We report the hardware components used, and finally, we also briefly describe the main characteristics of the developed software.

Computing Platform
Nowadays, an embedded system's core can be based on microcontrollers, microprocessors, Field Programmable Gate Arrays (FPGAs), Digital Signal Processors (DSPs) and Application-Specific Integrated Circuits (ASICs) [46] or even High Performance Computing (HPC) devices, such as GPUs.
Embedded systems are mostly based on ARM architectures, which implements three different versions: (i) Cortex-A-application processor cores for a performance-intensive systems, (ii) Cortex-R-high-performance cores for real-time applications and (iii) Cortex-M-microcontroller cores for a wide range of embedded applications. Table 1 shows some computing platforms used in the literature, some of which are presented in Section 2. We can observe that several different architectures have been used, such as a proprietary 8 or 16 bit Reduced Instruction Set Computer (RISC), an ARM cortex-M, an ARM cortex-A and a 32-bit ×86, depending on the final application or systems requirements. An Operating System (OS) such as linux would allow a much easier approach to implement the aforementioned requirements. For this reason, we have chosen a low-cost embedded platform that allows us to run a linux distribution OS with all the drivers required to control the hardware mentioned above.
A Raspberry Pi 3, supplied with a 1.2 GHz 64-bit quad-core ARM Cortex-A53 CPU, has been chosen because it is a commonly used platform for low cost applications, as it has a cost of approximately 30e. It has been increasingly used in the last years by researchers, universities and amateur engineers; all this community behind it are creating and documenting device drivers, tutorials, examples of applications, etc. Moreover, the Raspberry Pi 3 matches our requirements because this platform is distributed with a Bluetooth Low Energy (BLE) and a WiFi modules for wireless communications.

Microphone
The acquisition circuit and the microphone compose the acoustic acquisition data subsystem. Usually, the selection criteria for microphones are frequency range, with flat frequency response and high sensitivity. In fact, IEC 61672 defines two classes of sound level meters according to the tightness of the error tolerances, where class 1 microphones have a wider frequency range with less error tolerance than class 2 (see Table 2). This applies to the measurement instruments and also to the calibrator. However, we also have to take into consideration the trade-off between performance and cost, since keeping the cost low is one of our main requirements. Table 3 shows some acoustic acquisition systems found in the literature. This table contains low-cost microphone capsules, high-performance microphone capsules, embedded microphones with pre-amplifiers and embedded microphones with pre-amplifiers and an outdoor kit, respectively. We have developed the acoustic acquisition system based on the Adafruit I2S MEMS Microphone Breakout-SPH0645LM4H measuring microphone, this device exhibits a good trade-off between performance and price, presenting a flat frequency response as we can see in Figure 4. This microphone has a cost of 6.95e and a bandwidth between 50 Hz and 16 kHz. Despite the bandwidth being slightly lower than our initial ambitions (between 20 Hz and 20 kHz), this sensor is very affordable and up to 16 kHz includes most of the regular audible sounds that we are interested in. During the study, we reduced the requirements of the sensor because, on the one hand, it made the sensor cheaper, and on the other hand, the fact that the sound pressure level is usually measured in octaves, it works in the range between 31.5 Hz and 16,000 Hz, and the tails of this range are highly attenuated by the A-weight filter. It should be noted that the sensor does not meet the requirements for class 1 and class 2 tolerances for frequencies below 50 Hz, because frequency response has a gain lower than −5 dB instead of −3 dB. Nevertheless, the gain provided by the manufacturer (Figure 4) also depicts that the requirements for class 1 and class 2 are fulfilled in the frequency range between 60 Hz and 16,000 Hz (see Table 2 and Figure 4). On the other hand, the microphone selected for the design of the sensor (Adafruit I2S MEMS Microphone Breakout) exhibits a lower sensibility (−42 dB) than class 1 or class 2 microphone which are in range between −26 dB and −28 dB, as we can observe in Table 3. This means that audios with low sound pressure level may not be noticed. However, the aim of this platform is to measure relevant sound pressure level sounds and to detect the source of those undesirable noises.
Finally, the signal will be captured, processed and transmitted remotely by the Raspberry Pi 3 platform. Currently, we are working on the adaptation of the microphone to outdoors environments, because the commercial solutions based on sound level meter have a prohibitive price for our application. To demonstrate it, complete outdoor microphones are depicted in Table 3 and outdoor kits (microphones are not included) in Table 4.

Power Supply
The device will be powered by connecting to the AC mains of the city, a common practice when deploying urban devices [52]. In order to power all the electronic components inside it, an Alternate Current to Direct Current (AC-DC) converter is used: The TT Electronics-IoT Solutions SGS-15-5. This Power Supply Unit (PSU) offers one output with fixed voltage: It has a 3 A at 5 V output with a total power of 15 W. This ratings fits our application, since the two components that require more power are the Microprocessor Unit (MPU) with 2.5 A at 5 V and the 3G module, which can be powered at 5 V and has a maximum power consumption during transmission of 2 W. These power requirements are shown in Table 5. Moreover, this PSU has a small footprint size (2.30 × 3.34 inches), which makes it suitable to be integrated in the box and leaves enough room for the other components.

Wireless Communications
An important feature of this device is the wireless communication to connect to the IoT platform, to send the results of the processing and data samples to the SmartSense server, and to allow the remote configuration of the sensor (which also reduces human effort and cost).
The device supports both WiFi (module included in the Raspberry Pi 3) for urban locations where Barcelona WiFi is available [53] and 3G (external module from Adafruit) for other cases.
As the Raspberry Pi 3 already contains a WiFi module, only the 3G functionality has been added using an Adafruit board. The FONA 3G Cellular Breakout was used, as this board offers

Boxing
As this sensor will be placed outdoors, it should be protected by a box with an International Protection (IP) rating [54] of 65. This IP rating indicates that the box protects the system against hazardous events and the ingress of solid foreign objects. A rating of 6 means "No ingress of dust; complete protection against contact". Whilst the second digit represents the protection of the equipment inside the enclosure against harmful ingress of water, with a rating of 6 meaning "Water projected in powerful jets (12.5 mm nozzle) against the enclosure from any direction shall have no harmful effects". An enclosure such as BUD industries PNR-2601-C can be used, which has a price of 10.31e, however a further analysis have to be conducted to select the most suitable option depending on the characteristics of the project such as thermal and humidity isolation, size, price, etc.

Sensor Proposal
In summary, the MPU proposed is the Raspberry Pi 3, which includes BLE, Wifi and also an OS based on linux. Its power supply, a SGS-15-5, is able to provide up to 15 W at low cost. The microphone, SPH0645LM4H, has a bandwidth between 50 Hz and 15 kHz at a very low cost. Finally, the 3G module can be removed to save money when a Wi-Fi network is available.
All these components are summarized in Table 6, along with their prices. The total cost of the sensor is therefore around 139e. The simplified block diagram in Figure 5 depicts the main hardware components and how they relate to each other. Figure 6 shows a photo of the sensor.

Software Implementation
The software of this smart wireless acoustic sensor has been implemented in Python due to the huge documentation and libraries available online. The algorithm is basically comprised by the three main tasks shown in Figure 7: (i) Acquisition of audio frames, (ii) classification of the source of sound and, finally, (iii) the transfer of the results to Sentilo and the SmartSense platform. This algorithm has a resource consumption of CPU between 55% and 75% and a RAM memory of 8.5%.
The Python_Stream.py captures the RAW audio provided by the microphone through an I2S protocol. The data acquisition can be configured remotely modifying the file Config.dat thanks to the Socket.py library. The configuration parameters are the (i) frequency sampling, (ii) the number of bits of each sample, (iii) the data frame size of the audio and (iv) the wireless communication. These parameters, as well as the machine learning algorithms themselves, can be configured remotely from the SmartSense platform. The Classifier.py uses the model previously trained for labeling the audio; the inputs of this task are the raw audio data and the configuration parameters. This task also computes the L eq and obtains the timestamp. Finally, the algorithm grasps the label, the L eq and the timestamp to prepare a data frame and send it wirelessly to the remote server throughout WiFi or 3G.  Figure 7. Data path and software dependencies of the proposed system.

Preliminary Results of Data Acquisition
Currently, we have performed preliminary measurements in the street in Netherlands at 32 and 48 ksps to evaluate the data acquisition subsystem. Figure 8a depicts the spectrogram and the A-weighted equivalent noise levels of 5 min of sound acquired at 48 ksps in the Netherlands. Independent sources of sounds have been identified and highlighted to show how they exhibit differences both in time and frequency. These sounds spectrograms with another sensor can be found in [55,56], where a class II microphone was used to conduct the acquisition of the sounds. The analysis of characteristic sounds such as a siren and a motorbike depicts that they have the same spectrum and time distribution, despite them having been collected in different cities. Therefore, we have validated the ability to capture these sounds with the SPH0645LM4H low-cost microphone.
Finally, this simple experiment shows the system fulfills most of our original requirements:

•
Processing capacity: The sensor was capable of sampling, storage and transmit data in this test. However, more intensive tests have to be conducted to analyse the performance of the whole recognition system. More concretely, of the sound recognition algorithms for a long period of time in order to evaluate the results in real-operation.

•
Microphone: The audio has been sampled at 48 ksps with 18 bits. Moreover, the sensor provides a flat frequency response in the commonly used bands with A-weighted filtering, between 31.5 Hz and 16,000 Hz, which includes most of the regular audible sounds that interests us. Finally, the sensor has a high SNR of 65 dB(A) [51] and a quantization noise of 108 dB.

•
Wireless communications: The data has been sent to the remote server throughout the WI-FI connectivity.

Conclusions
In this work, we have reviewed several hardware architectures and devices used in WASNs applications. These platforms have been analyzed to study the suitability for our application considering their strengths and weaknesses.
The main goal of this work is describing a low-cost configurable acoustic sensor that can be deployed rapidly and easily in any city to create smart WASNs. The sensor includes a quad-core ARM Cortex-A53 CPU, Wi-Fi and 3G connectivity, a box with an IP of 65 and an acoustic acquisition system. The later supports a lower frequency range (50 Hz to 15 kHz) than we initially aimed for, but allows us to measure the L eq measurement and to run SmartSound algorithms [26,27] to most urban sounds we are interested in. The total cost of the sensor is only 139e, and it is extremely easy to assemble and deploy. As such, it can be used to create truly smart WASNs, capable of identifying and more precisely measuring the different sources of noise in a city. With that, we hope to enable the development of a number of smart city services that exploit the rich information that can be extracted from sound.
Additionally, the sensor has been specially designed to be used with the SmartSense platform. As such, it allows a number of configurations to facilitate the development and testing of proof-of-concept applications of the SmartSound technology, enabling fast data collection and test on-site for any environment. On-the-fly configurations include the sample frequency, the data precision, the processing algorithms, and the choice of sending audio samples with the classification of anomalous noise events.
This sensor does not provide a large storage capacity to save raw data, mainly because storage capacity was not a requirement, and we avoided adding a large one to keep the price as low as possible. Moreover, the system has not been provided with an external rechargeable battery, so it needs be connected to the power grid uninterruptedly, which makes it suitable for urban and suburban scenarios, and not so much for remote sensing. Finally, the system has been conceived to transfer small quantities of data, e.g., the L eq values and the labels of the acoustic events instead of raw data. As a result, it does not overload the WASN when it is deployed withing many sensors and it can better protect the citizen privacy in urban environments.
We plan to test the sensor in the near future in two different environments and with two different purposes: • Taking our university campus as a SmartCampus living lab, we first plan to install a small WASN just outside the student residence, which is next to the university restaurant, a basketball court and a football field. The aim is to test the sensors and its integration with the SmartSense platform in a nearby, yet real environment for classifying the sources of noise around the student residence. • At a second stage, we plan to install the sensor in the center of Barcelona, in a neighborhood that has both high-traffic and restaurants/bars. The goal is to analyse the noise during different times of the day and over a longer period of time to discriminate how much of it is caused by the traffic and leisure activities. According to our contacts in Barcelona city council, complains from neighbours in such areas are common, which noise originated from people/music on bars, people accessing leisure zones/venues and traffic. However, it can be difficult for them to create effective plans if they do not understand the distribution of these noise sources.
We expect that these tests will allow us to perform an initial evaluation of the suitability of the sensor for our purposes (i.e., the creation of smart WASNs and the the rapid development of sound recognition applications). Currently, we have evaluated the spectrogram of a 5 min record collected at 48 ksps in the Netherlands as is depicted in Figures 8 and 9, where the spectrograms differ both in time and frequency for different noise sources.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: