Source Traffic Modelling in WSN for Acoustic Sensing in Reverberant Environment

1 Abstract —Recent research efforts show that Wireless Sensor Networks (WSN) for acoustic sensing have the potential for ubiquitous sensing. The key challenge in delivering such WSN architecture is the trade-off between the cost of wireless sensors and acoustic analysis complexity. In this paper, we are proposing a source traffic modelling considering a system in which sensors are connected to a cloud based processing service. Acoustic sensing in practice generates great amount of traffic, and therefore a new model of sensing in a reverberant environment is necessary for better understanding of traffic nature and its volume. To lower the sensor cost, we have considered low-complexity acoustic event detection on the sensor side for preventing unnecessary traffic. The proposed acoustic sensor acts like a noise gate. Sound propagation in reverberant environment is modelled using the modified Image-Source Modelling (ISM) method. The source traffic model is validated in Matlab simulation and in real reverberant environment. Results show an example of bandwidth used by a simple acoustic sensor.


I. INTRODUCTION
The main paradigm of the internet of things is providing services using low-resource devices connected over the conventional internet network.In this paper, we are considering an approach in which sensors are connected to a cloud-based processing service.Sensors connected to powerful processing units can provide many techniques of sensor data fusion.Traditionally microphone grids are used as a powerful tool for extracting acoustic data, source localization or de-noising.Recent research shows a good potential of using module-clouds [1] and streaming of audio [2], [4].The system that we have in consideration is deployed using the existing internet infrastructure.Obviously, a bottleneck is the link to the cloud-based processing.This leads to the problem of having conventional network implementation which uses routers based on FIFO buffers that cannot assure quality of service.The usual "best effort" service cannot guarantee bandwidth, delay, or prevent a packet loss.
A protocol such as Real-time Transport Protocol (RTP) includes an ordinal number and timestamp for each packet.WSN needs to be able to synchronize clocks for every sensor so that timestamps are reasonably correlated.IEEE 1588 packet-based timing mechanism can be used on an embedded device to synchronize clocks with the resolution of under a microsecond [5].This can provide good temporal fidelity on the receiver side, which is essential for the analysis typical of microphone array systems.Still, what can be an impediment for the real-time system performance is the delay introduced by digital signal encoder on the acoustic sensor node and the time for the data transferring through conventional internet network.
The existing research of source traffic modelling is not applicable to WSN for acoustic sensing.Data traffic model in WSN for intrusion detection is considered in [6].Reference [7] shows the traffic of the medical sensors data that is modelled on the basis of an empirical data set.Both models are important contributions in this field but they cannot be used for the WSN that we have in consideration.The approach described in [8] is to some extent applicable to WSN for acoustic sensing since the signal nature in the system for target tracking is the same as the acoustic signal.However, the nature of the sound propagation in reverberant environment needs to be taken into account.In order to analyse properties of WSN source traffic, before the described system is deployed, a new model has to be derived.

II. SOUND PROPAGATION MODEL
For modelling sound propagation in reverberant environment Allen and Berkley proposed a very accurate method called Image-Source Modelling (ISM) [9].The ISM operates most simply with rectangular rooms, but it is not restricted only to such enclosures.In theory, any shape of an enclosed space can be modelled with ISM.A recent paper demonstrates object tracking in non-convex rooms using ISM in reverse direction [10].In this paper we have only modelled rectangular enclosures, considering the fact that majority of living space rooms can be approximated as rectangular.The basic idea is to calculate impulse response function between any two points in a small reverberant room taking into account room dimensions, shape, wall reflection properties, etc.When the room impulse response (RIR) is generated it can be convoluted to any desired input signal simulating reverberation effects.The original ISM is computationally intensive.In a more recent paper, Lehmann and Johansson [11] proposed an improved method in which energy decay estimation is used.
Here we are giving a minimal subset of ISM equations, necessary to express and understand simulation parameters used in our experimental work.Cartesian coordinate system is used as follows: where psource is the position of a sound source, psensor is the position of a point of interest and droom is the dimension of the room.The simulator calculates RIR in time domain h(t) for every point of interest using , where δ(•) denotes the Dirac impulse function, and the usum(u, v, w) and lsum(l, m, n) represent a triple sum over each of the triplet's internal indices respectively.Triplet's (u, v, w) and (l, m, n) are parameters for indexing the image source parameters in all dimensions.The attenuation factor A(usum, lsum) and the time delay τ(usum, lsum) in ( 2) are defined as follows: , As shown in (3) we get transformation function for simulating reverberation effect due to reflection from enclosure and delay.Reflection is modelled by reflection coefficient β (3).Delay ( 4) is due to sound propagation at the velocity of sound c.Coefficients β are calculated using absorption coefficients α The distance d(usum, lsum) from the image source to the sensor node is calculated using Euclidean norm of multiplication between diagonal matrix diag(•) with parameters (u, v, w) and (l, m, n), and psource and droom, respectively In the simulation we use RIR H(ω) computed for frequency domain as where ω is the frequency variable.Signal calculation ysensor,i(t) for the sensor i starts with discrete Fourier transform F(ysource(t)).Then, convolution in frequency domain is performed with the transformation function Hi(ω).
The signal is then transformed back to time domain with inverse Fourier transform For the sake of brevity, a detailed derivation of absorption coefficient α is omitted.The derivation example is given in Lehmann and Johansson's paper [11].

III. SENSOR NODE MODEL
When WSN described in this paper is deployed, we are assuming that sensor nodes use low-cost microphones, for practical and economic reasons.Therefore, it is expected that sound fidelity cannot be high.This fact is introducing a limited dynamic range of sensing capabilities.Each sensor node has to perform a self-adjusting to its physical characteristics.Sensing is performed by continuously recording sound locally in the node.Sensor node should send data only in the presence of acoustic events.Also, it is an imperative that WSN is energy efficient.Therefore, a simple acoustic sensor with a noise gate (SASwNG) is proposed.We use digital signal processing (DSP) algorithm similar to [12].In our case, a less computationally intensive foreground to background separation is performed so it is less accurate.The main idea of our modelling and simulation is to evaluate the performance of a simple noise gate in reverberant environment.
The recorded signal y(t) is represented in the time domain as where When powered up, SASwNG has to be calibrated.Sensor calibration is performed on the first recorded samples M (t = 0; k = 0).As a result of calibration, we have the power of noise estimated as ( ( )) ( ( ),0, ) ( ( ),0, ).
Calibration is performed on the assumption that the signal s(t) is zero for M samples.This measure is considered as power of the noise floor containing both the imperfection of a sensor and the environmental noise.After calibration, signal-to-noise ratio (SNR) estimation of the signal is calculated as 10 ( ( ), , ) ( ( ), , ) 10log .( ( ), 0, ) The SNR estimation is compared to SNRthreshold parameter for noise gate algorithm.The algorithm is operating on N samples blocks replacing blocks without significant acoustical event with a block of silence.
The equation ( 8) is producing a signal modified by sound propagation.When that signal is used as input to (13) we get noise gated signal.This is in essence our proposed source traffic model.
We assume that the background noise level varies over time, more slowly than the sounds of interest, and that no sharp frequency peaks are present in the background noise.SASwNG is not sending block of silence.Each block is sent in one network datagram that consists of the header and the payload.The payload consists of N samples of audio data.In the case of UDP/IPv4, the header size is 40 octets.To keep SASwNG as simple as possible we assume that no compression is performed on audio data as suggested in [13].RTP packet contains raw 16bit PCM data.Figure 1 illustrates an example of datagram generation.Here, we try to provide a clear overview of relations to the existing papers from references.It is shown that compression of data is usually performed before sending.To provide the exact data rates calculation we are not considering any data compression.

Aspect
Our work Related work

Processing data chunks
Batch processing.Loading all data in memory.[14] Foreground vs. background separation
No compression.
Compressed data [1], [2], [4] Metadata packets [3] No compression indicated [13] IV. EXPERIMENTAL RESULTS The acoustic simulation of our room is based on Lehmann's work available at [14].Our main contribution to the ISM simulation consists of reworking implementation of the main loops in the simulation program to enable processing of a large audio data set.The original implementation loads the whole input audio file.In addition the input audio file can have multiple source signals and calculated ISM in a batch process from loading input to the production of all output signals.This caused "out of memory" problem so we had to reorganize processing to operate on the data chunks.
After that, ISM validation is done.Since SASwNG uses signal power estimation, energy decay is measured in a simulated room.The signal source is placed at the stationary point.Sensors are placed to cover the whole area of the room.Validation is done using various signals at the source by calculating the root mean square (RMS) value.RMS value is expressed in decibels  , ) ( ( log 20 10 t y RMS RMS dB  (14) where 0 dB represents full scale amplitude.Figure 2(a) shows the case without reverberation when only direct sound path propagation is calculated.The generated white noise is used as an input signal.We can see that signal energy decays with d 2 .In Fig. 2(b) the same input signal is used but now SASwNG is placed in the simulated reverberant environment.We can see that the signal energy decays more slowly due to reverberation but with a uniform distribution.The next case in Fig. 2(c) shows superposition of waves from direct path with reflected waves.The choice of sine wave at 128 Hz provides enough resolution to see wave pattern in energy distribution.When 1 kHz sine wave is used complex energy decay pattern is produced as Fig. 2(d) demonstrates.Both patterns on Fig. 2(c) and Fig. 2(d) are caused by additive and subtractive wave interference depending on sensor position.Simulation parameters are listed in Table II.T60 is a simulation parameter that specifies the desired decay time.It is the time needed for the signal to drop by 60dB.Next, SASwNG performance verification is done using sequences of recorded voice.Authors of [15] propose high sampling frequency such as 44.1 kHz as a requirement for source localization techniques.Our data set consists of voice recordings at 16 kHz, but simulation is not limited to that frequency.Data rates are measured for SASwNG processing of the original record and the signal produced by ISM.For this experiment a sensor is placed on a different side of the room.Simulation parameters are given in Table III.For different block sizes N(samples), an average data rate is calculated.Theoretical maximum data rate is produced if we take a naïve approach when audio is streamed continuously, and all packets are sent.In general, traffic data rates R(Fs,N,λ) produced by SASwNG are given as ( ) (Fs,N, ) ( ), where Fs is a sample rate, H is header size in bits, Q is number of quantization bits and λ is the ratio between the signal of interest duration (foreground signal) and total recorded time.Factor λ is derived by hand-annotation of sequences duration in which a voice is present for each recorded sequence.Maximum data rate multiplied by ratio λ gives a minimum theoretical data rate necessary.
To validate our findings from the simulation we have conducted experiment in the lab.An audio recording was processed by SASwNG algorithm and data rates were calculated like in the previous experiment.Measurement of absorption coefficients α is a task that needs to be done with great care if we want to have a simulation that resembles the real reverberant environment.This is not done in this paper and we will consider it in the future.The decay time that match real reverberation environment is used and uniformly distributed sound absorption is assumed.Current lab results confirm our presumption once again and show how reverberation affects data rates.The results of the experiments are depicted in Fig. 3.
It is noticeable that data rate increases for higher value of N.This is due to misalignment of voice segments and processing block boundaries.In principle, this could be overcome by moving the processing window sample by sample; however, this would add significant amount of processing which would neglect the simplicity of SASwNG design.In conducted experiments in reverberant environment, SASwNG reduced traffic up to 43.78 % when compared to theoretical maximum.Experiments show that the effects of reverberation are lowering bandwidth savings.Also, we have to state that high performance microphone arrays in indoor environment [16] will always perform better than WSN.What we have considered here is a low cost sensor that could be easily installed in a building of interest using the existing network.This approach is an approach advocated by IoT paradigm.SASwNG proposed in this paper focuses on the design aspect of source traffic modelling if sensor processing power is very low.The simple NG described here has a small memory footprint and leaves enough processing power for sensor synchronization like the one used in [17] or some other additional work in WSN.

V. CONCLUSIONS
The source traffic model for acoustic sensing in reverberant environment is presented.Traffic modelling is an important part of a network design.A good traffic model enables analysis and simulation of WSN before the actual employment is done.This can lower the cost and improve WSN efficiency.The presented source traffic model can be used for exploring traffic volume in various reverberant environments, such as small rooms and the like.
The algorithm is compared with the naïve case when the sensor generates traffic continuously.Experimental results confirm that simple separation of foreground versus background signal can save significant bandwidth.Validation of data is done with an audio recording in a reverberant room.When SASwNG is placed farther from the signal source, it is clear that it produces more data due to distortion caused by reverberation.It is true that the described WSN is indeed more suitable for algorithms such as energy-based acoustic source localization.Algorithms that rely on the difference in arrival or phase information demand predictable jitter which WSN usually cannot provide.

Fig. 1 .
Fig. 1.SASwNG source traffic generation for voice recorded sequence.Gate state shown as (0 -gate closed; 1 -gate open).Parameters: M = 5000 samples; N = 1000 samples.Different aspects of the model are shown in Table I.It is the comparison of our paper to related papers in references.
Here we are introducing new parameters for desired sensor hysteresis.Attack time Tattack is the time for noise gate to start releasing packets.Similarly, release time Trelaease is the time for noise gate to stop releasing packets.In practice, Tattack is set to the value that is by magnitude smaller than Trelaease.When Tattack and Trelaease are set to zero, the simplified algorithm looks like this

TABLE I .
COMPARISON WITH RELATED WORK.

TABLE III .
DATA RATE SIMULATION PARAMETERS.