Hybrid Sensor Network-Based Indoor Surveillance System for Intrusion Detection

This paper presents a novel hybrid sensor-based intrusion detection system for low-power surveillance in an empty, sealed indoor space with or without illumination. The proposed system includes three functional steps: (i) initial detection of an intrusion event using a sound field sensor; (ii) automatic lighting control based on the detected event, and (iii) detection and tracking the intruder using an image sensor. The proposed hybrid sensor-based surveillance system uses a sound field sensor to detect an abnormal event in a very low-light or completely dark environment for 24 h a day to reduce the power consumption. After detecting the intrusion by the sound sensor, a collaborative image sensor takes over an accurate detection and tracking tasks. The proposed hybrid system can be applied to various surveillance environments such as an office room after work, empty automobile, safety room in a bank, and armory room. This paper deals with fusion of computer-aided pattern recognition and physics-based sound field analysis that reflects the symmetric aspect of computer vision and physical analysis


Introduction
An automatic indoor surveillance system should be able to detect dangerous events such as illegal intrusion for 24 h a day without human monitoring. However, a single image sensor-based system often fails to detect the event because of various unstable illumination conditions. To solve this problem, multiple hybrid sensors can collaborate to increase the detection accuracy and stability [1].
Performance of a general vision-based surveillance system is limited under a low-illumination condition. Although many low-light image enhancement methods were recently proposed [2,3], the surveillance function does not work if there is no light. To illuminate an empty indoor space to acquire high-quality images, high power consumption is unavoidable. Typical non-visual sensors such as passive infrared (PIR) and thermal sensors can be used to detect under a low-light condition. However, for example, the movement behind an obstacle such as a wall cannot be detected. A combined procedure of generating a sound wave generation and monitoring change in the sound wave configuration, referred to as 'sound field technology', is advantageous to solve the problem because most kinds of intrusion can be detected even in blind spots [4]. To cope with low-illumination and a blind spot problem, a sound field sensor can efficiently detect an abnormal intrusion using a pair speaker and microphone even in a completely dark environment. On the other hand, the sound field sensor-based surveillance system can only detect intrusion, but it cannot provide any additional information such as shape, behavior, color, or position of the intruding object. Furthermore, the sound field sensor is sensitive to vibration due to outdoor wind, air-conditioner, and acoustic noise. However, compared with in-ground intrusion detection such as optical fiber sensors, magnet, and radio frequency (RF) sensors, the sound field sensor has different features. In the real surveillance site, there are many false alarms due to the environmental noise, which can be filtered out using the relatively strong sound source with the multiple set of high-frequency sine wave. To take advantages of both image and sound field sensors, we present a combined hybrid sensor-based surveillance system. The proposed surveillance system can first detect an intruding event in a dark, sealed space, such as an office after work, a safe room of a bank, and an armory room, as shown in Figure 1a. After the sound field sensor detects the intrusion, the system turns on the light for the vision sensor to start tracking the intruding object as shown in Figure 1b.
The main contribution of this paper consists of two parts: (i) combination of an image sensor and sound field sensor to detect intrusion in a low illumination environment and (ii) power-efficient surveillance with illumination before an intrusion occurs. The paper is organized as follows: Section 2 briefly summarized existing intrusion detection techniques, Section 3 describes several different visual surveillance systems, and Section 4 presents a novel sound field sensor-based intrusion detection technique. The combined hybrid sensor-based surveillance system is presented in Section 5. After demonstrating the performance of the proposed system using experimental results in Section 6, Section 7 concludes the paper.

Related Works
Various image processing and computer vision algorithms were proposed to detect an illegal intrusion using image sensors such as a closed-circuit television (CCTV) or internet protocol (IP) cameras [5]. A general visual surveillance system generates an alarm signal if an object entering a pre-specified region using background generation, frame difference, or motion information. Zhan et al. detected an intruding object using frame difference and edge detection [6]. Yuan et al. detected the intrusion event by first detecting the object using background difference, and generates Garbor features for the support vector machine (SVM) classification [7]. Dastida et al. first generated a background image, and detected an intruding object using correlation between the object of interest and background [8]. Zhang et al. used the Fourier descriptor (FD) and histogram of oriented gradient (HoG) to develop a perimeter intrusion detection (PID) system [9]. Chen et al. pre-assigned a dangerous region-of-interest, and started detecting an object by generating a Gaussian mixture model (GMM)-based background image, and then generated an alarm if the object fell in the pre-specified region [10]. Although the background generation-based frame difference method is sensitive to the accuracy of the generated background, it can efficiently detect the foreground object in a fixed camera-based surveillance system. Hariyono et al. detected a moving object using optical flow with Kanade-Lucas-Tomasi (KLT) method [11], and Hossen et al detected abnormal object using motion vector based on Horn-Schunck optical flow [12]. Motion optical flow estimation-based object detection method can detect an object in moving camera. However, the estimated motion is very sensitive to the noise in the image. Chauhan et al. proposed a moving object detection method that combines a GMM-based background difference and optical flow to compensate for the disadvantages of typical motion estimation based methods and background difference-based methods [13]. Additional research on the visual surveillance techniques can be found in [14][15][16][17][18][19][20].
However, the image sensor cannot successfully detect an intrusion event without a sufficient amount of illumination. To solve the low-light illumination problem, various modality sensors were used such as acoustic sensors [21], robots [22], radar sensors [23][24][25], radio frequency (RF) [26], infrared (IR) sensors [27,28], and wireless sensor networks (WSN) [22,26,[29][30][31][32]. To reduce the implementation cost, a sound field sensor is very efficient to detect an intrusion in an empty, sealed space. In this context, Lee et al. successfully detected intrusion events using the sound field sensor in [4].
In order to compensate for the weakness of a single sensor, many researchers proposed surveillance systems using several sensors. Andersson et al. proposed a two-stage fusion method based on acoustic and optical sensor data to detection an abnormal event [33]. Castro et al. proposed a multi-sensor intelligent system by integrating different information obtained from multiple sensors such as surveillance cameras, motion sensors or microphones [1]. Azzam proposed a surveillance system that detects and tracks an object using image and acoustic sensors [34].

Image Sensor-Based Intrusion Detection
The proposed image sensor-based intrusion detection method detects if a moving object crosses a pre-specified boundary in the region-of-interest (ROI). More specifically, a user a priori assigns the ROI with inbound and outbound directions. The correlation filter is used to track an object since it is faster than other spatial domain tracking algorithms [35][36][37][38][39]. Since the correlation filter-based MOSSE (Minimum Output Sum of Squared Error) is simple and learns the target appearance for each frame, it gives accurate, fast tracking results. It is proven that the MOSSE filter-based tracker is more suitable for estimating the trajectory of moving object than other types of trackers such as KLT [40,41], mean-shift [42], Kalman [43] and particle filter [44].
The MOSSE filter, referred to as H, minimizes the squared error of the residual where F i and G i respectively represent the Fourier transform of a set of training input images and training correlation filtering results. H represents the Fourier transform of the correlation filter. and * are the element-wise multiplication operator and complex conjugate, respectively. To minimize Equation (1), we take the partial derivative of the function with respect to H * , and solve the minimality condition as where w and v represent the indices of frequency variables. By solving the optimization problem for H * , we obtain a closed form expression for the MOSSE filter as Finally, the MOSSE filter tracks the target using the peak to sidelobe ratio (PSR) with the strongest signal in the correlation response map.
The trajectory can be obtained by using the center of the object between the current and previous frames estimated using the MOSSE filter. Given the motion trajectory, a geometric analysis finally finds vector crossings for illegal intrusion as shown in Figure 2. In Figure 2, line 1 represents the a priori assigned line by the user, and line 2 represents the moving trajectory estimated by the MOSSE filter. Parametric equations of lines 1 and 2 can be expressed using parameters t and s, t, s ∈ [0, 1], respectively, where line 1 passes points P1 and P2, and line 2 passes P3 and P4. The crossing point by lines 1 and 2 is determined by solving the following equation: which can be decomposed into two equations using (x,y) coordinate as This can be rewritten in terms of t and s as where t and s represent parameters when two lines meet, and the corresponding crossing point is given as In Equation (8), we can decide that two lines cross each other when both t and s have values After detecting if the object crosses the pre-assigned line, we can decide whether the object moving direction is inbound or outbound using angle θ as where the inbound direction is considered as an illegal intrusion. Figure 3.

Sound Field Sensor-Based Intrusion Detection
In order to detect an abnormal event such as intrusion or fire without any blind zones, we use a sound field sensor. In addition, the sound field sensor technology is advantageous in comparison with the simple sound threshold method using a microphone. The intrusion events happening together with the sound generation such as footstep or window breaking can be detected by the sound threshold method. On the other hand, the proposed sound field sensor system can detect any kind of intrusion event happening without generating a sound signal because it detects the change of sound transfer function between a speaker and microphone depending on the configuration of the objects or the boundary condition of the secured space. Many false alarms are another bottleneck of using the sound threshold method. However, the false alarm can be reduced by using the optimized sound intensity of sinusoidal sounds for sound field sensor even with the environmental sound noises. The sound field sensor generally consists of a sound generator, a microphone, a speaker, and a signal processor [4,45]. We first generate a multi-tone sinusoidal sound. Next, the microphone measures the sound, and the recorded sound signal is transformed into the frequency domain. The transformed sound spectrum is then compared and analyzed in terms of the temporal periodicity.
A general security space consists of an acoustic space defined by the sound wave equation with the boundary condition of the initial pressure P(r 0 ) and velocity U(r 0 ), as shown in Figure 4. We can estimate the transfer function of the security space by comparing the input signal to the speaker and the output of the microphone. If an object enters the security area, the sound wave becomes distorted and suffers from various types of distortions including refraction, reflection, and absorption. Such abnormal events can be expressed in terms of the transfer function as shown in Figure 5. Let the sound source be q(t) and its Laplace transformation Q(s). If H(s) represents the transfer function of the space and P(s) the sound pressure at a certain position, the transfer function of the sound field can be expressed in the log scale as X = 20 log(H(s)) = 20 log(P rms (s)/Q(s)) .
When an object intrudes into the security space, the transfer function of the sound field is distorted according the transfer function of the space X = 20 log(H (s)) = 20 log(P rms (s)/Q(s)) .
The difference between Equations (11) and (12) and its magnitude are given as Y = X − X = 20 log( P rms (s → jw) P rms (s → jw) ), (13) |Y| = 20 log( P rms (jw) P rms (jw) ) , which is equivalent to the difference between sound pressures before and after the intrusion. In other words, the absolute magnitude of the sound pressure ratio is used to detect the intrusion in the security space. To detect an intrusion, we devise an analysis algorithm that monitors the deviation between the input sound pressure level and the reference one over multi-tone frequencies such as shown in Figure 6. More specifically, we used the signal-to-noise ratio (SNR), where the signal represents the difference between the real sound pressure level and the reference, and the noise represents the maximum deviation during the multiple measurements for the reference sound pressure level as shown in Figure 7. If SNR is larger than a pre-specified threshold, the intrusion event is detected. SNR is the average value of S/N over the multi-tone frequencies where S int-re f represents the difference of averaged TF int and TF re f . TF int is the transfer function of the intrusion and TF re f is the transform function of the reference state. N re f is the difference between the maximum and minimum of the reference sound pressure.  We calculated the S/N values for all multiple frequencies. S/N denotes the signal-to-noise ratio, and the signal S represents the difference of the transfer function between the reference and each measurement. The noise N is the deviation of TF in the reference. SNR is the average value of S/N over all multiple frequencies. The reference value of SNR for intrusion detection was defined to be 1 because we can decide the occurrence of detection if the signal S exceeds N for all multiple frequencies.

Hybrid Sensor-Based Intrusion Detection
We present a hybrid sensor-based surveillance system that combines an image and sound field sensors to detect intrusion in a sealed indoor space without illumination and blind zone. The overall block diagram of the proposed system is shown in Figure 8. In a dark, empty indoor space, the sound field sensor first detects the intruding events using a multi-tone sound field transfer function. To reduce the power consumption of the sound sensor, we used only 15 % duty cycle in the multi-tone signal as shown in Figure 9.
After the sound field sensor detects an intruding object, the proposed system turns on the light for the sensor to start tracking the detected object. The details of the sensor-based tracking algorithm was described in Section 3. Figure 9. The multi-tone signal with 15% duty cycle.
We can consider a data fusion method between image and sound field sensors. For example, the detection sensitivity of the intrusion event can be improved with the combination of two sensors using 'OR' decision algorithm, and the false alarm can be reduced using 'AND' decision algorithm. The proposed intrusion detection algorithm is summarized below: i f visual sensor = true && sound f ield sensor = true then intrusion event else then no intrusion event. (16)

Experimental Results
To configure the proposed system, the quality requirement of the microphone and speaker are not high for the implementation of the sound field sensor. The sensitivity of 5 mV/Pa and S/N ratio of 58 dB of in the frequency range of 500 to 8 kHz is sufficient for the microphone, and the sound pressure level (SPL) of 96 dB @ 10 cm @ 1W in the frequency range of 500 to 6 kHz is sufficient for the speaker. Most of the commercially available microphones and speakers embedded in a CCTV or smartphone camera can be used for the implementation of a sound field sensor. A Texas Instruments (Dallas, TX, USA) TMS320C674x digital signal processing (DSP) chip or advanced rice machine (ARM) processor having a similar performance are sufficient for the sound field sensing signal process of data acquisition and the sound analysis including the fast Fourier transform (FFT).
To demonstrate the performance, we tested the proposed intrusion detection system in a residential area as shown in Figure 10. The test area shown in Figure 10 has an empty, sealed room with a light control function connected with the proposed intrusion detection system. We set the moment of the system initialization as the reference time (t = 0 s), and an intruder enters the room after two seconds (t = 2.0 s). The sound field sensor-based test results are summarized in Figures 11-13. During the test of the sound field sensor, we measured the sound pressure level of multi-tone frequencies at every 3.5 s, and the results are summarized in Figures 11a, 12a and 13a. S/N values of the sound pressure level corresponding to the estimated multi-tone frequency are shown in Figures 11b,  12b and 13b.
The final detection results in terms of SNR are shown in Figures 11c, 12c and 13c, where we set the detection threshold value to 1 because we can decide the occurrence of detection if the signal S exceeds N for all multiple frequencies. As shown in Figures 11c, 12c and 13c, the system was initialized at t = 0, and the intruder enters the room at t = 2.0 s. Since image and sound field sensors have different initialization times, we started the test events after all the components start working in the normal condition. The two second delay is acceptable for a practical implementation. However, the system detects intrusion at t = 3.5 s since it measures the sound pressure level at every 3.5 s as shown in Figures 11a, 12a, 13a, 11b, 12b and 13b. The reliability of the sound field variation-based intrusion detection was tested using 20 trials with the similar intrusion scenario. After the system first detects the intruder, it turns on the light of the room for the image sensor to take over the tracking task as shown in Figure 14.
As shown in Figures 11c, 12c and 13c, since the sound field sensor can detect intrusion 3.5 s after the initialization, the light is turned on after 3.5 s as shown in Figure 14c. The image sensor-based detection was carried out using a fixed video camera without an auto white balancing (AWB) function. The detection algorithm generates a single background and frame differences. In Figures 11c, 12c and 13c, the SNR value exceeds the detection threshold, which means that the intruder exists the room. In Figure 14d, since the intruder enters the pre-assigned line, the image sensor detects the intrusion event. As shown in Figures 11c, 12c, 13c and 14d, the system decides the intrusion event when both sound field sensor and image sensors detect the intrusion. The proposed decision process is shown in Figure 15. Experimental results of [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20]46,47] using an image sensor detected intrusion objects in a general environment with normal illumination. Although many object detection systems adopt a low-light enhancement algorithm as preprocessing [2,3], it still cannot properly work if there are is no illumination at all. Even if there is a very small amount of illumination, a general low-light enhancement algorithm usually amplifies noise as well as image intensity. However, the proposed system uses a sound field sensor to physically turn on the light under no illumination state. As a result, it is possible to detect intrusion by using a simple detection algorithm using an image sensor without a complicated low brightness enhancement algorithm. Although infrared sensors were used to detect intrusion under the low illumination condition [16,48], the infrared sensor cannot detect the intrusion in the blind spot. To improve the problem of this blind spot, Refs. [49,50] employed passive infrared (PIR) sensors in all blind spots. However, it is too expensive to be installed in practical applications. The intrusion detection method using a thermal camera [51] was also proposed for low-light conditions. However, it is sensitive to temperature, and intrusion detection is not possible in blind spots. However, the proposed algorithm can detect intrusion events in the bounded condition of the secured space by using the sound field sensor even if an object intrudes behind the wall in low-light conditions at t = 3.5 s as shown in Figure 15. In Refs. [33,34,52], an acoustic sensor can detect an abnormal situation even if there is no light, but it needs additional data to detect an abnormal sound. In addition, it is very sensitive to sound intensity and surrounding noise. However, the proposed system can detect the intruding object using only sound signals without additional abnormal sound data. An object detection system using optical fiber [53,54] can be used for accurate intrusion detection even in low-light conditions. However, it is not suitable for indoor intrusion detection systems. However, the proposed system can construct an accurate intrusion detection system in a low-illumination environment and blind spot using an inexpensive, low-quality microphone, a speaker and an image sensor.

Conclusions
The proposed hybrid sensor-based surveillance system can robustly detect an illegal intrusion by exploiting advantages of both sound field and image sensors. Many security systems have a microphone and a speaker as well as an image sensor. For that reason, the sound field sensing algorithm can be embedded into a DSP module without significantly increasing the cost. Existing image sensor-based surveillance systems cannot avoid mis-detection or failure due to unstable illumination such as low-contrast, dark out, flickering, or noise.
To solve the problem of the image sensor-based detection, the proposed system adopts a sound field sensor to detect an illegal intrusion even without sufficient illumination. The hybrid system first detects an intruding event using a sound filed sensor by analyzing the multi-tone frequency spectrum of the sound pressure and distortion in the transfer function of the sound field. Next, the system automatically turns on the light based on the detected event. The image sensor then takes over the detection task from the image sensor for seamless, accurate analysis and tracking the intruding object. As a result, the proposed surveillance system has two major advantages: (i) it is power-efficient since we do not need to turn on the illuminating light for the initial detection; and (ii) there are no blind spots since the sound field sensor can detect an intruding object behind the obstacle. On the other hand, the source signal may make an unpleasant audible noise. However, this is not a serious problem because the detecting operation is working in the absence of people. The proposed hybrid sensor network-based system can be used for low-power, robust surveillance in various types of security purposes such as an office room after work, empty automobile, safety room in a bank, and armory room. The proposed surveillance system can provide elements such as an alarm message transfer to smartphone.
Author Contributions: H.P., J.P., H.K. and K.-H.P. initiated the research and designed the experiment. K.-H.P. and S.Q.L. evaluated the performance of the proposed algorithm. J.P. wrote the paper.