Intelligent Machine Vision for Automated Fence Intruder Detection Using Self-organizing Map

– This paper presents an intelligent machine vision for automated fence intruder detection. A series of still captured images that contain fence events using Internet Protocol cameras was used as input data to the system. Two classifiers were used; the first is to classify human posture and the second one will classify intruder location. The system classifiers were implemented using Self-Organizing Map after the implementation of several image segmentation processes. The human posture classifier is in charge of classifying the detected subject’s posture patterns from subject’s silhouette. Moreover, the Intruder Localization Classifier is in charge of classifying the detected pattern’s location classifier will estimate the location of the intruder with respect to the fence using geometric feature from images as inputs. The system is capable of activating the alarm, display the actual image and depict the location of the intruder when an intruder is detected. In detecting intruder posture, the system’s success rate of 88%. Overall system accuracy for day-time intruder localization is 83% and an accuracy of 88% for night-time intruder localization.


I. INTRODUCTION
Anyone wants to have a safe home, properties or offices.The Self-Organizing Map Classifier for Vision Based Auto Intruder Detection embracing the method of Artificial Neural Network (ANN) will offer a remarkable result in defending the residence, stuffs or properties from thieves, intruders or burglars.Crime is increasing day by day, thus the demand of reliable, fast and accurate security system is rapidly increasing.It was stated that whether or not a person is a victim of crime, the mere thought of an unwanted visitor lurking around his house can make him cringe.Many home security systems have proven effective in preventing home burglaries.It was grouped differently depending on the usage of the system and on the technical features it offers.There are magnetic, electric circuit and motion detecting systems, infrared systems and wireless security systems.There are systems with security cameras, electric fences and guard dog perimeters.Some other systems can communicate via household electric wires called the X10 security systems and those operated remotely via the Internet.
Though the mention security system offer a remarkable result but, none of them has an ability to identify and analyze the current situation.It simply gives a warning when there is a disturbance on an emitted signal or on an established circuit.Usually this type of security system has high percentage of error or high false alarm rate.
Establishments like banks, airports, casinos, convenience stores, and military installation use surveillance cameras to monitor and record activities both inside and outside the building but not really effective in preventing break-ins.The recorded events can be firm evidence as enforcement in catching and prosecuting the burglar.The Hidden Cameras are an improvement, but still no power to stop a crime.Observing the visual display or video stream of surveillance cameras (i.e.Close-Circuit Television (CCTV) cameras) can be time consuming because the activity captured by each of the installed camera is shown in one monitor at the same time.If there are 9 cameras installed in a building, the person who is assigned to observe the situation surely have difficulties in watching the video stream at the same time, which also makes the system ineffective in preventing burglars.This type of system will be much vulnerable from intruders when the assigned person falls asleep.
Accordingly, the study focuses on the development of an intelligent machine vision fence intruder detection using Selforganizing map, aimed to identify the intrusion level within the set perimeter of a building or a residence, promising a reliable high detection rate and a low false alarm rate.It also aimed to make the job of the assigned person much easier in supervising the system's visual display by providing simple but accurate visual display based on the intruder position.The system will trigger the alarm in case the assigned person fall asleep

II. SYSTEM ARCHITECTURE
The system classifies the acquired still images from data acquisition unit as Not Intruder, Potential intruder, Intruder Level-1 (L-1), or Intruder Level-2, using two system's classifiers: Human Pose Intruder Classifier and Intruder Localization Classifier after applying several image segmentation.The system will trigger the alarm and display the actual location and position of the intruder.Intruder localization data was subdivided into three: The Potential intruder data, The Level one intruder data, and The Level two intruder data.Potential intruder is an object identified to have human properties and with the use of system's classification technique, it is classified as may or may not be an intruder.Similarly, Level one (L-1) intruder and Level two (L-2) intruder is an object classified as human and specifically located in a certain point of a fence, refer to Figure 5.
Unlike human recognition ability, computer recognition ability is pretty much limited to numbers.With this, the developed system viewed the input data in a form of still images, as a series of random numbers that represented each and every color, lines and shapes of the picture.Part of the system goal was to extract the human and the fence from the given data and make it available for further processing.Static method of object extraction was used to extract the fence from input image.Fence was modeled using four line segments which depicted four corner points.Based from four corner points the system then calculated the image pixel at position (x i ,y i ) that was known to be a part of fence model using the equation of a line: Whereslopem = (y2 i -y1 i ) / (x2i -x1 i )and the value of pixel ycoordinate   (  )was defined by the relation: 1  <   (  ) < 2 +1 using constant interval of 1.
Foreground extractor was to eliminate unwanted data and to extract the target object from the given image.Most likely Human intruder fell to the category of a moving object; it constantly changes its position throughout the time.Given a single video frame as input data, the value of image pixel was considered at position (, )takenfor a certain period of time.This value was referred to as  , ()and treated as a random process of variable   ,   =   () (2) The current pixel value was modeled as a mixture of K Gaussian distribution.The weight of this value was determined by a distribution, π (3) where π i is an estimated weight of the i th Gaussian, and N is the evaluation of a standard Gaussian with mean µ i, t and covariance matrix ∑ i, t :    Based on the calculated Gaussian value, individual pixel of an input data was being group into background pixel or foreground pixel.The background subtraction process was implemented.
. Below was the procedural approach used in foreground detection: For each pixel in a video frame: • For every N values taken from the pixel • Find the K Gaussians and weights that best fit to sample N values using Expectation Maximization (EM) algorithm • Find the Gaussian with the largest weight and store its mean as the value of the background image for that pixel.• Subtract the background image from the frame.
In the resulting difference image, any value larger than three standard deviations from the mean was considered foreground, and any other value was considered background.
Feature vector extraction and formulation was the final stage of eliminating unwanted data, refining and finalizing the target object using several image segmentation techniques.The technique used in this stage is depicted in Figure 6.
For simplification, the input data was converted into binary image, 0 and 1. Zero value was used to represent the unwanted pixel and 1 for a pixel of interest that composed the structure of the target objects.Blob detection was applied in input images to detect points or region that differ in properties like brightness or color compared to the surrounding.In this study, the value of individual pixels and the minimum blob area, which was 250 unit pixels were the properties used as a basis for region detection.Blob area analyses were realized by counting the number of touching pixels or pixels with adjacent side and have a pixel value of 1. Image re-composition aimed to recreate a whole new image representation concerning only the detected region of interest which generated form blob detection stage.At this point the processed image still contained the unwanted data and still considered as noisy image but much refined compared to the data produced after applying region filtration.The image morphology correction technique used in the project development was a collection of non-linear operations related to the shape of target image.The goal of this technique was to correct the target object imperfections or distorted pixels.Even though, input image already undergone pixel correction, still there were parts of the image that needed for further refinements that were not meet in pass pixel distortion correction technique.Filling of holes was one of the concerns.Image hole was referred to pixels that have a pixel value of 0, surrounded by pixels that have a pixel value of 1. Unwanted blob was identified by counting the number of ones within the detected blob or by considering the minimum blob area.Human blob had greater numbers of ones or greater area compared to unwanted blob which made it easier to identify and eventually removed from target object.Object contour extraction technique involved data conversion or transformation.The method was used to extract the object contour in preparation to the object classification.The object's outline details were the only considered data, instead of using the entire object blob as object representation, shown in Figure 7(b).Furthermore, silhouettes or object contour was the refined object representation taken from object blob.
Feature Vector was the data structure that contained all the unique details of the target object, needed in object classification.Feature Vector dimension was formulated using the equation 5 and equation 6.
(  )  =  ℎ   ℎℎ(5)  = (6) Where:  =    (   )ℎ  ℎ    . =     Furthermore, the Feature Vector dimension can be reduced by half of its size, using the distance formula from the equation of a line (Equation 7) to calculate the unique attribute of the object.
The number of raw a from Feature Vector dimension (    ) was taken from the total number of object details   from Equation 7.  The accuracy of newly created classifiers was tested using the data specified below: For Human Pose Intruder Classifier: • 1000 samples for intruder human pose.
• 1000 samples for non -human For Intruder Localization Classifier: Both daytime and nighttime • 100 samples per meter in level-1 intrusion, total of 700 samples • 100 samples per meter in level-1 intrusion, total of 700 samples • 100 samples per meter in level-1 intrusion, total of 700 samples To test Intruder Localization Classifier accuracy, total of 2100 data samples were used.In Human Pose Intruder Classifier, 2000 total of samples were used to test its accuracy.

III TESTING RESULTS
Considering the constant change of light intensity and the image background stability, eliminating the unwanted data was never been easier.During noise reduction, the system could possibly hurt the desired object pixels The object detected was classified as human intruder only if it was positively identified as human and its localization belong to intruder localization.Furthermore, classifying the detected object was possible using only a single classifier, merging the object details as well as the object localization as object feature vector, The data that were used in mapping were the same data that were used during map development.The result below was depicted according to its level as well as to its horizontal distance from the camera.The intruder hit sample result depicted a clear boundary between three intruder level classifications.As expected, level 1 intruder classification would be closer to potential intruder level and level 2 intruders.Level 2 intruders and potential intruder level as shown in Figure 10 and 11, was isolated by a clear boundary distinction.The result implies that a potential intruder classification could hardly be identified as intruder level 2 and vice versa, but it could possibly be classified as intruder level 1.However, there were few occurrences of classification fluctuations as shown in Figure 10 (c), this was usually happened when the noise reduction process fails to preserved some important object details.
Similar to Night-time hit sample results, level 2 intruder classification was expected to be the neighbor of intruder level 1 classification and intruder level 1 classification to potential intruder classification.Level 2 intruder could possibly be classified as intruder level 1 or vice versa.Additionally, intruder level rarely could be classified as potential level intruder, based from Figure 11, hit sample result.
The first experiment was done using 1000 image for both human and not human data samples.Human sample was acquired using the same acquisition device used in actual experiments and all of the representations contain the pose of human intruder in different level and different distance.The data used to represent not human pose were taken from internet.The data were composed of different basic geometrical shapes and different animal shapes that were already in masked format.Human Pose classifier that was trained using 10 thousand epochs, painted a hit rate of 0.884 and a false rate of 0.116.Hit rate value (α) depict a result of 0.8942857 and false hit rate of 0.105714286 (β) for Introduction Localization Classifier using 30 thousand epoch.Hit rate has a value ranges between 0 -1 (0<=α<=1).Value close to 1 means high hit ratings, 0.5 values was the neutral hit rating of the system, implies that system has accuracy of 50 percent (50%).Figure 13 presented the sample images that were false identified as not human and Figure 14 were the sample image of falsely identified as human from a set of not human data samples.Figure 12 and Figure 13 were the result sample during simulated experiment using 10 x 10 map dimension that trained 10 thousand times (10K epoch).To aid the deficiency of Human Pose Intruder Classifier, Intruder Localization Classifier was used to confirm the position of the identified human.In connection with this, mostly non -human objects have the position located below or distance from the fence and the human object usually located near or climbing in the fence.
In intruder localization classification, potential identifier and level one intruder were the false classification possibly occurs, because of the location which was near to identical.The reason stated was also true to Level -1 Intruders and Level -2 Intruders.Classifier test result summary using 30 thousand epoch and 10x10 map dimension was shown in Table 1 using Human Pose Classifier, Table 2 for Intruder Localization Classifier -Night-time and Table 3 for Intruder Localization Classifier -Day-time.
The result in Table 1 depicted map accuracy with respect to the number of training epochs.The experiment evidently showed the result different from the three tested training epochs: 3k, 5k and 10k.The experiment result met the expectation that the greater the number of training epochs the better the classifier performance will be.For Table 3and Table 2 the classifier was tested and trained using 10k training epochs and 30k training epochs.Based from the result, expectation was a little bit of.The accuracy of system classifier using 30K epochs was lower by one unit compared to the classifier with 10k training epochs, but the result does   The second experiment was done around 10:00 am to 2:00 pm.Because of Foreground Detection and Background Subtraction technique, the time of experiment conducted was not a much of a concerned of the said experiment, as long as the sun still up high, then still it's good to go.Acquisition device was installed approximately 2.35 meter, measured from the ground and aiming to test the system accuracy using the distance of 2 meters, 4.5 meters, 5.5 meters, 7 meters, 8.5 meters, and 10.5 meters as the horizontal distance of the intruder from the camera.Recall, that the horizontal distance of the sample data (intruder) used to train both network (Human Pose Intruder and Intruder Localization Classifier) were taken using the specified distance from the camera: 3 to 2 meters, 4 meter, 6 meters, 7 meters, 9 meters, 10 meters, and 12 meters for Day-time data sample.The distance of used in the second experiment was a little bit off, this was done to test if the system still able to classify the object given the distance odd.
The experiment was using 10 intruder samples per meters as shown below.Given a single intruder the accuracy of the system was tested to detect the non-intruder level, the potential intruder level, the level-1 intruder level and the level-2 intruder level.As a result, the accuracy of the system was a little bit low for the intruders which have a horizontal distance: 2 meters and 4.5 meters, closer to the acquisition device.It was because; the acquisition device could not have a clear view or good image representation of the given potential intruder level, due to the physical setup of the acquisition device, as shown in Figure 6.The system can only have a clear view to the intruder when the intruder starts to climb on the fence, and nearly the situation could be possibly identified as Level -1 Intrusion.
Red line was the target system performance, blue curve was the approximation of the actual performance of the system, and green curve was the average system performance.
The result of experiment 2 shows that the system performance was much more accurate if the sample intruder was far from the acquisition device (the further the sample, the accurate the system can get), given that the horizontal distance of the sample intruder does not exist to 12 meters as the set distance limit.The average detection rate of intruder level 1 was low compare to potential intruder detection rate and intruder level 2 intruder rate; this was due to the system turnaround time.The dilemma usually happened when an   intruder climb a little bit faster.During second experiment, the system was also tested using non-intruder samples.The sample had just walk along the fence, sometime extended their hand to hold the side of the fence.The experiment was done using ten (10) samples passing back and forth from 1 meter distance from the camera up to 12 meter distance.As a result, the detector system did not positively identify the given sample as fence intruder, probably because of the system filter that filtered the location of non-intruder samples.The system was also tested using multiple intruders designated in random horizontal distance from the camera and climbed concurrently.As a result, the system still detected the intruder, if not all of them, at least the system detected the first climber or the intruders whose clearly emphasized from the perception of the acquisition device.
The third experiment was done around 10:00 pm to 1:00 am without the moon's presence.With our without the presence of the moon during testing, the system can still be operational.Foreground Detection and Background subtraction technique was very much suitable in slight changes of image background illumination.
The image background light, shown in Figure 4.21 was the light emitted by the acquisition device itself.The same with the first experiment acquisition device was installed or was setup 2.35 meters from the ground.Intruder horizontal distances used in third experiment were measure using the designated distances from the acquisition device: 2.5 meters, 5 meters, 7 meters, 9 meters, 10 meters and 11 meters.The designated distances were slightly different from the sample data used to train the network.The same reason with second experiment, the designated distances omitted purposely to test the classification accuracy of the system, which was one of the reason why SOM technique was developed.
This third experiment tested the system accuracy by providing 10 intruders sample per designated distances.Each of the intruders was identified on its potential level, on its level-one position and on its level-two position, with respect to the fence of interest.The same with experiment number one and experiment number two, the size of the map used was 10x10 that was train 10 thousand times for Human Pose Intruder Classifier and 30 thousand times for Intruder Localization Classifier.
The presented intruder sample in 6 different designated positions in both experiment 2 and experiment 3 were all detected by the system.Some Levels of intrusion were not identified or did not trigger the classifier due to the processing time or period required to detect or classify the intruder.
The system classified an intruder and gave a visual display approximately in less than 3 seconds.For intruder classification alone, the system classified the acquired image in less than 1.5 second.But since the system used wireless data transmission from acquisition unit to image classification unit and to monitoring unit, the time to complete single image classification reaches 3 seconds approximately, in both experiments 2 and 3.However, the system accuracy in both experiment 2 and 3 was altered when the intruder or intruders climb faster than 1 second, since the time required in classifying single image was less than 1.5 seconds.Furthermore, when the intruder climbs the fence in less than 1 second, the system can still detect or can still classify the intruder in Level -1 Intrusion category, given that the system did not detect the intruder in potential level position.However, if the said intruder was detected in potential level position, then there will be a big possibility that the intruder could not be detected anymore in Level-1 intrusion, but can still be detected in level 2 position.

III. CONCLUSIONS
In spite of all the factors that affected the system's accuracy such as: sudden change of light illumination, movement of undesirable object, noises, and the speed of the intruders; the developed detector system had successfully detected the presented intruder samples that were located not beyond 12 meters horizontal distance from the location of the acquisition device.With the use of classifiers or identifiers developed using Self-Organizing Map, the detector system was able to identify the pose of the extracted object as well as the current position of the object with respect to the fence of interest.The developed intruder detector system was also able to provide a simple yet accurate visual presentation of the detected intruders and triggered the system alarm when the image was positive from intruder(s).

Figure 3 :
Figure 3: Day-time human sample

Figure 8 ,
illustrated the logical representation of object details reduction, from originally acquired image to process or converted image.Three classifiers were used in the designed project development such as Human Pose Intruder Classifier, Day-time Intruder Localization Classifier and Night-time Intruder Localization Classifier.

Figure 6 :Figure 7
Figure 6: Feature vector extractions but the time required to train the network was very expensive.Training period to develop a single classifier merging the object details and object localization did reached to approximately one month using 10 thousand epoch of network training.That was exhausting and time consuming network training.The downside of a single classifier was when there was an update for intruder sample that required re-training of network.The update would take another one month approximately.In contrast with the single classifier, by using two classifiers (HPIC and ILC) the network training period was effectively reduced to approximately 2.5 hours.Approximately 2 hours for HPI training using 10 thousand network training epochs.Approximately 0.5 hours for ILC training using 30 thousand training epochs.

Figure 12 :Figure 13
Figure 12: ID as not human

TABLE 1 :
HUMAN POSE INTRUDER CLASSIFIER