Performance evaluation of neural network assisted motion detection schemes implemented within indoor optical camera based communications.

This paper investigates the performance of the neural network (NN) assisted motion detection (MD) over an indoor optical camera communication (OCC) link. The proposed study is based on the performance evaluation of various NN training algorithms, which provide efficient and reliable MD functionality along with vision, illumination, data communications and sensing in indoor OCC. To evaluate the proposed scheme, we have carried out an experimental investigation of a static indoor downlink OCC link employing a mobile phone front camera as the receiver and an 8 × 8 red, green and blue light-emitting diodes array as the transmitter. In addition to data transmission, MD is achieved using a camera to observe user's finger movement in the form of centroids via the OCC link. The captured motion is applied to the NN and is evaluated for a number of MD schemes. The results show that, resilient backpropagation based NN offers the fastest convergence with a minimum error of 10-5 within the processing time window of 0.67 s and a success probability of 100 % for MD compared to other algorithms. We demonstrate that, the proposed system with motion offers a bit error rate which is below the forward error correction limit of 3.8 × 10-3, over a transmission distance of 1.17 m.


Introduction
The optical wireless communications (OWC) technology covering ultraviolet, infrared and visible bands is a complementary technology to the dominated radio frequency (RF) based wireless systems that could be used for addressing the bandwidth bottleneck is a possible option for Internet of things (IoT) [1,2]. The visible spectrum band (i.e., 370-780 nm) known as visible light communications (VLC) is being considered as a possible option in 5 th generation (5G) wireless networks for indoor environments. VLC utilizing the light-emitting diodes (LEDs) based lighting fixture offers four independent functionalities of data communications, illumination, localization and sensing in indoor environments [3,4]. In addition, VLC can offer massive MIMO (multiple in multiple output) capabilities using LED and photodetector (PD) arrays for IoT applications in both indoor and outdoor environments [5]. This feature of VLC is unique compared to massive MIMO in RF-based systems, which is too complex to implement.
The wide spread use of smartphones (six billion of them) with high-spec cameras are opening up new possibilities for VLC in applications where the need for high-data rate [6,7]. Such applications include indoor localization, sensing, intelligent transportation systems, shopping areas, etc. The camera-based VLC, also termed as optical camera communications (OCC), has been studied within the framework of OWC and considered as part of the IEEE 802. 15.7rl standard [8,9]. OCC utilizes the built-in complementary metal-oxide-semiconductor camera in smart devices as the receiver (Rx) for capturing two-dimensional data in the form of image sequences, thus enabling multidimensional data transmission. OCC with multiple functionalities of vision, data communications, localization and motion detection (MD) [8][9][10] can be used in all-optical IoT (OIoT) [5] based network application including device-to-device communications, mobile atto-cells, vehicle-to-everything (V2X), smart environments (home, office, surveillance), etc. [11]. In smart environments (i.e., home and offices) OOC-based MD can be utilized to effectively control smart devices [10,12,13]. This is very convenient and cost-effective as users carry smartphones with inbuilt cameras, which can be used as a Rx for both OCC and MD compared with other user interface methods such as gesture control (using a single webcam) and an infrared 3D camera for PC [14,15]. MD based schemes such as (i) Li-Tech -shape detection and 3-D monitoring using visible light sensors [16]; (ii) Li-Sense -offering data communication and fine-grained, real-time human skeleton reconstruction using VL, which utilizes the shadowing effect and 324 PDs [17]; and (iii) a number of gesture recognition schemes [14,15] have been proposed.
In OCC, image processing is critical for retrieving the transmitted data from the captured image frames. In recent years, intelligent machine-learning techniques (i.e., neural networks (NN)) have been adopted in image recognition for identifying objects' shape in the image, transcribing speech into a text, matching classified items and predicting the relevant results from network training [18]. In NN-based feature recognition schemes, multiple hidden layers with artificial neurons are used to train the network. These artificial neurons represent the main constituent, which receives multiple input samples in order to train the NN.
In [12] first reported by the authors, the initial results of MD performance based on images and centroid data samples (i.e., both considered as the input to NN representing the motion) using the variable learning rate backpropagation algorithm for training. The results in [12] demonstrate that, NN trained with centroid data samples performs only 5000 iterations in a time window of up to 4 s while the conventional NN trained using images can perform up to 8138 iterations in a time window of up to 9 s. Even though [12] provides a promising approach for MD, such long time windows could not be applied in real-time cases. Since the time windows have been obtained using a basic backpropagation algorithm, it is necessary to further perform more detailed analyzes of the proposed scheme based on the centroid data samples and using different transfer function-based algorithms for NN in order to reduce the time window and a number of iterations. In this paper, the focus is on the experimental investigation of NN-based MD for OCC performance using a number of transfer function-based training algorithms. In doing so, we include a wide scale of training parameters including the processing time (PT), iterations carried out by NN for MD, the percentage of success for MD and mean squared error (MSE). Unlike conventional NN schemes [18], the proposed NN-based MD is trained with centroid data samples and different transfer function algorithms, thus providing more accurate detection. In this work, experimental investigations are conducted for an indoor static downlink OCC with a smartphone front camera used as the Rx. The NN training is performed using eight different transfer function-based training algorithms for MD over the transmission distance L of up to 2 m. The OCC link quality in terms of the bit error rate (BER) and peak signal-to-noise ratio performance (PSNR) with respect to L is also analyzed simultaneously. The proposed NN-based MD can be used for control of data communications in OIoT networks.
The rest of the paper is structured as follows: Section 2 provides details of the proposed NN based MD in OCC. Experiment results are discussed in Section 3. Conclusions are drawn in Section 4.  Figure 1(a) illustrates the system overview of the proposed OCC-based NN assisted MD in an indoor environment. A data packet generation output, which is in a 12.8 kbits non-return-to-zero (NRZ) on-off keying (OOK) format, is first mapped according to the addresses of 8 × 8 red, green and blue (RGB) Neo pixel LED array using an Arduino Uno board (an open source microcontroller board based on the ATmega328 [19]). The intensity modulated (IM) light signal is transmitted over the free space channel. On the Rx side, an Android smartphone's front camera with the frame rate of 30 frames per second (fps) and a resolution of 1920 × 1080 pixels is used to capture the images (i.e., a video stream) of the IM LED array. In this work, the mobile phone is assumed to be located in a static position directly beneath the LED transmitter (Tx) at a height of 20 to 200 cm.  Note, motion is achieved by the user's finger moving over the camera. Both the RGB LEDs and finger movement are simultaneously captured by the camera in the form of a video stream, which is then divided into frames for post-image processing using MATLAB. Typically, the recorded video length depends on the motion duration ∆t with the mean and maximum values of ∼ 2.5 s and 4.5 s, respectively. For a camera, with a frame rate of 30 fps the captured frames of 75 and 135 are for ∆t of 2.5 s and 4.5 s, respectively. As shown in Fig. 1(a), the user's finger hovering a few centimeters above the camera's screen will result in shadowing and reflected light rays. Note that, the illuminated finger is readily traceable by the camera using a tracking function, and its motion is expressed as centroids, which represent the center of a moving finger in the form of consecutive coordinate points [20], see Fig. 2. Note, in Fig. 2, each coordinate point represents the center of a moving finger tracked in a particular time frame. The key principle of MD is to compare the changes between the frames (a series of images) following video processing. The frame resolution is measured in terms of the pixels and inter-frame time, which is 33.3 ms (1 s/30 fps). The motion between two consecutive frames can be simply determined as the difference between the centroid coordinates (x 2x 1 , y 2y 1 ) in (N + 1) th and N th frame. The coordinate position of motion centroid (MC), which is obtained from the user's finger movement, is applied to a pre-trained NN system for detection and identification of user's motions. For the demonstration purpose, we consider five motion patterns, which are created from two simple natural motions in a straight, circular and curvature lines. These motions can be used to control smart devices, e.g., straight and circular motions can be used for turning ON and OFF of a device.

Data compensation scheme
We have adopted a transmit data compensation scheme based on the anchor LEDs (four per frame) and a synchronization LED for time synchronization, which is located in the first frame as in [12], in order to overcome blocking or shadowing due to mobility as depicted in Fig. 3(a). The data compensation scheme is based on discarding damaged frames due to the blocking of the anchor LEDs and requesting re-transmission. Note that, obstacles may fully/partially block one or more anchor LEDs, thus resulting in damaged frames, see Fig. 3(b), which will lead to increased BERs. The use of anchor LEDs (i.e., four-bit per frame in this case) results in reduced data throughput per frame, thus the trade-off between the BER and the data throughput. For the proposed scheme with the transmit data compensation scheme, the data rate can be given as: where N L and N A denote the number of data transmission, and anchors plus synchronization LEDs, respectively and L FR is the flickering rate of LEDs (20 pulses per second in this work). For the proposed system, the maximum achievable R d is 1.199 kbps (i.e., 64 × 20 -81). Note, if the number of anchor LEDs is reduced the data throughput will slightly increase. E.g., for the anchor LEDs of 3, 2 and 1, the data rates are 1.219 kbps, 1.239 kbps and 1.259 kbps, respectively. For the proposed OCC-based scheme, we have adopted an efficient detection scheme of differential detection threshold (DDT) [10,21]. In the DDT scheme, the threshold level is defined in terms of the quantized intensity level within the range of [0-255]. Figure 4(a) represents the identified data area within the frame, while Fig. 4(b) provides the quantized intensity of the detected data. Based on DDT the initial value of threshold level was set to 181 level of quantized intensity as in [10,21]. Note, the threshold level can be adaptively set based on the intensity levels in the image frame.

NN-based MD for OCC
The proposed scheme can be trained using the transfer function algorithms in order to improve the MD performance by identifying only the predefined motions. Figure 5 illustrates the NN structure for MD performance evaluation within the context of OCC. The input nodes are the coordinate positions of 100 centroid data samples, which represent 20 centroid data samples per predefined motions (i.e., variants of linear, circular and curvature movements) for an OCC link span ranging from 20 cm to 200 cm. There are two hidden layers of 100 and 5 neurons, respectively. The hidden layers are used to detect and identify the user's motion, the output of which is expressed in the form of five-bit training labels representing the five predefined motions.
Note, for the training of NN, we have used eight possible transfer function-based algorithms as listed in Table 1. When selecting the most suitable training algorithm a number of factors needs considering including the number of neurons N n in the hidden layers, PT, error measurement and the type of network used for pattern recognition, etc. [22]. In this work, we train the NN with MC and use pattern recognition to identify the classification of input signals or patterns in order to evaluate the link performance. Step Secant OSS Gradient descent GDX The key system and NN training parameters are given in Table 2. The training parameters of training goal, iterations and time were set to the same values for all training algorithms in order to evaluate their performance under the same training environment.

Results
Figures 6(a)-6(c) shows the experimental results of the detected MC representing variants of linear and circular motions as well as the curvatures. The solid grey line represents the actual considered motions while the dots represent detected MC tracked from user's finger movement over the smartphone's front camera when receiving data from the Tx. The coordinate points of these MC are used further to determine the probability of success for MD. Note, due to tracking some centroids are deviated from the actual motion path (highlighted in small blue circles) while some part of other light sources (highlighted in small red boxes) are captured within the surrounding. However, the NN training output is not affected due to these small number of deviated centroids and other light sources. To evaluate the system performance, we have used two criteria of MSE and the PT for all transfer function algorithms, which are obtained by averaging over 1000 training iterations for the OCC link span ranging from 20 -200 cm, as depicted in Fig. 7(a). As mentioned in Table 2, the training time limit was set to infinite in order to examine properly all the training algorithms, considering that some will take longer time to converge with the predicted accurate output. Note that, in a real-world application using NN with infinite networks a time complexity approach can be considered based on Markov Chain Monte Carlo method, which is compatible with large networks [23]. As shown in Fig. 7(a), the RP algorithm converges faster than others reaching the minimum MSE and PT of 5.1 × 10 −5 and 0.67 s, respectively. The conjugate gradient algorithms (SCG, CGB, CGF and CGP) also perform well and can be used in networks with a large number of neuron weights due to the modest memory requirements [21]. Note, the LM algorithm offers the worst performance in terms of both the PT and MSE. This is because LM is designed for the least square problems, which are approximately linear in contrast to pattern recognition problems where the output neurons are generally saturated [21]. Both GDX and OSS algorithms converge rapidly provided the training is stopped too soon, but at the cost of inconsistent results [21]. Figure 7(b) illustrates the percentage of success for MD performed over a total of 100 experiments with respect to L for all algorithms listed in Table 1. The percentage of success for MD was determined based on the comparison of the exact input with five-bit output of NN, which represents the five predefined motions. It can be seen that RP display the best performance with the MD accuracies of 100 and 96.5 % over a link spans of 1.6 and 2 m (i.e., the maximum range in this work). The reduction in accuracy for increasing L is due to the fact that the illumination level of finger becomes lower as it moves away from the Tx. However, this does not have a significant impact on NN training and therefore, these reduced accuracy levels are still acceptable. With RP displaying the best performance, we have further investigated it's complexity of NN in terms of MSE and PT. Note, in general, N n in the hidden layers can be larger or smaller than the number of input nodes (i.e., data samples). Large or small N n will result in a complex NN and a higher number of training iterations and PT, respectively. Thus, the trade-off between N n and NN training complexity is illustrated in Table 3. Since the proposed scheme offers simultaneous indoor data transmission via OCC and MD, next, we evaluated the link's BER and PSNR performances. Since in OCC the data is captured in the form of a two-dimensional image, a conventional SNR measurement cannot fully reflect the quality of the link. Therefore, we have adopted PSNR, which is widely used as a quality metric in image processing systems, as given by [24]: where I 2 peak denotes the squared peak intensity of the measured frame, I Tx and I Rx are the intensities of the transmitted and received frames.
Note that, user's motion will result in partial shadowing, which will ultimately affect the BER performance. Figure 8 shows the link's BER and PSNR performance against L for 12.8 kbits of data and four-bit header at a R d of 1.199 kbps, where error-free data transmission is achieved at L up to 80 cm. Note, at the forward error correction (FEC) limit of 3.8 × 10 −3 at L of 1.17 m, which is achieved because of the data compensation scheme. The transmission span of 1.17 m is a typical range in environments such as hospital wards, etc. Figure 8 depicts the BER performance as a function of PSNR for the proposed link. At a BER of 10 −5 , well below the FEC limit of 3.8 × 10 −3 , the PSNR is ∼ 20 dB. Finally, we compared the performance of the proposed NN assisted MD in OCC systems with MoC [10], TNMD [12], VLC-based MD [13] and LiSense [17] as shown in Table 4. In MoC, TNMD and the proposed NN assisted MD in OCC systems Android smartphone front camera has been used as Rx whereas, in VLC based MD and LiSense use PD-based Rx. Note, NN-based schemes offer improved performance compared to VLC-based systems. The highest percentage of success for MD of 96 % at L up to 200 cm is observed for the proposed scheme with the RP algorithm (with measured PSNR of 16.18 dB). The same percentage of success for MD is achieved for MoC a complex but a reliable Quadrant division based MD algorithm, but at L of 12 cm. Higher percentage of success for MD of 97 % is observed for TNMD with the basic NN algorithm at a maximum L of 125 cm. The improvement offered by the proposed NN assisted MD in OCC link, which uses a mobile phone camera as the Rx for MD and data transmission, is due to the use of RP algorithm within NN.

Conclusion
The performance of NN assisted MD OCC link was experimentally evaluated for eight different transfer function based training algorithms with training parameters of PT, the number of iterations, the percentage of success for MD and MSE. We showed that, the best performance was achieved using the RP algorithm with the fastest convergence at a minimum error (MSE) and a PT of 10 −5 and 0.67 s, respectively as well as the percentage of success for MD of 100 % up to a 1.6 m OCC link. For higher L, the OCC link will experience shadowing due to fingers' movement thus the need for diversity based Rx. We also demonstrated that, using the transmit data compensation scheme a high-quality data transmission with the FEC limit 3.8 × 10 −3 , was achieved at 1.17 m OCC link. The reliability and efficiency of the proposed scheme were assessed by comparing it with other existing techniques. The NN for MD analysis can be further extended to increase the link spans based on pattern recognition algorithms and using different transmitter configurations for mobility and multiuser indoor smart home environments. On the other hand, the date rate can be enhanced using a high capture speed camera with rolling shutter and a larger LED array as the Rx and the Tx, respectively in a MIMO OCC link.